Het Condor Team van de Universiteit van Wisconsin-Madison heeft een nieuwe stable-versie uitgebracht van hun 'workload management system' Condor. Het versienummer is aanbeland bij 7.2.4 en het pakket wordt onder de Apache 2.0-licentie uitgegeven. Condor richt zich op het beheer van rekenintensieve taken en kan deze over meerdere aangesloten nodes verdelen. De gebruiker stuurt zijn taak naar Condor waarna deze het proces afhandelt op basis van ingestelde policies en de beschikbaarheid van de aangesloten resources, om tot slot de resultaten naar de gebruiker terug te sturen. Condor kan bijvoorbeeld een dedicated Beowulf-cluster aansturen, maar ook standaard desktops die normaal ingezet worden voor gebruikers, kunnen gebruikt worden als ze even niets te doen hebben. Wanneer een gebruiker terugkeert naar zijn desktop wordt de huidige taak automatisch doorgespeeld naar een andere node. De aankondiging samen met de lijst van aanpassingen ziet er als volgt uit:
Condor 7.2.4 released!
The Condor Team is pleased to announce the release of Condor 7.2.4. This is a bugfix release to the stable series of Condor.
- Fixed a bug in the checkpoint server that caused failure of checkpoint image storage and retrieval if the requesting submission machine was running a 32-bit installation of Condor and the checkpoint server was from a 64 bit installation, or vice versa. The checkpoint image server, both the 32-bit and 64-bit installation, now handles both protocols. It is recommended that any checkpoint server installation which may be used in a flocking situation or other federated joining of pools use the 64-bit binary. This is due to the possibility that there could be a checkpoint image larger than what is representable in 32 bits. A 32-bit checkpoint image server will now notice if this situation occurs and log a message suggesting an upgrade to the 64-bit version.
- Fixed a bug that caused condor_procd to sometimes fail when monitoring processes with environments larger than 1MB.
- Fixed a bug that caused condor_dagman to fail in recovery mode on a DAG in which any nodes had been retried.
- Xen-based virtual machines now have the correct amount of memory. Previously, the amount of memory was too small by a factor of 1024.
- Fixed a bug in the handling of $$(VARIABLE) submit file expressions.
- Fixed a bug in the code related to USE_VISIBLE_DESKTOP that was causing the windows created by the job behave incorrectly.
- Fixed a bug that caused Stork to treat successful file transfers as failed.
- Fixed several bugs in the user log reader in the handling of files of size zero.
- Fixed a problem affecting parallel universe jobs with very short tasks. If any of the parallel tasks exited before the first node started, the entire job was prematurely treated as though it had finished. If the job ClassAd attribute ParallelShutdownPolicy was set to "WAIT_FOR_ALL", the job was prematurely treated as though it had finished if all stated tasks completed before the remaining tasks started.
Additions and Changes to the Manual:
- Occasionally, Condor daemons will, for unknown reasons, bind the command socket to the invalid IP address of 0.0.0.0, resulting in the daemon crashing or otherwise malfunctioning. The command socket address is always logged in the daemon's log, so the condition can be detected by looking for a line like in the daemon's log header with the address of 0.0.0.0 as in:
5/6 16:34:26 DaemonCore: Command Socket at <0.0.0.0:53795>
If you encounter this problem, please send an e-mail to email@example.com with any relevant details.
- Descriptions and definitions of all commands that may be placed within the submit description file have been moved from the condor_submit manual page to section 2.5.1.
Condor 7.2.3 released!
The Condor Team is pleased to announce the release of Condor 7.2.3. This release adds standard universe support for Debian 5.0 on x86_64. It also fixes a memory leak in the collector that has affected many users. Other bug fixes include file decriptor leaks in the schedd, encryption working with parallel jobs, and remote attributes in Condor-C job ads.
Configuration Variable Additions and Changes:
- Enhanced the Debian 5.0 Condor port on the x86_64 platform to include support for the standard universe.
- The new integer configuration variable SEC_TCP_SESSION_DEADLINE specifies the number of seconds after which the client should give up its attempt to establish a security session with a daemon that it is connecting to. The default value is 120 seconds.
- The new configuration variables SCHEDD_CLUSTER_INITIAL_VALUE and SCHEDD_CLUSTER_INCREMENT_VALUE are integers that specify the cluster number to use for the first job submission, and the stride used to increment the cluster id upon successive submissions. See 3.3.11 and 3.3.11 for the complete definitions of these variables.
- Fixed a memory leak in the condor_collector daemon. The growth in memory over time was approximately 10Mbytes per day per 1000 slots. This bug was introduced in Condor version 7.2.0.
- Fixed a problem that caused integrity checking of most UDP packets longer than about 40Kbytes to fail. This bug affected all previous versions of Condor.
- By adding the new configuration variable SEC_TCP_SESSION_DEADLINE, fixed a problem that has existed since Condor version 7.1.2. The problem was that non-blocking read operations in the security handshake had no timeout, and could therefore lead to a socket remaining allocated indefinitely, if the other side of the connection did not respond. When this problem was observed, the following message appeared in the log written by the condor_schedd daemon:
file descriptor safety level exceeded
- Fixed a rarely observed bug in the event log reader code that could cause it to not detect missed events.
- A bug in the Chirp java client has been fixed. The ChirpInputStream's read() method was returning negative values when encountering binary data.
- condor_dagman now rejects negative node retry values.
- condor_dagman no longer generates a rescue DAG if the DAG is aborted, but is considered successful; this is when ABORT-DAG-ON returns the value 0.
- The user log event numbered 27, named "Job submitted to grid resource", is now written for all grid universe jobs. Previously, it was not written for pbs, lsf, nordugrid, or unicore grid types.
- Fixed a bug where a Condor-C job with both remote_ and remote_remote_ attributes would not have a remote_ attribute when submitted to the remote condor_schedd daemon.
- Fixed a bug in condor_configure and condor_install that would leave the configuration variable CONDOR_HOST unset when configuring a central manager without using the -central-manager command-line argument.
- Fixed a bug that could cause the condor_schedd daemon to leak memory and file descriptors when using the EVENT_LOG configuration variable.
- Fixed a bug in the condor_gridmanager that could cause it to not send a clean up signal to the GRAM jobmanager for removed gt2 jobs.
- Fixed a bug that caused parallel jobs to not work when encryption was enabled.
- Fixed a bug in the Windows installer that caused it to fail to start Condor.