Het HTCondor Team van de Universiteit van Wisconsin-Madison heeft een twee nieuwe versies uitgebracht van zijn workload management system HTCondor. In de stable-tak is versie 8.6.9 verschenen en in de ontwikkeltak is dat versie 8.7.6. HTCondor richt zich op het beheer van rekenintensieve taken en kan deze over verschillende aangesloten nodes verdelen. De gebruiker stuurt zijn taak naar HTCondor, waarna dit het proces afhandelt op basis van ingestelde policies en de beschikbaarheid van aangesloten resources, om tot slot de resultaten naar de gebruiker terug te sturen. HTCondor kan bijvoorbeeld een dedicated Beowulf-cluster aansturen, maar ook gewone desktops die even niets te doen hebben. Tijdens SC16 hebben Google, Fermilab en het HTCondor Team een 160k-core cloud-based elastic compute cluster gedemonstreerd. De lijst met veranderingen van deze uitgaves ziet er als volgt uit:
- Changed the default value of configuration parameter IS_OWNER to False. The previous default value is now set as part of the use POLICY : Desktop configuration template. (Ticket #6463).
- You may now use SCHEDD and JOB instead of MY and TARGET in SUBMIT_REQUIREMENTS expressions. (Ticket #4818).
- Added cmake build option WANT_PYTHON_WHEELS and make target pypi_staging to build the framework for Python wheels. This option and target are not enabled by default and are not likely to work outside of Linux environments with a single Python installation. (Ticket #6486).
- Added new job attributes BatchProject and BatchRuntime for grid-type batch jobs. They specify the project/allocation name and maximum runtime in seconds for the job that's submited to the underlying batch system. (Ticket #6451).
- HTCondor now respects ATTR_JOB_SUCCESS_EXIT_CODE when sending job notifications. (Ticket #6432).
- Added some graph metrics (height, width, etc.) to DAGMan's metrics file output. (Ticket #6470).
- Removed Quill from HTCondor codebase. (Ticket #6496).
- HTCondor now reports all submit warnings, not just the first one. (Ticket #6446).
- The job log will no longer contain empty submit warnings. (Ticket #6465).
- DAGMan previously connected to condor_schedd every time it detected an update in its internal state. This is too aggressive for rapidly changing DAGs, so we've changed the connection to happen in time intervals defined by DAGMAN_QUEUE_UPDATE_INTERVAL, by default once every five minutes. (Ticket #6464).
- DAGMan now enforces the DAGMAN_MAX_JOB_HOLDS limit by the number of held jobs in a cluster at the same time. Previously it counted all holds over the lifetime of a cluster, even if only a small number of them are active at the same time. (Ticket #6492).
- Fixed a bug where on rare occasions the ShadowLog would become owned by root. (Ticket #6485).
- Fixed a bug where using condor_qedit to change any of the concurrency limits of a job would have no effect. (Ticket #6448).
- When copy_to_spool is set to True, condor_submit now attempts to transfer the job exectuable only once per job cluster, instead of once per job. (Ticket #6459).
- Fixed a bug that could result in an incorrect total reported by condor_rm when the -totals option is used. (Ticket #6450).
- When a daemon crashes, more information about the cause is now written to its log file. (Ticket #6483).
- Fixed a bug in the group quotas that would give too much surplus quota to some groups when ACCEPT_SURPLUS is on and NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION is true (the default) (Ticket #6514).
- Fixed a bug in the Python bindings when doing queries that specify a projection with the ``attr_list'' argument. The bug could could potentially result in memory corruption of the python interpreter process. (Ticket #6468).
- Reduced the amount of time that condor_preen will block the condor_schedd. condor_preen now connects only when specifically needed, and automatically disconnects after PREEN_MAX_SCHEDD_CONNECTION_TIME seconds. (Ticket #6490).
- Fixed a bug on Windows that would often result in the job sandbox on the execute node not being deleted when the condor_schedd relinquished its claim on the slot before the condor_starter had exited. (Ticket #6497).
- Fixed a bug where the condor_master stopped sending watchdog notifications to systemd after restarting itself. This resulted in systemd killing the condor_master shortly after the restart. (Ticket #6476).
- Updated the systemd configuration to only restart HTCondor upon failure. Otherwise, systemd would restart HTCondor if condor_off requested the condor_master to exit. (Ticket #6503).
- Fixed a bug with the use of the scheduler parameter MAX_JOBS_SUBMITTED. If this limit was ever reached by a submit with more than one proc in the cluster, the limit would be reduced by the difference until the condor_schedd was restarted. (Ticket #6460).
- Fixed a bug that caused very large RequestDisk requests to fail, and cause the Disk attribute in the machine ad to go negative. (Ticket #6467).
- Fixed a bug with the RESERVED_DISK parameter that would not accept an argument larger than 2 Gigabytes. (Ticket #6472).
- Improved validation of the lengths of messages in PASSWORD and SSL authentication methods. (Ticket #6493).
- Fixed a problem where the VM universe would be taken offline on the execute node, if the qcow2 disk image was corrupt. The offending job is now put on hold with an appropriate hold message. (Ticket #6505).
- Fixed a problem which would prevent Java universe jobs from working when using a relative path name to a jar file and submitting from Linux to Windows or vice versa. (Ticket #6474).
- Fixed a bug on 32 bit Linux systems that caused the starter to crash on startup if cgroup limits were enabled. (Ticket #6501).
- Fixed a bug in Startd Cron (see 4.4.3) where, in effect, SlotMergeConstraint was ignored. (Ticket #6488).
- Fixed a bug when IPv6 is enabled which could cause the condor_startd to crash when spawning a starter. (Ticket #6462).
- Fixed a bug in condor_q which could cause the DONE amount to be incorrect when multiple clusters shared a batch name. (Ticket #6469).
- Fixed issue on newer versions of Linux where core files generated by a daemon were not usable by gdb. A side effect of this fix is that the configuration parameter CORE_FILE_NAME no longer has any effect on Linux. (Ticket #6482).
- condor_chirp will now no longer abort when given a command that it cannot successfully execute, such as fetching a file that does not exist. (Ticket #6402).
- Removed unneeded copy_to_spool statement from default interactive submit file. (Ticket #6315).