Er blijkt weer eens een bugje in de Distributed.net client te zijn gevonden: de OGR-24 client berekent afhankelijk van het platform verschillende resultaten. Dit kan uiteraard niet de bedoeling zijn, D.net heeft daarom besloten om het complete OGR-24 contest opnieuw te laten beginnen:
Most of you should know by now that for the past week, we have been working on the OGR-24 project, the search for the 24 mark Optimal Golomb Ruler. So far, we have completed approximately 24% of the total stubs, mostly smaller ones. Unfortunately, we have noticed a problem while analyzing returned data in the master log files. Some clients have been returning inordinately large node counts for completed stubs. Once we reran the stubs in question for confirmation, we found that the node count corruption indicated an endianness problem, where platforms like SPARC order groups of bytes in memory differently from platforms like Intel x86.On Tuesday evening, we were able to pinpoint the missing code and operational circumstances that caused this problem. There was a missing ntohl() call in the buffer handling code. Most of the time this caused no problems, but when sharing buffer files between certain combinations of platforms, the count of how many nodes had been completed would be corrupted. Unfortunately, due to the circumstances necessary for this bug to manifest itself, our testing did not identify it.
The biggest problem with this bug is the difficulty of detecting it. The clients return the number of nodes done in each stub to the master. If the number of nodes actually done is greater than 2^32 (about 4 billion), then the bug is detectable due to the way the bytes get swapped (an impossibly high node count is returned). However, if the number of nodes is less than 2^32, it is not guaranteed that we will always be able to identify bad return values.
In light of the above, we have decided to suspend the OGR-24 project. Clients that are currently working on OGR will revert back to RC5 the next time they connect to the network; you can force a connection by shutting the client down, deleting the OGR buffers, and restarting the client.
We will need to build new clients to address this bug and to allow us to discard results from bad clients. As a consequence, we will have the opportunity to improve some other aspects of client operation. In particular, we plan to add more configurable checkpointing and a better display of progress.
Having built new clients and added code to discard faulty stubs, we will restart OGR-24 from the beginning, this time with much smaller 5-stubs instead of the 4-stubs used in the initial run. All is not lost from this past run; we will use its results as an additional verification of future work.
Thanks Tommie voor de tip.