Bug in Distributed.net client: OGR-24 moet opnieuw

Er blijkt weer eens een bugje in de Distributed.net client te zijn gevonden: de OGR-24 client berekent afhankelijk van het platform verschillende resultaten. Dit kan uiteraard niet de bedoeling zijn, D.net heeft daarom besloten om het complete OGR-24 contest opnieuw te laten beginnen:

Most of you should know by now that for the past week, we have been working on the OGR-24 project, the search for the 24 mark Optimal Golomb Ruler. So far, we have completed approximately 24% of the total stubs, mostly smaller ones. Unfortunately, we have noticed a problem while analyzing returned data in the master log files. Some clients have been returning inordinately large node counts for completed stubs. Once we reran the stubs in question for confirmation, we found that the node count corruption indicated an endianness problem, where platforms like SPARC order groups of bytes in memory differently from platforms like Intel x86.
On Tuesday evening, we were able to pinpoint the missing code and operational circumstances that caused this problem. There was a missing ntohl() call in the buffer handling code. Most of the time this caused no problems, but when sharing buffer files between certain combinations of platforms, the count of how many nodes had been completed would be corrupted. Unfortunately, due to the circumstances necessary for this bug to manifest itself, our testing did not identify it.
The biggest problem with this bug is the difficulty of detecting it. The clients return the number of nodes done in each stub to the master. If the number of nodes actually done is greater than 2^32 (about 4 billion), then the bug is detectable due to the way the bytes get swapped (an impossibly high node count is returned). However, if the number of nodes is less than 2^32, it is not guaranteed that we will always be able to identify bad return values.
In light of the above, we have decided to suspend the OGR-24 project. Clients that are currently working on OGR will revert back to RC5 the next time they connect to the network; you can force a connection by shutting the client down, deleting the OGR buffers, and restarting the client.
We will need to build new clients to address this bug and to allow us to discard results from bad clients. As a consequence, we will have the opportunity to improve some other aspects of client operation. In particular, we plan to add more configurable checkpointing and a better display of progress.
Having built new clients and added code to discard faulty stubs, we will restart OGR-24 from the beginning, this time with much smaller 5-stubs instead of the 4-stubs used in the initial run. All is not lost from this past run; we will use its results as an additional verification of future work.

Thanks Tommie voor de tip.

Lees meer

IT-banen

Reacties (11)

Verwijderd 24 februari 2000 13:46

Ja, dikke stierenstr*nt is dit zeg, en pas ook al zo'n vette bug in de CSC... die jongens maken er wel een puinhoop van zeg!

Heb ik weer voor joker dagenlang al m'n pc's aan laten staan!

Inderdaad begin ik ook steeds banger te worden, dat er straks na 8 jaar rekenen ook een foutje in de RC5 blijkt te zitten...

't lijkt wel een IT-bedrijf!

Zijn er nog andere toko's die ook dit soort dingen doen?
Geen SETI a.u.b., dat is wel het meest stompzinnige dat ik kan bedenken.
Nee, ergens nog een club die gewoon real-life problemen met brute kracht probeert op te lossen?
Een partijtje schaak tegen IBM ofzo? Of een service die vergeten passwordjes van zips e.d. weer recovered?

Verwijderd 24 februari 2000 15:00

Eeehh, Bunny, ik heb het idee dat je niet helemaal weet wat erbij komt kijken om zo'n project op te zetten, maar dat geeft niet, iedereen maakt wel eens domme opmerkingen..

Het probleem met OGR is dat de structuur van het OGR project, en de wijze van benaderen, heel anders is dan de afgelopen projecten (rc5-56, DES I/II/III, rc5-64), en dat het heel moeilijk is om de stats goed weer te geven.
Let wel, de zgn. 'Stubs' worden allemaal wel goed uitgerekend en teruggestuurd, alleen de wijze waarop de statistics worden uitgerekend klopt niet helemaal, en dat schijnt een heel moeilijk iets te zijn om dat op een representatieve manier te doen.

Verwijderd 24 februari 2000 13:32

Uuh, als ik het begrijp gaat dit toch over die

??? betekent dat nou dat die hele zooi opnieuw gaat beginnen???

/me Razorblade

Verwijderd 24 februari 2000 13:34

Eh-oh...en over een jaar komen ze er achter dat zo'n zelfde fout ook in de RC5-cruncher zit

Verwijderd 24 februari 2000 13:35

Jij hebt het over het RC5 project, dat is weer wat anders.
OGR-24 (ik meen pas net begonnen) moet overnieuw. Oftwel: gewoon je koetje laten grazen aan RC-5!

Verwijderd 24 februari 2000 14:00

Hummpie. /me Razorblade
iedereen trevreee????

Gman 24 februari 2000 14:11

Goed id Razor

Ik heb 3 OGR w-units opgestuurd...daar doe je vet lang over!!! met standaard packed size dan...
En Dual 450 lijkt me niet echt sloom.

Verwijderd 24 februari 2000 14:55

Ik kwam een tijdje geleden op een site waar ze computerkracht nodig hadden om berekeningen uit te voeren over opslagplaatsen van kernafval. Van containers met afval werden dan een speciaal soort foto gemaakt om te kijken hoeveel straling de containers doorlieten, en die foto's hadden een hoop processorkracht nodig.

Leuk,
MAAR....ik ben de URL kwijt!
Help!

Verwijderd 24 februari 2000 15:55

Nee, ik weet inderdaad niet wat er allemaal bij zo'n project komt kijken, maar er hebben wel weer meer dan 200.000 pc's weken voor joker gerekend.

En dat zuigt zwaar

Mag dan misschien moeilijk zijn, ze moeten gewoon nog beter nadenken en testen voordat ze weer een nieuwe client de wereld in schoppen.

hessel 24 februari 2000 16:33

* hessel hessel

Op dit item kan niet meer gereageerd worden.

Lees meer

IT-banen

Reacties (11)

Sorteer op:

Weergave: