stephenbrooks.orgForumMuon1GeneralAny volunteers?
Username: Password:
Search site:
Subscribe to thread via RSS
Stephen Brooks
2005-08-11 07:03:14
I'd like some more effort on PhaseRotC_bigS1 - does anyone want to make their client do just that simulations for a while - at least until it gets positive?

You switch off auto-lattice-update (set to 0) in config.txt and then delete everything except the PhaseRotC_bigS1.txt lattice file in your lattices subdirectory.

The lack of CPU on bigS1 is due to two things: (a) it is negative so simulations have few particles, meaning fewer CPU on average even if the same number of simulations are run; and (b) a certain high-scoring person who shall remain unnamed has decided to concentrate only on PhaseRotC for the time being Big Grin
2005-08-11 07:23:56
I'll purge outstanding results and force work on bigS1.
Stephen Brooks
2005-08-11 07:53:11
Sorry - it's just odd how big a difference it seems to make at the moment.  With you focussed on PhaseRotC only, it totally dominates and everyone else is spread between the four...
2005-08-11 07:55:32
Not a big deal... It gave me a chance to test out a script for controlling/reconfiguring all of the clients at once Wink Seems to have gone smoothly so far Big Grin
Maniacken [US-Distributed]
2005-08-11 19:52:43
Thats all that i have been doing with one machine.

Aw come on Z. I was having fun leading for a while.  Guess i might need to realocate a few machines to try to keep up with you.
2005-08-12 00:07:00
i've put 1 machine on it
2005-08-12 03:23:50
Yes,allright,ive put a dual amd on it,lets see where we,ll get.  Wink
2005-08-12 03:38:07
I,ll get an error.
FATAL no lattices found compatible with current Muon1 version-please upgrade.
I,m running 4.41f
2005-08-12 03:59:40
Ok will commit around 5 machines to doing it.
Stephen Brooks
2005-08-12 09:40:22
Thanks everyone.  As for the "v4.41f" error, yes, you do need to have 4.42+ installed to run the "C" lattices, as these use the new 10GeV pion file.

It seems to be working, incidentally - the score that was in the doldrums around -0.14 when I started this thread is now approaching -0.10!
2005-08-18 13:19:02
Is this one over?
The new client hasn't this lattice anymore.  The one client I donated exclusively to this, is complaining about new update available all the time.
2005-08-19 08:34:29
phaserotc_bigs1 is a long ways from over.  have you got the auto-updating lattice files turned on herb, and have you got 4.42c (which came out the other day)
2005-08-20 00:37:45
Yes I know, but it happened already that a lattice was faulty and needed to be replaced.  I was wondering that exactly just this client, where I turned lattic-update off was nagging with a new version message, while alll others didn't.

The lattice files coming along with both last zipped versions didn't come along with the PhaseRotC_bigS1 lattice, so I wondered if it was pulled, because with lattice update turned off you wouldn't notice that.
2005-08-23 20:14:46
Z just took us positive on this simulation.  Wahooooo.
2005-08-24 07:44:12
Originally posted by Herb[Romulus2]:
The lattice files coming along with both last zipped versions didn't come along with the PhaseRotC_bigS1 lattice, so I wondered if it was pulled, because with lattice update turned off you wouldn't notice that. 

Would the 'last zipped versions' happen to be the new 'update only' ones?  They only contain the files that have changed, and the lattices haven't.
Stephen Brooks
2005-08-24 08:10:39
Uh-oh.  [TA]z's stats just took a hammering.  I was beginning to wonder if the stats bug was restricted to PhaseRotB but it appears it affects any single large file.  Fortunately this rules out "integer/memory overflow" as an explanation.

AHA!  I've just found [TA]z's original file preserved in a "catchment area" I set up in cases of files that get shrunken to less than 40% of their original size after duplicate checking.  I'll put that one back in to see if it can be restored (and also search for any others who can have their points replaced from the glitch).  Will also change the threshhold from "less than 40% of original size" to "losing more than 1MB".
2005-08-24 08:49:57
If it affects any large file, that could be a real problem for those of us that run large off-line farms Frown
Stephen Brooks
2005-08-24 08:59:36
Large files here means your cumulative total results as stored in the database.  As far as I know, the size of the individual dumps doesn't matter.

Anyway I might now have the tools to figure out what is going wrong - [TA]z's file getting caught means I can see if it's something in the file that's wrong or a transient state in the rest of it.
2005-08-24 09:12:51
I've been dismembered and spat upon!

Honestly though, thanks for looking into it Stephen Wink
Stephen Brooks
2005-08-24 10:11:53
Sit tight... stats update may take a while (considering it's your 120MB+ file it's having to recount).

[update] Now it appears it's re-removed the same stuff from your file again.  REPEATABILITY!  For once... OK now tomorrow morning I guess I'm going to have to make this my priority.
2005-08-24 13:32:45
I think that would be wise.  Stats are our only payment and when they go wrong (or are perceived to have gone wrong) people leave the project and move on.  Hope you find it soon and thanks for looking into it in the first place.
Stephen Brooks
2005-08-25 02:52:15
OK, seems the problem is that the files sometimes become corrupted: the [TA]z one is fine for the first 30MB or so and then for a long while repeats the same 1000000 bytes of results over and over again.  This corresponds to a place in the stats generator where I copy a new file over an old one and I do it in chunks of a million bytes because loading the whole thing into RAM would cause such load as to possibly make the machine unstable.  So I've got to see if I can do the copy some other way or perhaps put error-checking in there.

So the duplicates-removal is actually the symptom, not the cause.  It's the file-copy operation that's overwriting good files with repeated stuff.

The offending code - for those of you who know C:
int p=0,l; unsigned char *buf=(unsigned char *)malloc(Mini(bpos,1000000));
while (p<bpos)
printf("\rCopying unchanged file... %d of %dmB",p/1000000,bpos/1000000);
fread(buf,1,l,in); fwrite(buf,1,l,out);
printf("\rCopying unchanged file... Done"); clreol();

It copies the first bpos bytes of FILE *in into FILE *out; bpos can be large ~100's of millions (I already check that bpos>0 and in and out are valid streams and opened properly before this).  I notice there's no error checking on the fread and fwrite, though I'm not sure quite what to do if it fails.
Stephen Brooks
2005-08-25 03:14:38
OK, I've made the stats run terminate if there is a fread failure and log it to a file.  Since a lot of runs occur without these glitches, hopefully we'll just get the occasional missed run until I can figure out how I should retry reading the file (and why I can't read it).
2005-08-25 04:29:20
Will you be able to recover any lost points?  I know there are a few who've lost a lot of work!
2005-08-25 06:58:23
That's GREAT news Stephen!
I hope you can find my lost points!  Big Grin
2005-08-11 -15,398,948.3 31,276,286.3
2005-08-10 66,986.8 46,675,234.6
: contact : - - -
E-mail: sbstrudel characterstephenbrooks.orgTwitter: stephenjbrooksMastodon: strudel charactersjbstrudel RSS feed

Site has had 22020981 accesses.