Message boards : Number crunching : Setting for maximum HD space
Author | Message |
---|---|
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=18872518 Swap space 1692.22 MB Total disk space 29.29 GB Free Disk Space 8.53 GB --- Use no more than 10 GB disk space Leave at least 0.01 GB disk space free Use no more than 50% of total disk space Write to disk at most every 60 seconds Use no more than 75% of total virtual memory ---- Generally, when I look at the memory usage on the machine itself, Rosetta is only claiming to use up around 20 megs. None of the partitions have less than 8 gigs free space - so did that WU really eat up the 8.52 gigs of HD space on the C: partition before erroring out? ---- 5/2/2006 5:40:48 PM|rosetta@home|Aborting result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0: exceeded disk limit: 100308693.000000 > 100000000.000000 5/2/2006 5:40:48 PM|rosetta@home|Unrecoverable error for result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0 (Maximum disk usage exceeded) From the message log, I see that it's whining about going over 100 megs. Where did it get this value from, since I can't see that representing the settings I've chosen for Boinc&Rosetta. ____________ Is this a change in the way Rosetta handles HD space, or have I just set the Boinc settings wrong for allowing Rosetta to use all but 10 megs of my free hard drive space? I just don't see how 8.6gigs (what I thought I'd set it up to use at the max) and 100megs (what Rosetta picked as the max HD space to use) are equal. After all.. it's eating up more Ram than 100 Megs. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Is this a change in the way Rosetta handles HD space, or have I just set the Boinc settings wrong for allowing Rosetta to use all but 10 megs of my free hard drive space? I only have 2c to offer today. But I would just point out that the BOINC controls you have are for all projects. And then these are further divided based on the resource share. So, if R@H were 75% resource share, you're maximum would only be 3/4 of your preference. ...still doesn't explain the 100million bytes number listed. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
I've only signed up for Rosetta - and have it set for 100%. 50% of 100% of 10 Gigs should be 5 Gigs, not 100 megs. Is the memory usage code that was just released looking at HD space, or Ram space? i.e. is this a bug in which the client is looking at the wrong type of memory space used? |
mnb Send message Joined: 15 Dec 05 Posts: 51 Credit: 69,458 RAC: 0 |
I had the same problem with this result: 19321664 07/05/2006 20:37:09|rosetta@home|Aborting result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_472_6802_0: exceeded disk limit: 101466227.000000 > 100000000.000000 07/05/2006 20:37:09|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_472_6802_0 (Maximum disk usage exceeded) 06/05/2006 21:47:34||Memory: 1023.48 MB physical, 1.90 GB virtual 06/05/2006 21:47:34||Disk: 8.79 GB total, 3.68 GB free ----- Use no more than 1 GB disk space Leave at least 1 GB disk space free Use no more than 50% of total disk space Write to disk at most every 120 seconds Use no more than 50% of total virtual memory running Rosetta and SIMAP currently. list of my results |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Perhaps the disk space required for the more frequent checkpointing is taking more than expected in some cases?? I've added a post on Ralph asking them to have a look at this. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
5/2/2006 5:40:48 PM|rosetta@home|Aborting result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0: exceeded disk limit: 100308693.000000 > 100000000.000000 I have some similar WUs that errored out complaining about 100MB of HD being exceeded: One with 5.07: https://boinc.bakerlab.org/rosetta/result.php?resultid=19606413 One with 5.12: https://boinc.bakerlab.org/rosetta/result.php?resultid=19714283 There was plenty of HD space, and BOINC is set to use up to 100 Giga-Bytes. I have no idea where the 100 MByte limit came from. I have seen the stdout.txt file grow to tens of MBytes, although I wasn't watching while these two WUs were crunching. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
If that's the source of the error message, it would be nice for the error to state which file is greater than 100 megs, (wait.. 100,000,000 bytes is not 100 megs!) |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
I think there is a limit on the size of your error file. I would have to look this up, but 100MB sounds right. Well, stderr.txt was only a few lines (you can see it listed in the result links). I guess the 100,000,000 Byte limit applies to stdout.txt as well. I looked at the .xml files and it looks like the limit is specified by the WU itself. There seems to be an "rsc_disk_bound" value that's set to 100000000. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
If that's the source of the error message, it would be nice for the error to state which file is greater than 100 megs, (wait.. 100,000,000 bytes is not 100 megs!) I believe the WU limit is for, well, the WU. So no one file throws it over the limit. It is all of them collectively. And I was thinking perhaps the checkpoint files count as well, and since that's the new player here, that was why I brought it up. Perhaps the checkpoint data is what's throwing it over the size limit. AMD: did the stdout look like good info? Or more like a loop of repeating messages? Or, actually, back to the project folks, if these results are meaningful and it was just a large number of models produced or a large protein, then perhaps there are cases where the disk space limit needs to be increased. If the results were caused by some sort of loop, then obviously a fix is needed. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
AMD: did the stdout look like good info? Or more like a loop of repeating messages? It turns out I still have a window open where I did some file listings, and scrolling back I see the stdout.txt file for one of those WUs at 70MB. So it was definitely the stdout.txt file that caused the problem. Unfortunetly, I didn't notice at the time and the file is long gone. I did take a snapshot of some large stdout.txt files. One was 20MB at 5 hours of crunching. (I use a 10 hour crunch time.) It contained mostly: ... set_omega:: move not allowed: 81 set_phi:: move not allowed: 81 set_psi:: move not allowed: 81 set_omega:: move not allowed: 81 set_phi:: move not allowed: 81 set_psi:: move not allowed: 81 set_omega:: move not allowed: 81 set_phi:: move not allowed: 81 set_psi:: move not allowed: 81 set_omega:: move not allowed: 81 set_phi:: move not allowed: 81 set_psi:: move not allowed: 81 ... And so on. Some other stdout.txt files, around 8MB in size, contained stuff like: ... Searching for dat file: ./1tul.dat Searching for dat file: ./1tul.dat WARNING!! .dat file not found! Looking for fasta file: ./1tul_.fasta [T/F OPT]Default FALSE value for [-find_disulf] [T/F OPT]Default FALSE value for [-fix_disulf] [T/F OPT]New TRUE value for [-n] [T/F OPT]New TRUE value for [-n] [STR OPT]New value for [-n] 1tul.pdb. [T/F OPT]Default FALSE value for [-use_native_centroid] WARNING:: end of pdb file reached: angle, secstruct, & res info not found Looking for dssp file: 1tul.dssp dssp file not found calculating secondary structure from torsion angles fragment file: ./aa1tul_03_05.200_v1_3.gz Total Residue 102 frag size: 3 frags/residue: 200 fragment file: ./aa1tul_09_05.200_v1_3.gz Total Residue 102 frag size: 9 frags/residue: 200 generating 1mer library from 3mer library [T/F OPT]Default FALSE value for [-ssblocks] [T/F OPT]Default FALSE value for [-check_homs] [T/F OPT]New TRUE value for [-barcode_mode] [INT OPT]New value for [-barcode_mode] 3 [T/F OPT]Default FALSE value for [-increment_barcode] [T/F OPT]New TRUE value for [-barcode_file] [STR OPT]New value for [-barcode_file] allbarcodes09.bar. Feature: PERMUTE Flavor 0.0394769 Flavor 6e-05 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 77 fval1= 84 fval2= 18 fval3= 25 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 18 fval1= 25 fval2= 36 fval3= 43 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 36 fval1= 43 fval2= 63 fval3= 72 Flavor 6e-05 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 77 fval1= 84 fval2= 18 fval3= 25 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 18 fval1= 25 fval2= 36 fval3= 43 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 36 fval1= 43 fval2= 63 fval3= 72 Flavor 6e-05 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 77 fval1= 84 fval2= 18 fval3= 25 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 18 fval1= 25 fval2= 36 fval3= 43 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 36 fval1= 43 fval2= 63 fval3= 72 Flavor 6e-05 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 77 fval1= 84 fval2= 18 fval3= 25 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 18 fval1= 25 fval2= 36 fval3= 43 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 36 fval1= 43 fval2= 63 fval3= 72 Flavor 6e-05 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 77 fval1= 84 fval2= 18 fval3= 25 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 18 fval1= 25 fval2= 36 fval3= 43 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 36 fval1= 43 fval2= 63 fval3= 72 Flavor 6e-05 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 77 fval1= 84 fval2= 18 fval3= 25 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 18 fval1= 25 fval2= 36 fval3= 43 barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 36 fval1= 43 fval2= 63 fval3= 72 Flavor 6e-05 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 56 fval1= 61 fval2= 5 fval3= 13 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 5 fval1= 13 fval2= 94 fval3= 102 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 94 fval1= 102 fval2= 77 fval3= 84 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 77 fval1= 84 fval2= 18 fval3= 25 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 18 fval1= 25 fval2= 36 fval3= 43 barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 36 fval1= 43 fval2= 63 fval3= 72 ... And so on, with those 7 lines repeated with minor variations. Those WUs completed, though. |
Message boards :
Number crunching :
Setting for maximum HD space
©2025 University of Washington
https://www.bakerlab.org