Message boards : Number crunching : bh0200xx_MonomerDesign2019_ units failing
Author | Message |
---|---|
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 1 |
They crunch during 12 hours and most of them fail with signal 11, some few manage to complete and validate (getting 20 credits for 12 hours compute time) Unit failed: <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process got signal 11</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_i686-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol template.xml -corrections::beta_nov16 -out:prefix bh020073 @bh020073.flags -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip bh020073.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3946742 Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. BOINC:: CPU time: 43708.9s, 14400s + 28800s[2020- 2-19 6: 8:53:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 43708.9 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== 06:08:53 (4877): called boinc_finish(0) Unit validated </stderr_txt> ]]> <core_client_version>7.14.1</core_client_version> <![CDATA[ <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol template.xml -corrections::beta_nov16 -out:prefix bh020019 @bh020019.flags -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip bh020019.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3945123 Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. BOINC:: CPU time: 43765.8s, 14400s + 28800s[2020- 2-19 9:20: 2:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 43765.8 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== 09:20:02 (4095): called boinc_finish(0) </stderr_txt> ]]> |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 8,387 |
WARNING! cannot get file size for default.out.gz: could not open file. I hope you have NO hope that someone will answer you. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Your preferred runtime is shown as 8hrs. The watchdog kicks in after the preferred runtime is exceeded by more than 4 hours, and it cuts off the work unit, and reports the details back to the project. Hence the total 12 hours of runtime before the work unit is ended. A link to the host and result, or the full name of the failing WU would be helpful. Rosetta Moderator: Mod.Sense |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 1 |
The units that failed mostly crunched with application Rosetta v4.07 i686-pc-linux-gnu The units that completed but just received 20 points mostly used application Rosetta v4.08 x86_64-pc-linux-gnu Some examples out of the over 60 units of this type crunched by my hosts. Hope it helps https://boinc.bakerlab.org/rosetta/result.php?resultid=1122875789 https://boinc.bakerlab.org/rosetta/result.php?resultid=1122890588 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011402918 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011384104 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011413743 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011391746 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011390304 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011411592 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011385693 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011386374 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011344088 https://boinc.bakerlab.org/rosetta/result.php?resultid=1122877802 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011408682 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011387452 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011386983 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011396733 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011397432 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011391280 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011411960 |
Message boards :
Number crunching :
bh0200xx_MonomerDesign2019_ units failing
©2024 University of Washington
https://www.bakerlab.org