bh0200xx_MonomerDesign2019_ units failing

Message boards : Number crunching : bh0200xx_MonomerDesign2019_ units failing

To post messages, you must log in.

AuthorMessage
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 1
Message 91742 - Posted: 19 Feb 2020, 8:34:57 UTC

They crunch during 12 hours and most of them fail with signal 11, some few manage to complete and validate (getting 20 credits for 12 hours compute time)

Unit failed:
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_i686-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol template.xml -corrections::beta_nov16 -out:prefix bh020073 @bh020073.flags -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip bh020073.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3946742
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
BOINC:: CPU time: 43708.9s, 14400s + 28800s[2020- 2-19 6: 8:53:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 43708.9 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
06:08:53 (4877): called boinc_finish(0)

Unit validated
</stderr_txt>
]]>

<core_client_version>7.14.1</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol template.xml -corrections::beta_nov16 -out:prefix bh020019 @bh020019.flags -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip bh020019.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3945123
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
BOINC:: CPU time: 43765.8s, 14400s + 28800s[2020- 2-19 9:20: 2:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 43765.8 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
09:20:02 (4095): called boinc_finish(0)

</stderr_txt>
]]>
ID: 91742 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 8,387
Message 91748 - Posted: 19 Feb 2020, 14:01:25 UTC - in response to Message 91742.  

WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)


I hope you have NO hope that someone will answer you.
ID: 91748 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 91749 - Posted: 19 Feb 2020, 14:19:52 UTC - in response to Message 91742.  

Your preferred runtime is shown as 8hrs. The watchdog kicks in after the preferred runtime is exceeded by more than 4 hours, and it cuts off the work unit, and reports the details back to the project. Hence the total 12 hours of runtime before the work unit is ended.

A link to the host and result, or the full name of the failing WU would be helpful.
Rosetta Moderator: Mod.Sense
ID: 91749 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 1
Message 91752 - Posted: 19 Feb 2020, 15:13:35 UTC

The units that failed mostly crunched with application Rosetta v4.07 i686-pc-linux-gnu

The units that completed but just received 20 points mostly used application Rosetta v4.08 x86_64-pc-linux-gnu

Some examples out of the over 60 units of this type crunched by my hosts. Hope it helps

https://boinc.bakerlab.org/rosetta/result.php?resultid=1122875789
https://boinc.bakerlab.org/rosetta/result.php?resultid=1122890588
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011402918
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011384104
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011413743
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011391746
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011390304
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011411592
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011385693
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011386374
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011344088
https://boinc.bakerlab.org/rosetta/result.php?resultid=1122877802
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011408682
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011387452
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011386983
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011396733
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011397432
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011391280
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1011411960
ID: 91752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : bh0200xx_MonomerDesign2019_ units failing



©2024 University of Washington
https://www.bakerlab.org