Rosetta 4.1+ and 4.2+

Message boards : Number crunching : Rosetta 4.1+ and 4.2+

To post messages, you must log in.

Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 34 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 8,784
Message 100328 - Posted: 7 Jan 2021, 10:09:42 UTC - in response to Message 100310.  

All MOF_ wus:
1316797097

<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @MOF_P4132_12res_testasym_c.33.6_0001_P_41_3_2_hit_GLU_GLU_1_4_2634_cell036_ncontact09_score-10.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3306046
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00000250198BEF70

Glad you said that. I've been tweaking my new PC very slowly, had numerous errors at one tweak level, dialled it back and seem to have found a sweet spot, apart from two task errors, but both are MOF and show the same error as you

MOF_I213_12res_testasym_c.82.1_0001_I_21_3_hit_ASP_ASP_2_3_46_cell035_ncontact25_score-37_SAVE_ALL_OUT_1056212_310_0

MOF_P4132_12res_testasym_c.10.2_0001_P_41_3_2_hit_ASP_ASP_1_3_153_cell037_ncontact12_score000_SAVE_ALL_OUT_1055733_167_0
ID: 100328 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 8,784
Message 100329 - Posted: 7 Jan 2021, 11:29:42 UTC - in response to Message 100328.  

All MOF_ wus:
1316797097
Reason: Access Violation (0xc0000005) at address 0x00000250198BEF70

Glad you said that. I've been tweaking my new PC very slowly, had numerous errors at one tweak level, dialled it back and seem to have found a sweet spot, apart from two task errors, but both are MOF and show the same error as you

MOF_I213_12res_testasym_c.82.1_0001_I_21_3_hit_ASP_ASP_2_3_46_cell035_ncontact25_score-37_SAVE_ALL_OUT_1056212_310_0

MOF_P4132_12res_testasym_c.10.2_0001_P_41_3_2_hit_ASP_ASP_1_3_153_cell037_ncontact12_score000_SAVE_ALL_OUT_1055733_167_0

Clarification: it's not all MOF tasks for me. I have two going through now that are checkpointing fine and currently over 5hrs through out of 8
ID: 100329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,919,932
RAC: 2,098
Message 100338 - Posted: 8 Jan 2021, 10:33:43 UTC

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1179951356

my computer have only 4GB of Ram, and the Mof Wu errored in few seconds ..

the Wingman's computer have 128GB of ram and the Mof WU ends well ^^
Type de CPU AuthenticAMD
AMD EPYC 7452 32-Core Processor [Family 23 Model 49 Stepping 0]
Mémoire 128766.04 Mo




Nom MOF_I213_12res_testasym_c.5.1_0001_I_21_3_hit_GLU_GLU_1_4_405_cell032_ncontact09_score-31_SAVE_ALL_OUT_1056124_380_1
Unité de travail (WU) 1179951356
Créé 7 Jan 2021, 12:24:44 UTC
Envoyé 7 Jan 2021, 12:32:31 UTC
Date limite de rapport 10 Jan 2021, 12:32:31 UTC
Reçu 8 Jan 2021, 6:35:42 UTC
État du serveur Sur
Résultats Succès
État du client Fait
État à la sortie 0 (0x00000000)
ID de l'ordinateur 4285019
Temps de fonctionnement 4 heures 0 min 31 sec
Temps de CPU 3 heures 59 min 47 sec
Valider l'état Valide
Crédit 124.61
FLOPS maximum de l'appareil 4.47 GFLOPS
Version de l'application Rosetta v4.20
x86_64-pc-linux-gnu
Peak working set size 324.82 MB
Peak swap size 387.00 MB
Peak disk usage 24.78 MB


i don't beleive of that:
Peak working set size 324.82 MB
Peak swap size 387.00 MB
Peak disk usage 24.78 MB


and then another anomalies ...


Rosetta 4.20 i686-pc-linux-gnu
Nombre de tâches terminées 562
Nombre maximal de tâches par jour 552
Nombre de tâches aujourd'hui 0
Nombre de tâches valides consécutives 3
Taux de calcul moyen 2.78 GFLOPS
Temps de cycle moyen 3.08 days
Rosetta 4.20 x86_64-pc-linux-gnu
Nombre de tâches terminées 58602
Nombre maximal de tâches par jour 1126
Nombre de tâches aujourd'hui 158
Nombre de tâches valides consécutives 626
Taux de calcul moyen 2.78 GFLOPS
Temps de cycle moyen 0.78 days


for an AMD EPYC 7452


Rosetta 4.20 windows_x86_64
Nombre de tâches terminées 35
Nombre maximal de tâches par jour 503
Nombre de tâches aujourd'hui 5
Nombre de tâches valides consécutives 4
Taux de calcul moyen 2.78 GFLOPS
Temps de cycle moyen 0.73 days
Rosetta 4.21 windows_intelx86
Nombre de tâches terminées 298
Nombre maximal de tâches par jour 512
Nombre de tâches aujourd'hui 0
Nombre de tâches valides consécutives 12
Taux de calcul moyen 2.78 GFLOPS
Temps de cycle moyen 1.26 days


for a Core 2 Duo E7600

same GFLOPS ... curious ...
ID: 100338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100341 - Posted: 8 Jan 2021, 14:13:03 UTC - in response to Message 100338.  

same GFLOPS ... curious ...
The 2.78 GFLOPS is the nominal rate at which the application performs work, not a measure of the performance of any individual machine running it. It’s the mechanism by which Rosetta fits its ‘fixed duration / variable work’ approach into BOINC’s original expectation of ‘fixed work / variable duration’. (Every task is declared as having 80 000 GFLOPs of work to perform, so with an application that is declared as achieving 2.78 GFLOPs per second, the initial run-time estimate becomes 8 hours.)
ID: 100341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100343 - Posted: 8 Jan 2021, 18:48:01 UTC - in response to Message 100338.  

my computer have only 4GB of Ram, and the Mof Wu errored in few seconds ..

the Wingman's computer have 128GB of ram and the Mof WU ends well ^^
I’m not sure memory is the key to it.
Here’s one which failed on both my machine and a 128 GB machine.
All the failures I’ve seen so far are on Windows. Some work units have failed on my machines but succeeded on Android or macOS.
ID: 100343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1681
Credit: 17,854,150
RAC: 18,215
Message 100349 - Posted: 8 Jan 2021, 22:18:23 UTC - in response to Message 100343.  

All the failures I’ve seen so far are on Windows. Some work units have failed on my machines but succeeded on Android or macOS.
Add Linux applications to that.
I've got heaps of RAM on my Windows systems but Tasks that crashed and burnt on mine completed OK on some Linux systems (one or 2 also errored out on the Linux application, but most ran to completion OK).
Grant
Darwin NT
ID: 100349 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 7,594
Message 100383 - Posted: 10 Jan 2021, 8:52:49 UTC - in response to Message 100349.  
Last modified: 10 Jan 2021, 8:54:59 UTC

I've got heaps of RAM on my Windows systems but Tasks that crashed and burnt on mine completed OK on some Linux systems (one or 2 also errored out on the Linux application, but most ran to completion OK).

Yeap
Seems a Windows app problem, also today
1319285270
1319282688
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address.......

ID: 100383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 7,594
Message 100401 - Posted: 14 Jan 2021, 6:30:10 UTC

Now a lot of errors of "hHH000001_dummy" wus

-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION

<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @hHH00001_dummy_0002_272_abinitio_flags_relax -in:file:native 00001.pdb -in:file:fullatom -in:file:s 00001.pdb -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3082674
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0000000000000000

Engaging BOINC Windows Runtime Debugger...

ID: 100401 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1681
Credit: 17,854,150
RAC: 18,215
Message 100404 - Posted: 14 Jan 2021, 7:56:27 UTC - in response to Message 100401.  

Now a lot of errors of "hHH000001_dummy" wus
I've had a couple of Tasks Validate, but that's about all so far. Errors outnumber Valids by a huge margin.
Grant
Darwin NT
ID: 100404 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,919,932
RAC: 2,098
Message 100406 - Posted: 14 Jan 2021, 10:23:30 UTC

Hi, new sort of errors for me with " FF_gogogo_0_SAVE_ALL_OUT_IGNORE_THE_REST_6lp4ci5n_1056733_2_0" Wu

Stderr.txt

lot of :
warning: filename too long--truncating.
[ m_m_m_JHR_b2_03207_n_full_17_000000231_0000100002_0000001_0_89_103_H_._JHR_b2_00801_n_full_17_0001_0001_0000200002_0000001_0_0001_full_85_100_H_._JHR_b2_00430_n_full_17_0000100004_0000031_0_0001_0001_0000200008_0000001_1_0001_7_14_H_._JHR_bd4_00177_nS_0022_00 ]




and

[ m_m_m_JHR_b2_03207_n_full_17_000000231_0000100002_0000001_0_89_103_H_._JHR_b2_00801_n_full_17_0001_0001_0000200002_0000001_0_0001_full_85_100_H_._JHR_b2_00430_n_full_17_0000100004_0000031_0_0001_0001_0000200008_0000001_1_0001_7_14_H_._JHR_bd4_00177_nS_0022_00 ]
checkdir warning: path too long; truncating
m_m_m_JHR_b2_03207_n_full_17_000000231_0000100002_0000001_0_89_103_H_._JHR_b2_00801_n_full_17_0001_0001_0000200002_0000001_0_0001_full_85_100_H_._JHR_b2_00430_n_full_17_0000100004_0000031_0_0001_0001_0000200008_0000001_1_0001_7_14_H_._JHR_bd4_00177_nS_0022_00
-> ./m_m_m_JHR_b2_03207_n_full_17_000000231_0000100002_0000001_0_89_103_H_._JHR_b2_00801_n_full_17_0001_0001_0000200002_0000001_0_0001_full_85_100_H_._JHR_b2_00430_n_full_17_0000100004_0000031_0_0001_0001_0000200008_0000001_1_0001_7_14_H_._JHR_bd4_00177_nS_0022_
error: cannot create ./m_m_m_JHR_b2_03207_n_full_17_000000231_0000100002_0000001_0_89_103_H_._JHR_b2_00801_n_full_17_0001_0001_0000200002_0000001_0_0001_full_85_100_H_._JHR_b2_00430_n_full_17_0000100004_0000031_0_0001_0001_0000200008_0000001_1_0001_7_14_H_._JHR_bd4_00177_nS_0022_
No such file or directory
ID: 100406 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 393
Credit: 12,110,248
RAC: 4,484
Message 100407 - Posted: 14 Jan 2021, 11:16:03 UTC - in response to Message 100401.  

I’m going to sulk, why can’t I have some of these errors - I never get any errors! it’s not fair!

I WANT MY SHARE OF ERRORS!
ID: 100407 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100409 - Posted: 14 Jan 2021, 14:39:40 UTC - in response to Message 100407.  
Last modified: 14 Jan 2021, 14:40:51 UTC

Come to the light side…
Use Windows…
Learn to love those 25 year-old path length limitations…
Enjoy the obscure bugs…
;-⁠)
ID: 100409 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 393
Credit: 12,110,248
RAC: 4,484
Message 100411 - Posted: 14 Jan 2021, 15:25:53 UTC - in response to Message 100409.  

Come to the light side…
Use Windows…
Learn to love those 25 year-old path length limitations…
Enjoy the obscure bugs…
;-⁠)


Ug - I don’t want them THAT badly.

I was setting up a Win10 machine for my granddaughter to use for on-line schooling and it was not a pleasant experience.
ID: 100411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 7,594
Message 100412 - Posted: 14 Jan 2021, 17:12:58 UTC - in response to Message 100407.  

I’m going to sulk, why can’t I have some of these errors - I never get any errors! it’s not fair!

If i'm not wrong, the native app is for linux and, after, they compile for Windows.
Maybe this is the problem...
ID: 100412 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 393
Credit: 12,110,248
RAC: 4,484
Message 100413 - Posted: 14 Jan 2021, 21:58:11 UTC - in response to Message 100412.  

I’m going to sulk, why can’t I have some of these errors - I never get any errors! it’s not fair!

If i'm not wrong, the native app is for linux and, after, they compile for Windows.
Maybe this is the problem...


Windows is the problem???

Yes, Windows is always a problem :-)


More seriously, that should not be a problem if they have a good testbed and apply it to both versions.
ID: 100413 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 7,594
Message 100415 - Posted: 15 Jan 2021, 13:56:11 UTC - in response to Message 100413.  
Last modified: 15 Jan 2021, 14:01:18 UTC

More seriously, that should not be a problem if they have a good testbed and apply it to both versions.

They said, in a recent publication, that almost half of Rosetta code is useless:
By 2019, the RosettaCommons has grown to laboratories at 71 institutions worldwide, overseeing a project consisting of over 3 million lines of code with contributions from over 800 scientists.....we estimate that the codebase could be reduced by half without a significant loss of functionality.

ID: 100415 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Phil McCrum

Send message
Joined: 17 Apr 10
Posts: 3
Credit: 244,691
RAC: 0
Message 100440 - Posted: 19 Jan 2021, 17:17:23 UTC

I am using BOINC 7.16.11 (x64).
Windows 10 Home

I am attempting to run four Rosetta 4.20 jobs. All four of them are counting up. I've exited and restarted BOINC. I've suspended a couple to see if the remaining two would straighten out. They are still counting up. Do I need to abort them and/or am I just not able to run Rosetta 4.20 jobs?
ID: 100440 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100442 - Posted: 19 Jan 2021, 18:28:29 UTC - in response to Message 100440.  

What do you mean by “counting up”? Are you saying the time in the ‘Remaining’ column is continually increasing even though the tasks are running? Note that the displayed remaining time and percentage progress are only rough estimates; all tasks should run for 8 hours of CPU time (give or take a few minutes).
ID: 100442 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Phil McCrum

Send message
Joined: 17 Apr 10
Posts: 3
Credit: 244,691
RAC: 0
Message 100454 - Posted: 20 Jan 2021, 14:38:58 UTC - in response to Message 100442.  

The tasks came in with an estimate of a little over 7 hours. I ran them for well over an hour and the estimate remaining was still a little over 7 hours. Am I just being too impatient?
ID: 100454 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 393
Credit: 12,110,248
RAC: 4,484
Message 100455 - Posted: 20 Jan 2021, 15:50:24 UTC - in response to Message 100454.  

The tasks came in with an estimate of a little over 7 hours. I ran them for well over an hour and the estimate remaining was still a little over 7 hours. Am I just being too impatient?


Yes, especially if Rosetta is a new project for this machine. It takes a few days for the system to settle down and get used to the environment before the estimated times can be relied on.
ID: 100455 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 34 · Next

Message boards : Number crunching : Rosetta 4.1+ and 4.2+



©2024 University of Washington
https://www.bakerlab.org