Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 195 · 196 · 197 · 198 · 199 · 200 · 201 . . . 300 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
Any more data on Python running on different CPUs? I only have this so far: https://www.dropbox.com/scl/fi/8gp41r6sh7ffkqupvglbp/Rosetta-Python-CPU-instruction-set.xlsx?dl=0&rlkey=4ubjc4jqyng1o9ivqyckl8hek |
JohnDK Send message Joined: 6 Apr 20 Posts: 33 Credit: 2,390,240 RAC: 0 |
Problem: WUs often pauses with the VM unmanageable error, no matter if I run 9 or only 5 WUs at a time. Processor: 32 AuthenticAMD AMD Ryzen 9 5950X 16-Core Processor [Family 25 Model 33 Stepping 0] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Problem: WUs often pauses with the VM unmanageable error, no matter if I run 9 or only 5 WUs at a time. It is a long-standing problem, much discussed here (you can search it). It is mainly on Linux that I have seen. If you use Windows, and then the 5.2.44 VirtualBox version (not 6.1.x), you will not have the problem. But there is another problem of tasks using very little CPU and running forever ("0 CPU" problem) that is common to both operating systems. You just abort them as early as you find them. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
Linux users seem to be giving huge numbers of processor abilities compared to Windows users. Not sure what's going on here. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
The biggest problem is computers which cannot run them AT ALL. EVERY SINGLE ONE uses no CPU time. I think this occurs on older CPUs and is a hardware incompatibility. I'm trying to work out which ones it happens on, and therefore what instruction is required on a CPU for it to be ok. Then perhaps Oracle can look into it.Problem: WUs often pauses with the VM unmanageable error, no matter if I run 9 or only 5 WUs at a time. |
zxcvbob Send message Joined: 4 Jan 06 Posts: 8 Credit: 830,878 RAC: 0 |
The 32-bit machine finally started getting work. That's the main point of this post. Another 64-bit machine w/o vbox (older CPU but has a graphics card) wasn't getting anything so I signed it up for SiDock@Home and that is crunching away. |
JohnDK Send message Joined: 6 Apr 20 Posts: 33 Credit: 2,390,240 RAC: 0 |
Problem: WUs often pauses with the VM unmanageable error, no matter if I run 9 or only 5 WUs at a time. Yes I'm running Linux on that host, on my Windows host I've no problem, even with VirtualBox 6.1. On the Linux host I did have some of those 0 CPU procent WUs and did abort them, again don't think I've had a single one on my Windows host. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Isn't the code written in Linux or other machine languages and then adapted to windows? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Isn't the code written in Linux or other machine languages and then adapted to windows? Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows. Chances are, they put more effort into the Windows version, since that is what most people use. They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Isn't the code written in Linux or other machine languages and then adapted to windows? They don't have any real tech people anymore. No one to write new code. You know how resistant to change and updated programs they are. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
Linux incompatible with Linux. [rolls eyes in disbelief] I'll stick to Windows....Isn't the code written in Linux or other machine languages and then adapted to windows? |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,370,872 RAC: 5,700 |
Isn't the code written in Linux or other machine languages and then adapted to windows? My main machine runs Red Hat Enterprise Linux release 8.5 (Ootpa). I was running these rosetta_4.20_x86_64-pc-linux I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows. So I guess that extra effort they put into the Windows version has not paid off. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows. The "Vm job unmanageable" ones don't fail, they just suspend for 24 hours or until you reboot. It is the "0 CPU" ones that fail. As I said, it is with both operating systems. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,073,013 RAC: 8,289 |
I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows. But he specified that they were 4.20 tasks, not vm. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Watch out on aaam-mVAL_pp-NMPHE-ACBC-AMACBEN2_2_2537474_3 type tasks First guy timed out, I aborted. 7 hours on 4 hours cpu It starts ok: 022-03-25 20:00:56 (16268): Status Report: Elapsed Time: '6000.105049' 2022-03-25 20:00:56 (16268): Status Report: CPU Time: '6012.031250' then: 022-03-25 21:41:41 (16268): Status Report: Elapsed Time: '12000.204857' 2022-03-25 21:41:41 (16268): Status Report: CPU Time: '6789.093750' and 2022-03-25 23:23:03 (16268): Status Report: Elapsed Time: '18000.497971' 2022-03-25 23:23:03 (16268): Status Report: CPU Time: '6859.546875' and 2022-03-26 01:05:12 (16268): Status Report: Elapsed Time: '24001.304815' 2022-03-26 01:05:12 (16268): Status Report: CPU Time: '6925.156250' <aborted> |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
But he specified that they were 4.20 tasks, not vm. OK, I was referring to the VirtualBox (python) ones. They are the only ones that have the "Vm job unmanageable" errors. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
There are no programmers in Rosetta that have a clue.Isn't the code written in Linux or other machine languages and then adapted to windows? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
You don't have to reboot, just restart the Boinc client. And I'm sure there must be a command we can send to the client, using Boinc_cmd that would make it try again. I have asked in the main Boinc forum....I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
I noticed that the server disabled Python tasks for my computer. After I enabled them, 21 of them completed. All of them validated. Six more are still running. The computer info that looks relevant: 3/27/2022 8:37:31 PM | | Starting BOINC client version 7.16.20 for windows_x86_64 3/27/2022 8:37:31 PM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz [Family 6 Model 158 Stepping 10] 3/27/2022 8:37:31 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2 3/27/2022 8:37:31 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.19044.00) 3/27/2022 8:37:31 PM | | VirtualBox version: 6.0.14 Maybe this will help decide which computers usually handle which CPU features are enough to handle the Python tasks correctly. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
Thanks. I've made a conclusion so far that it's one or more of these instructions causing the problem, since they're present on all the working ones, but none of the failed ones: avx, avx2, f16c, fma These are quite likely to be at fault, since TN-Grid for example makes good use of avx and fma if present and sends a different program if your CPU doesn't have them. Here's the spreadsheet so far, with the offending instructions in bold. https://www.dropbox.com/scl/fi/8gp41r6sh7ffkqupvglbp/Rosetta-Python-CPU-instruction-set.xlsx?dl=0&rlkey=4ubjc4jqyng1o9ivqyckl8hek Not sure where we go from here. If the program requires an avx capable machine, I doubt Rosetta are willing to make it not need that since it's only missing on older machines. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org