Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 196 · 197 · 198 · 199 · 200 · 201 · 202 . . . 300 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
VirtualBox comes in two major versions, vbox and vbox64. The Python tasks use only the newer of these, vbox64. Since vbox emulates a 32-bit instruction set and vbox64 emulates a 64-bit instruction set, they are not interchangeable.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
[snip] If some of you can identify specific emulated CPU instructions for which emulation fails and shuts down the emulation, you might give the details to Oracle and see if they will fix at least part of the problem, even if Rosetta@Home won't help.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? Time to ask Oracle to produce more meaningful error messages if any of the missing instructions are not present when vbox64 runs. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
It sounds like you know more about the way Oracle works than me - particularly whether Oracle or the program decides what instructions are available. Perhaps you should contact them? I would have thought Oracle just passes the available instruction set to the Python program, but maybe not.Time to ask Oracle to produce more meaningful error messages if any of the missing instructions are not present when vbox64 runs.If some of you can identify specific emulated CPU instructions for which emulation fails and shuts down the emulation, you might give the details to Oracle and see if they will fix at least part of the problem, even if Rosetta@Home won't help.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
[snip] It sounds like you know more about the way Oracle works than me - particularly whether Oracle or the program decides what instructions are available. Perhaps you should contact them? I would have thought Oracle just passes the available instruction set to the Python program, but maybe not.Time to ask Oracle to produce more meaningful error messages if any of the missing instructions are not present when vbox64 runs.One thing many of us might send them is a request that when the VM unmanageable error is given, vbox64 should give more details on why.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? I tried contacting Oracle. They made it rather difficult. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
I tried contacting Oracle. They made it rather difficult.Well we know Rosetta is impossible to speak to. Trouble is, are we sure who is to blame here? Does Oracle have a feature missing, or is Rosetta programmed badly? If you want to contact Oracle, there seems to be many ways to do so, here: https://www.virtualbox.org/wiki/Community |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I tried contacting Oracle. They made it rather difficult.Well we know Rosetta is impossible to speak to. Trouble is, are we sure who is to blame here? Does Oracle have a feature missing, or is Rosetta programmed badly? Why should Oracle care about a little problem with a specific program that does not affect thousands or tens of thousands of users of it's product? That is probably why they ran you off. It's like me contacting a cold wear testing lab about a specific product they tested and showed data for only 2 out of 12 zones and neither of these zones are critical to the more important areas that get cold the fastest. I am only a individual contacting a company that tests for million dollar industrial foot companies. My request got round filled or back burnered. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
A failing Python task. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1317287356 The section of the vbox_trace.txt file that looks relevant: 2022-03-28 14:55:36 (14760): Command: VBoxManage -q showvminfo "boinc_35d83054a4475009" --machinereadable Exit Code: -2135228415 Output: VBoxManage.exe: error: Could not find a registered machine named 'boinc_35d83054a4475009' VBoxManage.exe: error: Details: code VBOX_E_OBJECT_NOT_FOUND (0x80bb0001), component VirtualBoxWrap, interface IVirtualBox, callee IUnknown VBoxManage.exe: error: Context: "FindMachine(Bstr(VMNameOrUuid).raw(), machine.asOutParam())" at line 2621 of file VBoxManageInfo.cpp 2022-03-28 14:55:36 (14760): Command: VBoxManage -q showhdinfo "C:ProgramDataBOINCslots10/vm_image.vdi" Exit Code: 0 Output: UUID: ef35dff9-d482-48f8-9519-fef6c1b23a3b Parent UUID: base State: created Type: normal (base) Location: C:ProgramDataBOINCslots10vm_image.vdi Storage format: VDI Format variant: dynamic default Capacity: 8192 MBytes Size on disk: 7115 MBytes Encryption: disabled Elapsed time MUCH greater than simulated CPU time. I aborted it. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,513,695 RAC: 9,561 |
VirtualBox comes in two major versions, vbox and vbox64. The Python tasks use only the newer of these, vbox64. Since vbox emulates a 32-bit instruction set and vbox64 emulates a 64-bit instruction set, they are not interchangeable.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? I think you guys have found the issue here. My machines that don't work are a 1st gen Nehalem Xeon (AVX was introduced in 2nd gen Sandy Bridge) and Pentiums which have AVX/AVX2 disabled. I'm not sure about F16C or FMA yet. It looks like the Intel MKL doesn't require AVX, but if Virtualbox is telling it that it's available when it's not then it's going to crash. CPUs that don't work: https://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%20L5640%20-%20AT80614005133AB%20(BX80614L5640).html https://www.cpu-world.com/CPUs/Pentium_Dual-Core/Intel-Pentium%20G3220.html https://www.cpu-world.com/CPUs/Pentium_Dual-Core/Intel-Pentium%20G4500.html |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
I think you guys have found the issue here. My machines that don't work are a 1st gen Nehalem Xeon (AVX was introduced in 2nd gen Sandy Bridge) and Pentiums which have AVX/AVX2 disabled. I'm not sure about F16C or FMA yet.If it's a case of "Virtualbox is telling it that it's available when it's not" then perhaps we ought to speak to Virtualbox? |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,513,695 RAC: 9,561 |
It might be VirtualBox, but might it also just be that the script is setup to assume AVX (or whichever extension is missing) is available without checking? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,147,428 RAC: 16,343 |
A new batch of Rosetta 4.20 tasks are out atm, named YIL10mer_YILstub* I'm getting a lot of computation errors here Unhandled exception errors all over the place after just a few seconds. After a couple of attempts, I appear to have all 4 cores running tasks right now, but it's been a struggle. Beware |
MStenholm Send message Joined: 18 Apr 20 Posts: 18 Credit: 25,821,080 RAC: 16,471 |
A new batch of Rosetta 4.20 tasks are out atm, named YIL10mer_YILstub* It seems to be another batch that prefers Linux as I can see from my team members. I got two times 16 running on Linux about one hour in and as I can see the Windows ones errors out fast, seconds in. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
A new batch of Rosetta 4.20 tasks are out atm, named YIL10mer_YILstub*Errors after a few seconds don't bother me. Cosmology at home wasting the whole task time before deciding to crash is annoying. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,375 RAC: 7,553 |
Some errors on VirtualBox WUS: <message> |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
A bit late to the party with this some cpu specs :- Runs python ok , with only `normal` zombies Processor: 16 AuthenticAMD AMD Opteron(TM) Processor 6276 [Family 21 Model 1 Stepping 2] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 topx page1gb rdtscp OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7600.00) ........ Also Runs python ok , with only `normal` zombies Processor: 48 GenuineIntel Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz [Family 6 Model 62 Stepping 4] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes f16c rdrandsyscall nx lm avx vmx smx tm2 dca pbe fsgsbase smep OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7600.00) ........... Will not run VB tasks for rosetta@home or cosmology@home everything craps out after a few seconds Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz [Family 6 Model 23 Stepping 7] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe That is the only three systems I have infected with Virtual pox |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? From this short list the only on that I see is avx and I have no idea why VB + pythons + cosmology would need it . So , is it a simple matter of the admin of rosetta blocking all work to systems that don't have avx . . . . or removing its requirement , if possible . . . . Hmm . . . . I`le go back under my rock now :-) Well , actualy , its a old metal bin lid , like on `The Clangers` planet . ok , I admit to having three `clangers` dvd`s |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
Cosmology doesn't need it. I can run Cosmology on all 7 of my machines, most are missing AVX. The only thing that annoys Cosmology is VB 6. VB 5 is ok.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Cosmology doesn't need it. I can run Cosmology on all 7 of my machines, most are missing AVX. The only thing that annoys Cosmology is VB 6. VB 5 is ok. Had a look , the q9450 is on Boinc 7.16.20 so its got VB 6.1.2 I will finish all work and revert/uninstall/nuke back to Boinc 7.14.2 uses VB 5.2.8 to see what happens . I have got versions of boinc mangler back to 5.10.13 Oh! , that's 45 all together in win/Lin 32/64/VB or not , sad case . . . . Just in case . But sometimes they come in usefull . |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,147,428 RAC: 16,343 |
A new batch of Rosetta 4.20 tasks are out atm, named YIL10mer_YILstub* I'm not reporting anything recently, but I will send another message pointing out this LinuxWindows issue because it's turned up in several separate batches of work now. One-off little issues I don't bother with, but this seems systemic to me |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
Boinc version and VB version are not linked. Just install the older VB from the Oracle site. It will install on top of a newer one. Be sure to get the correct extensions along with it. I've not found any project that needs 6.Cosmology doesn't need it. I can run Cosmology on all 7 of my machines, most are missing AVX. The only thing that annoys Cosmology is VB 6. VB 5 is ok. If you change Boinc version you could break other things like SSL and you won't be able to contact some projects. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org