AlphaFold reveals the structure of protein universe

Message boards : Rosetta@home Science : AlphaFold reveals the structure of protein universe

To post messages, you must log in.

AuthorMessage
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,633,537
RAC: 7,232
Message 106684 - Posted: 1 Aug 2022, 6:59:34 UTC

200 milion proteins

In partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI), we’re now releasing predicted structures for nearly all catalogued proteins known to science, which will expand the AlphaFold DB by over 200x - from nearly 1 million structures to over 200 million structures - with the potential to dramatically increase our understanding of biology.

ID: 106684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 106685 - Posted: 1 Aug 2022, 9:34:18 UTC - in response to Message 106684.  

Remarkable stuff that will change the world. However, it does not appear that we can contribute to this work directly, since it is mainly done in-house.

But they do mention mining:
Structural search tools like Foldseek and Dali are allowing users to very quickly search for entries similar to a given protein. This could be a first step toward mining large sequence datasets for practically useful proteins, such as those that break down plastic, and it could provide clues about protein function. The update of the database to include over 200 million predicted structures will further amplify this impact.

I really have a very limited idea of how that works, but the LODA project seeks to discover new mining algorithms:
https://boinc.loda-lang.org/loda/

I have done a little of it, and think I will do more.
ID: 106685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 107
Credit: 838,868
RAC: 920
Message 106687 - Posted: 1 Aug 2022, 14:00:47 UTC - in response to Message 106685.  

"....mining large sequence datasets for practically useful proteins, such as those that break down plastic..."

We need to reduce microplastics, but don't want to produce Ice Nine. :>))

S. Gaber
ID: 106687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,633,537
RAC: 7,232
Message 106690 - Posted: 1 Aug 2022, 14:58:57 UTC - in response to Message 106685.  
Last modified: 1 Aug 2022, 15:00:27 UTC

Remarkable stuff that will change the world. However, it does not appear that we can contribute to this work directly, since it is mainly done in-house.

But they do mention mining:
Structural search tools like Foldseek and Dali are allowing users to very quickly search for entries similar to a given protein. This could be a first step toward mining large sequence datasets for practically useful proteins, such as those that break down plastic, and it could provide clues about protein function. The update of the database to include over 200 million predicted structures will further amplify this impact.


I have done a little of it, and think I will do more.


See my post about FoldSeek

P.S.
A boinc project with FoldSeek will be great!
ID: 106690 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 106691 - Posted: 1 Aug 2022, 15:15:45 UTC - in response to Message 106690.  

See my post about FoldSeek

P.S.
A boinc project with FoldSeek will be great!

Yes, it always depends on the size of the data that they have to shuffle around, and the latency.
We are ready when they are.
ID: 106691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,633,537
RAC: 7,232
Message 106694 - Posted: 2 Aug 2022, 6:50:50 UTC - in response to Message 106691.  

Yes, it always depends on the size of the data that they have to shuffle around, and the latency.
We are ready when they are.


FoldSeek database is 700gb to download, 950gb extracted.
There is a lot of work to do :-P
ID: 106694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : AlphaFold reveals the structure of protein universe



©2024 University of Washington
https://www.bakerlab.org