Topic 2 - Clustering and machine learning
In this part of the course, you will be introduced to molecular fingerprints and how these are used in similarity searches and clustering algorithms. Following this, you will be introduced to QSAR and learn to build a number of machine learning models in Google Colab
.
Prepare yourself:
In order to make full use of this lessons, it is highly recommended to prepare yourselves by first having a look at the following movies:
- Machine learning and metrics [YouTube - 24’48”]
- Supervised and unsupervised learning [YouTube - 2’47”]
- Molecular strings and fingerprints [YouTube - 17’25”]
Then have a thorough read of these papers:
- Molecular fingerprint similarity search in virtual screening [pdf]
- Applications of machine learning in drug discovery and development [pdf]
Course manuals:
Slides:
Exercises:
In order to get acquainted with all the concepts that have been covered, these Google Colab
links allow you to do some exercises:
- Clustering and machine learning with RDKit [Google Colab]
- Extra exercises [Google Colab]
Finished?
For those that are interested in machine learning for drug discovery, this paper provides an excellent review of the current state-of-the-art on this topic:
When finished, it is time to move to the third topic of this course: