Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Scalable advanced sampling in molecular dynamics using standalone tools for data mining
As a computer scientist, the candidate will tackle longstanding problems in molecular dynamics, i.e., sampling efficiency and scalability.
The position is offered in the group of Prof. Amedeo Caflisch (biochemistry department at the University of Zurich).
Keywords: Molecular dynamics, advanced sampling, data mining, computer science, GPU computing, parallel programming, algorithm development, software engineering, scalability, master thesis, collaboration, semester project, internship, preliminary position for PhD application.
By integrating suitable equations of motion, molecular dynamics (MD) simulations have proven extremely useful in providing insights into atomistic details of biological processes.
Recently, the development of MD software has experienced a shift in focus from scientific algorithms with broad functionality to hardware and software engineering targeting computing efficiency. This has implied a standardization of protocols. Unfortunately, the resultant equilibrium simulations suffer from sampling redundancy. Furthermore, MD codes offer limited scalability on general purpose hardware, which is mainly due to the light computational load per integration step.
We are looking to strengthen our team that works toward overcoming these limitations by implementing a fully scalable version of an advanced sampling method we have recently developed, the so-called Progress Index Guided Sampling, or PIGS. The planned implementation will take advantage of modern architectures featuring accelerators.
PIGS evolves many replicas of the same system in parallel. The protocol uses two in-house data mining algorithms to identify the more interesting points reached at a given time. In a process called reseeding, simulations evolving in overlapping domains of phase space are killed and restarted from more interesting points. In this scheme, the evolution of a single replica is deployed to a single compute node. Node-local acceleration by GPUs will grant a sufficient number of steps per replica per second. The reseedings allow for a remarkable increase in the exploration rate of the phase space with respect to conventional sampling. While the communication between nodes is performed infrequently, the two data mining algorithms have not yet been parallelized, and this limits scalability to hundreds of nodes. Without GPU support, performance in terms of aggregated steps per second is also limiting. The solution to both of these issues is the main activity proposed in this project.
By integrating suitable equations of motion, molecular dynamics (MD) simulations have proven extremely useful in providing insights into atomistic details of biological processes.
Recently, the development of MD software has experienced a shift in focus from scientific algorithms with broad functionality to hardware and software engineering targeting computing efficiency. This has implied a standardization of protocols. Unfortunately, the resultant equilibrium simulations suffer from sampling redundancy. Furthermore, MD codes offer limited scalability on general purpose hardware, which is mainly due to the light computational load per integration step.
We are looking to strengthen our team that works toward overcoming these limitations by implementing a fully scalable version of an advanced sampling method we have recently developed, the so-called Progress Index Guided Sampling, or PIGS. The planned implementation will take advantage of modern architectures featuring accelerators.
PIGS evolves many replicas of the same system in parallel. The protocol uses two in-house data mining algorithms to identify the more interesting points reached at a given time. In a process called reseeding, simulations evolving in overlapping domains of phase space are killed and restarted from more interesting points. In this scheme, the evolution of a single replica is deployed to a single compute node. Node-local acceleration by GPUs will grant a sufficient number of steps per replica per second. The reseedings allow for a remarkable increase in the exploration rate of the phase space with respect to conventional sampling. While the communication between nodes is performed infrequently, the two data mining algorithms have not yet been parallelized, and this limits scalability to hundreds of nodes. Without GPU support, performance in terms of aggregated steps per second is also limiting. The solution to both of these issues is the main activity proposed in this project.
The candidate will contribute to the following goals:
**1**. Develop, implement, and test parallel implementations of the two data mining techniques that are used in PIGS: a tree-based clustering algorithm and an algorithm that arranges and annotates time series data to reveal metastable states.
**2**. Establish an existing MD platform to develop a fully scalable implementation of the PIGS protocol on HPC resources equipped with accelerators.
Extend and/or modify such a platform to either incorporate the parallel versions of the algorithms developed in **1** or to provide GPU support.
**Desirable skills**
- Computer scientist with interest in algorithms.
- Experience in parallel programming (OpenMP and MPI), GPU computing, and software engineering.
- Working knowledge of Fortran.
**Additional skills**
- Familiarity with the numerical implementation of molecular dynamics simulations.
- Acquaintance with the MD software CHARMM,
http://www.charmm.org.
The candidate will contribute to the following goals:
**1**. Develop, implement, and test parallel implementations of the two data mining techniques that are used in PIGS: a tree-based clustering algorithm and an algorithm that arranges and annotates time series data to reveal metastable states.
**2**. Establish an existing MD platform to develop a fully scalable implementation of the PIGS protocol on HPC resources equipped with accelerators. Extend and/or modify such a platform to either incorporate the parallel versions of the algorithms developed in **1** or to provide GPU support.
**Desirable skills**
- Computer scientist with interest in algorithms.
- Experience in parallel programming (OpenMP and MPI), GPU computing, and software engineering.
- Working knowledge of Fortran.
**Additional skills**
- Familiarity with the numerical implementation of molecular dynamics simulations.
- Acquaintance with the MD software CHARMM, http://www.charmm.org.
Prof. Amedeo Caflisch, caflisch@bioc.uzh.ch, http://www.biochem-caflisch.uzh.ch
Dr. Andreas Vitalis, a.vitalis@bioc.uzh.ch
**Further information**
The goals are part of a funded project in the PASC program, http://www.pasc-ch.org/projects/projects/scalable-advanced-sampling-in-molecular-dynamics. If applicable, financial support may be provided to the candidate in compliance with Swiss regulations. We propose CHARMM as the platform to implement the scalable version of PIGS as it already offers GPU support. If the candidate has experience in GPU computing, we may add GPU support to CAMPARI http://campari.sourceforge.net instead (PIGS is currently implemented in CAMPARI).
Successful outcomes will be published in international journals. In agreement with the candidate, an extension of the project can be discussed, _e.g._ through the application for a PhD, as we are already using PIGS in many systems with high biological impact such as Alzheimer's and cancer
Prof. Amedeo Caflisch, caflisch@bioc.uzh.ch, http://www.biochem-caflisch.uzh.ch
Dr. Andreas Vitalis, a.vitalis@bioc.uzh.ch
**Further information**
The goals are part of a funded project in the PASC program, http://www.pasc-ch.org/projects/projects/scalable-advanced-sampling-in-molecular-dynamics. If applicable, financial support may be provided to the candidate in compliance with Swiss regulations. We propose CHARMM as the platform to implement the scalable version of PIGS as it already offers GPU support. If the candidate has experience in GPU computing, we may add GPU support to CAMPARI http://campari.sourceforge.net instead (PIGS is currently implemented in CAMPARI). Successful outcomes will be published in international journals. In agreement with the candidate, an extension of the project can be discussed, _e.g._ through the application for a PhD, as we are already using PIGS in many systems with high biological impact such as Alzheimer's and cancer