Machine Learning Intern (Network Analysis & Biology)

We’re launching a cutting-edge machine learning project to leverage gene networks in drug target discovery. Together, we’ll be developing supervised and unsupervised methods to use genomic data to predict the effects of perturbations on cellular functions.

Your project will be supervised by data scientists, but the role requires mastery of the inner workings of diverse models and the ability to diagnose biases, creatively benchmark unsupervised models, handle data limitations, and work with biologists to interpret results. The ability to search academic literature and codebases, even outside your domain of expertise, for methods and implementations and avoid re-inventing the wheel will be critical to rapid progress. The project will roughly follow a standard data science lifecycle, including ideation/planning, implementation, validation, presentation, iteration, and deployment. Strong coding and documentation practices will be key to communication with computational team members. An excellent candidate will have the ability to present to a diverse audience and translate feedback from biologists into tractable computational problems.

Hours and duration negotiable

Growth Opportunities:

Career transition – Move from boring data science to the fascinating world of biological data science
Advanced methods – Learn cutting edge graph data science and generative deep learning
Deep ML intuition – Level up your data intuition, unsupervised modeling, model interpretability
Ownership and autonomy – You’ll be encouraged to draft project plans and participate in decision making
Communication – Learn to present and translate findings to non-computational stakeholders

Key Responsibilities include but are not limited to:

Develop and validate supervised and unsupervised models
Connect with biologists to interrogate models and interpret results
Explore computer science literature to identify useful methods, caveats, data, and codebases
Quickly learn the basics of molecular biology and other domains if not already familiar

Required Qualifications (in order of importance):

Python data stack – warm welcome for pandas and numpy wizards
Machine learning – strong understanding of supervised and unsupervised methods and their caveats; ability to explore model interpretability and feature relationships
Python – strong scripting skills; OOP not required if code is neat and modular

Preferred Qualifications (in order of importance):

Graphs – experience analyzing graph data
Data visualization – plotly, dash, seaborn, etc
Deep learning – ability to peek under the hood and tweak graph convolutional VAEs, losses, etc
Computational optimization – strong Leetcode-like skills to optimize topological algorithms
RNA data – familiarity with the statistical properties and limitations of RNAseq and scRNAseq
Biology – basic understanding of the central dogma of biology, different molecule types (e.g., transcription factor)
Immunology – T cell biology, autoimmunity
R – ability to wrangle the occasional R package that comes along, work with other teammates in R, or build Shiny modules

Careers Submission

"*" indicates required fields

Name*

First Name* Last Name*

Phone

Email*

Upload a resume or other documents

Drop files here or

Accepted file types: pdf, doc, docx, Max. file size: 256 MB, Max. files: 3.

Comments

This field is for validation purposes and should be left unchanged.