Machine Learning Intern (Network Analysis & Biology)

We’re launching a cutting-edge machine learning project to leverage gene networks in drug target discovery. Together, we’ll be developing supervised and unsupervised methods to use  genomic data to predict the effects of perturbations on cellular functions.

Your project will be supervised by data scientists, but the role requires mastery of the inner workings of diverse models and the ability to diagnose biases, creatively benchmark unsupervised models, handle data limitations, and work with biologists to interpret results. The ability to search academic literature and codebases, even outside your domain of expertise, for methods and implementations and avoid re-inventing the wheel will be critical to rapid progress. The project will roughly follow a standard data science lifecycle, including ideation/planning, implementation, validation, presentation, iteration, and deployment. Strong coding and documentation practices will be key to communication with computational team members. An excellent candidate will have the ability to present to a diverse audience and translate feedback from biologists into tractable computational problems.

Hours and duration negotiable


Growth Opportunities:

  • Career transition – Move from boring data science to the fascinating world of biological data science
  • Advanced methods – Learn cutting edge graph data science and generative deep learning
  • Deep ML intuition – Level up your data intuition, unsupervised modeling, model interpretability
  • Ownership and autonomy – You’ll be encouraged to draft project plans and participate in decision making
  • Communication – Learn to present and translate findings to non-computational stakeholders

Key Responsibilities include but are not limited to:

  • Develop and validate supervised and unsupervised models
  • Connect with biologists to interrogate models and interpret results
  • Explore computer science literature to identify useful methods, caveats, data, and codebases
  • Quickly learn the basics of molecular biology and other domains if not already familiar

Required Qualifications (in order of importance):

  • Python data stack – warm welcome for pandas and numpy wizards
  • Machine learning – strong understanding of supervised and unsupervised methods and their caveats; ability to explore model interpretability and feature relationships
  • Python – strong scripting skills; OOP not required if code is neat and modular

Preferred Qualifications (in order of importance):

  • Graphs – experience analyzing graph data
  • Data visualization – plotly, dash, seaborn, etc
  • Deep learning – ability to peek under the hood and tweak graph convolutional VAEs, losses, etc
  • Computational optimization – strong Leetcode-like skills to optimize topological algorithms
  • RNA data – familiarity with the statistical properties and limitations of RNAseq and scRNAseq
  • Biology – basic understanding of the central dogma of biology, different molecule types (e.g., transcription factor)
  • Immunology – T cell biology, autoimmunity
  • R – ability to wrangle the occasional R package that comes along, work with other teammates in R, or build Shiny modules

Careers Submission

"*" indicates required fields

Drop files here or
Accepted file types: pdf, doc, docx, Max. file size: 256 MB, Max. files: 3.
    This field is for validation purposes and should be left unchanged.