Curriculum Vitae aka CV aka Resume
Egor Marin
Machine Learning Scientist @ ENPICOM B.V., working with protein language models, bioinformatics, diffusion – you name it; @marinegor at most of the platforms.
PDF version of this CV is here
Profile
I have formal education in applied mathematics and physics (BSc + MSc), one year of full-time extracurricular education in computer science, PhD in biophysics and structural biology, and 8 years of computational lab experience.
I enjoy writing code and building complex systems, and want to do that for the things that get to run many times, and hence should be designed and written wisely. I know a lot about (computational) biology, mostly on molecular level (structural biology, protein biochemistry). And I can communicate with people, including mentoring or leading small teams, at least that’s what I have had experience with so far.
Socials
Career
I have been roughly 8 years in science, working with membrane proteins and their structure-function relationships: GPCRs, (microbal) rhodopsins, membrane transporters, P450 enzymes, antibodies and nanobodies.
I am currently employed at ENPICOM B.V., and before that have worked at the University of Groningen and Moscow Institute of Physics and Technology. Also, I have worked at many synchrotrons and XFELs, and also was a visiting research assistant at the University of Southern California.
2024-current: Machine Learning Scientist
Doing machine learning in biotech-oriented SaaS company.
- full-cycle ML model development: from literature survey and data collection to reproducible training and deployment
- working with both generative and predictive models for various tasks in the antibody development field
2017-2023: Scientist
- conducted research, managed data, wrote publications, participated in conferences
- managed students (BSs & MSc diploma), created a course on modern protein crystallography
2016-2017: Scientific Journalist
- analyzed publicational activity of MIPT
- wrote press-releases on published papers
- communicated with scientists & media.
Software skills & activities
Bag of words: python, numpy/sklearn/pytorch/lightning, polars🫶, huggingface🤗 datasets and tokenizers, uv/ruff❤️🔥, pytest, docker/docker-compose, bash, mlflow, Ubuntu/nixos, HPC/SLURM/dask, prefect/modal/airflow.
🤓 MDAnalysis Core Developer since February 2025. For MDAnalysis, wrote a parallel backend for all analysis classes (dask/multiprocessing), added a DSSP module for native secondary structure assignment, currently working on an MMCIF parser.
🧑💻 contributed to opensource: reciprocalspaceship: wrote parser for serial crystallography data into binary dataframe-like class, ntfy-cryosparc: wrote web-server to parse CryoSPARC (tm) notifications and notify appropriate users.
😎 participated in Google Summer of Code contributing to MDAnalysis: introduced process-based parallelization to the library using dask or multiprocessing (see main PR).
🏆 participated in data science competitions (top-10% in Kaggle “Predict Molecular Properties”, top-1 in first round of “Learning How To Smell”, top-10% in Takeda Signate competition, 5th place in Tochka Bank graph ML competition).
💾 administrated ~15 Linux workstations and servers with around 40 users, managing around 200 Tb of research data.
🍝 performed large-scale calculations on SLURM and PBS, wrote bash scripts and pipelines or reliable and reproducible data processing of serial crystallography data.
🤷♂️ self-hosted bunch of things: *arr, telegram bots, WebDAV, proxy & VPN servers, paperless, openwebui, you name it
🦀 Wrote a python(pyo3)+Rust(pest) parser for crystallographic data, contributed to polars-distance
Science skills & acitivities
Bag of words: structural biology, crystallography, cryoEM, cheminformatics, computer vision, data science, molecular docking, drug discovery, protein structure, GPCRs, membrane proteins, structure-based drug discovery, antibodies, protein language models, discrete diffusion, flow matching, AlphaOpenfold
🧬+🥩 structural biology: co-published papers in Science, Nature Communications, JACS, Science Advances, Journal of Chemical Information and Modelling, Scientific Data. Performed data analysis, wrote texts, created figures, managed writing process – the normal stuff.
💊 structure-based drug discovery: performed large-scale virtual screening campaign, created robust accelerated virtual screening approach, communicated with CROs, oversaw functional tests.
🤖 machine learning: did many ad-hoc ML applications in computer vision (background removal with NMF decomposition), clustering, supervised learning. Always try linear regression first, have a paper about it.
👾 deep learning: know about protein language models and their properties, AlphaOpenfold-like models and their applications, wrote toy discrete diffusion models, adapted open-source discrete diffusion models for other domains.
Education
University of Groningen, 2019-2023
PhD, diploma on “On the methods of studying protein-ligand interaction dynamics”
Computer Science Center, 2020-2022
Full-time extracurricular education in computer science: Python, C++, Algorithms and Data Structures, Data Science, Intro to Linux Systems, Rust
Moscow Institute of Physics and Technology, 2017-2019
MSc in applied mathematics and physics, summa cum laude, with specialization in biophysics and structural biology
Moscow Institute of Physics and Technology, 2013-2017
BSc in applied mathematics and physics, magna cum laude
Last updated: September 2025.