CV
General Information
| Full Name | Yu S. Huang/黄宇 |
| Languages | Chinese, English |
Experience
-
2022 - Senior Director of Bioinformatics
臻和 Genecast Biotechnology Corp Ltd, Shanghai/Beijing/Wuhan, China - Define and execute long-term technical strategy for AI-driven precision oncology, aligned with corporate product pipelines and business goals.
- Lead the development of multimodal AI platforms integrating sequence, structure, and epigenomic data for non-invasive cancer detection.
- Built enterprise-grade AI computing infrastructure (K8s, PyTorch, distributed storage, high-speed interconnect) to support large-scale computing.
- Lead and mentor a high-performance team of algorithm scientists, bioinformaticians, and software engineers to deliver end-to-end solutions from in silico modeling to experimental validation.
- Led cross-disciplinary team management and promoted tight integration between computational models and experimental biology.
- External scientific engagement, conference presentations, high-impact publications, and IP strategy; drove research-to-product translation.
- AI model for Multi Cancer Early Detection (Nature Communications 2023).
- Optimize core bioinformatics algorithms using Deep/Machine/Statistical Learning techniques.
- AI models for MRD (Minimal Residual Disease) fixed-panel and WES custom-panel products.
- Teach Bayesian Statistics, Machine/Deep Learning, Julia/Rust Programming.
-
2015 - 2021 Professor/Principal Investigator
Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 中科院 - Led the establishment of AI-driven drug discovery center and built a mature structure-based drug design & virtual screening system.
- Developed Fergie (VAE-based small molecule generation) and Deffini (structure-based virtual screening DNN) to enable structure-guided drug design at scale, designed one kinase inhibitor molecule entering the PCC phase.
- Developed core algorithms for genomic variant calling, copy number analysis, and methylation sequencing to support early-stage innovative drug R&D.
- Directed national/provincial research projects, built academic-industry partnerships, and delivered high-impact publications.
- Built the AI computing infrastructure for CAS SIMM.
- Teach Russell & Norvig 2020 "Artificial Intelligence_ A Modern Approach".
- Teach Chris Bishop 2006 book "Pattern Recognition and Machine Learning".
- Teach Julia programming, Matrix Computations, Optimization.
-
2014-2015 Bioinformatics Scientist
Illumina Inc., San Diego, California, USA - Developed algorithms and pipelines for high-throughput sequencing data analysis.
- Built MethylSeq analysis tool on Illumina BaseSpace for bisulfite sequencing data processing.
- Developed UFlow, a Directed-Acyclic-Graph workflow system that speeds up Illumina bioinformatics workflows by >50X.
- Bioinformatics libraries in GOlang that speed up sequencing analysis by >100X.
Education
-
2010 PhD in Computational Biology and Bioinformatics
University of Southern California, Los Angeles, USA -
2003 B.S. in Biology
Fudan University, Shanghai, China
Honors and Awards
-
2023 - 江苏省省双创人才
-
2016 - China Thousand-Talent Program
-
2015 - Hundred-Talent Program of Chinese Academy of Sciences
-
2003-2008 - Merit Award Fellowship, University of Southern California.
-
2022 - Third Award, Computer Programming Contest, Fudan University.
-
1999-2003 - People's Scholarship, Fudan University.
-
1998 - Third Award, National High School Mathematics Competition of China.
-
1996 - Third Award, Junior High School Physics Competition, Shanghai.
Expertise & Skills
-
AI & Deep Learning
- Transformers, VAE, Multimodal Fusion, Statistical Learning, Deep Neural Networks
-
AI for Protein Design
- Protein Language Models, Generative AI, Diffusion Models, Structure-aware Models, AlphaFold / Rosetta workflows
-
AI Drug Discovery
- Virtual Screening, De novo Design, Structure-Based Drug Design
-
AI Infra & HPC
- PyTorch, TensorFlow, K8s, Kubeflow, Lustre FS, Infiniband, OpenMPI
-
Programming
- Rust, Python, C++, Julia, Go, R, Java, SQL
Open Source Projects
-
2017 Accucopy
- A computational method that infers Allele-specific Copy Number alterations from low-coverage low-purity tumor sequencing Data.
-
2021 eGADA
- enhanced GADA: a fast segmentation algorithm utilizing the Sparse Bayesian Learning (also called Relevance Vector Machine). It can be applied to array intensity data, NGS sequencing data, or any sequential data that displays characteristics of stepwise functions. Enhancements include: 1) a customized Red-Black tree to significantly expedite the final backward elimination step; 2) coded in C++, which is better structured than C; 3) export eGADA.so, a Python API.
Hobbies
- Read, Surf, Snowboard, Swim