HUBioDataLab

ChemPile Dataset Paper Accepted at NeurIPS 2025

Great news! The paper “ChemPile: A 250 GB Diverse and Curated Dataset for Chemical Foundation Models,” co-authored by Bünyamin Şen and led by a research group in Germany, has been accepted to the NeurIPS 2025 conference.

This is a significant development, as NeurIPS is one of the world’s most prestigious CS/AI conferences. The paper introduces ChemPile, a massive (250 GB, 75B tokens) open dataset of curated chemical data, designed to train and evaluate the next generation of chemical foundation models.

Previous post
ISMB/ECCB 2025 Participation
Next post
Turkish Universities Accelerate Drug Discovery with 'DrugGEN' AI Project