ChemPile Dataset Paper Accepted at NeurIPS 2025
research-news
September 20, 2025
October 25, 2025
Great news! The paper “ChemPile: A 250 GB Diverse and Curated Dataset for Chemical Foundation Models,” co-authored by Bünyamin Şen and led by a research group in Germany, has been accepted to the NeurIPS 2025 conference.
This is a significant development, as NeurIPS is one of the world’s most prestigious CS/AI conferences. The paper introduces ChemPile, a massive (250 GB, 75B tokens) open dataset of curated chemical data, designed to train and evaluate the next generation of chemical foundation models.