For datasets associated with these projects, see Datasets.
Machine learning for materials and molecules
genheas — Generate High Entropy Alloys Structures
A neural evolution structures (NES) generation methodology combining artificial neural networks with evolutionary algorithms for inverse design of high entropy alloy (HEA) configurations. Trains on small unit cells using pair distribution functions and atomic properties, then generates much larger structures (40,000+ atoms) approximately 1000x faster than conventional Special Quasirandom Structures (SQS) methods. A single trained model can produce multiple distinct structures while maintaining identical fractional compositions.
Extensive deep neural networks (EDNN)
A physically motivated neural network topology designed for problems requiring extensivity (additivity across sub-domains). EDNNs use domain decomposition where each sub-domain (tile) has a non-overlapping focus region surrounded by an overlapping context region, enabling O(N) scaling. They can train on small structures and predict properties of arbitrarily large systems, making them ideal for massively parallel computing. EDNNs have been applied to replace or augment Kohn-Sham density functional theory, predicting total energies, band gaps, and charge densities.
- Paper: K. Mills, K. Ryczko, I. Luchak, A. Domurad, C. Beeler, I. Tamblyn, Extensive deep neural networks for transferring small scale learning to large scale systems, Chemical Science 10, 4129 (2019). DOI
- Language: Python (TensorFlow and PyTorch)
- Code:
RUGAN — Regressive Upscaling Generative Adversarial Network
A generative machine learning method that learns to produce synthetic configurations which are numerically accurate and statistically indistinguishable from real data. RUGAN's fully convolutional architecture allows it to generate arbitrarily large structures after training only on small-scale data, and it can produce configurations at requested energy values while respecting periodic boundary conditions. Applied to mesoscale surface generation from chemical motifs and optical lattice experiments at unobserved conditions.
- Papers: K. Mills, C. Casert, I. Tamblyn, Adversarial generation of mesoscale surfaces from small scale chemical motifs, Journal of Physical Chemistry C 124(42), 23158-23163 (2020). DOI
C. Casert, K. Mills, T. Vieijra, J. Ryckebusch, I. Tamblyn, Optical lattice experiments at unobserved conditions and scales through generative adversarial deep learning, Physical Review Research 3, 033267 (2021). DOI - Language: Python (PyTorch)
- Code: GitHub
Watch and Learn — Transferable Learning with Physical Laws
An unsupervised learning approach augmented with physical principles that achieves fully transferable learning for problems in statistical physics. By coupling a recurrent neural network (RNN) to an extensive deep neural network (EDNN), the method — called Distribution-Consistent Learning (DCL) — learns equilibrium probability distributions and inter-particle interaction models from a single set of observations, then extrapolates across all temperatures, thermodynamic phases, and length scales. Works for Ising, Potts, and spin-glass models without feature engineering.
- Paper: K. Sprague, J. Carrasquilla, S. Whitelam, I. Tamblyn, Watch and learn — a generalized approach for transferrable learning in deep neural networks via physical principles, Machine Learning: Science and Technology 2(2), 02LT02 (2021). DOI
- Language: Python (Jupyter Notebooks). GPL-3.0 license.
- Code: GitHub
Spectroscopy and imaging
Hyperspectral Stimulated Raman Microscopy (SRS2021)
Unsupervised and supervised deep neural network models for enhancing hyperspectral stimulated Raman spectroscopy (SRS) microscopy images. Performs denoising and segmentation via one-shot deep learning — removing noise and identifying chemically distinct regions within a sample to produce chemical maps, all without requiring labelled training data. Demonstrated on lithium ore samples.
- Paper: P. Abdolghader, G. Resch, A. Ridsdale, T. Grammatikopoulos, F. Légaré, A. Stolow, A.F. Pegoraro, I. Tamblyn, Unsupervised Hyperspectral Stimulated Raman Microscopy Image Enhancement: De-Noising and Segmentation via One-Shot Deep Learning, Optics Express 29(21), 34205-34219 (2021). DOI
- Language: Python (Jupyter Notebooks)
- Code: GitHub
Deep learning and high harmonic generation (HHG)
Deep neural networks applied to high harmonic generation — the process by which intense laser light interacting with atoms and molecules produces radiation at many multiples of the driving laser frequency. Includes CNNs, autoencoders, regression models, and surrogate models for forward prediction (molecular parameters to spectra), inverse problem solving (spectra to molecular parameters), transfer learning, and molecular classification. Includes 10 Jupyter notebooks demonstrating all applications.
Leucippus
A plugin for the popular ImageJ framework that extracts positional and compositional information from nanoscale microscopy data using computer vision. Named after the ancient Greek philosopher credited with originating the theory of atomism.
- Language: Java (ImageJ plugin)
Reinforcement learning environments
ChemGymRL — Reinforcement Learning for Digital Chemistry
An open-source framework providing multiple customizable virtual chemistry laboratory benches where RL agents learn to perform chemical synthesis and material discovery tasks. Built on the Gymnasium API for compatibility with standard RL libraries (DQN, PPO, etc.). Includes reaction, extraction, distillation, and characterization benches, plus a Lab Manager for orchestrating multi-agent workflows.
- Paper: C. Beeler, S.G. Subramanian, K. Sprague, N. Chatti, C. Bellinger, M. Shahen, N. Paquin, M. Baula, A. Dawit, Z. Yang, X. Li, M. Crowley, I. Tamblyn, ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry, Digital Discovery (2024). DOI
- Language: Python (Gymnasium API)
- Website: ChemGymRL.com | Docs: docs.chemgymrl.com | Code: GitHub
Network simulation
HASHKAT — Agent-Based Social Network Simulation
A free, open-source, agent-based kinetic Monte Carlo simulation engine for modeling the growth of and information propagation within large-scale online social networks. Agents can speak different languages, live in different regions, have ideological preferences, send and rebroadcast messages, and connect to or disconnect from other agents. Incorporates multiple user profiles (standard users, organizations, celebrities, bots), trending topics, and advertising. Produces random, preferential attachment, and hybrid network topologies.
- Paper: K. Ryczko, A. Domurad, N. Buhagiar, I. Tamblyn, Hashkat: large-scale simulations of online social networks, Social Network Analysis and Mining 7, 4 (2017). DOI
- Language: C++ (with Bash and Python scripts)
- Website: hashkat.org
Analysis scripts
KIB, DOS, GW, MD
Open-source analysis scripts for various computational physics tasks. Free to use with proper citation (see the README in each repository).
- KIB — Kinetic Ising/Boltzmann model analysis
- DOS — Density of states calculations and visualization
- GW — Post-processing for GW quasiparticle calculations and energy level alignment
- MD — Molecular dynamics trajectory analysis
To clone: git clone https://github.com/itamblyn/XXX.git (where XXX is the repo name).
Software and tools we use
Simulation and electronic structure
VASP, CPMD, ABINIT, BerkeleyGW, VMD, Polymatic
Machine learning
Development and deployment
Databases and data resources
OPTIMADE | CMR | MoleculeNet | NIST Computational Chemistry Databank | Harvard Clean Energy Project Database | Materials Project | Quantum Machine Database
Neural network potentials and tutorials
- CSI Princeton Workshop (July 2020) — Neural network potential training workshop
- ASE_ANI — ANI neural network potentials with ASE interface
- DeePMD-kit — Deep potential molecular dynamics


