Datasets from articles
For code repositories, see Codes and Computing.
Schrödinger equation dataset
Numerical solutions of the time-independent Schrödinger equation for an electron in 2D confining potentials, used to train deep convolutional neural networks. The dataset contains millions of solved instances across four classes of electrostatic potentials (simple harmonic oscillator, infinite well, double inverted Gaussians, and random potentials). Each entry includes the potential, ground-state wave function, and associated energies.
- Size: ~700 GB total (411 HDF5 files across 5 ZIP archives); sample dataset of 8.5 GB also available
- Format: HDF5 (
.h5), compressed as ZIP - Paper: K. Mills, M. Spanner, I. Tamblyn, Deep learning and the Schrödinger equation, Physical Review A 96, 042113 (2017). DOI
- Download: NRC Digital Repository
Deep learning and density functional theory
Self-consistent charge densities and energy components (correlation, exchange, external, kinetic, and total energies) for multielectron systems computed with three exchange-correlation functionals: LDA, PBE, and MGGA-Pittalis. Systems include 1, 2, 3, and 10 electrons in both simple harmonic oscillator (SHO) and random (RND) external potentials.
- Size:
SHO_10e.h5is ~100 GB; smaller variants for 1, 2, and 3 electrons also available - Format: HDF5 (
.h5) - Files:
SHO_10e,SHO_3e,SHO_2e,SHO_1e,RND_10e,RND_3e,RND_2e,RND_1e - Paper: K. Ryczko, D.A. Strubbe, I. Tamblyn, Deep learning and density-functional theory, Physical Review A 100, 022512 (2019). DOI
- Download: Google Drive | Command-line script: cli_download_from_gdrive.sh
Deep learning and high harmonic generation
Training data for deep neural networks applied to high harmonic generation (HHG). Contains time-dependent dipoles and spectra from reduced-dimensionality models of di- and triatomic systems, parameterized by laser pulse intensity, internuclear distance, and molecular orientation. Includes data for both the forward problem (molecular parameters to spectra) and the inverse problem (spectra to molecular parameters).
- Format: Numerical data files with accompanying Jupyter notebooks (TensorFlow/Keras)
- Paper: M. Lytova, M. Spanner, I. Tamblyn, Deep learning and high harmonic generation, Canadian Journal of Physics 101(3) (2022). DOI
- Download: Google Drive | Code: GitHub
Hyperspectral stimulated Raman microscopy
Hyperspectral stimulated Raman scattering (SRS) microscopy image data of a lithium ore sample. Used to demonstrate UHRED (Unsupervised Hyperspectral Resolution Enhancement and Denoising), an unsupervised deep learning method for automatic denoising that requires only a single hyperspectral image (one-shot, no labelled training data needed). Combined with k-means clustering, the method produces automatic chemical species maps.
- Format: Hyperspectral image data (3D arrays: x, y, spectral channels)
- Paper: P. Abdolghader, G. Resch, A. Ridsdale, T. Grammatikopoulos, F. Légaré, A. Stolow, A.F. Pegoraro, I. Tamblyn, Unsupervised Hyperspectral Stimulated Raman Microscopy Image Enhancement: De-Noising and Segmentation via One-Shot Deep Learning, Optics Express 29(21), 34205-34219 (2021). DOI
- Download: Google Drive | Code: GitHub
Big graphene dataset
Over 500,000 density functional theory calculations of graphene structures (3.5 nm x 3.5 nm unit cell, 60-atom systems) with random structural defects, computed using the PBE functional in VASP. Each entry contains carbon atom coordinates and total energy values. Used to demonstrate that extensive deep neural networks (EDNNs) trained on small systems can predict total energies of larger systems in ~57 ms with DFT-level accuracy.
- Size: ~3.7 GB compressed; 501,473 training files and 60,744 testing files
- Format: HDF5 (
.h5), distributed as.tar.gz - Paper: K. Mills, M. Spanner, I. Tamblyn, Extensive deep neural networks for transferring small scale learning to large scale systems, Chemical Science 10(15), 4129-4140 (2019). DOI
- License: Open Government Licence - Canada / CC BY 2.0
- Download: Google Drive | NRC Digital Repository


