🔬 Materials Science × Deep Learning

Classifying Crystal Structures
from Diffraction Patterns

DeepBravais is a physics-informed 1D residual network that classifies powder X-ray diffraction (PXRD) patterns into all 14 Bravais lattice types from first-principles simulations — no experimental database required.

⬇ Download Weights GitHub Repository View Notebook

Overview

What is DeepBravais?

A physics-informed deep learning pipeline that synthesises realistic X-ray diffraction patterns from first principles, then trains a compact 1D ConvNeXt to distinguish all 14 crystal symmetry classes — in Q-space, so scale invariance is preserved.

🔩

Lattice Sampling

Random unit-cell parameters drawn per Bravais class from physically valid ranges

📐

d-Spacing Calc

Closed-form IUCr formulae applied via vectorised NumPy — no pymatgen overhead

〰️

Pseudo-Voigt

Scherrer broadening, mixed Lorentz–Gauss peak profiles, randomised FWHM

📡

Poisson Noise

Photon-counting noise at 500–5 000 counts/bin to simulate real instruments

🧠

ConvNeXt-1D

Depthwise-separable residual classifier trained end-to-end on synthetic patterns

Dataset

500 000 Synthetic Patterns in 118 Minutes

Rather than downloading a multi-hundred-GB experimental database, all patterns are generated entirely on CPU using a vectorised NumPy physics engine, achieving a 20–50× speedup over pymatgen.

⚡ NumPy Physics Engine

Bypasses pymatgen's Python-object overhead by pre-computing (h,k,l) grids and running fully vectorised matrix multiplications.

~4 200 patterns / min

Pure CPU no GPU needed

1 hr 58 min total runtime

python data_generator.py \
  --n_samples 500000 \
  --numpy

📦 Dataset Specifications

Each pattern is a 1 024-point intensity array on a fixed Q-axis spanning 0.7–6.0 Å⁻¹, normalised to [0, 1].

500 000 total samples

35 714 per class

1 024 bins / pattern

~1.4 GB .npz on disk

Q-space 0.7 – 6.0 Å⁻¹

Cu Kα λ = 1.5406 Å

🔷

Triclinic

aP — most general symmetry; no angular or length constraints

🔶

Monoclinic

mP, mC — one 2-fold rotation axis; one mirror plane

🟦

Orthorhombic

oP, oI, oF, oC — three mutually perpendicular 2-fold axes

🟩

Cubic

cP, cI, cF — four 3-fold axes; highest possible symmetry

Architecture

ConvNeXt-1D Small (~1.6 M parameters)

Depthwise-separable convolutions with an inverted-bottleneck FFN and stochastic depth regularisation — adapted from ConvNeXt-V1 to the 1-D signal domain and trained with mixed-precision FP16.

pxrd_input

Raw pattern in Q-space, normalised [0, 1]

(N, 1024, 1)

↓

stem_conv + stem_ln

Conv1D kernel=7, stride=2 → LayerNorm — encodes local peak shapes

(N, 512, 32)

↓

Stage 1 — 2× ConvNeXt Block

DepthwiseConv k=7 → LN → MLP(128) → MLP(32) + skip connection

(N, 512, 32)

↓

Stage 2 — 2× ConvNeXt Block

Downsampled 2× via strided Conv1D projection; channels doubled

(N, 256, 64)

↓

Stage 3 — 3× ConvNeXt Block

Stochastic Depth regularisation activated; channels doubled

(N, 128, 128)

↓

Stage 4 — 2× ConvNeXt Block

Deepest stage; captures long-range systematic absence fingerprints

(N, 64, 256)

↓

gap → head_ln → Dropout → predictions

GlobalAveragePooling1D → LayerNorm → Dense(14, softmax)

(N, 14)

Training Run

20 Epochs on a Tesla P100

End-to-end training from random initialisation to 96.78% test accuracy, completed in approximately one hour using mixed-precision FP16 on a single Kaggle GPU.

T + 0:00 — Epoch 1

Model warm-up

Train accuracy 30.1% → Val accuracy 82.7%. XLA kernel fusion adds ~218 s to the first epoch.

T + 0:07 — Epochs 2–5

Rapid ascent

Validation accuracy crosses 90% by epoch 3, reaches 93.7% at epoch 5. Each epoch stabilises at ~104 s.

T + 0:25 — Epoch 11

Plateau broken by CosineDecay

Learning rate annealing drives a jump from 95.2% → 95.97%. Confusion matrix patterns begin to crystallise.

T + 0:52 — Epoch 20

Final checkpoint saved

Best val accuracy 96.87%. Weights restored; model evaluated on the 75 000-sample test set.

⚙️ Hyperparameters

20 epochs

256 batch size

3e-4 initial LR

0.5 dropout

FP16 mixed precision

CosineDecay LR schedule

🕐 Wall-clock Timings

1 hr 58 min data generation (CPU)

1 hr 03 min training (P100 GPU)

~104 s per epoch

127 ms per step

Training Curves

Loss & Accuracy over 20 epochs — Train vs. Validation. Best val accuracy at Epoch 20: 96.87%.

Results

96.78% on 75 000 Held-out Patterns

Macro F1-score of 0.9676 across all 14 classes. Cubic lattices reach near-perfect classification. The main source of confusion is between mP and mC, as expected from their similar systematic-absence rules.

Confusion Matrix

🗃️

14 × 14 Confusion Matrix

website/assets/plots/confusion_matrix.png

Copy your Kaggle screenshot here, then swap this block with an <img> tag pointing to the path above.

Confusion matrix on 75 000 test samples. Diagonal entries are nearly saturated; off-diagonal confusion is concentrated between adjacent monoclinic classes (mP ↔ mC).

Per-Class F1-Score

Classification Report

Symbol	Lattice System	Precision	Recall	F1-Score	Support
aP	Triclinic P	0.9448	0.9291	0.9369	5 358
mP	Monoclinic P	0.9404	0.8719	0.9049	5 357
mC	Monoclinic C	0.9525	0.9239	0.9379	5 358
oP	Orthorhombic P	0.9476	0.9558	0.9517	5 357
oI	Orthorhombic I	0.9925	0.9634	0.9777	5 357
oF	Orthorhombic F	0.9928	0.9731	0.9828	5 357
oC	Orthorhombic C	0.9353	0.9601	0.9475	5 357
tP	Tetragonal P	0.9549	0.9951	0.9746	5 357
tI	Tetragonal I	0.9419	0.9808	0.9610	5 357
hR	Trigonal R	0.9968	0.9996	0.9982	5 357
hP	Hexagonal P	0.9679	0.9976	0.9825	5 357
cP	Cubic P	1.0000	1.0000	1.0000	5 357
cI	Cubic I	0.9976	1.0000	0.9988	5 357
cF	Cubic F	0.9855	0.9994	0.9924	5 357
Macro Average		0.9679	0.9678	0.9676	75 000

References

Literature & Acknowledgements

1 He, K., Zhang, X., Ren, S., & Sun, J. (2016). Identity Mappings in Deep Residual Networks. ECCV 2016. arXiv:1603.05027
2 Warren, B. E. (1990). X-Ray Diffraction. Dover Publications.
3 Ong, S. P. et al. (2013). Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 68, 314–319. doi:10.1016/j.commatsci.2012.10.028
4 Toby, B. H. & Von Dreele, R. B. (2013). GSAS-II: the genesis of a modern open-source all purpose crystallography software package. J. Appl. Cryst., 46, 544–549. doi:10.1107/S0021889813003531
5 Cao, B., Dong, S., Liang, J., Luo, D., & Lookman, T. (2024). SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerates the Crystalline Symmetry Classification. ICLR 2025. arXiv:2406.15469 · Dataset (HuggingFace) · Code
6 Larsen, A. H. et al. (2017). The Atomic Simulation Environment — A Python library for working with atoms. J. Phys.: Condens. Matter, 29, 273002. doi:10.1088/1361-648X/aa680e
7 Andrejevic, N. et al. (2026). AlphaDiffract: Automated Crystallographic Analysis of Powder X-ray Diffraction Data. arXiv:2603.23367

Classifying Crystal Structuresfrom Diffraction Patterns