🔬 Materials Science × Deep Learning

Classifying Crystal Structures
from Diffraction Patterns

DeepBravais is a physics-informed 1D residual network that classifies powder X-ray diffraction (PXRD) patterns into all 14 Bravais lattice types from first-principles simulations — no experimental database required.

⬇ Download Weights GitHub Repository View Notebook
96.78%
Test Accuracy
500 k
Synthetic Patterns
1.6 M
Model Parameters
14
Bravais Classes

What is DeepBravais?

A physics-informed deep learning pipeline that synthesises realistic X-ray diffraction patterns from first principles, then trains a compact 1D ConvNeXt to distinguish all 14 crystal symmetry classes — in Q-space, so scale invariance is preserved.

🔩

Lattice Sampling

Random unit-cell parameters drawn per Bravais class from physically valid ranges

📐

d-Spacing Calc

Closed-form IUCr formulae applied via vectorised NumPy — no pymatgen overhead

〰️

Pseudo-Voigt

Scherrer broadening, mixed Lorentz–Gauss peak profiles, randomised FWHM

📡

Poisson Noise

Photon-counting noise at 500–5 000 counts/bin to simulate real instruments

🧠

ConvNeXt-1D

Depthwise-separable residual classifier trained end-to-end on synthetic patterns


500 000 Synthetic Patterns in 118 Minutes

Rather than downloading a multi-hundred-GB experimental database, all patterns are generated entirely on CPU using a vectorised NumPy physics engine, achieving a 20–50× speedup over pymatgen.

⚡ NumPy Physics Engine

Bypasses pymatgen's Python-object overhead by pre-computing (h,k,l) grids and running fully vectorised matrix multiplications.

~4 200 patterns / min
Pure CPU no GPU needed
1 hr 58 min total runtime
python data_generator.py \
  --n_samples 500000 \
  --numpy
📦 Dataset Specifications

Each pattern is a 1 024-point intensity array on a fixed Q-axis spanning 0.7–6.0 Å⁻¹, normalised to [0, 1].

500 000 total samples
35 714 per class
1 024 bins / pattern
~1.4 GB .npz on disk
Q-space 0.7 – 6.0 Å⁻¹
Cu Kα λ = 1.5406 Å
🔷
Triclinic
aP — most general symmetry; no angular or length constraints
🔶
Monoclinic
mP, mC — one 2-fold rotation axis; one mirror plane
🟦
Orthorhombic
oP, oI, oF, oC — three mutually perpendicular 2-fold axes
🟩
Cubic
cP, cI, cF — four 3-fold axes; highest possible symmetry

ConvNeXt-1D Small (~1.6 M parameters)

Depthwise-separable convolutions with an inverted-bottleneck FFN and stochastic depth regularisation — adapted from ConvNeXt-V1 to the 1-D signal domain and trained with mixed-precision FP16.

pxrd_input
Raw pattern in Q-space, normalised [0, 1]
(N, 1024, 1)
stem_conv + stem_ln
Conv1D kernel=7, stride=2 → LayerNorm — encodes local peak shapes
(N, 512, 32)
Stage 1 — 2× ConvNeXt Block
DepthwiseConv k=7 → LN → MLP(128) → MLP(32) + skip connection
(N, 512, 32)
Stage 2 — 2× ConvNeXt Block
Downsampled 2× via strided Conv1D projection; channels doubled
(N, 256, 64)
Stage 3 — 3× ConvNeXt Block
Stochastic Depth regularisation activated; channels doubled
(N, 128, 128)
Stage 4 — 2× ConvNeXt Block
Deepest stage; captures long-range systematic absence fingerprints
(N, 64, 256)
gap → head_ln → Dropout → predictions
GlobalAveragePooling1D → LayerNorm → Dense(14, softmax)
(N, 14)

20 Epochs on a Tesla P100

End-to-end training from random initialisation to 96.78% test accuracy, completed in approximately one hour using mixed-precision FP16 on a single Kaggle GPU.

T + 0:00 — Epoch 1
Model warm-up
Train accuracy 30.1% → Val accuracy 82.7%. XLA kernel fusion adds ~218 s to the first epoch.
T + 0:07 — Epochs 2–5
Rapid ascent
Validation accuracy crosses 90% by epoch 3, reaches 93.7% at epoch 5. Each epoch stabilises at ~104 s.
T + 0:25 — Epoch 11
Plateau broken by CosineDecay
Learning rate annealing drives a jump from 95.2% → 95.97%. Confusion matrix patterns begin to crystallise.
T + 0:52 — Epoch 20
Final checkpoint saved
Best val accuracy 96.87%. Weights restored; model evaluated on the 75 000-sample test set.
⚙️ Hyperparameters
20 epochs
256 batch size
3e-4 initial LR
0.5 dropout
FP16 mixed precision
CosineDecay LR schedule
🕐 Wall-clock Timings
1 hr 58 min data generation (CPU)
1 hr 03 min training (P100 GPU)
~104 s per epoch
127 ms per step

Training Curves

Training Curves
Loss & Accuracy over 20 epochs — Train vs. Validation. Best val accuracy at Epoch 20: 96.87%.

96.78% on 75 000 Held-out Patterns

Macro F1-score of 0.9676 across all 14 classes. Cubic lattices reach near-perfect classification. The main source of confusion is between mP and mC, as expected from their similar systematic-absence rules.

Confusion Matrix

🗃️
14 × 14 Confusion Matrix
website/assets/plots/confusion_matrix.png
Copy your Kaggle screenshot here, then swap this block with an <img> tag pointing to the path above.
Confusion matrix on 75 000 test samples. Diagonal entries are nearly saturated; off-diagonal confusion is concentrated between adjacent monoclinic classes (mP ↔ mC).

Per-Class F1-Score

Classification Report

Symbol Lattice System Precision Recall F1-Score Support
aP Triclinic P 0.9448 0.9291 0.9369 5 358
mP Monoclinic P 0.9404 0.8719 0.9049 5 357
mC Monoclinic C 0.9525 0.9239 0.9379 5 358
oP Orthorhombic P 0.9476 0.9558 0.9517 5 357
oI Orthorhombic I 0.9925 0.9634 0.9777 5 357
oF Orthorhombic F 0.9928 0.9731 0.9828 5 357
oC Orthorhombic C 0.9353 0.9601 0.9475 5 357
tP Tetragonal P 0.9549 0.9951 0.9746 5 357
tI Tetragonal I 0.9419 0.9808 0.9610 5 357
hR Trigonal R 0.9968 0.9996 0.9982 5 357
hP Hexagonal P 0.9679 0.9976 0.9825 5 357
cP Cubic P 1.0000 1.0000 1.0000 5 357
cI Cubic I 0.9976 1.0000 0.9988 5 357
cF Cubic F 0.9855 0.9994 0.9924 5 357
Macro Average 0.9679 0.9678 0.9676 75 000

Literature & Acknowledgements

  1. 1 He, K., Zhang, X., Ren, S., & Sun, J. (2016). Identity Mappings in Deep Residual Networks. ECCV 2016. arXiv:1603.05027
  2. 2 Warren, B. E. (1990). X-Ray Diffraction. Dover Publications.
  3. 3 Ong, S. P. et al. (2013). Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 68, 314–319. doi:10.1016/j.commatsci.2012.10.028
  4. 4 Toby, B. H. & Von Dreele, R. B. (2013). GSAS-II: the genesis of a modern open-source all purpose crystallography software package. J. Appl. Cryst., 46, 544–549. doi:10.1107/S0021889813003531
  5. 5 Cao, B., Dong, S., Liang, J., Luo, D., & Lookman, T. (2024). SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerates the Crystalline Symmetry Classification. ICLR 2025. arXiv:2406.15469 · Dataset (HuggingFace) · Code
  6. 6 Larsen, A. H. et al. (2017). The Atomic Simulation Environment — A Python library for working with atoms. J. Phys.: Condens. Matter, 29, 273002. doi:10.1088/1361-648X/aa680e
  7. 7 Andrejevic, N. et al. (2026). AlphaDiffract: Automated Crystallographic Analysis of Powder X-ray Diffraction Data. arXiv:2603.23367