SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry Classification
arxiv(2024)
Abstract
Spectroscopic data, particularly diffraction data, contain detailed crystal
and microstructure information and thus are crucial for materials discovery.
Powder X-ray diffraction (XRD) patterns are greatly effective in identifying
crystals. Although machine learning (ML) has significantly advanced the
analysis of powder XRD patterns, the progress is hindered by a lack of training
data. To address this, we introduce SimXRD, the largest open-source simulated
XRD pattern dataset so far, to accelerate the development of crystallographic
informatics. SimXRD comprises 4,065,346 simulated powder X-ray diffraction
patterns, representing 119,569 distinct crystal structures under 33 simulated
conditions that mimic real-world variations. We find that the crystal symmetry
inherently follows a long-tailed distribution and evaluate 21 sequence learning
models on SimXRD. The results indicate that existing neural networks struggle
with low-frequency crystal classifications. The present work highlights the
academic significance and the engineering novelty of simulated XRD patterns in
this interdisciplinary field.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined