Dataset Structure

Schema Overview

The data types are selected to best suit a dataframe or SQL database for analysis.

Field name Recommended type Description Sample values
Gender Categorical / String The biological sex of the patient. Male, Female
Age String (Mixed) The age of the patient at the time of the study.
Note: Requires cleaning (see Data Fields).
018Y, 060Y, 069Y
Modality String The imaging method used.
Currently, all visible entries are MR (Magnetic Resonance).
MR
Description String A descriptive label of the body part, imaging type, and study details. MRI LUMBER SPINE(MRI SPINE), MRI CERVICAL SPINE(MRI SPINE)
Size_raw String The file size as displayed in the UI. 11.95 MB, 6.02 MB
Size_bytes Float / Int (Derived) The file size converted to a standard numerical unit for analysis. 11950000, 6020000

Data fields & quality notes

A detailed breakdown of the fields in the dataset: 

Gender

  • Type: Categorical
  • Observations: Standard binary classification observed so far (Male, Female). Check for Other or Unknown in the full set.

Age

  • Type: String (requires parsing)
  • Data Quality Issues:
    • Formatting: Values include a unit suffix (e.g., 018Y, 060Y, 069Y). You will need to strip the “Y” to perform numerical analysis.
    • Leading Zeros: Age values are zero-padded (e.g., 018Y for 18 years, 060Y for 60 years). Convert to integers for analysis.
    • Age Range: The visible sample shows ages ranging from 18 to 69 years, indicating an adult population dataset.

Modality

  • Type: Categorical
  • Observations: The sample shows only MR (Magnetic Resonance). MRI uses strong magnetic fields and radio waves to generate detailed images of internal body structures without ionizing radiation.

Description

  • Type: Text
  • Observations: Contains detailed anatomical information including specific spinal regions (Lumber Spine, Dorsal Spine, Cervical Spine) and study type (MRI SPINE). The visible sample shows a focus on spinal imaging studies.
  • Standardization: Text includes both the specific study name and a general category in parentheses (e.g., “MRI LUMBER SPINE(MRI SPINE)”). This hierarchical naming can be parsed to extract both specific and general anatomical regions. Note minor spelling variations (“LUMBER” should be “LUMBAR”).

Size

  • Type: String
  • Observations: File sizes range from approximately 6.02 MB to 12.57 MB in the sample. Includes the unit (e.g., “MB”). For analysis, this should be split into a numerical value and a unit, or normalized to a single unit (e.g., Bytes).
  • File Size Distribution: The MR Set shows relatively consistent file sizes in the 6-13 MB range, which is typical for MRI sequences. MRI files are generally larger than plain X-rays due to the multi-slice nature of the imaging.

Usage & considerations

Technical characteristics of
magnetic resonance imaging (MRI)

Physical
principles

MRI uses powerful magnetic fields (typically 1.5T or 3.0T for clinical imaging) to align hydrogen protons in the body. Radiofrequency pulses temporarily disturb this alignment, and the resulting signals as protons return to equilibrium are captured to create images. Different tissues have unique relaxation properties (T1, T2), enabling excellent soft tissue contrast.

Image
acquisition

Multi-planar imaging capability allows acquisition in axial, sagittal, and coronal planes without repositioning the patient. Various pulse sequences (T1-weighted, T2-weighted, FLAIR, gradient echo, diffusion-weighted) highlight different tissue characteristics. Typical spine MRI includes 100-200 individual slices across multiple sequences.

Superior soft tissue
contrast

Unparalleled visualization of neural tissue, intervertebral discs, ligaments, muscles, and spinal cord pathology. Can differentiate tissue types based on water content, cellularity, and molecular environment without ionizing radiation exposure.

Scanning time
and workflow

Typical spinal MRI examinations require 20-45 minutes depending on sequences and anatomical coverage. Longer acquisition times compared to X-ray or CT, but provides comprehensive 3D anatomical information in a single study.

Clinical applications for
spinal imaging

Primary modality for evaluating disc herniation, spinal cord compression, neural foraminal stenosis, degenerative changes, tumors, infections, and inflammatory conditions. Essential for pre-surgical planning and post-operative assessment of spinal interventions.

Primary use cases

  • Training deep learning models for spinal pathology detection, including disc herniation, stenosis, and degenerative disease classification.
  • Developing automated segmentation models for spinal cord, vertebrae, and intervertebral discs across different MRI sequences.
  • Multi-sequence MRI analysis requiring models that can leverage complementary information from T1, T2, and other pulse sequences.
  • Analysis of data storage requirements for MRI PACS systems and optimization of compression strategies.
  • Cross-modality studies comparing MRI with X-ray/CT for spinal assessment and validating AI model generalization across imaging types.

Privacy & ethics

  • While names are not visible, the combination of Age, Gender, specific anatomical study details, and timestamps (if added later) could be quasi-identifying. Ensure HIPAA/GDPR compliance before public release.
  • MRI images may contain more identifiable anatomical features than plain X-rays, requiring additional de-identification review. Consider removing or obscuring facial features in cervical spine studies that may include portions of the head.

Preprocessing needs

  • Age Normalization: Strip the "Y" suffix and convert to integers. Remove leading zeros for analysis (018Y → 18).
  • Size Normalization: Parse size strings to separate numerical values from units (MB). Convert to consistent units (Bytes) for computational analysis.
  • Description Parsing: Extract hierarchical information from description field. Split "MRI LUMBER SPINE(MRI SPINE)" into specific study ("LUMBER SPINE") and general category ("SPINE"). Correct spelling errors (LUMBER → LUMBAR, DORSAL → THORACIC if needed).
  • Modality Consistency: Confirm all entries are MR/MRI. This field enables filtering and combination with other imaging modalities (X-ray, CT) in multi-modal diagnostic studies.

Unlock your true
speed to scale 

Accelerate what data and AI can do together.