Dataset Structure

Schema Overview

The data types are selected to best suit a dataframe or SQL database for analysis.

Field name Recommended type Description Sample values
Gender Categorical / String The biological sex of the patient. Female (primarily)
Age String (Mixed) The age of the patient.
Note: Requires cleaning.
045Y, 052Y, 061Y
Modality String The imaging method. All entries are MG (Mammography). MG
Description String Study type and breast imaging views. Bilateral Screening Mammo, Diagnostic Mammo Left, Tomosynthesis
Size_raw String The file size as displayed. 45 MB, 120 MB
Size_bytes Float / Int (Derived) Converted to bytes. 45000000, 120000000

Usage & considerations

Technical characteristics of
mammography (MG)

Specialized
X-Ray System

Mammography units use low-energy X-rays (25-35 kVp) optimized for soft tissue contrast. Breast compression reduces tissue thickness, minimizes motion blur, decreases radiation dose, and separates overlapping structures.

Digital Breast
Tomosynthesis

3D mammography acquires multiple projections. Reconstructed slices reduce tissue superimposition, improving lesion detection and reducing false positives. File sizes 5-10x larger than 2D.

Primary use cases

  • Training deep learning models for automated breast cancer detection and classification.
  • Developing AI systems for breast density assessment and BI-RADS categorization.

Unlock your true
speed to scale 

Accelerate what data and AI can do together.