The data types are selected to best suit a dataframe or SQL database for analysis.
| Field name | Recommended type | Description | Sample values |
|---|---|---|---|
| Gender | Categorical / String | The biological sex of the patient. | Female (primarily) |
| Age | String (Mixed) | The age of the patient. Note: Requires cleaning. |
045Y, 052Y, 061Y |
| Modality | String | The imaging method. All entries are MG (Mammography). | MG |
| Description | String | Study type and breast imaging views. | Bilateral Screening Mammo, Diagnostic Mammo Left, Tomosynthesis |
| Size_raw | String | The file size as displayed. | 45 MB, 120 MB |
| Size_bytes | Float / Int | (Derived) Converted to bytes. | 45000000, 120000000 |
Mammography units use low-energy X-rays (25-35 kVp) optimized for soft tissue contrast. Breast compression reduces tissue thickness, minimizes motion blur, decreases radiation dose, and separates overlapping structures.
3D mammography acquires multiple projections. Reconstructed slices reduce tissue superimposition, improving lesion detection and reducing false positives. File sizes 5-10x larger than 2D.