Dataset Structure

Schema Overview

The data types are selected to best suit a dataframe or SQL database for analysis.

Field name Recommended type Description Sample values
Gender Categorical / String The biological sex of the patient. Male, Female
Age String (Mixed) The age of the patient at the time of the study.
Note: Inherits from parent imaging study.
Varies
Modality String The document type. All entries are SR (Structured Report). SR
Description String Report type and content category. Measurement Report, CAD Report, Key Object Selection
Size_raw String The file size as displayed in the UI. 50 KB, 150 KB
Size_bytes Float / Int (Derived) The file size converted to a standard numerical unit for analysis. 50000, 150000

Usage & considerations

Technical characteristics of
structured report (SR)

DICOM SR standard

Structured Reports follow the DICOM SR standard, which defines a hierarchical tree structure for encoding clinical information. Content items include measurements, observations, codes from standard terminologies (SNOMED, RadLex), and relationships between findings. This structured format enables machine parsing and automated quality assurance.

SR document types

Common SR types include: Measurement Reports (quantitative analysis results), CAD Reports (computer-aided detection findings), Key Object Selection (references to significant images), Dose Reports (radiation exposure documentation), and Comprehensive SR (full radiology reports with coded findings).

Content organization

SR documents organize content using coded concepts linked through relationships (CONTAINS, HAS OBS CONTEXT, INFERRED FROM). Each content item has a concept name (from standard vocabulary), value (numeric, text, or coded), and optional qualifiers. This semantic structure enables sophisticated queries and analytics.

Integration with images

SRs maintain references to source images through DICOM UIDs, enabling correlation between findings and imaging data. Spatial coordinates can be encoded to mark lesion locations. This linkage is essential for training AI models with ground-truth annotations derived from clinical reports.

Clinical workflow

SRs are generated by PACS workstations, CAD systems, quantitative analysis tools, and voice recognition systems. They enable standardized reporting templates, automated data extraction for registries, quality metrics calculation, and clinical decision support through real-time rule evaluation.

Primary use cases

  • Extracting ground-truth labels and annotations for training medical imaging AI models from clinical structured reports.
  • Building natural language processing (NLP) systems to convert unstructured radiology reports into structured data.
  • Developing automated quality assurance systems that validate measurement consistency and report completeness.
  • Creating clinical decision support tools that trigger alerts based on critical findings encoded in structured reports.
  • Analyzing reporting patterns and inter-reader variability using standardized terminology from SR documents.
  • Populating disease registries and research databases through automated extraction of coded diagnoses and measurements.

Unlock your true
speed to scale 

Accelerate what data and AI can do together.