AI tasks


The ACOUSLIC-AI challenge is designed to evaluate and benchmark AI models that automate the measurement of fetal abdominal circumference using blind-sweep ultrasound data. Its ultimate aim is to broaden the accessibility of prenatal care in areas with limited resources. 

This challenge involves analyzing a series of 2D ultrasound frames extracted from blind-sweep sequences acquired by novice operators. Participants are tasked with identifying the most suitable frame for measuring the fetal abdominal circumference. Along with selecting this optimal frame, participants must also provide the binary segmentation mask of the abdomen on the ultrasound image corresponding to the selected frame. This segmentation mask should be fit for an ellipse fitting tool to measure its circumference during evaluation.

In particular, the algorithm should provide the two following outputs:

  • Fetal abdomen segmentation mask: A 2D numpy array of type np.uint8, matching the dimensions of the input images (744x562 pixels) with a pixel spacing of 0.28 mm. An ellipse fitting tool will be used to fit an ellipse to the segmentation mask so participants should ensure that they use proper post-processing to ensure their mask is suitable for this (e.g. remove disconnected components that are not part of the intended segmentation).  The circumference of this ellipse will then be calculated and compared to a reference measurement.
  • Fetal frame number: An integer representing the frame number where the segmentation was identified, or -1 if no relevant frame was identified. Please note that indexing in the evaluation software begins at 0, meaning valid frame numbers range from [0, 840), with -1 indicating no relevant frame found.

Evaluation


Performance metrics

The ACOUSLIC-AI challenge employs a comprehensive suite of metrics to evaluate the performance of the participating algorithms:

Dice Similarity Coefficient (DSC): This metric quantifies the spatial overlap accuracy of the algorithm's segmentation against the ground truth mask. A higher DSC indicates a closer match to the ground truth and thus a better segmentation performance. It's important to note that the ground truth mask, if available, corresponds to the annotation in the specified frame of the fetal abdomen stack (i.e., this metric is computed on the 2D ground truth and prediction masks corresponding to the fetal frame number). For this comparison, the ground truth mask is converted to a binary format (1 representing the fetal abdomen and 0 representing the background).

Weighted Frame Selection Score (WFSS): WFSS evaluates the algorithm's frame selection accuracy, assigning higher scores to accurately identified and chosen clinically relevant frames. A score of 1 denotes correct identification of optimal planes, 0.6 for suboptimal plane selection when an optimal is available, and 0 for the selection of irrelevant frames when optimal/suboptimal ones are present.

Hausdorff Distance (HD): This metric measures the maximum distance between the algorithm's predicted boundary and the actual ground truth boundary, providing a sense of the largest potential error in the segmentation boundary prediction. Similarly to the computation of the DICE coefficient, the 2D ground truth mask in the selected frame is converted to a binary format for evaluation against the 2D predicted mask. Additionally, only the pixels within the field of view of the ultrasound beam are considered during this process.

Normalized Absolute Error (NAE): the normalized absolute error for abdominal circumference measurements provides a scale-independent measure of the precision in abdominal circumference estimation. It's calculated by taking the absolute difference between the ground truth and the predicted circumference, normalized by the maximum of either value to account for the scale:


  Where:

  •   is the Normalized Absolute Error for Abdominal Circumference.
  •   is the ground truth Abdominal Circumference measurement — if present — in the sweep corresponding to the algorithm's selected frame.
  •   is the algorithm's predicted Abdominal Circumference measurement.
  •  is a small constant to prevent division by zero, set to 1e-6.

A lower NAE indicates a higher accuracy in predicting the AC measurements from the segmented masks, which is crucial for clinical applicability.

Note: The predicted abdominal circumference used to compute this metric is measured using the fit_ellipses function in the ellipse fitting tool provided in this repository. Ellipses extending beyond the field of view of the ultrasound beam are extrapolated using the contour points contained within the FOV.

The combined use of these metrics allows for a balanced evaluation of the algorithms, not only in terms of their segmentation accuracy but also their practical utility in a clinical setting. The evaluation software for this challenge is publicly available on ACOUSLIC-AI-evaluation-method GitHub repository.

Ranking method

The performance rank for all submitted algorithms is determined based on the following composite score:

The weight assignment prioritizes the accuracy of fetal abdominal circumference measurements (as the most critical factor, underscoring the importance of precise clinical measurement. Following this, equal importance is assigned to the clinical relevance of the frame selection (WFSS), which ensures the selection of the most appropriate planes for assessment, and the accuracy of the delineated fetal abdomen masks (DSC). These metrics mirror the steps that an expert would take to provide an abdominal circumference measurement for a specific case. While the geometric accuracy of boundary delineation (HD) is not included in the ranking, it is provided as an additional metric for comparing algorithm performance.

The results of the ACOUSLIC-AI challenge will be published in a journal article. Teams whose algorithms rank in the top three on the Final Leaderboard (Final Test Phase) will be invited to co-author the paper, with a limit of three members per team.