Home

Awesome

Dataset release for "Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection"

The Cataract-1K dataset consists of 1000 videos of cataract surgeries conducted in the eye clinic of Klinikum Klagenfurt from 2021 to 2023. The videos are recorded using a MediLive Trio Eye device mounted on a ZEISS OPMI Vario microscope. The Cataract-1K dataset comprises videos conducted by surgeons with a cumulative count of completed surgeries ranging from 1,000 to over 40,000 procedures. On average, the videos have a duration of 7.12 minutes, with a standard duration of 200 seconds. In addition to this large-scale dataset, we provide surgical phase annotations for 56 regular videos and relevant anatomical plus instrument pixel-level annotations for 2256 frames out of 30 cataract surgery videos. Furthermore, we provide a small subset of surgeries with two major irregularities, including "pupil reaction" and "IOL rotation," to support further research on irregularity detection in cataract surgery. Except for the annotated videos and images, the remaining videos in the Cataract-1K dataset are encoded with a temporal resolution of 25 fps and a spatial resolution of $512 \times 324$.

Phase recognition dataset

A regular cataract surgery can include twelve action phases, including incision, viscoelastic, capsulorhexis, hydro-dissection, phacoemulsification, irrigation-aspiration, capsule polishing, lens implantation, lens positioning, viscoelastic-suction, anterior-chamber flushing, and tonifying/antibiotics. Besides, the idle phases refer to the time spans in the middle of a phase or between two phases when the surgeons mainly change the instruments and no instrument is visible inside the frames.

We provide a large annotated dataset to enable comprehensive studies on deep-learning-based phase recognition in cataract surgery videos. Table 1 visualizes the phase annotations corresponding to 56 regular cataract surgery videos, with a spatial resolution of $1024 \times 768$, a temporal resolution of 30 fps, and an average duration of 6.45 minutes with a standard deviation of 2.04 minutes. This dataset comprises patients with an average age of 75 years, ranging from 51 to 93 years, and a standard deviation of 8.69 years. The videos present in the phase recognition dataset correspond to surgeries executed by surgeons with an average experience of 8929 surgeries and a standard deviation of 6350 surgeries. Frame-level annotations for phase recognition are provided in CSV files, determining the first and the last frames for all action phases per video. The preprocessing codes to extract all action and idle phases from a video using the CSV files are provided in the GitHub repository of the paper. Furthermore, Figure 1 demonstrates the total duration of the annotations corresponding to each phase from 56 videos.

Table 1. Visualizations of phase annotations for 56 normal cataract surgeries. The durations of the videos are different and normalized for better visualization.

<img src="./Dataset_webpage/imgs/Table1.png" alt=" Visualizations of phase annotations for 56 normal cataract surgeries. The durations of the videos are different and normalized for better visualization." width="1000"> <img src="./Dataset_webpage/imgs/pie_chart.png" alt=" Total duration of the annotated phases in the 56 annotated cataract surgery videos (in seconds)." width="600">

Figure 1. Total duration of the annotated phases in the 56 annotated cataract surgery videos (in seconds).

Semantic segmentation dataset

Figure 2 visualizes pixel-level annotations for relevant anatomical objects and instruments.

<img src="./Dataset_webpage/imgs/Figure3.png" alt=" Visualization of pixel-based annotations corresponding to relevant anatomical structures and instruments in cataract surgery and the challenges associated with different objects." width="1000">

Figure 2. Visualization of pixel-based annotations corresponding to relevant anatomical structures and instruments in cataract surgery and the challenges associated with different objects.

The semantic segmentation dataset includes frames from 30 regular cataract surgery videos with a spatial resolution of $1024 \times 768$ and an average duration of 6.52 minutes with a standard deviation of two minutes. Frame extraction is performed at the rate of one frame per five seconds. Subsequently, the frames featuring very harsh motion blur or out-of-scene iris are excluded from the dataset. We provide pixel-level annotations for three relevant anatomical structures, including iris, pupil, and intraocular lens, as well as nine instruments used in regular cataract surgeries including slit/incision knife, gauge, spatula, capsulorhexis cystome, phacoemulsifier tip, irrigation-aspiration, lens injector, capsulorhexis forceps, and katana forceps. All annotations are performed using polygons in the Supervisely platform, and exported as JSON files. Within this dataset, the included individuals possess an average age of 74.5 years, spanning from 51 to 90 years, with a standard deviation of 8.43 years. Additionally, the videos contained in the semantic segmentation dataset depict surgeries conducted by surgeons whose collective experience averages 8033 surgeries, with a standard deviation of 3894 surgeries. The provided dataset enables a reliable study of segmentation performance for relevant anatomical structures, binary instruments, and multi-class instruments. Pixel-level annotations are provided in two formats: (1) Supervisely format, for which we provide Python codes for mask creation from JSON files, and (2) COCO format, which also provides bounding box annotations for all pixel-level annotated objects. The latter annotations can be used for object localization problems. The preprocessing codes to create training masks for "anatomy plus instrument segmentation", "binary instrument segmentation", and "multi-class instrument segmentation" are provided in the GitHub repository of the paper. We have formed five folds with patient-wise separation, meaning every fold consists of the frames corresponding to six distinct videos. Table 2 compares the number of instances and their appearance percentage in the frames. Besides, Table 3 lists the average number of pixels per frame corresponding to each label.

Table 2. Number of instances and presence in the frames (% of total number of frames in each fold).

<img src="./Dataset_webpage/imgs/Table2.png" alt=" Number of instances and presence in the frames (% of the total number of frames in each fold)." width="1000">

Table 3. Average pixels corresponding to different labels per frame.

<img src="./Dataset_webpage/imgs/Table3.png" alt="Average pixels corresponding to different labels per frame." width="1000">

Irregularity detection dataset

This dataset contains two small subsets of major intra-operative irregularities in cataract surgery, including pupil reaction and lens rotation.

<img src="./Dataset_webpage/imgs/Figure4.png" alt="Intra-operative irregularities in cataract surgery." width="1000">

Figure 3. Intra-operative irregularities in cataract surgery.


Disclaimer

A reference must be made to the following publication when this dataset is used in any academic and research reports:

Ghamsarian, N., El-Shabrawi, Y., Nasirihaghighi, S. Putzgruber-Adamitsch, D., Zinkernagel, M., Wolf, S., Schoeffmann, K., Sznitman, R.: Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection (to appear)

BibTeX:

@inproceedings{Cataract-1K,
    author    = {Negin Ghamsarian and
                Yosuf El-Shabrawi and
                Sahar Nasirihaghighi and
                Doris Putzgruber-Adamitsch and
                Martin Zinkernagel and
                Sebastian Wolf and
                Klaus Schoeffmann and
                Raphael Sznitman},
    title     = {Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection (to appear)},
    
}

The datasets are licensed under Creative Commons 4.0 International (CC BY, Creative Commons License).

This license allows users of this dataset to copy, distribute, and transmit the work under the following conditions:

For further legal details, please read the complete license terms.


Download

If you agree to the above conditions, you are free to download the following:


Acknowledgments

This work was supported in part by the Haag-Streit Foundation, Switzerland; and in part by the Austrian Science Fund (FWF) under Grant P 31486-N31 and P 32010-N38.