Second Annual Data Science Bowl

Data Files

Each has the same contents but use different packaging methods.

In this dataset, you are given hundreds of cardiac MRI images in DICOM format. 2D图片，一个心动周期内的大约30张图片 These are 2D cine images that contain approximately 30 images across the cardiac cycle. Each slice is acquired on a separate breath hold. This is important since the registration from slice to slice is expected to be imperfect.

The competition task is to create an automated method capable of determining the left ventricle volume at two points in time: after systole, when the heart is contracted and the ventricles are at their minimum volume, and after diastole, when the heart is at its largest volume.

　　射血分数：临床医生在评价病人的心功能时射血分数(ejection fraction，EF)是一个非常关键的指标。心脏在不停息地进行收缩舒张时才能发挥泵血的功能，心室舒张末期充盈量最大，此时心室的容积称为舒张末期容积。心室射血期末，容积最小，这时的心室容积称为收缩末期容积。舒张末期容积与收缩末期容积之差，即为搏出量。正常成年人，左心室舒张末期容积估计约为145ml，收缩末期容积约75ml，搏出量为70ml。可见，每一次心跳，心室内血液并没有全部射出。搏出量占心室舒张末期容积的百分比，称为射血分数。

I see more than one series at the same slice location. How should we deal with those cases?

Generally, a slice location is repeated if there is an artifact on the images. You can use either slice but the odds are that the last slice at a given slice location is the best the technologist could acquire.

Some MRI images are not consistent (in size, shape, or structure). What should we do about these?

We have opted to include as many cases as possible in this dataset. As this is real data from many sources, it is bound to have some amount of unwanted variability. You should do your best to handle these files. Since this is a two stage competition and the test set may have unseen abnormalities, we recommend including some form of error catching as you write your code.

项目的目的

Illustrated represetation of the DSB challenge

The challenge with using MRI to measure cardiac volumes and derive ejection fraction, however, is that the process is manual and slow. A skilled cardiologist must analyze MRI scans to determine EF. The process can take up to 20 minutes to complete—time the cardiologist could be spending with his or her patients. Making this measurement process more efficient will enhance doctors' ability to diagnose heart conditions early, and carries broad implications for advancing the science of heart disease treatment.

The 2015 Data Science Bowl challenges you to create an algorithm to automatically measure end-systolic and end-diastolic volumes in cardiac MRIs. You will examine MRI images from more than 1,000 patients. This data set was compiled by the National Institutes of Health and Children's National Medical Center and is an order of magnitude larger than any cardiac MRI data set released previously. With it comes the opportunity for the data science community to take action to transform how we diagnose heart disease.

This is not an easy task, but together we can push the limits of what's possible. We can give people the opportunity to spend more time with the ones they love, for longer than ever before.

We all have a heart. Although we often take it for granted, it's our heart that gives us the moments in life to imagine, create, and discover. Yet cardiovascular disease threatens to take away these moments. Each day, 1,500 people in the U.S. alone are diagnosed with heart failure—but together, we can help. We can use data science to transform how we diagnose heart disease. By putting data science to work in the cardiology field, we can empower doctors to help more people live longer lives and spend more time with those that they love.

Declining cardiac function is a key indicator of heart disease. Doctors determine cardiac function by measuring end-systolic and end-diastolic volumes (i.e., the size of one chamber of the heart at the beginning and middle of each heartbeat), which are then used to derive the ejection fraction (EF). EF is the percentage of blood ejected from the left ventricle with each heartbeat. Both the volumes and the ejection fraction are predictive of heart disease. While a number of technologies can measure volumes or EF, Magnetic Resonance Imaging (MRI) is considered the gold standard test to accurately assess the heart's squeezing ability. The challenge with using MRI to measure cardiac volumes and derive ejection fraction, however, is that the process is manual and slow. A skilled cardiologist must analyze MRI scans to determine EF. The process can take up to 20 minutes to complete—time the cardiologist could be spending with his or her patients. Making this measurement process more efficient will enhance doctors' ability to diagnose heart conditions early, and carries broad implications for advancing the science of heart disease treatment.

The 2015 Data Science Bowl challenges you to create an algorithm to automatically measure end-systolic and end-diastolic volumes in cardiac MRIs. You will examine MRI images from more than 1,000 patients. This data set was compiled by the National Institutes of Health and Children's National Medical Center and is an order of magnitude larger than any cardiac MRI data set released previously. With it comes the opportunity for the data science community to take action to transform how we diagnose heart disease.

This is not an easy task, but together we can push the limits of what's possible. We can give people the opportunity to spend more time with the ones they love, for longer than ever before.

结果评估

Evaluation Submissions will be evaluated on the Continuous Ranked Probability Score (CRPS). For each MRI, you must predict a cumulative probability distribution for both the systolic and diastolic volumes (two separate distributions per case). The CRPS is computed as follows: where P is the predicted distribution, N is the number of rows in the test set (equal to twice the number of cases), V is the actual volume (in mL) and H(x) is the Heaviside step function (H(x)=1 for x≥0 and zero otherwise). While it is not simple to visualize the CRPS, the shaded area on the figure below may be a helpful guide for understanding the error term between the predicted distribution and actual volume:

The entry will not score if any of the predicted values has P(y≤k)>P(y≤k+1)

for any k (i.e., the CDF must be non-decreasing).

Submission File

For each Id, you must predict 600 values that represent its cumulative distribution from 0 to 599 mL. P0 represents the probability the volume is less than or equal to 0 mL, P1 represents the probability the volume is less than or equal to 1 mL, etc. The file must have a header and contain all 600 values in the following format:

Id,P0,P1,P2,P3,...,P599
1_systolic,0.1,0.3,0.33,0.4,...,1.0
1_diastolic,0.1,0.24,0.25,0.35,...,1.0
...
etc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Second Annual Data Science Bowl

Data Files

项目的目的

结果评估

Submission File

相关资源

Clone this wiki locally