Statistics How To

C-Statistic: Definition, Examples, Weighting and Significance

ROC Curve > C-Statistic

You may want to read this article first: What is a Receiver Operating Characteristic (ROC) curve?.

What is a C-Statistic?

C-statistic

The concordance statistic is equal to the area under a ROC curve.

The C-statistic (sometimes called the “concordance” statistic or C-index) is a measure of goodness of fit for binary outcomes in a logistic regression model. In clinical studies, the C-statistic gives the probability a randomly selected patient who experienced an event (e.g. a disease or condition) had a higher risk score than a patient who had not experienced the event. It is equal to the area under the Receiver Operating Characteristic (ROC) curve and ranges from 0.5 to 1.

  • A value below 0.5 indicates a very poor model.
  • A value of 0.5 means that the model is no better than predicting an outcome than random chance.
  • Values over 0.7 indicate a good model.
  • Values over 0.8 indicate a strong model.
  • A value of 1 means that the model perfectly predicts those group members who will experience a certain outcome and those who will not.

The C-statistic isn’t used very often as it only gives you a general idea about a model; A ROC curve contains much more information about accuracy, sensitivity and specificity.

Weighting

A weighted c-index is used when the cost of failing to predict a positive outcome (like a test for cancer) is higher than benefit of correctly predicting a negative outcome. Weighting penalizes models that result in small probability differences for positive and negative outcomes, but doesn’t change the value of the C-statistic. It can also be used to adjust for stratified random sampling.

Statistical Significance

Like most statistics, the C-statistic is sometimes paired with a confidence interval. For example, you might have a result of 0.63 with a confidence interval ranging from 0.53 to 0.73). In general, any result is not significant if it includes 0.5, even if it includes the relevant C-statistic. For example, a result of 0.63 with a CI ranging from 0.43 to 0.83 would not be significant because it includes 0.5 in that range.

Reference:
Hosmer DW, Lemeshow S. Applied Logistic Regression (2nd Edition). New York, NY: John Wiley & Sons; 2000.

------------------------------------------------------------------------------

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
C-Statistic: Definition, Examples, Weighting and Significance was last modified: October 15th, 2017 by Stephanie Glen