What is the consistency index?The English name of C-index is Harmony Index, and the Chinese tran" />
What is the consistency index?
The English name of C-index is Harmony Index, and the Chinese translation is Harmony Index. It was first put forward by Frank Harell Jr1996, a professor of biostatistics at Vanderbilt University. It is mainly used to calculate the discrimination between the predicted value and the true value of COX model in survival analysis, which is actually similar to the familiar AUC. It is widely used to evaluate the prediction accuracy of the prognosis model of tumor patients. The general evaluation model mainly has two aspects. One is the goodness of fit of the model, and the commonly used evaluation indexes are R square, -2logL, AIC, BIC, etc.
The other is the prediction accuracy of the model, which, as the name implies, is the difference, mean square error and relative error between the real value and the predicted value of the model. In clinical application, more attention is paid to the prediction accuracy, the main purpose of modeling is prediction, and C index belongs to the prediction accuracy in model evaluation index.
The calculation method of C index is to randomly pair all the research objects in the research data. Take survival analysis as an example. If the predicted survival time of one of the two patients is longer than that of the other patient, or the predicted survival time of the patient with high survival probability is longer than that of the other patient, it is called that the predicted result is consistent with the actual result, and it is called consistency.
Calculate c exponent = k/m.
As can be seen from the above calculation method, the C-index is between 0.5 and 1 (in the case of random pairing, the probability of coincidence and inconsistency is exactly 0.5). 0.5 is completely inconsistent, indicating that the model has no prediction function, and 1 is completely consistent, indicating that the prediction results of the model are completely consistent with the actual situation. Generally speaking, the accuracy of C index is low when it is 0.50-0.70, and medium when it is 0.7 1-0.90. And higher than 0.90 is high accuracy, which is somewhat similar to the correlation coefficient.
It is difficult to measure the accuracy only from the number C-index, so people want to use a statistical test to convince and prove this level, just as it is arbitrary to judge the expression difference only by looking at the multiple of the difference when screening gene differences. At this point, Bootstrap technology is introduced to test the accuracy of the prediction model. Bootstrap is an important statistical method in nonparametric statistics, which is used to estimate the variance of statistics and then estimate the interval.
The core idea and basic steps of Bootstrap method are as follows:
(1) uses resampling technology to extract a certain number of samples from the original samples, which allows repeated sampling.
(2) Calculate the given statistic t according to the extracted samples.
(3) Repeat the above n times (generally greater than 1000) to obtain n statistics t. ..
(4) Calculate the sample tree variance of n statistics t to obtain the variance of statistics.
In addition, if the data set is large, it can be split according to different scales, some for modeling and some for verification. Cross-validation, such as 5 times, 10 times, etc.
Although it seems complicated, in fact, some people have done these things. There are packages in R that can directly calculate the consistency index: Hmisc and compareC, both of which can calculate the c-index.