The goal of this competition is to predict the probability that a student will drop out a course in 10 days. Therefore, your results must fall in the interval of [0,1]. In data sets, there is a unique ID for each student taking a given course. In other words, each ID corresponds to a combination of one student ID and one course ID.
The submission file should have two columns, one column is the unique ID, and the other indicates the probability that the student will quit a particular course. The correct format of submission file should only contain a 80362 * 2 matrix, with no header or other information. An error will be reported if the format is incorrect.
Since the true value is either 1 (drop) or 0 (not drop), receiver operating characteristic will be used to evaluate your binary classifier. The accuracy of a submission, which is measured by the area under the ROC curve (AUC), depends on how well it separates dropouts from non-dropouts.
KDD Cup 2015