Please note: you can download sample submission file now on the ‘data’ webpage.
1. The intermediate results of validation set: each team can submit the intermediate results of the validation set anytime before the release of test set (23:59 PM UTC Sep. 14). Our system will evaluate the results in real time. Please combine the results of four tasks into one text file before submitting it. The name of the file shall be “temp.txt”. The detailed format of the text file is as below (<tab> refers to the string “tab”):
22<tab>http://www.homepage2.com/bbb<tab>f<tab>distinguished professor;dean<tab>http://www.homepage2.com/cc.jpg<tab>somename at mit dot edu<tab>P.R.China
Jiawei Han<tab>data mining<tab>database<tab>information networks<tab>knowledge discovery<tab>machine learning
2. The final results of test set: each team will have to submit the final results of the test set before the deadline (23:59 PM UTC, Sep. 15). Please combine the results of four tasks into one text file before submitting it. The name of the file shall be “temp.txt”.
Please note: The submitted text file has to be UTF-8 encoding without a BOM. In addition, the file should not include extra space except the separators among words. The first line of each task is the header line and the end of the file is a blank line.
The system will first calculate the score for each task, according to the submitted results.
In Task 1, each researcher has k personal profile variables (k = 6 in this task), which are needed to be predicted by using perfectly matching method, except for the variable ”title/position”. The score will be 1 if the match is perfect, otherwise it will be 0. Since the “title/position” variable is a sample set, the Jaccard index (i.e. the size of the intersection divided by the size of the union of the sample sets) should be used to calculate the similarity score (which is between 0 and 1) between the extracted set calculated by the participated team, and the given labeled set. The final prediction score for each researcher is the average score of k personal profile variables. The final score for each participant on this task is the average prediction score for the researchers. The formula shows as follows,
For the format of submission content of task 1, please refer to Taskone.
The score2 of task 2 is the ratio that scholars' research interest calculated by each team is identical with the given scolars' research interst. The formula shows as follows,
where N is the number of samples of test dataset in task 2, Ti is a set of the ith scholar's research interest calculated by each team and Ti* is a set of the ith given scholar's research interest. Here, |Ti*|=3.
The score3 of task 3 is calculated using the relative error between the number of citations of scholars predicted by each team and the truthful number of citations of scholars.
The formula for calculatign the final score of each team is as follows:
The final score will be the basis for this evaluation.
The top 10 teams are required to submit the packaged source code and algorithmic documentation. Organizers will confirm the credibility of algorithms and results. If some term is found misbehaving or cheating, the rank of that term will be canceled and replaced by the following team. Packaged source code should be able to run and enviroment and instructions are required. Source code and documentation are submitted to us by sending emails (The email address will be announced later.).
Open Academic Data Challenge 2017
Sponsor：Microsoft, Tsinghua, CKCEST