Home   >  Competitions   > 

Background Introduction

The conference on Natural Language Processing and Chinese Computing (NLPCC) is the annual conference of CCF TCCI (Technical Committee of Chinese Information, China Computer Federation). The NLPCC conferences have been successfully held in Beijing (2012),Chongqing (2013), Shenzhen (2014), Nanchang (2015) and Kunming (2016). This year’s NLPCC conference will be held in Dalian on November 8 - 12, 2017.

NLPCC 2017 will follow the NLPCC tradition of holding several shared tasks in natural language processing and Chinese computing. This year’s shared tasks focus on both classic problems and newly emerging problems, including Chinese Word Semantic Relation Classification, News Headline Categorization, Single Document Summarization, Emotional Conversation Generation, Open Domain Question Answering, and Social Media User Modeling.

Participants from both academia and industry are welcomed. Each group can participate in one or multiple tasks and members in each group can attend the NLPCC conference to present their techniques and results. The participants will be invited to submit papers to the main conference and the accepted papers will appear in the conference proceedings published by Springer LNCS.



Traditional news document summarization techniques have been widely explored on the DUC and TAC conferences, and existing datasets for document summarization are mainly focused on western languages, while Chinese news summarization has seldom been explored. In this evaluation task, we aim to investigate single document summarization techniques for automatically generating short summaries of Chinese news articles. We will provide a large dataset for evaluating and comparing different document summarization techniques.


The Task

The single document summarization task is defined as a task of automatically generating a short summary for a given Chinese news article, and the short summary is used for news browsing and propagation on Toutiao.com. The length of the short summary is less than 60 Chinese characters. We will provide a sample/training dataset consisting of a large number of Chinese news articles with reference summaries, together with a large number of news articles without reference summaries (for semi-supervised methods). The test dataset will be provided to the participants later.