Discussion

Home     Discussion Board      2019 Datagrand Cup: Text Information Extraction Challenge      Deciphering the dataset

yorko

Deciphering the dataset

posted in   2019 Datagrand Cup: Text Information Extraction Challenge

June 30, 2019, 4:57 p.m.

23  comments

  • hongzhi reply gululinbo

    July 1, 2019, 3:39 a.m.

    Reply

    9
    <p>So, is it alowed to use the pre-trained Chinese bert? (we could learn the word mapping by ourselves )</p>

gululinbo

July 1, 2019, 3:39 a.m.

Reply

2
<p>You can use external model like bert or some one else however you can’t pre-trained word vectors are allowed. If you want to use pre-trained word vectors, you should train your own word embedding models on <code>corpus_data.txt</code> dataset.</p> <p>There is no need to do decipher. A chinese word or punctuation is mapped to one unique id that is a way of data desensitization. The mapping on chinese word and id is the same on training and test data. That is also mean pre-trained word vectors is unaccessible.</p>