Awesome
ReMine
Take NYT13_110K dataset as a example:
Pipeline
Preprocessing:
- Link entity seeds
- Postagging
- Relation Extractor(pattern-based)
Multi-Extraction:
1> Generate Pos Tag supervision
2> Generate phrases supervision by combing unigram/multi-gram entity/multi-gram relation. <b>Tune</b>
3> Preprocess Training/Testing Corpus, using python src_py/Preprocessor.py translation mode
4> Run ReMine, given phrases/entity postags/relation postags as supervision. Segmentation lable includes <None>/<ENTITY>/<RELATION>
5> Run ReMine segmentation, postprocess using Preprocessor.py segmentation mode
Todos
- Ideal generative Process should be able to denoise/reduce search space in order to compete with OpenIE system: Ollie/Stanford IE/ReVerb
Bugs
- Check punctuation boundary case