An Approach of Extracting Information for Maritime Unstructured Text Based on Rules

YU Chen; MAO Zhe; GAO Song

doi:10.3963/j.issn.1674-4861.2017.02.007

Issue 2

Apr. 2017

Turn off MathJax

Article Contents

Article Navigation > Journal of Transport Information and Safety > 2017 > 35(2): 40-47

YU Chen, MAO Zhe, GAO Song. An Approach of Extracting Information for Maritime Unstructured Text Based on Rules[J]. Journal of Transport Information and Safety, 2017, 35(2): 40-47. doi: 10.3963/j.issn.1674-4861.2017.02.007

Citation:

YU Chen, MAO Zhe, GAO Song. An Approach of Extracting Information for Maritime Unstructured Text Based on Rules[J]. Journal of Transport Information and Safety, 2017, 35(2): 40-47. doi: 10.3963/j.issn.1674-4861.2017.02.007

Citation:

PDF( 971 KB)

An Approach of Extracting Information for Maritime Unstructured Text Based on Rules

doi: 10.3963/j.issn.1674-4861.2017.02.007

Publish Date: 2017-04-28

Abstract

Abstract

Structural processing of maritime data plays an important role in maritime safety.There is a plenty of maritime related information on internet.However, most of the information is unstructured data which has different formats.An approach of extracting maritime information and converting unstructured text into structural data is proposed in this paper.Web crawlers are used to obtain the text data from maritime-related Web pages.According to the definitions of the texts, they are divided into four items, which are time, location, vessel name, and type of accident.According to the extraction process and its common trigger words, the maritime lexicon for segmentation of Chinese words and part-of-speech tagging is constructed.Relying on an analysis of a large number of accident corpuses, the rules for extraction of information are summarized.The structured maritime data is then formulated.In order to verify the feasibility of this approach in term of extracting information based on rules, the data from the website of The Yangtze river maritime bureau is applied as a case study.The results indicate that the precision of extracting time information is 100%, with the recall rate of 91%.The precision of extracting location information is 94.52%, with the recall rate of 69%.The precision of extracting vessel name information is 97.75%, with the recall rate of 86%.The precision of extracting accident type information is 96.6%, with the recall rate of 87%.
- extracting information,
- maritime text information,
- user-defined words library,
- rules for extraction