1. ParsingPhrase: Parsing-based automated quality phrase mining.
- Author
-
Wu, Yongliang, Zhao, Shuliang, Dou, Shimao, and Li, Jinghui
- Subjects
- *
TERMS & phrases , *NATURAL languages - Abstract
• Introduce the role of phrases in text representation and semantic disambiguation. • Propose ParsingPhrase to extracts combination phrases and improves phrase quality. • Mining candidate phrases efficiently by traversing the parse tree hierarchically. • Put forward a new phrase evaluation indicator to enhance phrase evaluation accuracy. • We suggest a joint scheme to solve sentence disambiguation and phrase optimization. Phrases represent independent semantics in natural language but usually have indeterminate lengths and different combinations. So, extracting meaningful phrases from unstructured texts will substantially reduce semantic ambiguity and lay the foundation for downstream natural language tasks. Most existing research obtains candidate phrases by N-grams, which includes meaningless word sequences and degrades algorithm performance. In this paper, we propose a novel phrase-mining algorithm, called ParsingPhrase, which effectively extracts combination phrases from text and improves phrase quality by syntactic features. It consists of three stages. Firstly, all sentences in texts are represented as parsing trees by PCFG (Probabilistic Context-Free Grammar). We propose PBMP (Parsing-Based Phrase mining) to obtain candidate phrases from those parsing trees. Then, we introduce a new phrase evaluation indicator, called Significance, that relies on the role of phrases to measure their importance. We integrate the Significance with conventional evaluation indexes for a more reasonable phrase evaluation. Finally, we optimize the phrase quality again by exploiting the optimal phrase composition features of sentences. To the best of our knowledge, it is the first work to employ parsing for combination phrase mining and evaluation, meanwhile offering a solution for syntactic disambiguation. Experiments on three real corpora demonstrate that the ParsingPhrase exceeds state-of-the-art baselines, is 7% higher in the candidate phrase conversion rate, and is 6% better in terms of Precision. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF