Back to Search Start Over

Indirectly Supervised English Sentence Break Prediction Using Paragraph Break Probability Estimates

Authors :
Moore, Robert C.
Publication Year :
2021

Abstract

This report explores the use of paragraph break probability estimates to help predict the location of sentence breaks in English natural language text. We show that a sentence break predictor based almost solely on paragraph break probability estimates can achieve high accuracy on this task. This sentence break predictor is trained almost entirely on a large amount of naturally occurring text without sentence break annotations, with only a small amount of annotated data needed to tune two hyperparameters. We also show that even better results can be achieved across in-domain and out-of-domain test data, if paragraph break probability signals are combined with a support vector machine classifier trained on a somewhat larger amount of sentence-break-annotated data. Numerous related issues are addressed along the way.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2109.12023
Document Type :
Working Paper