1. Approach for extracting events from news stream
- Subjects
новинний потік ,синтаксичні моделі ,лексеми ,критерій близькості ,УДК 004.021 ,news flow ,the syntactic model ,tokens ,proximity criterion ,новостной поток ,синтаксические модели ,лексемы ,критерий близости - Abstract
У статті розглянута проблема обробки дублікатів і сюжетних ланцюжків новин при виділенні унікальних подій у новинному потоці. Запропоновано метричні критерії оцінки ступеня близькості новин. Сформульовано алгоритм обробки новинного потоку., Market price forecasting allows effective manage of pricing policy and acquire competitive advantage. Most of existing price forecasting approaches are based either on experts’ opinions or on raw price data models. Neither of this approaches allows to get a high forecasting accuracy due to nature of price behaving since price reflects events in real world. Possible solution could be in usage of news based forecasting models. Such forecasting models require processing of news streams. Processing news streams is a complex task because reflection of event in the news isn’t very precise therefore there is a need in development of proper news data processing methods. Main problem in news data processing is filtering news duplicates and plots. One of the possible approaches in news’ processing is based on extracting lexemes from news header and first sentence and their further processing. By forming three vectors based on extracted lexemes for the news it is possible to develop an efficient criteria for duplicates detection. By itself criteria doesn’t include any kind of expert opinion for similarity detection except the basic processing logic. Developed approach allows to extract events data from news stream sufficient for price forecasting., В статье рассмотрена проблема обработки дубликатов и сюжетных цепочек новостей при выделении уникальных событий в новостном потоке. Предложены метрические критерии оценки степени близости новостей. Сформулирован алгоритм обработки новостного потока.
- Published
- 2013