Survey Paper on Web Content Extraction & Classification

Authors :: Sachin Bojewar
Ankit Sanghvi
Dipali Shete
Source :: 2021 6th International Conference for Convergence in Technology (I2CT).
Publication Year :: 2021
Publisher :: IEEE, 2021.
Abstract: Over the last few years, web data extraction has gained popularity. Product information on the Ecommerce website floods the internet with big data. Web-based business sites these days have gotten one of the most significant hotspots for getting a large amount of relevant data. Wide range of software application designs to extract relevant data from web pages in order to draw in more business. The extracted data can be used for retailer business and data analysis purposes. The web pages on such sites are based on different technologies, and the generated web documents are in structured or unstructured formats. Manually extract such relevant product data and multimedia type Information from the websites is complex and time-consuming. After extraction of data needs to be classified because web content contains unwanted data e.g. design information, advertising content. This paper describes different Procedures for web document classification and extraction.

Subjects :: World Wide Web
Software
Data extraction
business.industry
Computer science
Web page
Feature extraction
Big data
The Internet
Web content
Product (category theory)
business

Database :: OpenAIRE
Journal :: 2021 6th International Conference for Convergence in Technology (I2CT)
Accession number :: edsair.doi...........956d419bac7869672a6e309c26e32d1a
Full Text :: https://doi.org/10.1109/i2ct51068.2021.9417947