Extracting Information from Template Based Web Pages using Attribute Matching Patterns and Position Details

Main Article Content

B.Venkat Ramana
Prof A. Damodaram

Abstract

To extract structured data from web sites we recommend a new method for information extraction from web, which effectively uses
content redundancy on the web. To start with, we extract records from the initial web sites and populate the seed database with the records. For a
new extracted record, our method will compare it with the already available records in the seed database. We define a new matching technique
that helps to match records with deferent representations across the sites. This new method finds the matching pattern between the attribute
values of the two sites and ignores unwanted portions of the attribute. We developed an algorithm to find the attribute position details with
sufficient matching values across pages. Finally we have done some experimental study with web data to know the effectiveness of our
extraction approach.

 

 

Keywords: Information Extraction; Template WebPages; Pattern Matching; Data Position; Content Matching.

Downloads

Download data is not yet available.

Article Details

Section
Articles