Extração Automática de Conteúdo Semi-Estruturado na Web:  Estudo de Caso do Futebol Brasileiro

Alexandre S. de Melo; Hendrik T. Macedo

Automatic extraction of semi-structured Web content: Case study of Brazilian football

Authors

Alexandre S. de Melo Departamento de Ciência da Computação - Universidade Federal de Minas Gerais (UFMG)
Hendrik T. Macedo Departamento de Computação – Universidade Federal de Sergipe (UFS)

Keywords:

Information Extraction, Production Rules, JEOPS, Wrapper, Crawler

Abstract

Information extraction techniques provide automated generation of a structured representation from unstructured or semi-structured content. Structured information enables or facilitates further processing by third-part Web applications. This work describes the implementation of a domain-oriented information extraction system. The system automatically converts semi-structured Web content into structured content, by means of object-oriented production rules that instantiate a specific domain classes provided. These rules are implemented in JEOPS, a Java-based first-order forward chaining inference engine. We have fully specified classes modeling the Brazilian Soccer Championship to show the feasibility of the proposal. Taking as input a Web site address, the system uses facts and rules defined in its knowledge base in order to identify related links, find the championship classification table and extract table data. As a result, it automatically fulfills domain classes’ instances.

Author Biographies

Alexandre S. de Melo, Departamento de Ciência da Computação - Universidade Federal de Minas Gerais (UFMG)

Hendrik T. Macedo, Departamento de Computação – Universidade Federal de Sergipe (UFS)

Downloads

PDF (Português (Brasil))

How to Cite

de Melo, A. S., & Macedo, H. T. (2011). Automatic extraction of semi-structured Web content: Case study of Brazilian football. Scientia Plena, 5(8). Retrieved from https://www.scientiaplena.org.br/sp/article/view/640

Download Citation

Issue

Vol. 5 No. 8 (2009): August/Agosto 2009

Section

Articles

License

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work