A statistical approach to knowledge discovery: Bootstrap analysis of language models for knowledge base population from unstructured text

Document Type: Article

Authors

1 Computer Engineering and Information Technology Department Amirkabir University of Technology, Tehran, Iran

2 Saarland University, Saarbrucken, Germany

Abstract

In this paper, we propose a novel approach for knowledge discovery from textual data. The generated knowledge base can be used as one of the main components in the cognitive process of question answering systems. The proposed model automatically extract relations between named enti- ties in Persian. Our proposed model is a bootstrapping approach based on n-gram model to nd the representative textual patterns of relations as n-grams in order to extract new knowledge about given named entities. The main motivation for this work is the characteristic of the sentence structure in Persian which, in contrary to English sentences, is in subject- object-verb format. The proposed approach is a purely statistical one and no  background knowledge of the target language is required. This makes our method applicable to any open domain relation extraction task. How- ever, as for our test-bed, we focus on the domain of biographical data of international poets and scientists to build a knowledge base about them. Qualitative evaluations based on human assessment is an evidence for the ecacy of our method.

Keywords

Main Subjects