Acquisition of conceptual domain dictionaries via decision tree learning

Roberto Basili, Maria Teresa Pazienza, Fabio Massimo Zanzotto

Knowledge based systems usually rely on large size domain models needed to support reasoning and decision-making. The development of realistic models represents a critical and labour intensive phase. Automatic terminology acquisition (TA) has been proposed as the task of automatically extracting specialized dictionaries from raw texts useful for application purposes like precise information retrieval and machine translation. In this paper we argue that TA provides a significant contribution in the development of ontological components of a knowledge base. We therefore propose an automatic knowledge acquisition architecture for the TA process based on robust methods for text processing and on algorithms for learning decision trees. An incremental semi-automatic approach is proposed to enable the first steps in the development of a domain ontology. The novel aspects of the method rely on the use of syntagmatic and lexical properties of terms combined with analogous (negative) evidences observable for non-terms. The underlying assumptions as well as the different adopted linguistic representations have been extensively investigated over a large test set. The scale of the target test data provides empirical evidence of the superiority of the method over more quantitative approaches. The proposed architecture is thus a viable approach to the development of conceptual domain dictionaries.

Keywords: Natural Language Processing, Knowledge Acquisition, Machine Learning, Information Extraction

