Using Distributional Semantics for Automatic Taxonomy Creation

This paper explains the construction of taxonomies of specialized domains using language-independent, statistical methodology. The methodology relies on the term’s distributional semantics. The algorithm captured the terms co-occurrence in large corpora. In a first step, terms’ syntagmatic relations are analyzed which provide the basis seed terms for taxonomy construction. The results include the list of hypernym candidates, for the each seed term. The second step involves the analysis of paradigmatic relations of the terms. This relation is between the hypernym term and its co-occurring term. The results of Step 2 are more refined and an appropriate hypernym lists. In the final step, the taxonomy is constructed using the resulted hypernym lists. The terms are connected with asymmetrically to taxonomy at a specific depth. Proposed idea has been properly discussed with some sample corpus to ensure its effectiveness. Sample corpus has been used to demonstrate proposed idea effectively. The recall and precision of proposed algorithm are 78.6% and 79.8% respectively. The proposed algorithm significantly improves the results quality.