TaxoEmbed

Sapienza University of Rome

Linguistic Computing Laboratory

TaxoEmbed

Supervised Distributional Hypernym Discovery via Domain Adaptation

About

TaxoEmbed is a supervised distributional framework for hypernym discovery which operates at the sense level, enabling large-scale automatic acquisition of disambiguated taxonomies. TaxoEmbed exploits semantic regularities between hyponyms and hypernyms in embeddings spaces to learn a hypernym transformation matrix, and integrates a domain clustering algorithm to produce domain-specific models that are sensitive to the target data. Experiments on ten different domains show that TaxoEmbed is flexible and robust enough to accommodate heterogeneous training pairs, drawn from manually curated knowledge bases as well as OIE-derived resources.

Reference Paper

Luis Espinosa Anke, José Camacho Collados, Claudio Delli Bovi and Horacio Saggion.
Supervised Distributional Hypernym Discovery via Domain Adaptation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 424–435, Austin, Texas, USA, 1-5 November 2016.

Contacts

Luis Espinosa-Anke
luis [dot] espinosa [at] upf [dot] edu

José Camacho Collados
collados [at] di.uniroma1 [dot] it
bn:17381131n bn:17381131n @ BabelNet

Claudio Delli Bovi
dellibovi [at] di.uniroma1 [dot] it
bn:17381128n @ BabelNet bn:17381128n

Horacio Saggion
horacio [dot] saggion [at] upf [dot] edu

Download

Training Data: Wikidata, KB-Unify [ zip: 26 MB ]

Nasari Domain Labels [ tsv: 46 MB ]

SensEmbed Sense Vectors [ bin: 3.0 GB ]

Python API

Java API [ Coming Soon! ]

Updates

Python API repository available on Bitbucket! - 2017, Apr 26th
First version of the Python API (v0.9) available! - 2016, Nov 7th
Uploaded training data, domain labels and vectors - 2016, Oct 31st
Website online! - 2016, Sep 26th

Last update: Apr 26th 2017 by Claudio Delli Bovi