About Me

Hi, I'm Tommaso, I was born on 7 September 1991 and I'm a Ph.D. student in computer science at "La Sapienza University of Rome". I obtained my bachelor degree in 3/3 years with vote of 108/110 and my master degree in 2/2 years with a vote of 110/110 cum laude. My master thesis is in the field of Natural Language Processing and aimed at automatic building a taxonomy for both, Wikipedia pages and categories, potentially in 250 different languages. This was a joint work with other Ph.D. students of LCL @ Sapienza and supervised by prof. Roberto Navigli (See publications page for further information).

I always liked technology, but I've also been interested in human science, infact i've graduated at classical high school and I also studied classical guitar for about 10 years.
I love travelling, infact i use to visit some countries during the summer hollidays with my friends; I've been in the most important european capitals and London and Berlin are the ones i liked the most, because of the multicultural atmosphere, beauty of architecture, organization and the broadmindedness.
I usally do sport, especially parkour. I really like to work out, it helps me to relax and free my mind. I also play guitar in a duo with a friend of mie. For more information you can contact me on the main social media using the links below or clicking on the tab "Contact Me" to send me an e-mail, or, you can visit my blog at computersciencestudentlife.wordpress.com or you can just download my Curriculum Vitae.

Huge Automatically Extracted Training Sets for Multilingual Word Sense Disambiguation

We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low-resourced languages and providing competitive results for English, where manually annotated training sets are accessible. The data is available at trainomatic.org.

Two Knowledge-based Methods for High-Performance Sense Distribution Learning

Knowing the correct distribution of senses within a corpus can potentially boost the performance of Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing the distribution of senses given a raw corpus of sentences. Intrinsic and extrinsic evaluations show that our methods outperform the current state of the art in sense distribution learning and the strongest baselines for the most frequent sense in multiple languages and on domain-specific test sets. Our sense distributions are available at trainomatic.org.

Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data

Annotating large numbers of sentenceswith senses is the heaviest requirementof current Word Sense Disambiguation. We present Train-O-Matic, a language-independent method for generating millions of sense-annotated training instances for virtually all meanings of words in a language’s vocabulary. The approachis fully automatic: no human intervention is required and the only type of human knowledge used is a WordNet-like resource. Train-O-Matic achieves consistently state-of-the-art performance across gold standard datasets and languages, while at the same time removing the burden of manual annotation. All the training data is available for research purposes at trainomatic.org.

Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project

We present WiBi, an approach to the automatic creation of a bitaxonomy for Wikipedia, that is, an integrated taxonomy of Wikipage pages and categories. We leverage the information available in either one of the taxonomies to reinforce the creation of the other taxonomy. Our experiments show higher quality and coverage than state-of-the-art resources like DBpedia, YAGO, MENTA, WikiNet and WikiTaxonomy. WiBi is available at http://wibitaxonomy.org.

MultiWiBi: The multilingual Wikipedia bitaxonomy project

We present MultiWiBi, an approach to the automatic creation of two integrated taxonomies for Wikipedia pages and categories written in different languages. In order to create both taxonomies in an arbitrary language, we first build them in English and then project the two taxonomies to other languages automatically, without the help of language-specific resources or tools. The process crucially leverages a novel algorithm which exploits the information available in either one of the taxonomies to reinforce the creation of the other taxonomy. Our experiments show that the taxonomical information in MultiWiBi is characterized by a higher quality and coverage than state-of-the-art resources like DBpedia, YAGO, MENTA, WikiNet, LHD and WikiTaxonomy, also across languages. MultiWiBi is available online at http://wibitaxonomy.org/multiwibi.