Jose Camacho Collados

I am a Google Doctoral Fellow and third-year PhD student at the Linguistic Computing Laboratory (LCL) of Sapienza University of Rome. Previously I worked as a research engineer at ATILF-CNRS in France. My background education includes a Erasmus Mundus Master in Natural Language Processing and Human Language Technology and a 5-year BSc degree in Mathematics.

I work on various topics in Natural Language Processing (NLP), mainly on the lexical and distributional semantics areas. Currently I work on leveraging knowledge resources for improving NLP applications, with a special focus on multilinguality and ambiguity. To this end, I have been collaborating on the BabelNet project and developing knowledge-based sense vector representations (e.g. NASARI and SW2V) to be used as a bridge between lexical resources and text-based applications. We have organized a tutorial at ACL 2016 and a workshop at EACL 2017 on this topic.

I strongly believe that well-curated datasets and resources, as well as shared tasks, are key for advancing science. This year I'm co-organizing two SemEval 2018 tasks, on Hypernym Discovery and Emoji Prediction. Check them out!

NLP aside, I love travelling and sports. I was raised in Granada, a wonderful city in the south of Spain where I spent the first 20 years of my life. Then, I have been living in large European cities like Paris, Barcelona and Rome, and spent long amounts of time in Seoul. I have also lived in other smaller (but equally charming) cities: Nancy and Besançon (France) and Wolverhampton (UK). I like practising all kinds of sports: football, swimming, tennis, padel, ping pong... and chess (yes, it is also a sport!). I hold the International Master chess title and am currently the top-rated chess player of South Korea.


Jose Camacho-Collados and Mohammad Taher Pilehvar.
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis. [paper]
arXiv preprint arXiv:1707.01780 (2017).
Massimiliano Mancini*, Jose Camacho-Collados*, Ignacio Iacobacci and Roberto Navigli.
Embedding Words and Senses Together via Joint Knowledge-Enhanced Training. [paper] [data&code]
CoNLL 2017, Vancouver, Canada.
Mohammad Taher Pilehvar, Jose Camacho-Collados, Roberto Navigli and Nigel Collier.
Towards a Seamless Integration of Word Senses into Downstream NLP Applications. [paper]
ACL 2017, Vancouver, Canada.
Claudio Delli Bovi, Jose Camacho-Collados, Alessandro Raganato and Roberto Navigli.
EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. [paper] [data]
ACL 2017 (short), Vancouver, Canada.


September 2017. To attend Google's Natural Language Processing Summit in Zurich (25-27 September).

August 2017. Co-organizing two SemEval 2018 tasks: Hypernym Discovery and Emoji Prediction

August 2017. To attend ACL, CoNLL and SemEval at Vancouver, Canada.

June 2017. To attend Google's Machine Learning Summit in Zurich (12-14 June).

April 2017. I will give a talk on "Semantic Representations of Word Senses, Concepts and Entities and their Applications" at the University of Cambridge, UK. [Slides]

April 2017. Two papers (one short and one long) accepted at ACL 2017. Joint works with Claudio Delli Bovi, Alessandro Raganato and Roberto Navigli, and Taher Pilehvar and Nigel Collier.



- NASARI vector representations for BabelNet synsets and Wikipedia pages (English, French, German, Italian, Spanish).

- SW2V (Senses and Words to Vectors): Word and Sense embeddings in the same vector space (code+pre-trained models).

- Unified Evaluation Framework for Word Sense Disambiguation.

- BabelNet, a very large multilingual encyclopedic dictionary and semantic network.

- BabelDomains: Lexical items (synsets, Wikipedia pages) annotated with domains of knowledge.

- EuroSense: Multilingual sense annotations for Europarl.

- Supervised distributional framework (including Python API) for hypernym discovery.

- Find the word that does not belong: an evaluation benchmark for the outlier detection task

- Large multilingual corpus of sense-annotated textual definitions.

- Word similarity datasets in several languages (also cross-lingual!).


invited talks

Semantic (Vector) Representations of Word Senses, Concepts and Entities and their Applications, 20 April 2017, University of Cambridge, UK. [slides]

Semantic Representations of Word Senses, Concepts and Entities and their Applications, 19 October 2016, Pompeu Fabra University, Barcelona, Spain. [slides]

Computational Semantic Representation, 5 June 2015, Department of Applied Mathematics, University of California (UCLA), Los Angeles, USA.

Using NLP tools to identify people and places in a collection of 19th century emigrant letters, 'Digitising Experience of Migration' Workshop,
15 March 2014, Mellon Centre for Migration Studies, Omagh, Northern Ireland.