MLODE 2014, September 1-2 in Leipzig, Germany

Content analysis and the Semantic Web, a LIDER Hackathon

T9: Converting the output of Babelfy into RDF-NIF

SUMMARY

Topic description
Objective
Requirements
Software requirements
Data requirements
Instructions
Useful links
Example

Topic description

Babelfy is a unified, multilingual, graph-based approach to Entity Linking and Word Sense Disambiguation. Based on a loose identification of candidate meanings, coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations, Babelfy is able to annotate free text with with both concepts and named entities drawn from BabelNet’s sense inventory.

The task consists of converting text annotated by Babelfy into RDF format. In order to accomplish this, participants will start from free text, will annotate it with Babelfy and make use of the NLP2RDF NIF module.

Objective

INPUT: free text, in any of the 50 languages available in BabelNet.
INTERMEDIATE OUTPUT: enriched text, semantically annotated via Babelfy.
OUTPUT: semantically annotated text converted into NIF-RDF format.

The following figure shows the workflow of the task.

Note: We released a demo package that shows you how the input and the output should look like (see details below, section Instructions). The objective of this hackathon is to try to mime the behaviour of the demo yourself.

Requirements

Basic Java programming skills
Basic RDF knowledge

Software requirements

A text editor (but an IDE is recommended: Eclipse, Netbeans, etc.)
Java 1.6 or greater

Data requirements

BabelNet standoff indexes [~ 5.3G] click here
Hackathon demo [~ 250M] click here

Instructions

Download BabelNet stanford indexes from http://babelnet.org/download (direct link: http://babelnet.org/data/2.5/babelnet-2.5-index-bundle.tar.bz2)

Unpack it with

	
	bzip2 -d babelnet-2.5-index-bundle.tar.bz2
	tar xvf babelnet-2.5-index-bundle.tar

You should now have a directory called 'BabelNet-2.5'

Download the hackathon demo here

Uncompress the hackathon demo file to your project directory

		
	tar xzvf YOUR_PROJECT_DIR leipzig_hackathon_t9_babelfy2nif.tar.gz

You should now have the following structure:

	
	.
	├── config/
	├── lib/
	├── models/
	├── resources/
	├── leipzig_hackathon-0.0.1-SNAPSHOT-jar-with-dependencies.jar
	└── run_babelfy2nif-demo.sh

Open config/babelnet.var.properties and set the variable babelnet.dir to the directory containing BabelNet's indexes. For example, if you have the indexes under the directory /home/flati/resources/BabelNet-2.5, set
```
	
	babelnet.dir=/home/flati/resources/BabelNet-2.5
	
```
Run the demo
```
	sh run_babelfy2nif-demo.sh			
	
```
Take a look at the file config/babelfy2nif.properties. It has several parameters that allows the customization of the demo.
- The Babelfy key: you will be given one during the hackathon. The key allows you to freely query Babelfy and obtain disambiguated text.
- Language: the language you want to work with. This can be any of the 50 languages offered by BabelNet.
- Text: the text you want to annotate and convert into NIF.
- Algorithm: the way Babelfy's overlapping annotations are handled (you can select either FIRST_COME_FIRST_SERVED_ALGORITHM or LONGEST_ANNOTATION_GREEDY_ALGORITHM).
- RDF format: the format of your NIF output. This is quite standard and you can choose among TURTLE, RDF/XML and NTRIPLE representations.
- Output: where you want the result to be displayed. This can either be the standard output (stdout) or your favourite file (file).
- Output file: if the output was set to file, this parameter determines the file path in which the output will be written.

Useful links

Babelfy tutorial [see]
NIF explanation [see]
Brown 2 NIF tutorial [see] and an example of how a corpus converted into NIF looks like [see]

An example

The demo feeds Babelfy with the following sentence


	Hello World!

and then converts it into NIF. The result of this is contained into a newly-created file called rdf_output.nif.txt which looks like the following:

@prefix dc:    <http://purl.org/dc/elements/1.1/> .
  @prefix bn:    <http://babelnet.org/2.0/> .
  @prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
  @prefix olia:  <http://purl.org/olia/olia.owl#> .
  @prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
  @prefix itsrdf: <http://www.w3.org/2005/11/its#> .
  @prefix owl:   <http://www.w3.org/2002/07/owl#> .
  @prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
  @prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
  @prefix babelfy: <http://lcl.uniroma1.it/babelfy2nif#> .

olia:CommonNoun  a  owl:Class .

<http://purl.org/olia/penn.owl#NN>
      a       owl:Thing .

<char=0,12>  a          nif:Context , nif:RFC5147String ;
	  nif:beginIndex  "0" ;
	  nif:endIndex    "12" ;
	  nif:isString    "hello world!" .

nif:word  a     owl:ObjectProperty .

nif:isString  a  owl:DatatypeProperty .

nif:beginIndex  a  owl:DatatypeProperty .

nif:sentence  a  owl:ObjectProperty .

nif:oliaLink  a  owl:ObjectProperty .

nif:anchorOf  a  owl:DatatypeProperty .

<http://lcl.uniroma1.it/babelfy2nif#char=0,12>
	  a                     nif:Sentence , nif:Word , nif:RFC5147String ;
	  nif:anchorOf          "hello world!" ;
	  nif:beginIndex        "0" ;
	  nif:endIndex          "12" ;
	  nif:oliaCategory      olia:Noun , olia:CommonNoun ;
	  nif:oliaLink          <http://purl.org/olia/penn.owl#NN> ;
	  nif:referenceContext  <char=0,12> ;
	  nif:sentence          <http://lcl.uniroma1.it/babelfy2nif#char=0,12> ;
	  nif:word              <http://lcl.uniroma1.it/babelfy2nif#char=0,12> ;
	  itsrdf:taIdentRef     bn:s00587925n .

nif:oliaCategory  a  owl:AnnotationProperty .

nif:Context  a  owl:Class .

nif:Sentence  a  owl:Class .

nif:referenceContext  a  owl:ObjectProperty .

olia:Noun  a    owl:Class .

nif:endIndex  a  owl:DatatypeProperty .

nif:RFC5147String  a  owl:Class .

nif:Word  a     owl:Class .