Commit 8affb9d3 authored by Gerhard Gossen's avatar Gerhard Gossen

Document output format

parent 4620ecac
......@@ -15,6 +15,20 @@ Run as:
where `*.csv` are the output files produced by TAGME. This will produce one new file per input file, e.g. for `0.html_annotations.csv` this will create `0.html_annotations-geo.csv`.
## Data format
Each file has lines corresponding to each entity mention. The lines have the following columns:
|Column | Description
|:------------|:-----------
`token` | text that occured in the document
`entity` | normalized name of the entity
`offset` | position of the token in the document (number of characters)
`entity_url` | Wikipedia URL of the entity
`confidence` | TAGME confidence that the token is really the entity
`wikidata_id` | WikiData ID of the entity (may be empty)
`coordinates` | geo-coordinates of the entity (may be empty)
## License
Copyright 2016 Gerhard Gossen. This program may be used under the Apache License 2.0.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment