Entity Enrichment

... TextRazor can enrich each entity instance it identifies with structured data from various linked data sources.

Sometimes your application needs more than just Entity IDs and labels from the mentions identified in your content.

We index billions of facts including place geolocation information, multilingual descriptions, birth dates and much more. The engine knows, for example, that London in England is the same concept as en.wikipedia.org/wiki/London, or /m/04jpl in Freebase, information we can use to look up extra relevant, targeted data for you to use in your application. We take care of the hefty dataset cleaning, indexing and update process for you, allowing you to effortlessly build on real-world data.

The TextRazor API allows you to add simple queries to your requests to help extract only the specific information you need.

Usage

TextRazor's client SDKs make it easy to use Entity Enrichment in your application. The official TextRazor SDKs allow you to pass in multiple queries and handle parsing the response for you. Results from your queries will appear as part of the "data" field of each entity.

The REST API expects an array of entities.freebaseEnrichmentQueries strings. For more integration details, head over to the documentation for each SDK.

Queries use the same format as found on the Freebase website or API. Each Freebase object has a number of links to other objects - for example on http://www.freebase.com/m/04jpl?links= you can see the "London" entity is linked to thousands of other facts by "predicates". TextRazor's enrichment queries allow you to specify several of these predicates to extract the information you need in your application.

Each query consists of a source prefix fbase:, and one or more predicates separated by a '>'. Where multiple predicates are specified, TextRazor will follow the results of each subquery to reach the final answer. You may need to follow several links to get the exact data you need. For example, in Freebase geolocation information is stored in a separate object for each entity. The query fbase:/location/location/geolocation will return a freebase mid id. TextRazor can expand this to a full longitude with the query fbase:/location/location/geolocation>/location/geocode/longitude

Each query returns an array of results, more than one result may match your query (for example, in the case of multilingual descriptions).

Example Queries

Latitude and Longitude for placesfbase:/location/location/geolocation>/location/geocode/latitude, fbase:/location/location/geolocation>/location/geocode/longitude
Multilingual descriptionsfbase:/common/topic/description
Entity Synonymsfbase:/common/topic/alias
Example imagesfbase:/common/topic/image>/type/content/source>/type/content_import/uri
Official websitesfbase:/common/topic/official_website

Feel free to get in touch to discuss the best way to get the exact data you need in your application.

Sources

TextRazor currently indexes Freebase, specifically the complete set of data available at http://www.freebase.com without the "/base/" and "/user/" bases. We find this provides the most coverage for common use cases, but there's another linked data source that would help please get in touch and we'll do our best to get it included.

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

Licensing

Linked Data provided by the TextRazor API is provided under the same license as the original source. Freebase data is licensed under the Creative Commons Attribution 2.5 license.

Limits & Performance

TextRazor's enrichment database runs on a redundant SSD-backed DB cluster, ensuring queries add minimal latency to your requests. To help maintain the performance of your requests we impose several default limits:

  • A total of 10 enrichment queries can be added to each request.
  • Each query can have a maximum length of 3 path predicates.
  • A maximum of 1000 results are returned with each request.

If you require higher limits for your application, please contact support, in most cases we will happily increase the limit for you. TextRazor's indexes are frequently updated, but may not contain the latest changes from the source dataset.