Blog

Introducing Custom Entity Dictionaries

Thu 20 August 2015

One of our most popular features is Entity Recognition and Disambiguation, the ability to identify key people, places, organizations, and other things mentioned in your documents. The system works well out of the box, leveraging a number of data sources and advanced machine learning methods to ensure comprehensive Entity coverage across many different languages and styles of writing.

Sometimes your application might need to identify Entities that aren't common enough for TextRazor to know about - niche Product names, Drug names, or specific person names for example. TextRazor's new Custom Entity Dictionaries make it possible to upload large lists of your own entities, which TextRazor will seamlessly identify in your documents and return with the other entities it has found.

The new feature is fully integrated with the rest of our analysis functionality. Simply create your Dictionary through the REST API or one of our SDKs, and add its ID to your analysis requests. Where the engine matches one of your Entities, it will be returned as part of the usual Entity response. Dictionaries have a number of options that control the matching process, see the documentation for more details.

The system can be used to augment the TextRazor response with a few entities that it misses in your domain, or it can be used to create large dictionaries containing millions of niche entities. Either way, the system takes care of scaling and indexing so there's minimal increase to processing time.

Custom Entity Dictionaries are ready to use immediately. We'd love to hear any feedback you might have on this.