Topic Tagging

TextRazor provides automatic multilingual topic detection and tagging of your unstructured content. The system identifies general themes and topics used in the text, even if they aren't explicitly mentioned.

TextRazor's topic tagger uses millions of Wikipedia pages to help assign high level topics to your content with no additional training on your data, using our knowledgebase of entity and word category relationships. For example, a document that mentions "Chelsea", "Stamford Bridge", "Arsenal" is tagged with "Soccer".

TextRazor is capable of recognizing thousands of different tags at various levels of abstraction (""Premier League Soccer", ""Soccer", and "Sports"). TextRazor has an automatic understanding of hundreds of thousands of different topics at different levels of abstraction, a list that is constantly evolving to keep pace with changes in language. If you are interested in tagging to a finite set of categories, or need to customize the classification process, you might want to check out our classification system.

The tagger provides an easy way to add semantic metadata to your documents and boost discoverability of your documents without any upfront customization effort.

Linked Data

TextRazor extracts topics as simple textual labels as well as normalized links to Wikipedia, DBPedia, and Wikidata.

The Universal Tagger

Topic detection is supported in all TextRazor's languages - English, Dutch, French, German, Italian, Polish, Portugese, Russian, Spanish, Swedish.

Topic labels and linked Wikipedia pages are returned in English regardless of the language of your content. This powerful concept dramatically reduces the effort required to internationalize your classification algorithms.

API Calls

To tag your text simply add the "topics" extractor to your request, the TextRazor response will be populated with a "topics" property.

Read more in our Python Client or REST Documentation.

You can try out the Topic detection system with your own documents through our Online Demo.