Blog

Freebase Type Deduction (with data!)

Wed 07 October 2015

A major use-case for Freebase, Google's recently closed knowledgebase, was the type data it indexed for millions of real-world entities. For example, types of the entity "Netflix" include /organization/organization, /business/business_operation and /tv/tv_network. While similar type data is available elsewhere, Freebase is popular because of its wide coverage and rich taxonomy. Now that updates have stopped, many of our users are naturally concerned that such a great resource is quickly going stale.

We didn't want to see the Freebase type system go to waste, so we've recently released new functionality to automatically assign types to new entities that Freebase doesn't know about. We do this with a machine learning process designed to derive the type of an entity given the facts we have collected in our knowledgebase. This process means that we can continue supporting the rich set of Freebase types as new entities come and go, and maintain backward compatibility for all our users currently relying on these types.

The new types are now being returned for new entities as part of the existing "freebaseTypes" response, so you don't need to do anything to take advantage of this new feature. We think this might be of interest to the wider community too, so we're releasing it as an open dataset of Wikipedia and Freebase Type mappings - a simple .bz2 compressed .tsv file of the format:


<Wikipedia ID> <Freebase type> <Is TextRazor AutoType>\n

A single Wikipedia item may be mapped to several types. The final column is a '1' where the type was automatically assigned by TextRazor, and '0' for existing Freebase types. You'll notice that it's mostly the recently indexed entities that have been auto-tagged - https://en.wikipedia.org/wiki/Alex_Kirsch for example, has been auto-tagged with /sports/cyclist.

You can download the latest English dump here, made available under the terms of the CC-BY 3.0 Attribution license. The dump is currently infrequently updated, and so might not exactly match the types in the TextRazor response (generated as part of our regular index process). Please let us know if you find this useful, and we'll look into releasing support for our other languages too.

This completes our immediate work on the Freebase migration - we have now fully replaced/augmented our Freebase dependencies and have been happily running for several months now with no reduction in analysis quality. We've also released support for direct disambiguation of entities to the ever-growing Wikidata.