Freebase to Wikidata

Tue 17 February 2015

Google recently announced that it would be closing its Freebase knowledgebase later this year, offering to migrate the data to Wikidata. Like many other startups working with Linked Data, we're a big fan of Freebase, so I'd like to share a few thoughts on the migration and what this means for our users.

Firstly, we've released an update to our disambiguation engine that means we're no longer dependent on Freebase, so we're confident there won't be any performance surprises later on this year. This new engine includes a number of other speed and accuracy enhancements, you should be noticing that the entity extraction and topic detection results are better than ever.

We'll continue to support Freebase MIDs/type data while there is still demand for it, or at least until the start of 2016. We will also start to provide Wikidata QIDs where it's possible to link our entity output.

If you rely on Freebase data in your own application, however, it might be wise to start investigating alternative data sources.

Why is this a big deal?

Freebase is a huge open database of real-world knowledge, containing a wealth of information on ~50m topics released under a liberal license in a single (fairly) easy to parse download. An amazing range of innovative commercial and research projects build on top of this data. TextRazor uses the relationships between Freebase entities as an input to help disambiguate multiple mentions of the same content. For example, Freebase has a link between Paris the City and France the country, information we can use to help determine that a mention of "Paris" near "France" is about the French city, not the Paris in Texas or Paris Hilton.

Many of our users also rely on Freebase data to help geocode places, translate titles, extract images - TextRazor supports enrichment of it's results with such data.

Can't we just use DBPedia/Wikidata?

DBpedia is another excellent resource, with somewhat similar aims to Freebase. TextRazor supports linking entities to both Freebase and DBPedia where possible.

DBPedia extracts its data from Wikipedia, and so can be extremely accurate, but can be noisy and suffers from slow update cycles. Freebase, on the other hand, pulls data from a number of sources, and encourages community contributions. In practice this means Freebase does a much better job of the "long tail" of data. For example - Freebase has used data from Musicbrainz to identify "Beardyman" as the artist of a new album Distractions. Since Wikipedia doesn't currently have that information, the DBPedia equivalent page for Beardyman misses that relationship. The DBPedia extraction process is only run occasionally, so it'll be a while before that fact makes it into DBPedia. This might seem like a pretty minor edge case, but many applications rely on this type of long tail data.

For these reasons Freebase has become hugely popular among those developing Linked Data applications, and the Freebase "mid" (a unique ID assigned to each Freebase topic) has become the de-facto foreign key into the linked data world.

What's next?

Google has announced that it will be integrating Freebase data with Wikidata. In an ideal world this will mean less fragmentation, with one less database to consider integrating into your project. At the time of writing there are some technical details that need to be worked out, specifically around licensing and "sourcing" of the facts for Wikidata.

Unfortunately this means that there's currently no single solution for projects currently using Freebase data. Those operating in certain niches may want to investigate domain specific resources such as GeoNames or MusicBrainz. If timeliness/data coverage isn't so important to you, it might be a good idea to look into DBPedia or Wikidata. Otherwise, while it's prudent to start researching alternatives, it sounds like things are moving along in the Freebase migration and I'm personally optimistic about the future of Wikidata (you can be part of it!).

We will be carefully monitoring the migration and will let you know if anything changes. If you have any queries about what this means for you we'd love to help, please contact us.