Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
qlever
Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
I learned SPARQL recently, and would agrre its complicated to get info out of Wikidata.
However, having read the article, they didnt have an easy time with scraping Wikipedia either.
So I'd probably still recommend people look into wikidata and SPARQL if they want to do this kind of thing.
Theres a few tools that generate queries for you, and some cli tools as well:
https://github.com/maxlath/wikibase-cli#readme
It makes Wikipedia better too, in a virtuous cycle, with some infoboxes like those that he scraped being converted to be automatically populated from wikidata.
I am doubtful. I tried for a long time to use it to get data or for my taxonomic graph project (https://relatedhow.kodare.com/) and SPARCQL was just not usable at all. The biggest problem was the 60s time limit. Totally not workable for what I wanted. I also had issues with seemingly inconsistent results, but it was hard to tell.
I ended up loading the full nightly db dump and filtering it streaming from the zip instead. Faster and it actually worked.
The code to do that is at https://github.com/boxed/relatedhow
There's an alternate Wikidata query engine available here: https://qlever.cs.uni-freiburg.de/wikidata (from https://github.com/ad-freiburg/QLever)
Currently it doesn't support some SPARQL features, but I've found it to generally be quite a bit faster for most queries.