Updated: I've saved all of Wikipedia into a SQLITE database!

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • PlainTextWikipedia

    Discontinued Convert Wikipedia database dumps into plaintext files

  • wikitextparser

    A Python library to parse MediaWiki WikiText

  • The use of regex seems inefficient, is there any reason why you didn't start with lxml or a purpose built parser like wikitextparser?

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • DrQA

    Reading Wikipedia to Answer Open-Domain Questions

  • Nice work! That AI "framework" (to summarize the RAVEN acronym somehow) of yours reminds me of an old project of myself years ago, using prolog and first order logic to build a QA engine and pulling data from wikipedia. Something I eventually abandoned due to changing philosophical views on human consciousness... - yet it was still a fun learning exercise mixing compiler theory and logical inference. Facebook once open sourced code for something similar https://github.com/facebookresearch/DrQA - also pulling raw data from wikipedia.

  • python-libzim

    Libzim binding for Python: read/write ZIM files in Python

  • https://github.com/openzim/python-libzim is the official one

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Reverse-engineering an encrypted IoT protocol

    8 projects | news.ycombinator.com | 14 Feb 2024
  • Why my favourite API is a zipfile on the European Central Bank's website

    10 projects | news.ycombinator.com | 15 Sep 2023
  • Hyprland 0.29.* causing trouble after a few days of working fine (on archbtw)

    2 projects | /r/hyprland | 7 Sep 2023
  • Show HN: I wrote a RDBMS (SQLite clone) from scratch in pure Python

    8 projects | news.ycombinator.com | 13 Aug 2023
  • hyprland crashing on launch

    2 projects | /r/hyprland | 1 Jul 2023