Python Wikipedia

Open-source Python projects categorized as Wikipedia | Edit details

Top 17 Python Wikipedium Projects

  • mwparserfromhell

    A Python parser for MediaWiki wikicode

    Project mention: [Python] How can I clean up Wikipedia's XML backup dump to create dictionaries of commonly used words for multiple languages? | reddit.com/r/learnprogramming | 2021-10-12

    In particular what you're looking at is not XML but wikitext. I found a discussion on stackoverflow about solving the same problem of getting text from wikitext. Seems like the most promising solution in Python since you already have the dump is to run each page through mwparserfromhell. According to the top stackoverflow answer you could use something like

  • wikiteam

    Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2020, WikiTeam has preserved more than 250,000 wikis.

    Project mention: Archiving Wiki (Fandom) Pages | reddit.com/r/DataHoarder | 2022-01-18

    Hi all - I'm trying to archive a number of fandom pages. Upon checking out this subreddit, I've found a few ways of doing so, and am currently working with the WikiTeam python tool (https://github.com/WikiTeam/wikiteam)

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • wikipedia_ql

    Query language for efficient data extraction from Wikipedia

    Project mention: WikipediaQL: Query language for efficient data extraction from Wikipedia (early | news.ycombinator.com | 2021-07-05
  • isbntools

    python app/framework for 'all things ISBN' including metadata, descriptions, covers...

  • Mediawiker

    A plugin for Sublime Text editor that adds possibility to use it as Wiki Editor on MediaWiki-based sites like Wikipedia and many other.

  • codex

    CoDEx: A set of knowledge graph Completion Datasets Extracted from Wikidata and Wikipedia (by tsafavi)

    Project mention: [P] Knowledge Graph Completion With CoDEx | reddit.com/r/MachineLearning | 2021-09-21
  • kiwix-hotspot

    Kiwix Hotspot Image Creator (Desktop) for Windows/macOS/Linux

    Project mention: Hotspot installer 2.4 is out! | reddit.com/r/Kiwix | 2021-05-19

    This update is fairly important as it corrects a number of limitations that were on Raspberry Hotspots. The full changelog is here but here's what really matters:

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • danker

    Compute PageRank on >3 billion Wikipedia links on off-the-shelf hardware.

    Project mention: How to get the links of 15,000 Wiki-articles | reddit.com/r/wikipedia | 2021-02-23

    Oh cool, I had my students do PageRank when I taught that class. Implementing the actual PageRank algorithm should be pretty easy, gathering and processing the data into usable form is harder, especially in Matlab which does not excel at that kind of task. You might compare your program to danker for verification and validation. I think Wikipedia also makes their page view / article popularity data available, which might be of interest to you.

  • fetch

    Fetch is use to get information about anything on the shell using Wikipedia. (by yashsinghcodes)

    Project mention: Fetch Command Line Wikipedia | news.ycombinator.com | 2021-10-20
  • Wikipedia-Article-Scraper

    A complete Python text analytics package that allows users to search for a Wikipedia article, scrape it, conduct basic text analytics and integrate it to a data pipeline without writing excessive code.

  • taxopedia

    Taxonomic trees (cladograms) from Wikipedia-scraped data.

    Project mention: Taxopedia: Build taxonomic trees (cladograms) from Wikipedia-scraped data. | reddit.com/r/biology | 2021-03-30
  • witokit

    A Python toolkit to generate a tokenized dump of Wikipedia for NLP

    Project mention: Download Wikipedia Text Dump? | reddit.com/r/LanguageTechnology | 2021-10-01
  • NLP-Model-for-Corpus-Similarity

    A NLP model I developed to determine the similarity or relation between two documents/Wikipedia articles. Inspired by the cosine similarity algorithm and built from WordNet.

    Project mention: What's the coolest self-driven project you've worked on? | reddit.com/r/datascience | 2021-02-24
  • wikipedia_abuse_checker

    A repo for code that checks for abuse of Wikipedia's Indian pages

    Project mention: Which wikipedia pages in India were abused the most in 2021? | reddit.com/r/india | 2021-12-21

    The code for the project is available at https://github.com/shijithpk/wikipedia_abuse_checker.

  • RoboTito

    Bot based in discord.py

    Project mention: Resumen del plan 2021-2022 | reddit.com/r/argentina | 2021-05-15
  • wi-page

    Rank Wikipedia Article's Contributors by Byte Counts.

    Project mention: Show HN: Wi-Page – Rank Wikipedia Article's Contributors by Byte Counts | news.ycombinator.com | 2021-03-23
  • Wiki

    Wikipedia style page. (by Abhishek-Rath)

    Project mention: Created my First Project!! | reddit.com/r/learnprogramming | 2021-09-27

    I am currently learning web development from the CS50 Web development with Python and JavaScript course, and I have completed project 1 of it. Here is the link to the project: https://github.com/Abhishek-Rath/Wiki

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-18.

Python Wikipedia related posts

Index

What are some of the best open-source Wikipedium projects in Python? This list will help you:

Project Stars
1 mwparserfromhell 496
2 wikiteam 465
3 wikipedia_ql 328
4 isbntools 142
5 Mediawiker 125
6 codex 83
7 kiwix-hotspot 49
8 danker 38
9 fetch 14
10 Wikipedia-Article-Scraper 8
11 taxopedia 7
12 witokit 7
13 NLP-Model-for-Corpus-Similarity 5
14 wikipedia_abuse_checker 2
15 RoboTito 2
16 wi-page 1
17 Wiki 0
Find remote jobs at our new job board 99remotejobs.com. There are 30 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
github.com/nanovms