wtf_wikipedia
duckling
Our great sponsors
wtf_wikipedia | duckling | |
---|---|---|
1 | 13 | |
743 | 4,019 | |
- | 0.7% | |
8.0 | 0.0 | |
13 days ago | 2 months ago | |
JavaScript | Haskell | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
wtf_wikipedia
-
Experimental library for scraping websites using OpenAI's GPT API
This may finally be a solution for scraping wikipedia and turning it into structured data. (Or do we even need structured data in the post-AI age?)
Mediawiki is notorious for being hard to parse:
* https://github.com/spencermountain/wtf_wikipedia#ok-first- - why it's hard
* https://techblog.wikimedia.org/2022/04/26/what-it-takes-to-p... - an entire article about parsing page TITLES
* https://osr.cs.fau.de/wp-content/uploads/2017/09/wikitext-pa... - a paper published about a wikitext parser
duckling
-
Experimental library for scraping websites using OpenAI's GPT API
For the reasons others have said I don't see it replacing 'traditional' scraping soon. But I am looking forward to it replacing current methods of extracting data from the scraped content.
I've been using Duckling [0] for extracting fuzzy dates and times from text. It does a good job but I needed a custom build with extra rules to make that into a great job. And that's just for dates, 1 of 13 dimensions supported. Being able to use an AI that handles them with better accuracy will be fantastic.
Does a specialised model trained to extract times and dates already exist? It's entity tagging but a specialised form (especially when dealing with historical documents where you may need Gregorian and Julian calendars).
[0] https://github.com/facebook/duckling
-
Automatisiert Kalendereinträge erstellen aus Mails mit Formatlosen Datumsangaben
Ah, sorry: https://github.com/facebook/duckling
-
Transforming free-form geospatial directions into addresses - SOTA?
To understand what relative distance and direction is indicated from the reference point, I'd look into something like Facebook & Wit.AI's Duckling, and a custom classifier to identify if it's on the reference point ("corner of"), or some distance from ("200 meters southwest"). If you can parse out a distance and direction, then it's all logic to plot the point.
-
Programming languages endorsed for server-side use at Meta
It also powers the backend of Wit.ai which FB owns. Wit's open-source entity parser, duckling, is written entirely in Haskell. https://github.com/facebook/duckling
- Data Cleaning using Machine Learning?
-
Unsplash chatbot for Discord, Pt. 2: more ways to bring pictures to Discord
Our RandomPicForLater intent will have one slot called reminderTime and will be of type @duckling.time. Duckling is a library that extracts entities from text, and it is one of the tools used in JAICP for this purpose. Entity types in Duckling are called dimensions and there's a number of them built in, among them is Time which suits us perfectly since we need to ask users when they want us to schedule a post for and then parse a text input into a datetime object.
-
Dependencies difference between cabal and stack
I'm working on a pretty interesting project right now and I'm having different results depending on the build tool used: with cabal, the test suite fails but it passes with stack.
-
Running Duckling on Windows
Try downloading the v0.2.0.0 release, extracting it somewhere, opening that location in powershell, and running these commands:
-
[ANN] Duckling v0.2.0.0 released
Duckling (https://github.com/facebook/duckling) is a library for parsing text into structured data.
-
Extract name:value relationships from plain text
If you really want high precision, Duckling is a good project to check out https://github.com/facebook/duckling
What are some alternatives?
sdow - Six Degrees of Wikipedia
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
anon - tweet about anonymous Wikipedia edits from particular IP address ranges
ctparse - Parse natural language time expressions in python
autoscraper - A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Giveme5W1H - Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
scrapeghost - 👻 Experimental library for scraping websites using OpenAI's GPT API.
syntaxdot - Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.
Kornia - Geometric Computer Vision Library for Spatial AI
BLINK - Entity Linker solution
semantic-source - Parsing, analyzing, and comparing source code across many languages
projects - 🪐 End-to-end NLP workflows from prototype to production