hextuples
webscraping-open
hextuples | webscraping-open | |
---|---|---|
2 | 2 | |
28 | - | |
- | - | |
2.2 | - | |
about 1 year ago | - | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hextuples
-
Update of the RDF and SPARQL (RDF star) families of specifications
The problem is that they aren’t tabular and the examples they give which make them look simple are incomplete. For example, they rarely show examples that specify the language or data type. A truly tabular format is hextuples. https://github.com/ontola/hextuples
- Web Scraping Open Knowledge
webscraping-open
-
Ask HN: What are the best tools for web scraping in 2022?
I’m collecting my experience in using these tools in this “web scraping open knowledge project” on github (https://github.com/reanalytics-databoutique/webscraping-open...) and on my substack (http://thewebscraping.club/) for longer free content
-
Web Scraping Open Knowledge
On the page about canvas fingerprinting[0], it only mentions Cloudflare. From what I can tell, reCaptcha v3 also uses canvas fingerprinting [1]
[0] https://github.com/reanalytics-databoutique/webscraping-open...
[1] https://brianwjoe.com/2019/02/06/how-does-recaptcha-v3-work/
What are some alternatives?
Webscraping Open Project - The web scraping open project repository aims to share knowledge and experiences about web scraping with Python [Moved to: https://github.com/TheWebScrapingClub/webscraping-from-0-to-hero]
openstates-scrapers - source for Open States scrapers
linkedom - A triple-linked lists based DOM implementation.
docker-selenium-lambda - The simplest demo of chrome automation by python and selenium in AWS Lambda
undetected-chromedriver - Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
lv2 - The LV2 audio plugin specification
data.gov - Main repository for the data.gov service
jq - Command-line JSON processor [Moved to: https://github.com/jqlang/jq]
pup - Parsing HTML at the command line