wi-page
danker
wi-page | danker | |
---|---|---|
2 | 1 | |
1 | 53 | |
- | - | |
0.0 | 8.0 | |
about 3 years ago | 26 days ago | |
Python | Python | |
- | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
wi-page
-
Ask HN: What are the best tools for web scraping in 2022?
[4] https://github.com/altilunium/wi-page (Scrap wikipedia to get most active contributors that contribute to a certain article)
- Show HN: Wi-Page – Rank Wikipedia Article's Contributors by Byte Counts
danker
-
How to get the links of 15,000 Wiki-articles
Oh cool, I had my students do PageRank when I taught that class. Implementing the actual PageRank algorithm should be pretty easy, gathering and processing the data into usable form is harder, especially in Matlab which does not excel at that kind of task. You might compare your program to danker for verification and validation. I think Wikipedia also makes their page view / article popularity data available, which might be of interest to you.
What are some alternatives?
kiwix-hotspot - Kiwix Hotspot Image Creator (Desktop) for Windows/macOS/Linux
Github-Ranking - :star:Github Ranking:star: Github stars and forks ranking list. Github Top100 stars list of different languages. Automatically update daily. | Github仓库排名,每日自动更新
estela - estela, an elastic web scraping cluster 🕸
wembedder - Wikidata embedding
scrapy-redis - Redis-based components for Scrapy.
polite - Be nice on the web
curl-impersonate - curl-impersonate: A special build of curl that can impersonate Chrome & Firefox
chrome-aws-lambda - Chromium Binary for AWS Lambda and Google Cloud Functions
pup - Parsing HTML at the command line
linkedom - A triple-linked lists based DOM implementation.
browserless - Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.
Playwright - Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.