URLExtract
furl
URLExtract | furl | |
---|---|---|
1 | 1 | |
236 | 2,574 | |
- | - | |
5.7 | 0.0 | |
3 months ago | about 1 year ago | |
Python | Python | |
MIT License | The Unlicense |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
URLExtract
-
Famous HNers and Their Sites
That'd explain some of the holes mentioned in these comments. I think you just want to match any "word" containing ".[valid TLD]" and then exclude invalid URLs ("@" in first part indicating email, etc).
I've been using this[0] Python library which seemed good enough for my needs in some scraping project.
0: https://github.com/lipoja/URLExtract
furl
What are some alternatives?
MPKExtractor - Simple extractor script for Diablo Immortal's .MPK files
purl - A simple, immutable URL class with a clean API for interrogation and manipulation.
proxy_web_crawler - Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords
short_url - Python implementation for generating Tiny URL- and bit.ly-like URLs.
office365-audit-log-collector - Collect / retrieve Office365, AzureAD and DLP audit logs and output to PRTG, Azure Log Analytics Workspace, SQL, Graylog, Fluentd, and/or file output.
webargs - A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.
yarl - Yet another URL library
pyshorteners - :electric_plug: Generating short urls with python has never been easier
courlan - Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters
url_cleaner - A package for removing tracing parameters from URLs. This package supports automatically updating filtering rules from Adguard.
OAuthLib - A generic, spec-compliant, thorough implementation of the OAuth request-signing logic
dottorrent - High-level Python 3 library for creating .torrent files