Cascadia.jl
gazpacho
Our great sponsors
Cascadia.jl | gazpacho | |
---|---|---|
2 | 1 | |
116 | 730 | |
0.0% | - | |
3.2 | 3.2 | |
almost 2 years ago | 5 months ago | |
Julia | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Cascadia.jl
-
I Need to Convert HTML Files to CSV
https://github.com/Algocircle/Cascadia.jl is a julia library for css-style queries on Gumbo.jl parsed HTML.
-
Recommendations on how to start web scraping with julia for price updates? (if possible)
I haven't seen that tutorial, but I agree that HTTP.jl, Gumbo.jl, and Cascadia.jl are the way. I used them to export public wishlists from bookdepository, which has no API nor a built in exporting tool.
gazpacho
-
Ask HN: What are some tools / libraries you built yourself?
I've been working on gazpacho [1] for last two years.
It's a general purpose web scraping library for Python that replaces BeautifulSoup + requests for most projects.
Just surpassed ~2K downloads every week!
[1] https://github.com/maxhumber/gazpacho
What are some alternatives?
HTTP.jl - HTTP for Julia
selectolax - Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
Gumbo.jl - Julia wrapper around Google's gumbo C library for parsing HTML
lxml - The lxml XML toolkit for Python
autoscraper - A Smart, Automatic, Fast and Lightweight Web Scraper for Python
html5lib - Standards-compliant library for parsing and serializing HTML documents and fragments in Python
dude - dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
xmltodict - Python module that makes working with XML feel like you are working with JSON
Huginn - Create agents that monitor and act on your behalf. Your agents are standing by!
xhtml2pdf - A library for converting HTML into PDFs using ReportLab
untangle - Converts XML to Python objects
cssutils