bebe
gazpacho
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
bebe
-
Ask HN: What are some tools / libraries you built yourself?
I built daria (https://github.com/MrPowers/spark-daria) to make it easier to write Spark and spark-fast-tests (https://github.com/MrPowers/spark-fast-tests) to provide a good testing workflow.
quinn (https://github.com/MrPowers/quinn) and chispa (https://github.com/MrPowers/chispa) are the PySpark equivalents.
Built bebe (https://github.com/MrPowers/bebe) to expose the Spark Catalyst expressions that aren't exposed to the Scala / Python APIs.
Also build spark-sbt.g8 to create a Spark project with a single command: https://github.com/MrPowers/spark-sbt.g8
-
Finished porting all the Spark SQL functions that aren't exposed via the Scala API to the bebe project
The bebe project fills all these gaps in the Scala API. See the project README for examples on how each function works.
-
Making the Spark DataFrame composition type safe(r)
See here for a more detailed discussion and let me know your thoughts!!
gazpacho
-
Ask HN: What are some tools / libraries you built yourself?
I've been working on gazpacho [1] for last two years.
It's a general purpose web scraping library for Python that replaces BeautifulSoup + requests for most projects.
Just surpassed ~2K downloads every week!
[1] https://github.com/maxhumber/gazpacho
What are some alternatives?
frameless - Expressive types for Spark.
selectolax - Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
kondo - Cleans dependencies and build artifacts from your projects.
lxml - The lxml XML toolkit for Python
sqldb-logger - A logger for Go SQL database driver without modifying existing *sql.DB stdlib usage.
html5lib - Standards-compliant library for parsing and serializing HTML documents and fragments in Python
gutenberg - A fast static site generator in a single binary with everything built-in. https://www.getzola.org
xmltodict - Python module that makes working with XML feel like you are working with JSON
yadm - Yet Another Dotfiles Manager
xhtml2pdf - A library for converting HTML into PDFs using ReportLab
Shynet - Modern, privacy-friendly, and detailed web analytics that works without cookies or JS.
untangle - Converts XML to Python objects