beneath
sayn
Our great sponsors
beneath | sayn | |
---|---|---|
2 | 2 | |
78 | 117 | |
- | 0.0% | |
0.0 | 7.1 | |
about 2 years ago | 14 days ago | |
Go | Python | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
beneath
-
Analyzing the r/wallstreetbets hivemind — August 2021
If you’re interested, here’s the raw Reddit data, my data pipeline, the derived data, and my Jupyter notebook. I’m using Beneath, an open data platform I’m building, to stream and save the data.
-
[Self Promotion] Reddit r/wallstreetbets posts and comments in real-time
The scraper (which uses Async PRAW) is open source here: https://github.com/beneath-hq/beneath/tree/master/examples/reddit
sayn
-
Average reply times from some of my Facebook friends over the last few years [OC], full article here: https://medium.com/@timsugaipov/taking-your-facebook-messenger-data-further-f9da079b1409?source=friends_link&sk=3bd04bb35ad9a4b6f586300e52f96e4f
Data Processing: SAYN
-
Introducing SAYN: A Simple Yet Powerful Data Processing Framework.
We believe simplicity to be crucial when maintaining pipelines at scale. However, we also believe that simplicity should not come at the expense of flexibility. This is why we have built our own open source data processing framework: SAYN. SAYN is designed to empower analytics teams by being simple, flexible and centralised. It democratises the contribution to data processes within an analytics team, enables full flexibility and helps save a lot of time through automation.
What are some alternatives?
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]
dbt-databricks - A dbt adapter for Databricks.
whylogs - An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
dataform - Dataform is a framework for managing SQL based data operations in BigQuery
optimus - Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
tinvois-parser - Extract receipt info
pachyderm - Data-Centric Pipelines and Data Versioning
data-engineering-wiki - The best place to learn data engineering. Built and maintained by the data engineering community.
flyte - Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
yaetos - Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
oomstore - Lightweight and Fast Feature Store Powered by Go (and Rust).
dbt - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. [Moved to: https://github.com/dbt-labs/dbt-core]