sayn
beneath
sayn | beneath | |
---|---|---|
2 | 2 | |
117 | 78 | |
0.9% | - | |
6.8 | 0.0 | |
4 days ago | about 2 years ago | |
Python | Go | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sayn
-
Average reply times from some of my Facebook friends over the last few years [OC], full article here: https://medium.com/@timsugaipov/taking-your-facebook-messenger-data-further-f9da079b1409?source=friends_link&sk=3bd04bb35ad9a4b6f586300e52f96e4f
Data Processing: SAYN
-
Introducing SAYN: A Simple Yet Powerful Data Processing Framework.
We believe simplicity to be crucial when maintaining pipelines at scale. However, we also believe that simplicity should not come at the expense of flexibility. This is why we have built our own open source data processing framework: SAYN. SAYN is designed to empower analytics teams by being simple, flexible and centralised. It democratises the contribution to data processes within an analytics team, enables full flexibility and helps save a lot of time through automation.
beneath
-
Analyzing the r/wallstreetbets hivemind — August 2021
If you’re interested, here’s the raw Reddit data, my data pipeline, the derived data, and my Jupyter notebook. I’m using Beneath, an open data platform I’m building, to stream and save the data.
-
[Self Promotion] Reddit r/wallstreetbets posts and comments in real-time
The scraper (which uses Async PRAW) is open source here: https://github.com/beneath-hq/beneath/tree/master/examples/reddit
What are some alternatives?
dbt-databricks - A dbt adapter for Databricks.
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]
dataform - Dataform is a framework for managing SQL based data operations in BigQuery
whylogs - An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
tinvois-parser - Extract receipt info
optimus - Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
data-engineering-wiki - The best place to learn data engineering. Built and maintained by the data engineering community.
pachyderm - Data-Centric Pipelines and Data Versioning
yaetos - Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
flyte - Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
dbt - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. [Moved to: https://github.com/dbt-labs/dbt-core]
oomstore - Lightweight and Fast Feature Store Powered by Go (and Rust).