Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 data-analytic Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
danfojs
Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
-
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
-
diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
-
zui
Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Data-Analyst-Roadmap
I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge
-
ethereum-etl-airflow
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
-
bitcoin-etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
-
tellery
Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.
-
desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Superset is absolutely phenomenal. I really hope Microsoft eventually releases all of their customizations they made to it internally to the OS community someday.
https://www.youtube.com/watch?v=RY0SSvSUkMA
https://github.com/apache/superset/discussions/20094
> YAML, pivoting being done in the frontend, no symmetric aggregates
(one of the maintainers of Lightdash) You touched on some of our most interesting problems here! Would be especially interested to hear about what you liked / didn't like about symmetric aggregates in Looker and how you find dev with YAML. If you have an idea of how you'd like these to look in Lightdash, the team would be really open to making that a reality.
For pivoting in the backend, this is coming! Issue here: https://github.com/lightdash/lightdash/issues/2907
Project mention: Open-Source Observability for the Semantic Layer | news.ycombinator.com | 2024-01-16Think of Datadrift as a simple & open-source Monte Carlo for the semantic layer era. The repo is at https://github.com/data-drift/data-drift
Datadrift started as an internal tool built at our former company, a large European B2B Fintech. We had data reliability challenges impacting key metrics used for financial and regulatory reporting.
However, when we tried existing data quality tools we where always frustrated. They provide row-level static testing (eg. uniqueness or nullness) which does not address time-varying metrics like revenues. And commercial observability solutions costs $manyK a month and brings compliance and security overhead.
We designed Datadrift to solve these problems. Datadrift works by simply adding a monitor where your metric is computed. It then understands how your metric is computed and on which upstream tables it depends. When an issue occurs, it pinpoints exactly which rows have been updated and introducing the change.
You can also set up alerting and customise it. For example, you can decide to open and assign an Github issue to the analyst owning the revenue metric when a +10% change is detected. We tried to make it easy to customise and developer friendly.
We are thinking of adding features around root cause analysis automation/issues pattern analysis to help data teams improve metrics quality overtime. We’d love to hear your feature requests.
Datadrift is built with Python and Go, and licensed under GPL. Our docs are here: https://github.com/data-drift/data-drift?tab=readme-ov-file#...
Dev set up and demo : https://app.claap.io/sammyt/drift-db-demo-a18-c-ApwBh9kt4p-0...
We’re very eager to get your feedback!
data-analytics related posts
- Open-Source Observability for the Semantic Layer
- Show HN: Desbordante 1.0.0 Released
- Explainable (Structured) Machine Learning Algorithm
- Would learn Go to contribute to an OS project ? Or should I stick to python ?
- public-datasets: NEW Data - star count:181.0
- public-datasets: NEW Data - star count:181.0
- public-datasets: NEW Data - star count:181.0
-
A note from our sponsor - InfluxDB
www.influxdata.com | 27 Apr 2024
Index
What are some of the best open-source data-analytic projects? This list will help you:
Project | Stars | |
---|---|---|
1 | superset | 58,737 |
2 | awesome-bigdata | 12,792 |
3 | danfojs | 4,649 |
4 | lightdash | 3,399 |
5 | lance | 3,256 |
6 | diffgram | 1,796 |
7 | zui | 1,733 |
8 | dremio-oss | 1,298 |
9 | insights | 1,057 |
10 | data-science-with-ruby | 693 |
11 | isp-data-pollution | 566 |
12 | Data-Analyst-Roadmap | 507 |
13 | ethereum-etl-airflow | 387 |
14 | bitcoin-etl | 386 |
15 | ActivitySchema | 373 |
16 | tellery | 351 |
17 | traffic | 344 |
18 | desbordante-core | 321 |
19 | data-drift | 298 |
20 | SQL-for-Data-Analytics | 252 |
21 | Morpheus | 236 |
22 | snowpark-python | 229 |
23 | bloxs | 213 |
Sponsored