data-analytics

Top 23 data-analytic Open-Source Projects

  • superset

    Apache Superset is a Data Visualization and Data Exploration Platform

  • Project mention: Apache Superset | news.ycombinator.com | 2024-02-26

    Superset is absolutely phenomenal. I really hope Microsoft eventually releases all of their customizations they made to it internally to the OS community someday.

    https://www.youtube.com/watch?v=RY0SSvSUkMA

    https://github.com/apache/superset/discussions/20094

  • awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  • Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • danfojs

    Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

  • lightdash

    Self-serve BI to 10x your data team ⚡️

  • Project mention: Apache Superset | news.ycombinator.com | 2024-02-26

    > YAML, pivoting being done in the frontend, no symmetric aggregates

    (one of the maintainers of Lightdash) You touched on some of our most interesting problems here! Would be especially interested to hear about what you liked / didn't like about symmetric aggregates in Looker and how you find dev with YAML. If you have an idea of how you'd like these to look in Lightdash, the team would be really open to making that a reality.

    For pivoting in the backend, this is coming! Issue here: https://github.com/lightdash/lightdash/issues/2907

  • lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

  • Project mention: The Nimble File Format by Meta | news.ycombinator.com | 2024-04-25
  • diffgram

    The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

  • zui

    Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • dremio-oss

    Dremio - the missing link in modern data

  • insights

    Open Source Self-Hosted Business Intelligence Platform

  • data-science-with-ruby

    Practical Data Science with Ruby based tools.

  • isp-data-pollution

    ISP Data Pollution to Protect Private Browsing History with Obfuscation

  • Data-Analyst-Roadmap

    I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge

  • ethereum-etl-airflow

    Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee

  • Project mention: ethereum-etl-airflow: NEW Data - star count:358.0 | /r/algoprojects | 2023-07-10
  • bitcoin-etl

    ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

  • ActivitySchema

    Repository for the ActivitySchema spec and supporting materials

  • tellery

    Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.

  • traffic

    A toolbox for processing and analysing air traffic data (by xoolive)

  • Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11
  • desbordante-core

    Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

  • Project mention: Show HN: Desbordante 1.0.0 Released | news.ycombinator.com | 2023-12-11
  • data-drift

    Metrics Observability & Troubleshooting

  • Project mention: Open-Source Observability for the Semantic Layer | news.ycombinator.com | 2024-01-16

    Think of Datadrift as a simple & open-source Monte Carlo for the semantic layer era. The repo is at https://github.com/data-drift/data-drift

    Datadrift started as an internal tool built at our former company, a large European B2B Fintech. We had data reliability challenges impacting key metrics used for financial and regulatory reporting.

    However, when we tried existing data quality tools we where always frustrated. They provide row-level static testing (eg. uniqueness or nullness) which does not address time-varying metrics like revenues. And commercial observability solutions costs $manyK a month and brings compliance and security overhead.

    We designed Datadrift to solve these problems. Datadrift works by simply adding a monitor where your metric is computed. It then understands how your metric is computed and on which upstream tables it depends. When an issue occurs, it pinpoints exactly which rows have been updated and introducing the change.

    You can also set up alerting and customise it. For example, you can decide to open and assign an Github issue to the analyst owning the revenue metric when a +10% change is detected. We tried to make it easy to customise and developer friendly.

    We are thinking of adding features around root cause analysis automation/issues pattern analysis to help data teams improve metrics quality overtime. We’d love to hear your feature requests.

    Datadrift is built with Python and Go, and licensed under GPL. Our docs are here: https://github.com/data-drift/data-drift?tab=readme-ov-file#...

    Dev set up and demo : https://app.claap.io/sammyt/drift-db-demo-a18-c-ApwBh9kt4p-0...

    We’re very eager to get your feedback!

  • SQL-for-Data-Analytics

    Perform fast and efficient data analysis with the power of SQL

  • Morpheus

    The foundational library of the Morpheus data science framework

  • snowpark-python

    Snowflake Snowpark Python API

  • bloxs

    Build dashboards in Jupyter Notebook with numeric and chart boxes

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

data-analytics related posts

Index

What are some of the best open-source data-analytic projects? This list will help you:

Project Stars
1 superset 58,737
2 awesome-bigdata 12,792
3 danfojs 4,649
4 lightdash 3,399
5 lance 3,256
6 diffgram 1,796
7 zui 1,733
8 dremio-oss 1,298
9 insights 1,057
10 data-science-with-ruby 693
11 isp-data-pollution 566
12 Data-Analyst-Roadmap 507
13 ethereum-etl-airflow 387
14 bitcoin-etl 386
15 ActivitySchema 373
16 tellery 351
17 traffic 344
18 desbordante-core 321
19 data-drift 298
20 SQL-for-Data-Analytics 252
21 Morpheus 236
22 snowpark-python 229
23 bloxs 213

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com