Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Data Analysis Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
-
Metabase
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
CyberChef
The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
-
GoAccess
GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
OpenRefine
OpenRefine is a free, open source power tool for working with messy data and improving it
-
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
gonum
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Superset is absolutely phenomenal. I really hope Microsoft eventually releases all of their customizations they made to it internally to the OS community someday.
https://www.youtube.com/watch?v=RY0SSvSUkMA
https://github.com/apache/superset/discussions/20094
Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).
Dash is a Python framework that enables you to build interactive frontend applications without writing a single line of Javascript. Internally and in projects we like to use it in order to build a quick proof of concept for data driven applications because of the nice integration with Plotly and pandas. For this post, I'm going to assume that you're already familiar with Dash and won't explain that part in detail. Instead, we'll focus on what's necessary to make it run serverless.
Remote Code Execution via H2
Project mention: Creating a Sales Analysis Application with Streamlit: A Practical Approach to Business Intelligence | dev.to | 2024-04-192.-Go to https://streamlit.io, log in, and create a new app from your GitHub repository.
Project mention: Show HN: Dropbase – Build internal web apps with just Python | news.ycombinator.com | 2023-12-05There's also that library all the AI models started using that gives you a public URL to share. After researching it: https://www.gradio.app/ is the link.
It's used specifically for making simple UIs for machine learning apps. But I guess technically you could use it for anything.
**[I.am.ai AI Expert Roadmap](https://i.am.ai/roadmap)**: This roadmap focuses more on AI and includes various aspects of machine learning and deep learning. It's suitable for those who want to delve deeper into AI, particularly in cutting-edge research and applications.
Get started with Data Science in the Data Science for Beginners curricula.
Then we take the encrypted text and use CyberChef to decrypt it.
If one wants server-side metrics with a little more info than the author's "hacky little script", there's always goaccess [1], which functions in broadly the same way. I even use it with Firebase Hosting-hosted sites via [2] (which I wrote).
[1] http://goaccess.io/
[2] https://github.com/Silicon-Ally/gcp-clf
Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.
It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.
Project mention: Ask HN: What Underrated Open Source Project Deserves More Recognition? | news.ycombinator.com | 2024-03-07"OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data." https://openrefine.org/
Project mention: Show HN: Use an "eraser" to clean data on flight without breaking your workflow | news.ycombinator.com | 2024-03-15
Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.
Project mention: A Comprehensive Guide for Building Rag-Based LLM Applications | news.ycombinator.com | 2023-09-13This is a feature in many commercial products already, as well as open source libraries like PyOD. https://github.com/yzhao062/pyod
The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.
But if you want to see what can be done for numeric stuff, check out gonum. Personally, I still wouldn't use Go, and I rather suspect it's still pretty easy to reach for something like what you're trying to do and not find it because Go just can't write that type sensibly, but you can at least see what is available, written by people who disagree with me about Go not being a great language for this.
Data Analysis related posts
- Multiwoven Reverse ETL (0.2.0) – Open-Source Alternative to Hightouch and Census
- Show HN: Privacy-first analytics in natural language in the browser
- Sqlime: Online SQLite Playground
- Plotting Financial Data in Kotlin with Kandy
- PicoCTF 2024: packer
- The Design Philosophy of Great Tables (Software Package)
- Unbreakable 2024: secrets-of-winter
-
A note from our sponsor - InfluxDB
www.influxdata.com | 23 Apr 2024
Index
What are some of the best open-source Data Analysis projects? This list will help you:
Project | Stars | |
---|---|---|
1 | superset | 58,737 |
2 | scikit-learn | 58,046 |
3 | Pandas | 41,923 |
4 | Metabase | 36,417 |
5 | streamlit | 31,506 |
6 | gradio | 28,556 |
7 | AI-Expert-Roadmap | 28,388 |
8 | Data-Science-For-Beginners | 26,290 |
9 | CyberChef | 25,384 |
10 | GoAccess | 17,467 |
11 | best-of-ml-python | 15,302 |
12 | airbyte | 13,923 |
13 | ydata-profiling | 12,022 |
14 | OpenRefine | 10,448 |
15 | pandas_exercises | 10,159 |
16 | pygwalker | 9,759 |
17 | statsmodels | 9,534 |
18 | mlcourse.ai | 9,390 |
19 | cleanlab | 8,592 |
20 | akshare | 8,321 |
21 | pyod | 7,928 |
22 | cudf | 7,257 |
23 | gonum | 7,249 |
Sponsored