Top 23 Panda Open-Source Projects

Pandas

393 41,923 10.0 Python

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Project mention: Deploying a Serverless Dash App with AWS SAM and Lambda | dev.to | 2024-03-04

Dash is a Python framework that enables you to build interactive frontend applications without writing a single line of Javascript. Internally and in projects we like to use it in order to build a quick proof of concept for data driven applications because of the nice integration with Plotly and pandas. For this post, I'm going to assume that you're already familiar with Dash and won't explain that part in detail. Instead, we'll focus on what's necessary to make it run serverless.

PythonDataScienceHandbook

98 41,407 1.0 Jupyter Notebook

Python Data Science Handbook: full text in Jupyter Notebooks

Project mention: About Data analyst, data scientist and data engineer, resources and experiences | dev.to | 2024-03-26

Python Data Science Handbook

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
30-Days-Of-Python

66 31,031 2.4 Python

30 days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than100 days, follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw

Project mention: Top GitHub Resources to Level Up Your Python game | dev.to | 2023-11-27

🎇 Repository Link: 30 Days of Python

tqdm

33 27,405 7.3 Python

:zap: A Fast, Extensible Progress Bar for Python and CLI

Project mention: Neat Parallel Output in Python | news.ycombinator.com | 2024-02-25

yeah my code needs to use multiprocessing, which does not play nice with tqdm. thanks for the tip about positions though, that helped me search more effectively and came up with two promising comments. unmerged / require some workarounds, but might just work:
https://github.com/tqdm/tqdm/issues/1000#issuecomment-184208...

data-science-ipython-notebooks

1 26,459 0.0 Python

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Data-Science-For-Beginners

15 26,290 6.5 Jupyter Notebook

10 Weeks, 20 Lessons, Data Science for All!

Project mention: Welcome to 14 days of Data Science! | dev.to | 2024-03-07

Get started with Data Science in the Data Science for Beginners curricula.

datasets

15 18,376 9.5 Python

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑‍💻 🥇 | dev.to | 2023-10-19

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ydata-profiling

43 12,022 8.5 Python

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26

Dask

32 11,982 9.7 Python

Parallel computing with task scheduling

Project mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15

seaborn

76 11,946 8.5 Python

Statistical data visualization in Python

Project mention: Apache Superset | news.ycombinator.com | 2024-02-26

If you are doing data analysis I don't think any of the 3 pieces of software you mentioned are going to be that helpful.
I see these products as tools for data visualization and reporting i.e. presenting prepared datasets to users in a visually appealing way. They aren't as well suited for serious analytics.
I can't comment on Superset or Tableau but I am familiar with Power BI (it has been rolled out across my org), the type of statistics you can do with it are fairly rudimentary. If you need to do any thing beyond summarizing (counts, averages, min, max etc). It is not particularly easy.
For data analysis I use SAS or R. This software allows you do things like multivariate regression, timeseries forecasting, PCA, Cluster analysis etc. There is also plotting capability.
Both these products are kind of old school, I've been using them since early 2000's, the "new school" seems to be Python. Pretty much all the recent data science people in my organization use Python. Particularly Pandas and libraries like Seaborn (https://seaborn.pydata.org/).
The "power" users of Power BI in my organization tend to be finance/HR people for use cases like drill down into cost figures or Interactively presenting KPI's and other headline figures to management things like that.

yfinance

59 11,778 9.0 Python

Download market data from Yahoo! Finance's API

Project mention: How to catch exceptions in library? | /r/learnpython | 2023-07-06

If you check the file here - https://github.com/ranaroussi/yfinance/blob/main/yfinance/base.py - you can see this is communicated via the "raise Exception('%s: %s' % (self.ticker, err_msg))" line. I'm trying to use the following to catch the exception but no luck.

pandas_exercises

10 10,159 0.0 Jupyter Notebook

Practice your pandas skills!
pygwalker

22 9,759 9.6 Python

PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

Project mention: Show HN: Use an "eraser" to clean data on flight without breaking your workflow | news.ycombinator.com | 2024-03-15

modin

11 9,465 9.6 Python

Modin: Scale your Pandas workflows by changing a single line of code

Project mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15

mlcourse.ai

85 9,390 3.4 Python

Open Machine Learning Course

Project mention: Open Machine Learning Course | news.ycombinator.com | 2023-10-22

visidata

36 7,409 9.8 Python

A terminal spreadsheet multitool for discovering and arranging data

Project mention: Fx – Terminal JSON Viewer | news.ycombinator.com | 2023-09-19

[4] "Is it possible to "flatten" structured data (like JSON?)": https://github.com/saulpw/visidata/discussions/1605

cudf

23 7,274 9.9 C++

cuDF - GPU DataFrame Library

Project mention: A Polars exploration into Kedro | dev.to | 2023-05-17

The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.

py

5 6,626 0.0 Jupyter Notebook

Repository to store sample python programs for python learning
pixie

19 5,273 9.4 C++

Instant Kubernetes-Native Application Observability

Project mention: Grafana Beyla: OSS eBPF auto-instrumentation for application observability | news.ycombinator.com | 2023-09-13

lux

6 4,915 2.2 Python

Automatically visualize your pandas dataframe via a single print! 📊 💡 (by lux-org)
pandas-ta

17 4,732 0.0 Python

Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 150+ Indicators

Project mention: Help recreating ta-lib python MACDFIX in pure python | /r/algotrading | 2023-05-03

I do not know what is the difference between MACD and MACDFIX but maybe you can take a look how MACD is implemented in pandas_ta library and modify it a bit to achive a behavior you want.

danfojs

2 4,649 0.6 TypeScript

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
orange

27 4,604 9.6 Python

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Pandas related posts

Show HN: Hashquery, a Python library for defining reusable analysis
1 project | news.ycombinator.com | 23 Apr 2024
The Design Philosophy of Great Tables (Software Package)
7 projects | news.ycombinator.com | 4 Apr 2024
Show HN: Use an "eraser" to clean data on flight without breaking your workflow
1 project | news.ycombinator.com | 15 Mar 2024
Ibis: The portable Python dataframe library
1 project | news.ycombinator.com | 13 Mar 2024
Ask HN: Problems worth solving with a low-code back end?
2 projects | news.ycombinator.com | 12 Mar 2024
Welcome to 14 days of Data Science!
1 project | dev.to | 7 Mar 2024
Excel Anonymizer-A Python script to anonymize data in Excel files
1 project | news.ycombinator.com | 5 Mar 2024
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source Panda projects? This list will help you:

	Project	Stars
1	Pandas	41,923
2	PythonDataScienceHandbook	41,407
3	30-Days-Of-Python	31,031
4	tqdm	27,405
5	data-science-ipython-notebooks	26,459
6	Data-Science-For-Beginners	26,290
7	datasets	18,376
8	ydata-profiling	12,022
9	Dask	11,982
10	seaborn	11,946
11	yfinance	11,778
12	pandas_exercises	10,159
13	pygwalker	9,759
14	modin	9,465
15	mlcourse.ai	9,390
16	visidata	7,409
17	cudf	7,274
18	py	6,626
19	pixie	5,273
20	lux	4,915
21	pandas-ta	4,732
22	danfojs	4,649
23	orange	4,604