Pandas

Top 23 Panda Open-Source Projects

  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: Why isn't my CSV reader code working? | reddit.com/r/learnpython | 2023-05-25
  • PythonDataScienceHandbook

    Python Data Science Handbook: full text in Jupyter Notebooks

    Project mention: Book Recommendations | reddit.com/r/datascience | 2023-05-29

    I don't know what tools you will be using but if you will be using Python you can start with Python Data Science Handbook by Jake VanderPlas and Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting DataData Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data which gives a very good outlook on the data science and big data frame work. PS: Jake's book is also available as jupyter notebooks so you can read and run the code at the same time.

  • CodiumAI

    TestGPT | Generating meaningful tests for busy devs. Get non-trivial tests (and trivial, too!) suggested right inside your IDE, so you can code smart, create more value, and stay confident when you push.

  • data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • tqdm

    A Fast, Extensible Progress Bar for Python and CLI

    Project mention: I have this function I have written that shows how much of a percentage is done given progress in a loop..so..if you are iterating through a loop that is 500 long, at 200 it says "40%",240 "48%", and so on, but, how do you just change the value on the screen, not print a new one on a new line? | reddit.com/r/learnpython | 2023-04-11

    I can recommend you the package tqdm (https://github.com/tqdm/tqdm) You can replace the standard for statement with it, or use it with any other iterable. By default, it gives you a progress bar with a percentage and ETA, but you can also configure it to only print the percentage, if you want that. If you want to use print statements, adding \r at the beginning and not putting a line end should also do the trick.

  • 30-Days-Of-Python

    30 days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than100 days, follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw

    Project mention: 30 May 2023 - Daily Chat Thread | reddit.com/r/indonesia | 2023-05-29
  • Data-Science-For-Beginners

    10 Weeks, 20 Lessons, Data Science for All!

    Project mention: How do I reset my career after already getting my masters? | reddit.com/r/AskUK | 2022-08-20
  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: How to Train Large Models on Many GPUs? | news.ycombinator.com | 2023-02-11

    https://github.com/huggingface/datasets

    https://github.com/huggingface/transformers

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • Dask

    Parallel computing with task scheduling

    Project mention: A peek into Location Data Science at Ola | dev.to | 2022-09-26

    Data scientists work on phenomenally large datasets, and Dask is a handy tool for exploration within the confines of a single cloud VM or their local PCs. Location data visualization is an essential part of deciding further algorithm development and roadmap for projects. This lays the foundation for data engineering and science to work at scale, with petabytes of data.

  • seaborn

    Statistical data visualization in Python

    Project mention: Introducing seaborn-polars, a package allowing to use Polars DataFrames and LazyFrames with Seaborn | reddit.com/r/Python | 2023-05-15

    I'm sure that your package is great, but seaborn will soon support the interchange protocol and will work relatively seamlessly with polars. https://github.com/mwaskom/seaborn/pull/3340

  • ydata-profiling

    Create HTML profiling reports from pandas DataFrame objects

    Project mention: Ydata-Profiling and Dask | news.ycombinator.com | 2023-05-19

    Hey guys,

    We've been recently at the Dask Demo Day and we're hoping to launch a new feature on ydata-profiling, with the support for Dask dataframes!

    We're looking for Dask Wizards to start collaborating on this feature, so if you're interested, please join us to define the roadmap of the project and start making it real

    Current GitHub branch is here: https://github.com/ydataai/ydata-profiling/tree/feat/dask

    Dedicated dask channel here: https://discord.gg/EHDBuSSDuy

  • yfinance

    Download market data from Yahoo! Finance's API

    Project mention: Pandas Datareader | reddit.com/r/Python | 2023-05-26

    You should give yfinance a try (Github)

  • pandas_exercises

    Practice your pandas skills!

    Project mention: Does CS 1301 cover topics like NumPy and pandas? | reddit.com/r/OMSA | 2023-02-09

    pandas notebook

  • mlcourse.ai

    Open Machine Learning Course

    Project mention: mlcourse.ai: NEW Courses - star count:8755.0 | reddit.com/r/algoprojects | 2023-05-06
  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

    Project mention: A Polars exploration into Kedro | dev.to | 2023-05-17

    The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.

  • visidata

    A terminal spreadsheet multitool for discovering and arranging data

    Project mention: Mapping LA's Soft-Story Building Earthquake Retrofit Program [OC] | reddit.com/r/dataisbeautiful | 2023-04-22

    Visidata - https://visidata.org

  • py

    Repository to store sample python programs for python learning

    Project mention: I'm in my 30's and never had a "real job" - I have depression and anxiety, how do I get my life in order? | reddit.com/r/findapath | 2022-10-02
  • cudf

    cuDF - GPU DataFrame Library

    Project mention: A Polars exploration into Kedro | dev.to | 2023-05-17

    The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.

  • pygwalker

    PyGWalker: Turn your pandas dataframe into a Tableau-style User Interface for visual analysis

    Project mention: A blocky based CAD program | news.ycombinator.com | 2023-05-23
  • pixie

    Instant Kubernetes-Native Application Observability

    Project mention: Lens Dashboard for monitoring multiple AKS/EKS/... clusters | reddit.com/r/kubernetes | 2023-05-25

    Plenty of paid monitoring solutions out there. Instana is pretty slick. NewRelic has a new open source tool, https://github.com/pixie-io/pixie

  • lux

    Automatically visualize your pandas dataframe via a single print! 📊 💡 (by lux-org)

    Project mention: Name of library that creates multille charts quickly | reddit.com/r/learnpython | 2023-02-04
  • danfojs

    Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: Why don't more people use Altair for python Visualizations instead of Plotly? | reddit.com/r/datascience | 2023-05-23

    You should also check out Orange Data Mining, it allows to create a lot of charts, filter data from a chart to another, build ML models, predictions and a lot more. And you can do it with zero code.

  • machine_learning_complete

    A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-05-29.

Pandas related posts

Index

What are some of the best open-source Panda projects? This list will help you:

Project Stars
1 Pandas 38,499
2 PythonDataScienceHandbook 38,486
3 data-science-ipython-notebooks 25,130
4 tqdm 24,881
5 30-Days-Of-Python 24,419
6 Data-Science-For-Beginners 19,173
7 datasets 16,271
8 Dask 11,051
9 seaborn 10,746
10 ydata-profiling 10,639
11 yfinance 9,540
12 pandas_exercises 9,009
13 mlcourse.ai 8,817
14 modin 8,662
15 visidata 6,587
16 py 6,184
17 cudf 5,540
18 pygwalker 5,474
19 pixie 4,596
20 lux 4,578
21 danfojs 4,212
22 orange 4,113
23 machine_learning_complete 4,105
ONLYOFFICE Docs — document collaboration in your environment
Powerful document editing and collaboration in your app or environment. Ultimate security, API and 30+ ready connectors, SaaS or on-premises
www.onlyoffice.com