Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →
Top 23 Python Data Analysis Projects
-
Scikit-learn : A Python module for machine learning build on top of SciPy
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Pandas is a go-to tool for tabular data management, processing, and analysis in Python, but sometimes you may want to go from pandas to SQL.
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
Project mention: What are you guys using for making GUIs nowadays? | reddit.com/r/Python | 2023-01-26
- For a PoC / localhost / web usage : https://streamlit.io/
-
For Python, here's a nice compilation: https://github.com/ml-tooling/best-of-ml-python/blob/main/RE...
-
Project mention: pandas-profiling VS Rath - a user suggested alternative | libhunt.com/r/pandas-profiling | 2023-01-12
-
Project mention: mlcourse.ai: NEW Courses - star count:8584.0 | reddit.com/r/algoprojects | 2023-02-01
-
Project mention: [P] statsmodels.tsa.holtwinters.ExponentialSmoothing results in NaN forecasts and parameters when fitting on entire dataset using known parameters from training model. | reddit.com/r/MachineLearning | 2022-11-19
I reckon you're more likely to get a good response on their Github page than here. Unless a dev happens to see this post.
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
Project mention: Pyod – A Comprehensive and Scalable Python Library for Outlier Detection | news.ycombinator.com | 2022-08-10
-
knowledge-repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
While a start, a few that just being a markdown is editor is not enough, GitHub and GitLab already have this sort of wiki. I feel something like https://github.com/airbnb/knowledge-repo provides a better experience, since it gives an incentive for Data Scientists to make their source notebook well documented, and be a SSoT. With a Wiki like, if you change something on the original project, you need to remind yourself to update your reports. If your notebook is in itself your report, that's not necessary. Plus, it would benefit from the Semantic Diffs that DagsHub already have implemented.
-
Project mention: #VisualizationTip: Using Seaborn(Heatmap) to visualize Missing data( Yellow- Representation of a column's missing data.) | reddit.com/r/datascience | 2022-10-04
Good job, but I would recommend missingno it's a powerful module for missing values visualization.
-
Project mention: Is R or Python an EASIER option for non-CS/SE grads? | reddit.com/r/datascience | 2022-12-12
You could use plotnine if you like the grammar of graphics concept: https://plotnine.readthedocs.io/en/stable/
-
AWS Data Wrangler
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Project mention: I agree that Arrow Tables are great, but we decided to keep the library focused on the Pandas interface. [wont implement] | reddit.com/r/programmingcirclejerk | 2022-09-21 -
flyte
Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale. It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source.
Have you looked at flyte.org. It aims to bring "versioning", "compute" and "reproducibility" together in one package.
-
igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
-
Project mention: pandas datareader get_data_yahoo(), Fix question, I have heard they changed something to make it secure | reddit.com/r/api | 2023-01-21
-
-
-
nannyml
Detecting silent model failure. NannyML estimates performance for regression and classification models using tabular data. It alerts you when and why it changed. It is the only open-source library capable of fully capturing the impact of data drift on performance.
Project mention: [HIRING][Full Time, Part Time, Temporary, Internship, Freelance] Data Science Intern (Remote) | reddit.com/r/jobbit | 2022-05-20Description NannyML - creators of an Open Source Python library, are looking for multiple Data Science interns to help across research, prototyping, and product. Github: https://github.com/NannyML/nannyml About Us NannyML is an Open Source Python lib …
-
Project mention: PyCM 3.8 Released: Distance/Similarity Support | reddit.com/r/coolgithubprojects | 2023-02-01
-
Optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)
-
Project mention: Release 0.8.3 · capitalone/DataProfiler | reddit.com/r/LanguageTechnology | 2022-11-14
-
I don't know what's best for you, but I can recommend Siuba, a tidy interface for Python to send queries to pandas and SQL-db.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Data Analysis related posts
- mlcourse.ai: NEW Courses - star count:8584.0
- mlcourse.ai: NEW Courses - star count:8584.0
- What are the best Python libraries to learn for beginners?
- mlcourse.ai: NEW Courses - star count:8584.0
- mlcourse.ai: NEW Courses - star count:8584.0
- Leveraging the pipe method to write beautiful and concise data transformations in pandas
- Pandas Illustrated. The Definitive Visual Guide to Pandas.
-
A note from our sponsor - Sonar
www.sonarsource.com | 1 Feb 2023
Index
What are some of the best open-source Data Analysis projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | scikit-learn | 52,699 |
2 | Pandas | 36,692 |
3 | streamlit | 22,333 |
4 | best-of-ml-python | 12,524 |
5 | pandas-profiling | 10,067 |
6 | mlcourse.ai | 8,586 |
7 | statsmodels | 8,135 |
8 | pyod | 6,677 |
9 | akshare | 5,861 |
10 | knowledge-repo | 5,260 |
11 | missingno | 3,440 |
12 | plotnine | 3,336 |
13 | AWS Data Wrangler | 3,297 |
14 | flyte | 3,039 |
15 | igel | 3,023 |
16 | pandas-datareader | 2,555 |
17 | sweetviz | 2,305 |
18 | Cubes | 1,481 |
19 | nannyml | 1,362 |
20 | pycm | 1,347 |
21 | Optimus | 1,337 |
22 | DataProfiler | 1,084 |
23 | siuba | 985 |