Top 23 Python Data Analysis Projects
scikit-learn: machine learning in PythonProject mention: scikit-learn test case results? | reddit.com/r/scikit_learn | 2022-01-05
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much moreProject mention: Best Data Structure for this? | reddit.com/r/learnpython | 2022-01-17
If you really want to store it all (labels included) in one data structure, you should look up pandas.
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
Streamlit — The fastest way to build data apps in PythonProject mention: How to Build a Machine Learning Demo in 2022 | dev.to | 2022-01-16
So what if you want something almost as flexible as what is possible with the full-stack approach, but without the development requirements? Well, you are in luck because the past few years have seen the emergence of Python libraries that allow the creation of impressively interactive demos with only a few lines of code. In this article, we are going to focus on two of the most promising libraries: Gradio and Streamlit. There are notable differences between the two that will be explored below, but the high level idea is the same: eliminate most of the painful back and front end work outlined in the full-stack section, albeit at the cost of some flexibility.
Statsmodels: statistical modeling and econometrics in PythonProject mention: Advice required to choose appropriate software for an assignment | reddit.com/r/econometrics | 2021-04-26
Can't you get a student discount for Stata? R would definitely be able to handle everything. For Python, have a look through the statsmodel package https://github.com/statsmodels/statsmodels
(JMLR' 19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)Project mention: [D] Unsupervised Outlier Detection - Advise Requested | reddit.com/r/MachineLearning | 2021-12-03
The source code and documentaion of PyOD is the best survey about OOD. Besides, the normalized flow and VQVAE are also feasible.
A next-generation curated knowledge sharing platform for data scientists and other technical professions.Project mention: How does everyone share their models etc. across teams for re-use effectively? | reddit.com/r/datascience | 2021-05-22
Create UIs for your machine learning model in Python in 3 minutesProject mention: I automated my job over a year ago and haven't told anyone. | reddit.com/r/antiwork | 2022-01-12
Interesting, never heard about TK or QT. I've been using streamlit and Gradio as GUIs for my Python scripts which have been awesome but it seems like comparing to something like QT that it is much more robust and customizable than what I'm using.
OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
Missing data visualization module for Python.Project mention: For all the python/pandas users out there I just released a bunch of UI updates to the free visualizer, D-Tale | reddit.com/r/algotrading | 2021-04-12
analysis of "Missing" data using the missingno package is now available in a sliding side panel enlarge or download PNG files for matrix/bar/heatmap/dendrogram charts generated using missingno
a delightful machine learning tool that allows you to train, test, and use models without writing codeProject mention: Train/fit, test, and use models without writing code | reddit.com/r/ArtificialInteligence | 2021-06-29
Link to the repo: https://github.com/nidhaloff/igel
Visualizer for pandas data structuresProject mention: Show HN: D-Tale, easy to use pandas GUI | news.ycombinator.com | 2021-11-01
A grammar of graphics for PythonProject mention: Should I learn matplotlib in 2022? | reddit.com/r/learnpython | 2022-01-09
If you are familiar with R or ggplot, I recommend using plotnine. It implements ggplot2 (the well-known graphics package for R) in Python. In fact, plotnine is just a wrapper of matplotlib. However, it is a little more convenient than pure matplotlib.
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).Project mention: Automate some wrangling and data visualization in Python | reddit.com/r/aws | 2022-01-03
Extract data from a wide range of Internet sources into a pandas DataFrame.Project mention: Best quantitative tools/repos/apis for Sentiment & Social Media analysis of individual Stock/Crypto tickers | reddit.com/r/algotrading | 2021-07-03
Also Yahoo continually takes steps to discourage programmatic access (the most recent attempt is happening right now: https://github.com/pydata/pandas-datareader/issues/868).
Visualize and compare datasets, target values and associations, with one line of code.Project mention: Automated Data Profiling and Attribute Clustering using unsupervised ML techniques | reddit.com/r/datascience | 2021-07-03
Take a look at this package which computes associations between variables and other viz and can infer some types https://github.com/fbdesignpro/sweetviz
Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale. It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source.Project mention: Hacktoberfest: Flytesnacks Project "update tuple output examples" | dev.to | 2021-11-01
I chose the flytekit project, which is one of the component repos of flyte and is the python SDK and tools of the Flyte project
Find your trading edge, using the fastest engine for backtesting, algorithmic trading, and research.Project mention: Repost with explanation - OOS Testing cluster | reddit.com/r/algotrading | 2022-01-01
I second the idea of looking through software optimization, but there is no need to jump right to C. I would look at something like vectorbt. You get the speed of C running under the hood while staying in Python for your back testing code
Light-weight Python OLAP framework for multi-dimensional data analysisProject mention: Building data analysis apps | reddit.com/r/Python | 2021-04-16
I'm looking for materials and tools to learn. I'm reading up on OLAP and cubes. I found cubes python package but it hasn't been updated in years. Could you give me some tips on what to learn in 2021?
Multi-class confusion matrix library in PythonProject mention: [P] PyCM 3.3 released: Comparison of Classifiers Based on Confusion Matrix | reddit.com/r/MachineLearning | 2021-10-27
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)
NFStream: a Flexible Network Data Analysis Framework.Project mention: Open Source Deep Packet Inspection Using Python | news.ycombinator.com | 2021-07-02
GitHub project: https://github.com/nfstream/nfstream
Community feedbacks and contributions are welcome!
Python library for using dplyr like syntax with pandas and SQLProject mention: Going from R to Pandas: dplython vs dfply vs plydata | reddit.com/r/datascience | 2021-09-30
You should follow /u/the75th's advice. However, if you decide to buck that take, I'd look into siuba. I've never heard of those packages you've listed, and have doubts they'd be maintained.
What's in your data? Extract schema, statistics and entities from datasetsProject mention: Miller – tool for querying, shaping, reformatting data in CSV, TSV, and JSON | news.ycombinator.com | 2021-12-22
My team built a similar tool in Python to load any delimited file, json, parquet and Avro with one command:
Effectively loads anything into a dataframe
Python Data Analysis related posts
Best Data Structure for this?
1 project | reddit.com/r/learnpython | 17 Jan 2022
SEC Speed is a myth.
1 project | reddit.com/r/CFB | 15 Jan 2022
Open source projects that are good to read to learn best practices?
2 projects | reddit.com/r/cscareerquestions | 14 Jan 2022
5 Useful Pandas Methods You May Not Know Existed (Part 2)
1 project | reddit.com/r/Python | 9 Jan 2022
Career change - data analysis
1 project | reddit.com/r/AusFinance | 9 Jan 2022
Trading Algos - 5 Key Metrics and How to Implement Them in Python
4 projects | dev.to | 8 Jan 2022
scikit-learn test case results?
1 project | reddit.com/r/scikit_learn | 5 Jan 2022
What are some of the best open-source Data Analysis projects in Python? This list will help you:
|13||AWS Data Wrangler||2,445|
Are you hiring? Post a new remote job listing for free.