awesome-python-for-data-science
nist-crc-2023
awesome-python-for-data-science | nist-crc-2023 | |
---|---|---|
7 | 7 | |
68 | 27 | |
- | - | |
6.6 | 4.3 | |
2 days ago | 10 months ago | |
Jupyter Notebook | Jupyter Notebook | |
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
awesome-python-for-data-science
-
[D] Best tools to learn data science nowadays?
We're updating our awesome-python-for-data-science repository.
-
Embarking on a Journey of 99 Data Science Projects - From Beginner to Expert
Sounds like an amazing journey! Feel free to add your projects on our awesome-python-for-data-science repo as you go! And in case you need a hand or feedback on the projects, we'll be happy to help at the Data-Centric AI Community.
-
[D] What is the best way to learn machine learning?
We've started a nice repo on the DS roadmap: https://github.com/Data-Centric-AI-Community/awesome-python-for-data-science/tree/main
-
Where can I find data science projects to gain more experience.
Hey! You can find several resources online, check out this repo. Also, if you're up for it, we are running aproject on synthetic data (instructions are given weekly) on the Data-Centric AI Community. You'll find the #ds-projects channel and the #nist-challenge project where we're currently working on.
-
Hands-on Data-Centric AI: Data Preparation tuning - Why and how?
We made a tutorial following a fully Data-Centric AI pipeline for fraud detection! The material is freely available, let us know what you think! :)
- Hands-On Data-Centric Preparation Tuning – Why and How?
-
I'm new to data science. Where to start?
You're very much welcome into the Data-Centric AI Community, take a look at our awesome-python-for-data-science repo: https://github.com/Data-Centric-AI-Community/awesome-python-for-data-science
nist-crc-2023
-
Assessing the Quality of Synthetic Data with Data-centric AI
Data Quality is key for all applications and models, and LLMs are no exception :) I've been working on a small community project with synthetic data using ydata-synthetic, and it really shows! Underrepresentation (category imbalance) and missing data are two of the main issues!
-
Is anyone willing to work with us on a Synthetic Data Project?
Hey everyone! At the Data-Centric AI Community, we have started a project around synthetic data.
-
Where can I find data science projects to gain more experience.
Hey! You can find several resources online, check out this repo. Also, if you're up for it, we are running aproject on synthetic data (instructions are given weekly) on the Data-Centric AI Community. You'll find the #ds-projects channel and the #nist-challenge project where we're currently working on.
-
Have a lot of free time in my DS work, feel guilty about it, is it normal?
To atone for your sins, I think you should consider mentoring others in your "free" time 😂 We sure could use someone with your skills in teaching a group of young data scientists in the making. We actually have started a project on Synthetic Data and we could definitely use your expertise!
-
How can a correlation coefficient be "invalid"?
I've been working on this NIST challenge. We're at week 3 and we're supposed to profile the data, but using ydata-profiling, I'm getting invalid coefficients, some values return 'NaN', do you have some ideas why this is happening? Thank you guys
-
Datacamp Bootcamp!
Projects for sure. Can I invite you to the Data-Centric AI Community? We're starting a project on synthetic data this week with more to come next month! :) Here's the repo for that one: https://github.com/Data-Centric-AI-Community/nist-crc-2023
-
Synthetic Data Community Project
Heres's our repository: 🚀 (https://github.com/Data-Centric-AI-Community/nist-crc-2023)
What are some alternatives?
ydata-synthetic - Synthetic data generators for tabular and time-series data
tdk-demo - This is a collection of TDK demo projects that use different databases and options
rgb-to-hex - Python script to convert an RGB text sequence into HEX Code
genalog - Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
ultimate-python - Ultimate Python study guide for newcomers and professionals alike. :snake: :snake: :snake:
SDV - Synthetic data generation for tabular data
ml-earth-observation-101 - An introduction to applying machine learning to satellite imagery (remote sensing).
gan-vae-pretrained-pytorch - Pretrained GANs + VAEs + classifiers for MNIST/CIFAR in pytorch.
mud-pi - A simple MUD server in Python, for teaching purposes, which could be run on a Raspberry Pi
python-tutorial - A Python 3 programming tutorial for beginners.
Data-Science-Resources - Data Science related resources and cheatsheets
learn oops in python - 📚 Playground and cheatsheet for learning Python. Collection of Python scripts that are split by topics and contain code examples with explanations.