tidytuesday
awesome-public-datasets
Our great sponsors
tidytuesday | awesome-public-datasets | |
---|---|---|
79 | 77 | |
6,387 | 58,391 | |
1.8% | 0.8% | |
8.4 | 5.1 | |
5 days ago | 9 days ago | |
HTML | ||
Creative Commons Zero v1.0 Universal | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tidytuesday
-
Recommendation for interesting datasets to work with?
TidyTuesday is a weekly data cleaning project where a new, interesting data source is linked to each week: https://github.com/rfordatascience/tidytuesday
- Rfordatascience/tidytuesday: Official repo for the tidytuesday project
- [OC] Tornados in the U.S. are becoming more frequent in off-peak months
-
Too old to continue my education? I'm lost.
For R, I don't have specific resources, but I remember I started out with doing tidytuesdays challenge (https://github.com/rfordatascience/tidytuesday).
-
First Project
Tidy Tuesday has data and links to more data. The nice thing about those data sets is that you can search for what other people did with the data on social media (e.g. Twitter).
-
[OC] Popularity of Horror Movie Poster Color Schemes from 1970
Dataset: https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-11-01
-
Tips on getting experience in R on GitHub
What you're describing is contributing to open source. Some things I'd suggest doing: - learn some git first - create GitHub account and create at least a practice repo - look at learning community-related repos, like Tidy Tuesday - follow R "power" users, people associated with RStudio, and similar folks on social media. Those folks will sometimes mention projects aimed at beginners.
-
[OC] 2021-22 EPL Home/Away Goal Differential
Data: TidyTuesday April 4
-
Publicly available datasets?
The Tidy Tuesday git repo has a lot of example datasets to work with.
-
[OC] Kyle Feldt and his Chevalier Sheriffs: An Infographic of Feldt's NRL Tries
I mostly use ggplot2 in R for visualisations which means that The R Graph Gallery is my starting point for inspiration. The best thing to do is start with a simple idea that tells a story, and one of the best guys out there that does this is Cedric Scherer. He is involved a bit with the TidyTuesday project which I wish I had more time to play around with, and is a great starting point for developing a library of vis techniques.
awesome-public-datasets
-
How to practice data analytics skills
Merry Christmas buddy.
You'll find a ton of public datasets on GitHub [1].
Maven Analytics offers a monthly data analytics challenge [2] that you can enter for free. See their past competitions for some interesting datasets.
As I'm based in Ireland I'll also recommend the Irish Data Portal [3].
[1] https://github.com/awesomedata/awesome-public-datasets
- Are there people out there who still like Sam atlman - AI IS AT DANGER
-
The Data Engineering Docker-Compose Starter Kit
The “data.csv” file contains historical weather data from Tel Aviv, sourced from another article I wrote. If you wish, you can swap it with a public dataset, for example, from here.
-
suggestions for personal GitHub projects in economics/econometrics
If you want something personal and fun, you will need lots of different data. As such, you can take a look at these publicly available datasets. Maybe you can find out some interesting relationships.
-
Where do you get your data when you have an obscure idea for a dashboard?
Some others I use: https://data.world/search This github project with links: https://github.com/awesomedata/awesome-public-datasets Data.fivethirtyeight.com r/datasets and similar subreddits can also be of help.
-
Full Stack Data Science Project Ideas
There's a lot in here; it's a good start.
- Where to find big datasets?
- How do you keep track of useful datasets?
-
Complete: D214 - MSDA Capstone
Github: Awesome Public Datasets I didn't find much of use here for me, as much of this was either very specialized or very large datasets. But maybe you'll find something of use, here.
-
The internet has been “parametered”!
Of these thousands of open source data sets? https://github.com/awesomedata/awesome-public-datasets
What are some alternatives?
data - Data and code behind the articles and graphics at FiveThirtyEight
labelImg - LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.
gganimate - A Grammar of Animated Graphics
reddit-top-2.5-million - This is a dataset of the all-time top 1,000 posts, from the top 2,500 subreddits by subscribers, pulled from reddit between August 15–20, 2013.
cheatsheets - Posit Cheat Sheets - Can also be found at https://posit.co/resources/cheatsheets/.
tensorboard - TensorFlow's Visualization Toolkit
r4ds - R for data science: a book
big-mac-data - Data and methodology for the Big Mac index
zsv - zsv+lib: world's fastest (simd) CSV parser, bare metal or wasm, with an extensible CLI for SQL querying, format conversion and more
ggsunburst
quickdraw-dataset - Documentation on how to access and use the Quick, Draw! Dataset.