data
awesome-public-datasets
Our great sponsors
data | awesome-public-datasets | |
---|---|---|
116 | 77 | |
16,617 | 58,272 | |
0.3% | 0.6% | |
8.5 | 5.6 | |
about 1 month ago | 4 months ago | |
Jupyter Notebook | ||
Creative Commons Attribution 4.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
data
-
[USMNT] It only took 20 caps for Jesus Ferreira to get double-digit goals. The fastest in #USMNT history.
You of course already know this answer, but just to put it into more perspective. Here are the SPI ranking equivalents to what he did with these 11 goals in Scotland and Switzerland.
-
[Effortpost] Advanced stats on which players are contributing the most to the Heat's playoff run.
To answer these questions I decided to look at 538’s RAPTOR ratings. RAPTOR uses player tracking data to estimate how much each player contributes on the offensive and defensive ends. The total RAPTOR score should be something like the “number of points a player contributes to his team’s offense and defense per 100 possessions, relative to a league-average player.” Higher is better, best during the regular season has been Nikola Jokic at +14. You can read more about it here or play with an interactive tool on their website here. I don’t really care about the details of why it’s a good statistic, but it seems pretty helpful and most importantly for my purposes you can download the data here for free.
-
Consanguineous marriage percentage per country
EDIT: I came to this data from this repository which has a nice csv collection for machine training.
-
USMNT is a European club. How did they do this season?
Looks like we may actually be collectively underrating our guys now. That's an interesting change. Based on SPI (rating = 72.4) we would be:
- Derrick White's WAR over the past season has been ~6.7 according to a composite of various metrics. Derrick White's WAR in the playoffs has been ~0.1 according to RAPTOR. The worst among the main Boston roster
-
Nate Silver: Some personal news
Before Disney/ABC get any -ideas-, might be a good chance to get our hands on at least their data[0]!
-
In honor of Sexual Assault Awareness Month, make sure neither you nor friends harbor any misconceptions about consent
Most young women expect words to be involved when their partner seeks their consent. 43% of young men actually ask for verbal confirmation of consent. Overall, verbal indicators of consent or nonconsent are more common than nonverbal indicators. More open communication also increases the likelihood of orgasm for women.
- CMV: When selecting a movie to watch, the audience's rating is the only thing that matters and the critic's rating is entirely irrelevant.
-
Slight majority of people in WA want to leave state, poll finds
DHM does not use an equity sample. Of all polling operations they rank 250 out of 517. Id like to see another pollster https://github.com/fivethirtyeight/data/blob/master/pollster-ratings/pollster-ratings.csv
-
Optimism is bad for your health. So lets just do some maths! How can Liverpool FC get top 4? part 2
LOL My github’s pretty sparse but I’m pulling data from this API; 538 also provides the data they use for their club predictions here if that interests you
awesome-public-datasets
-
How to practice data analytics skills
Merry Christmas buddy.
You'll find a ton of public datasets on GitHub [1].
Maven Analytics offers a monthly data analytics challenge [2] that you can enter for free. See their past competitions for some interesting datasets.
As I'm based in Ireland I'll also recommend the Irish Data Portal [3].
- Are there people out there who still like Sam atlman - AI IS AT DANGER
-
The Data Engineering Docker-Compose Starter Kit
The “data.csv” file contains historical weather data from Tel Aviv, sourced from another article I wrote. If you wish, you can swap it with a public dataset, for example, from here.
-
suggestions for personal GitHub projects in economics/econometrics
If you want something personal and fun, you will need lots of different data. As such, you can take a look at these publicly available datasets. Maybe you can find out some interesting relationships.
-
Where do you get your data when you have an obscure idea for a dashboard?
Some others I use: https://data.world/search This github project with links: https://github.com/awesomedata/awesome-public-datasets Data.fivethirtyeight.com r/datasets and similar subreddits can also be of help.
-
Full Stack Data Science Project Ideas
There's a lot in here; it's a good start.
- Where to find big datasets?
- How do you keep track of useful datasets?
-
Complete: D214 - MSDA Capstone
Github: Awesome Public Datasets I didn't find much of use here for me, as much of this was either very specialized or very large datasets. But maybe you'll find something of use, here.
-
The internet has been “parametered”!
Of these thousands of open source data sets? https://github.com/awesomedata/awesome-public-datasets
What are some alternatives?
uawardata - The data behind uawardata.com
labelImg - LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.
tidytuesday - Official repo for the #tidytuesday project
ydata-quality - Data Quality assessment with one line of code
reddit-top-2.5-million - This is a dataset of the all-time top 1,000 posts, from the top 2,500 subreddits by subscribers, pulled from reddit between August 15–20, 2013.
quilt - Quilt is a data mesh for connecting people with actionable data
tensorboard - TensorFlow's Visualization Toolkit
CodeSearchNet - Datasets, tools, and benchmarks for representation learning of code.
zsv - zsv+lib: world's fastest (simd) CSV parser, bare metal or wasm, with an extensible CLI for SQL querying, format conversion and more
Video-Swin-Transformer - This is an official implementation for "Video Swin Transformers".
quickdraw-dataset - Documentation on how to access and use the Quick, Draw! Dataset.