5 Best Public Datasets to Practice Your Data Analysis Skills

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

collection

4 1,361 6.5

The Museum of Modern Art (MoMA) collection data

The collection includes two datasets - ‘Artist’ and ‘Artwork’ available in both CSV and JSON formats. The data can either be forked or downloaded directly from the GitHub page. However, the dataset has incomplete information and should only be used for research purposes. That is why it’s the perfect candidate as it resembles a real-world scenario where data is often missing.

covid-19-data

347 6,987 0.0

A repository of data on coronavirus cases and deaths in the U.S.

The Covid-19 Dataset is a time-series data based on the daily cases reported in the United States. It is sourced from the files released by the New York Times. The collection contains both the historical and live data which gets updated often. The data is again subdivided into 57 states and 3000+ counties.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
data-movies

1 204 10.0 Ruby

Download data from IMDB movies and parse into useful form

But why do you need these scripts if IMDB already makes all the data available for customers? Well, the IMDB Datasets are raw and subdivided into several text files. The Ruby scripts store all this information into a single CSV file making it easier to analyze. The approach also ensures that we have access to the latest data with fields such as:

git-lfs

159 12,471 9.0 Go

Git extension for versioning large files

Note that because of its large size, the dataset is versioned using the Git Large File Storage (LFS) extension. To make use of the data, the LFS extension is a prerequisite.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project