Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The collection includes two datasets - ‘Artist’ and ‘Artwork’ available in both CSV and JSON formats. The data can either be forked or downloaded directly from the GitHub page. However, the dataset has incomplete information and should only be used for research purposes. That is why it’s the perfect candidate as it resembles a real-world scenario where data is often missing.
The Covid-19 Dataset is a time-series data based on the daily cases reported in the United States. It is sourced from the files released by the New York Times. The collection contains both the historical and live data which gets updated often. The data is again subdivided into 57 states and 3000+ counties.
But why do you need these scripts if IMDB already makes all the data available for customers? Well, the IMDB Datasets are raw and subdivided into several text files. The Ruby scripts store all this information into a single CSV file making it easier to analyze. The approach also ensures that we have access to the latest data with fields such as:
Note that because of its large size, the dataset is versioned using the Git Large File Storage (LFS) extension. To make use of the data, the LFS extension is a prerequisite.