fma
SKAB
fma | SKAB | |
---|---|---|
1 | 9 | |
2,108 | 295 | |
- | - | |
0.0 | 4.8 | |
over 1 year ago | 8 months ago | |
Jupyter Notebook | Jupyter Notebook | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
fma
-
Analyzing music to determine subgenre?
This dataset seems worth looking into: https://github.com/mdeff/fma. I think you'll have a hard time identifying subgenres since even people don't know what subgenre a song belongs to. It's a very subjective classification compared to distinguishing between main genres; e.g. rock, rap, and country. Also, from my work with the Spotify API, there a lot of seemingly synonymous subgenres which will make this task even more tedious (what is the difference between "pop dance" and "dance pop"?).
SKAB
What are some alternatives?
mac-miller-lyrics-dataset - Dataset with lyrics from Mac Miller
Tegridy-MIDI-Dataset - Tegridy MIDI Dataset for precise and effective Music AI models creation.
toiletmap - API/UI server for the Great British Public Toilet Map
raccoon_dataset - The dataset is used to train my own raccoon detector and I blogged about it on Medium
essentia - C++ library for audio and music analysis, description and synthesis, including Python bindings
indonlu - The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
covid19za - Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
COVID-CT - COVID-CT-Dataset: A CT Scan Dataset about COVID-19
clusterdata - cluster data collected from production clusters in Alibaba for cluster management research
medmcqa - A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.
TheVault - [EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
openfema-samples - Code, dataset, and analysis samples that utilize the OpenFEMA API.