Our great sponsors
-
TheVault
[EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I essentially want to fine-tune LLaMA on a dataset that's geared towards code generation. After a bit of research I found TheVault which seems good enough for the job (let me know if there are better datasets tho).
github https://github.com/artidoro/qlora
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.