stat4701
hate-speech-project
stat4701 | hate-speech-project | |
---|---|---|
1 | 1 | |
2 | 6 | |
- | - | |
10.0 | 10.0 | |
about 9 years ago | over 1 year ago | |
R | Python | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
stat4701
-
Replit's new Code LLM was trained in 1 week
My favorite line from the HumanEval paper
> It is important for these tasks to be hand-written, since our models are trained on a large fraction of GitHub, which already contains solutions to problems from a variety of sources.
So to answer your question, yes, the evaluation dataset is spoiled. You can find such unique and never before seen docstrings like
> For a given list of input numbers calculate the Mean Absolute Deviation around the mean of this dataset. Mean Absolute Deviation is the absolute difference between each element and a centerpoint (mean in this case)[0]
And here's a repo I found that is 8 years old[1]. But how about a more recent one that is even closer?[2] There's plenty more examples[3] (does anyone know how actually limit the date to prior to 2021? `pushed:<2021` doesn't work nor does using the `created` keyword. Date searching doesn't seem to work well).
[0] https://github.com/openai/code-align-evals-data/blob/97446d9...
[1] https://github.com/bertomartin/stat4701/blob/ec2b64f629cbbf6...
[2] https://github.com/danielwatson6/hate-speech-project/blob/64...
[3] https://github.com/search?q=abs%28x+-+mean%29+for+language%3...
hate-speech-project
-
Replit's new Code LLM was trained in 1 week
My favorite line from the HumanEval paper
> It is important for these tasks to be hand-written, since our models are trained on a large fraction of GitHub, which already contains solutions to problems from a variety of sources.
So to answer your question, yes, the evaluation dataset is spoiled. You can find such unique and never before seen docstrings like
> For a given list of input numbers calculate the Mean Absolute Deviation around the mean of this dataset. Mean Absolute Deviation is the absolute difference between each element and a centerpoint (mean in this case)[0]
And here's a repo I found that is 8 years old[1]. But how about a more recent one that is even closer?[2] There's plenty more examples[3] (does anyone know how actually limit the date to prior to 2021? `pushed:<2021` doesn't work nor does using the `created` keyword. Date searching doesn't seem to work well).
[0] https://github.com/openai/code-align-evals-data/blob/97446d9...
[1] https://github.com/bertomartin/stat4701/blob/ec2b64f629cbbf6...
[2] https://github.com/danielwatson6/hate-speech-project/blob/64...
[3] https://github.com/search?q=abs%28x+-+mean%29+for+language%3...
What are some alternatives?
IF
code-align-evals-data
ReplitLM - Inference code and configs for the ReplitLM model family
trax - Trax — Deep Learning with Clear Code and Speed
mation-spec
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.