code-align-evals-data

By openai

Code-align-evals-data Alternatives

Similar projects and alternatives to code-align-evals-data

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better code-align-evals-data alternative or higher similarity.

code-align-evals-data reviews and mentions

Posts with mentions or reviews of code-align-evals-data. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-03.
  • Replit's new Code LLM was trained in 1 week
    12 projects | news.ycombinator.com | 3 May 2023
    deduplication. We first split the files into words/tokens based on non-alphanumeric characters and remove files with fewer than 10 tokens. Next, we compute the MinHash with 256 permutations of all documents, and use Locality Sensitive Hashing to find clusters of duplicates. We further reduce these clusters by ensuring that each file in the original cluster is similar to at least one other file in the reduced cluster. We consider two files similar when their Jaccard similarity exceeds 0.85.

    Near-duplicates are still difficult to measure. So we should expect duplication, and it should be proportional to the number of samples we have (even if the same variance, but I'd wager higher variance with larger duplications).

    [0] https://github.com/openai/code-align-evals-data/tree/97446d9...

    [1] https://arxiv.org/abs/2211.15533

Stats

Basic code-align-evals-data repo stats
2
24
10.0
almost 3 years ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com