toxicity
sourcegraph
toxicity | sourcegraph | |
---|---|---|
11 | 69 | |
166 | 9,726 | |
0.0% | 1.0% | |
0.0 | 10.0 | |
almost 2 years ago | 6 days ago | |
Go | ||
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
toxicity
-
Perhaps It Is a Bad Thing That the Leading AI Companies Cannot Control Their AIs
I'm a PM at a human data company (https://www.surgehq.ai) that helps the large language model companies ensure their models are safe (we're the “clever prompt engineers” who helped Redwood assess their model performance).
We actually just published a blog today that includes our perspective on building “AI red teams” and best practices for AI alignment/safety: https://www.surgehq.ai/blog/ai-red-teams-for-adversarial-tra...
-
30% of Google's Emotions Dataset Is Mislabeled
I'd love to chat. Want to reach out to the email in my profile? I'm the founder of a much higher-quality data startup (https://www.surgehq.ai), and previously built the human computation platforms at a couple FAANGs.
We work with a lot of the top AI/NLP companies and research labs, and do both the "typical" data labeling work (sentiment analysis, text categorization, etc), but also a lot more advanced stuff (e.g., training coding assistants, evaluating the new wave of large language models, adversarial labeling, etc -- so not just distinguishing cats and dogs, but rather making full use of the power of the human mind!).
-
Building a No-Code Toxicity Classifier – By Talking to GitHub Copilot
> Rather than operating under a strict definition of toxicity, we asked our team to identify comments that they personally found toxic.
[0]: https://github.com/surge-ai/toxicity
-
Ask HN: Who is hiring? (January 2022)
Love language? So do we, and our mission is to infuse AI with that same love. At Surge, we're building the human infrastructure to power NLP — from detecting hate speech, to parsing complex documents, to injecting human values into the next wave of language models. Our first product is a platform that helps ML teams create amazing, human-powered datasets to train AI in the richness of language. We're a team of former Google, Facebook, and Airbnb engineering leads, and we work with top companies at the forefront of machine learning. Our tech stack is Ruby on Rails, React, and Python. We’re rapidly growing, and we're looking for full-stack engineers to join the team and develop our product. To apply, please email [email protected] with a resume and 2-3 sentences describing your interest in Surge. We love personal projects and writings too!
More information: https://www.surgehq.ai/about#careers
A blog post explaining the problems we are working to solve: https://www.surgehq.ai/blog/the-ai-bottleneck-high-quality-h...
- The Toxicity Dataset – building the largest free dataset of online toxicity
- [Free] The Toxicity Dataset — building the world's largest free dataset of online toxicity [Github]
- The Toxicity Dataset — building the world's largest free dataset of online toxicity
- The Toxicity Dataset (1000 social media comments) — any ideas for interesting visualizations? [github]
- The Toxicity Dataset - free dataset of online toxicity (Github) - could be used for interesting portfolio projects
- The Toxicity Dataset — free dataset of online toxicity (Github)
sourcegraph
-
Ask HN: Who is hiring? (March 2024)
Sourcegraph | REMOTE | Full-Time | Machine Learning Engineer, Developer Advocate, Enterprise Product Manager, Technical Advisor | https://sourcegraph.com
Sourcegraph is a code AI platform that makes it easy to read, write, and fix code–even in big, complex codebases.
We are building Cody, an AI coding assistant that uses code search and code intelligence to help devs quickly understand what's happening in code and generate new code that matches the best practices in your codebase. Cody supports AI-enabled autocompletion, fixing bugs, refactoring, test generation, code explanation, and answering high-level questions. You can read Steve Yegge's post on why Cody's code context engine differentiates it from the fast-moving field of AI dev tools: https://about.sourcegraph.com/blog/cheating-is-all-you-need.
Apply here: https://grnh.se/0572f98b4us
-
Architecture.md (2021)
That's pretty much what https://sourcegraph.com/ are selling, is it not?
-
Tell HN: GitHub is blocking search unless you are logged in
Despite their shitty rug-pull <https://github.com/sourcegraph/sourcegraph/pull/53345>, I do really like Sourcegraph and one doesn't (currently?!) need to be logged in to use it: https://sourcegraph.com/search and they have a handy rewrite pattern such that one can just plug the repo path into the URL for quick searching e.g. https://sourcegraph.com/github.com/JetBrains/intellij-commun...
-
My 2024 AI Predictions
- https://sourcegraph.com is pivoting and building a copilot application (named Cody). This is pretty good, since sourcegraph is great at understanding your code
-
The Curse of Docker
While a readable Dockerfile can work as documentation, there are a few caveats:
* the application needs to be designed to work outside containers (so, no hardcoded URLs, ports, or paths). Also, not directly related to containers, but it's nice if it can be easily compiled in most environments and not just on the base image.
* I still need a way to notify me of updates; if the Dockerfile just wgets a binary, this doesn't help me.
* The Dockerfiles need to be easy to find. Sourcegraph's don't seem to be referenced from the documentation, I had to look through their Github repos to find https://github.com/sourcegraph/sourcegraph/tree/main/docker-... (though most are bazel scripts instead of Dockerfiles, but serve the same purpose)
-
Building Reddit’s Design System on iOS
We use Sourcegraph, which is a tool that searches through code in repositories. We leverage this tool in order to understand the adoption curve of our components across all of Reddit. We have a dashboard for each of the platforms to compare the inclusion of RPL components over legacy components. These insights are helpful for us to make informed decisions on how we continue to drive RPL adoption. We love seeing the green line go up and the red line go down!
-
Launch HN: GitStart (YC S19) – Remote junior devs working on production PRs
SourceGraph: https://github.com/sourcegraph/sourcegraph/pulls?q=is%3Apr+a...
- Sourcegraph is no longer Open Source
What are some alternatives?
hate-speech-and-offensive-language - Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
opengrok - OpenGrok is a fast and usable source code search and cross reference engine, written in Java
seldon-core - An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
tree-sitter - An incremental parsing system for programming tools
zotero - Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.
Code-Server - VS Code in the browser
Fleet - Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
theia-apps - Theia applications examples - docker images, desktop apps, packagings
zenml - ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.
Vue Storefront - Alokai is a Frontend as a Service solution that simplifies composable commerce. It connects all the technologies needed to build and deploy fast & scalable ecommerce frontends. It guides merchants to deliver exceptional customer experiences quickly and easily.
datapane - Build and share data reports in 100% Python
Atheos - A self-hosted browser-based cloud IDE, updated from Codiad IDE