BlingFire
sourcegraph
BlingFire | sourcegraph | |
---|---|---|
2 | 69 | |
1,781 | 9,726 | |
0.3% | 1.0% | |
3.6 | 10.0 | |
6 months ago | 4 days ago | |
C++ | Go | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
BlingFire
-
[D] SentencePiece, WordPiece, BPE... Which tokenizer is the best one?
SentencePiece -> implementation of some algorithms (there are several others, https://github.com/microsoft/BlingFire https://github.com/glample/fastBPE https://github.com/huggingface/tokenizers )
-
Ask HN: Who is hiring? (March 2021)
• Develop the best technology to bring deep learning solutions to unprecedented scale, for example we built the world's fastest tokenizer. [https://github.com/microsoft/BlingFire]
sourcegraph
-
Ask HN: Who is hiring? (March 2024)
Sourcegraph | REMOTE | Full-Time | Machine Learning Engineer, Developer Advocate, Enterprise Product Manager, Technical Advisor | https://sourcegraph.com
Sourcegraph is a code AI platform that makes it easy to read, write, and fix code–even in big, complex codebases.
We are building Cody, an AI coding assistant that uses code search and code intelligence to help devs quickly understand what's happening in code and generate new code that matches the best practices in your codebase. Cody supports AI-enabled autocompletion, fixing bugs, refactoring, test generation, code explanation, and answering high-level questions. You can read Steve Yegge's post on why Cody's code context engine differentiates it from the fast-moving field of AI dev tools: https://about.sourcegraph.com/blog/cheating-is-all-you-need.
Apply here: https://grnh.se/0572f98b4us
-
Architecture.md (2021)
That's pretty much what https://sourcegraph.com/ are selling, is it not?
-
Tell HN: GitHub is blocking search unless you are logged in
Despite their shitty rug-pull <https://github.com/sourcegraph/sourcegraph/pull/53345>, I do really like Sourcegraph and one doesn't (currently?!) need to be logged in to use it: https://sourcegraph.com/search and they have a handy rewrite pattern such that one can just plug the repo path into the URL for quick searching e.g. https://sourcegraph.com/github.com/JetBrains/intellij-commun...
-
My 2024 AI Predictions
- https://sourcegraph.com is pivoting and building a copilot application (named Cody). This is pretty good, since sourcegraph is great at understanding your code
-
The Curse of Docker
While a readable Dockerfile can work as documentation, there are a few caveats:
* the application needs to be designed to work outside containers (so, no hardcoded URLs, ports, or paths). Also, not directly related to containers, but it's nice if it can be easily compiled in most environments and not just on the base image.
* I still need a way to notify me of updates; if the Dockerfile just wgets a binary, this doesn't help me.
* The Dockerfiles need to be easy to find. Sourcegraph's don't seem to be referenced from the documentation, I had to look through their Github repos to find https://github.com/sourcegraph/sourcegraph/tree/main/docker-... (though most are bazel scripts instead of Dockerfiles, but serve the same purpose)
-
Building Reddit’s Design System on iOS
We use Sourcegraph, which is a tool that searches through code in repositories. We leverage this tool in order to understand the adoption curve of our components across all of Reddit. We have a dashboard for each of the platforms to compare the inclusion of RPL components over legacy components. These insights are helpful for us to make informed decisions on how we continue to drive RPL adoption. We love seeing the green line go up and the red line go down!
-
Launch HN: GitStart (YC S19) – Remote junior devs working on production PRs
SourceGraph: https://github.com/sourcegraph/sourcegraph/pulls?q=is%3Apr+a...
- Sourcegraph is no longer Open Source
What are some alternatives?
tokenizers - 💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
opengrok - OpenGrok is a fast and usable source code search and cross reference engine, written in Java
Mattermost - Mattermost is an open source platform for secure collaboration across the entire software development lifecycle..
tree-sitter - An incremental parsing system for programming tools
OpenKP - Automatically extracting keyphrases that are salient to the document meanings is an essential step to semantic document understanding. An effective keyphrase extraction (KPE) system can benefit a wide range of natural language processing and information retrieval tasks. Recent neural methods formulate the task as a document-to-keyphrase sequence-to-sequence task. These seq2seq learning models have shown promising results compared to previous KPE systems The recent progress in neural KPE is mostly observed in documents originating from the scientific domain. In real-world scenarios, most potential applications of KPE deal with diverse documents originating from sparse sources. These documents are unlikely to include the structure, prose and be as well written as scientific papers. They often include a much diverse document structure and reside in various domains whose contents target much wider audiences than scientists. To encourage the research community to develop a powerful neural m
Code-Server - VS Code in the browser
sgr - sgr (command line client for Splitgraph) and the splitgraph Python library
theia-apps - Theia applications examples - docker images, desktop apps, packagings
python-fake-data-producer-for-apache-kafka - The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and push it to an Apache Kafka topic.
Vue Storefront - Alokai is a Frontend as a Service solution that simplifies composable commerce. It connects all the technologies needed to build and deploy fast & scalable ecommerce frontends. It guides merchants to deliver exceptional customer experiences quickly and easily.
fargate-game-servers - This repository contains an example solution on how to scale a fleet of game servers on AWS Fargate on Elastic Container Service and route players to game sessions using a Serverless backend. Game Server data is stored in ElastiCache Redis. All resources are deployed with Infrastructure as Code using CloudFormation, Serverless Application Model, Docker and bash/powershell scripts. By leveraging AWS Fargate for your game servers you don't need to manage the underlying virtual machines.
Atheos - A self-hosted browser-based cloud IDE, updated from Codiad IDE