OpenKP
BlingFire
Our great sponsors
OpenKP | BlingFire | |
---|---|---|
1 | 2 | |
149 | 1,781 | |
1.3% | 0.6% | |
1.9 | 3.6 | |
11 months ago | 6 months ago | |
Python | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenKP
-
Ask HN: Who is hiring? (March 2021)
• Establish new benchmarks for natural language understanding tasks such as key phrase extraction. [https://github.com/microsoft/OpenKP]
We are looking for a passionate Applied Scientist with demonstrable skills in information retrieval, deep learning, natural language processing and/or large-scale distributed computing.
https://careers.microsoft.com/us/en/job/990004/Sr-Data-Appli...
BlingFire
-
[D] SentencePiece, WordPiece, BPE... Which tokenizer is the best one?
SentencePiece -> implementation of some algorithms (there are several others, https://github.com/microsoft/BlingFire https://github.com/glample/fastBPE https://github.com/huggingface/tokenizers )
-
Ask HN: Who is hiring? (March 2021)
• Develop the best technology to bring deep learning solutions to unprecedented scale, for example we built the world's fastest tokenizer. [https://github.com/microsoft/BlingFire]
What are some alternatives?
python-fake-data-producer-for-apache-kafka - The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and push it to an Apache Kafka topic.
tokenizers - 💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
autocomplete - IDE-style autocomplete for your existing terminal & shell
Mattermost - Mattermost is an open source platform for secure collaboration across the entire software development lifecycle..
sgr - sgr (command line client for Splitgraph) and the splitgraph Python library
Baserow - Open source no-code database and Airtable alternative. Create your own online database without technical experience. Performant with high volumes of data, can be self hosted and supports plugins
Stream-Framework - Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
fargate-game-servers - This repository contains an example solution on how to scale a fleet of game servers on AWS Fargate on Elastic Container Service and route players to game sessions using a Serverless backend. Game Server data is stored in ElastiCache Redis. All resources are deployed with Infrastructure as Code using CloudFormation, Serverless Application Model, Docker and bash/powershell scripts. By leveraging AWS Fargate for your game servers you don't need to manage the underlying virtual machines.
Nightmare - A high-level browser automation library.
parabol - Free online agile retrospective meeting tool