Our great sponsors
|15 days ago||5 days ago|
|BSD 3-clause "New" or "Revised" License||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
20+ Free Tools & Resources for Machine Learning
5 projects | dev.to | 31 Mar 2022
Compose Compose targets labeling raw data, allowing you to set labeling functions for your data in Python in order to make the labeling process easier.
[P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions.
12 projects | reddit.com/r/MachineLearning | 3 Mar 2023
You definitely forgot https://www.kern.ai/ :)
GPT and BERT: A Comparison of Transformer Architectures
2 projects | dev.to | 9 Feb 2023
Get it for free here: https://github.com/code-kern-ai/refinery
Drastically decrease the size of your Docker application
2 projects | dev.to | 3 Jan 2023
Containers are amazing for building applications. Because they allow you to pack up a programm together with all it's dependencies and execute it wherever you like. That is why our application consists of 20+ individual containers, forming our data-centric IDE for NLP, which you can check out here: https://github.com/code-kern-ai/refinery.
Introducing bricks, an open-source content-library for NLP
2 projects | dev.to | 8 Dec 2022
Today we launched bricks, an open-source library which provides enrichments for your natural language processing projects. Our main goal with bricks is to shorten the amount of time that you need from idea to implementation. Bricks also seamlessly integrates into our main tool, the Kern AI refinery.
How to fine-tune your embeddings for better similarity search
2 projects | dev.to | 15 Sep 2022
This blog post will share our experience with fine-tuning sentence embeddings on a commonly available dataset using similarity learning. We additionally explore how this could benefit the labeling workflow in the Kern AI refinery. To understand this post, you should know what embeddings are and how they are generated. A rough idea of what fine-tuning is also helps. All the code and data referenced in this post is available on GitHub.
Vector Databases for Data-Centric AI (Part 2)
3 projects | dev.to | 26 Aug 2022
Shout out to both Kern.AI (an excellent open-source NLP labelling tool) https://github.com/code-kern-ai/refinery and Voxel51 (an excellent open-source Computer Vision analysis tool) https://github.com/voxel51/fiftyone for being early adopters of the technology in their platforms, but I don't believe either have yet made use of all of the value it can provide.
Hacker News top posts: Jul 18, 2022
3 projects | reddit.com/r/hackerdigest | 18 Jul 2022
Show HN: If VS Code had a data-centric IDE sibling, what would that look like?\ (23 comments)
Show HN: If VS Code had a data-centric IDE sibling, what would that look like?
you can take a look at our architecture overview here: https://github.com/code-kern-ai/refinery#-architecture
A bit below it, you find a table with the links to all repositories. All of them are open-source. But thanks for the feedback, I'll try to make it a bit easier to understand! I appreciate that! :)
Hi Tom! Thanks, happy to hear that :)
We've focused on JSON as the user-specified data model. So you can upload anything fitting into a JSON. We're using pandas to process the uploaded data, so spreadsheets or CSV-ish also work.
We've got a public roadmap (https://github.com/code-kern-ai/refinery/projects/1), and we're looking forward to also integrate e.g. native PDF labeling sometime soon.
What are some alternatives?
sqlx - 🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.
dbs-tools - Perl tools to transform account / transaction data from DBS Bank into proper CSV
lensm - Go assembly and source viewer
azuredatastudio - Azure Data Studio is a data management tool that enables working with SQL Server, Azure SQL DB and SQL DW from Windows, macOS and Linux.
mutate - A library to synthesize text datasets using Large Language Models (LLM)
refinery-sample-projects - Containing examples of projects you can use to test refinery. Please select the use case from the branches.
PostgreSQL - PostgreSQL client for node.js.
QDrant-NLP - QDrant-NLP
fiftyone - The open-source tool for building high-quality datasets and computer vision models
serde_postgres - Easily Deserialize Postgres rows.
nhost - The Open Source Firebase Alternative with GraphQL.
FreeDiscovery - Web Service for E-Discovery Analytics