skweak
jina
Our great sponsors
skweak | jina | |
---|---|---|
8 | 126 | |
909 | 20,041 | |
0.2% | 1.7% | |
6.2 | 9.1 | |
6 months ago | 9 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
skweak
-
Entity Extraction with Predefined List
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
-
[P] Programmatic: Powerful Weak Labeling
Code for https://arxiv.org/abs/2104.09683 found: https://github.com/NorskRegnesentral/skweak
-
Show HN: Programmatic – a REPL for creating labeled data
Hi Raza here, one of the other co-founders.
I know that HN likes to nerd out over technical details so thought I’d share a bit more on how we aggregate the noisy labels to clean them up.
At the moment we use the great Skweak [1] open source library to do this. Skweak uses an HMM to infer the most likely unobserved label given the evidence of the votes from each of the labelling functions.
This whole strategy of first training a label model and then training a neural net was pioneered by Snorkel. We’ve used this approach for now but we actually think there are big opportunities for improvement.
We’re working on an end-to-end approach that de-noises the labelling function and trains the model at the same time. So far we’ve seen improvements on the standard benchmarks [2] and are planning to submit to Neurips.
R
[1]: Skweak package: https://github.com/NorskRegnesentral/skweak
-
The hand-picked selection of the best Python libraries released in 2021
skweak.
- Skweak: Weak Supervision for NLP
-
Inevitable Manual Work involved in NLP
For more advanced unsupervised labeling, you should check skweak
-
How to get Training data for NER?
I'm the main developer behind skweak by the way, happy to hear you're interested in our toolkit :-) We do already have a small list of products (see https://github.com/NorskRegnesentral/skweak/blob/main/data/products.json) extracted from DBPedia and Wikidata, but it may not be exactly the type of products you're looking for.
jina
- Jina.ai: Self-host Multimodal models
- FLaNK Stack Weekly for 30 Oct 2023
-
Cross data type search that wasn’t supported well using Elasticsearch
Jina mainly because of their use of neural networks and AI.
- Recommend a Lightweight Launcher with Nested Folders
-
I plan to build my own AI powered search engine for my portfolio. Do you know ones that are open-source?
Jina - It’s an open-source project where you can build search engines. Well maybe not no code but it claims that you only need a few lines of code for creating projects. The project supports semantic, text, image, audio, and video search. What I’m also interested in is with their neural search and generative AI. I’m also interested in the amount of github repo that they have. I have this on my radar since this is also something I was interested in.
-
How can we match images in our database?
Do you guys have any ideas how we can match images on our database? We’re working on a project that about matching images on our database. We were trying to use SIFT and some other similar methods, but for some reason, nothing doesn’t seem to be working that well. Does anyone have any suggestions for the most effective way to do this? Maybe some open-source solutions like HuggingFace or Jina AI? We just want to make sure our image matching is correct and that part’s been a bit of a struggle on our part.
-
Can AI 3D model search engines be a thing this year?
The tech lets you find 3D models without sifting through tons of text - An information retrieval framework does the heavy lifting and compares models to each other, no descriptions or keywords needed.
-
Any MLOps platform you use?
Jina AI -They offer a neural search solution that can help build smarter, more efficient search engines. They also have a list of cool github repos that you can check out. Similar to Vertex AI, they have image classification tools, NLPs, fine tuners etc.
-
This week(s) in DocArray
Well, it's not exactly a new feature, but we've been working on early support for DocArray v2 in Jina.
-
Multi-model serving options
Jina let’s you serve all of your models through the same Gateway while deploying them as individual microservices. You can also tie your models together in a pipeline if needed. Also some nice ML focussed features such as dynamic batching.
What are some alternatives?
snorkel - A system for quickly generating training data with weak supervision
Weaviate - Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
argilla - Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
haystack - :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
DearPy3D - Dear PyGui 3D Engine (prototyping)
dalle-flow - 🌊 A Human-in-the-Loop workflow for creating HD images from text
snorkel - A system for quickly generating training data with weak supervision [Moved to: https://github.com/snorkel-team/snorkel]
whoogle-search - A self-hosted, ad-free, privacy-respecting metasearch engine
AugLy - A data augmentations library for audio, image, text, and video.
es-clip-image-search - Sample implementation of natural language image search with OpenAI's CLIP and Elasticsearch or Opensearch.
Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization
growthbook - Open Source Feature Flagging and A/B Testing Platform