xtreme1
markup
xtreme1 | markup | |
---|---|---|
2 | 3 | |
729 | 231 | |
1.9% | - | |
9.1 | 6.9 | |
11 days ago | 4 months ago | |
TypeScript | TypeScript | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
xtreme1
-
Open-source Image & LiDAR data annotation platform [Project]
wget https://github.com/basicai/xtreme1/releases/download/v0.5/xtreme1-v0.5.zip unzip -d xtreme1-v0.5 xtreme1-v0.5.zip
-
Data labeling & Data curation open-source project [Project][Discussion]
Project page: https://github.com/basicai/xtreme1
markup
-
Show HN: An annotation tool for ML and NLP
Hey HN! I'm super excited to share Markup with you, which is a totally free & open-source annotation tool that helps you transform unstructured text (e.g. news articles) into structured data that you can use for building, training, or fine-tuning ML models!
Check it out: https://github.com/samueldobbie/markup
Just to preface this summary, it's all a bit hacked together at the moment, and I'm in the process of rewriting the tool from scratch so this description is privy to change.
To generate the suggestions there's an active learner with an underlying random forest classifier, that has been fed ~60 seed sentences [1], to classify positive sentences (e.g. contains a prescription) and negative sentences (e.g. doesn't contain a prescription).
All positive sentences are fed into a sequence-to-sequence RNN model, that has been trained on ~50k synthetic rows of data [2] which maps unstructured sentences (e.g. patient is on pheneturide 250mg twice a day) to a structured output with the desired features (e.g. name: pheneturide; dose: 285; unit: g; frequency: 2). These synthetic sentences were generated with the in-built data generator [3].
The outputs of the RNN are validated to ensure they meet the expected structure and are valid for the sentence (e.g. the predicted drug name must exist somewhere within the sentence).
All non-junk predictions are shown to the user who can accept, edit, or reject each. Based on the users' response, the active learner is refined (currently nothing is fed back into the RNN).
[1] https://github.com/samueldobbie/markup/blob/master/data/text...
[2] https://raw.githubusercontent.com/samueldobbie/markup/master...
[3] https://www.getmarkup.com/tools/data-generator/
What are some alternatives?
layerx-community - LayerX-AI is a comprehensive platform to annotate and manage your machine learning data.
pawls - Software that makes labeling PDFs easy.
Low-Cost-Mocap - Low cost motion capture system for room scale tracking
force-multiplier - Use AI to edit your documents in real-time. Provide feedback and let the AI do all the work.
obsidian-ava - Quickly format your notes with ChatGPT in Obsidian
datalabel - datalabel is a UI-based data editing tool that makes it easy to create labeled text data in a dataframe. With datalabel, you can quickly and effortlessly edit your data without having to write any code. Its intuitive interface makes it ideal for both experienced data professionals and those new to data editing.
spotlight - Interactively explore unstructured datasets from your dataframe.
langhuan - Light weight labeling engine
refinery - The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.