transformer-smaller-training-vocab
presidio
transformer-smaller-training-vocab | presidio | |
---|---|---|
1 | 5 | |
20 | 3,156 | |
- | 5.8% | |
6.7 | 8.9 | |
about 2 months ago | 1 day ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
transformer-smaller-training-vocab
presidio
-
You can't build a moat with AI
Perhaps de-identification before training could be helpful here.
Microsoft does seem active in this, e.g. https://microsoft.github.io/presidio/
- Presidio β Data Protection and De-Identification SDK
-
Show HN: Cape API β Keep your sensitive data private while using GPT-4
Something like https://github.com/microsoft/presidio for stripping out PII might fill the role I expected https://github.com/capeprivacy/private-ai to do.
-
Handling PII data in Azure
Depending on your use case, you may want to check out Presidio as well. Itβs a Microsoft product for PII scrubbing. Perfect for ADF and Synapse pipelines.
-
Data Anonymization
There's an API from Microsoft, named Presidio which is used for Anonymization. This is the Github link.
What are some alternatives?
refinery - The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
DataProfiler - What's in your data? Extract schema, statistics and entities from datasets
peft - π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
exodus - Platform to audit trackers used by Android application
ml-engineering - Machine Learning Engineering Open Book
databunker - A secure user directory built for developers to comply with the GDPR [Moved to: https://github.com/securitybunker/databunker]
whisper-timestamped - Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Databunker - Secure SDK/vault for personal records/PII built to comply with GDPR
techbench-json-dump - Dump Tech Bench metadata to a JSON file.
PrivacyEngCollabSpace - Privacy Engineering Collaboration Space
private-ai - Repo for Udacity's Secure & Private AI course
randomizer - Gluetun VPN Randomizer