PrivacyEngCollabSpace
gretel-synthetics
PrivacyEngCollabSpace | gretel-synthetics | |
---|---|---|
1 | 4 | |
223 | 542 | |
3.6% | 4.4% | |
7.3 | 7.2 | |
9 days ago | 3 days ago | |
Python | Python | |
- | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
PrivacyEngCollabSpace
-
What format / templates do you (CISOs/ISOs) use for your risk assessments of the org?
I would look into some NIST-provided tools like this one: https://github.com/usnistgov/PrivacyEngCollabSpace/tree/master/tools/risk-assessment/NIST-Privacy-Risk-Assessment-Methodology-PRAM. Haven't used it myself but it looks like it might fit your use-case.
gretel-synthetics
-
Ask HN: If we train an LLM with “data” instead of “language” tokens
Hey there! Co-founder of Gretel.ai here, and I think I can provide some insights on this topic.
Firstly, the concept you're hinting at is not purely traditional ML. In traditional machine learning, we often prioritize feature extraction and engineering specific to a given problem space before training.
What you're describing and what we've been working on at Gretel.ai, is leveraging the power of models like Large Language Models (LLMs) to understand and extrapolate from vast amounts of diverse data without the need for time-consuming feature engineering. Here's a link to our open-source library https://github.com/gretelai/gretel-synthetics for synthetic data generation (currently supporting GAN and RNN-based language models), and also our recent announcement around a Tabular LLM we're training to help people build with data https://gretel.ai/tabular-llm
A few areas where we've found tabular or Large Data Models to be really useful are:
-
Libraries for synthetic data?
you can try QuantGAN: https://github.com/PakAndrey/QuantGANforRisk also try DoppelGANger https://github.com/gretelai/gretel-synthetics/tree/master/src/gretel_synthetics/timeseries_dgan
- Which open source tool for generating synthetic data sets?
- Gretel-synthetics: open-source library to create synthetic datasets
What are some alternatives?
presidio - Context aware, pluggable and customizable data protection and de-identification SDK for text and images
Copulas - A library to model multivariate data using copulas.
differential-privacy-library - Diffprivlib: The IBM Differential Privacy Library
gretel-python-client - The Gretel Python Client allows you to interact with the Gretel REST API.
attack-control-framework-mappings - 🚨ATTENTION🚨 The NIST 800-53 mappings have migrated to the Center’s Mappings Explorer project. See README below. This repository is kept here as an archive.
rex-gym - OpenAI Gym environments for an open-source quadruped robot (SpotMicro)
PyDP - The Python Differential Privacy Library. Built on top of: https://github.com/google/differential-privacy
adversarial-robustness-toolbox - Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
tern - Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.
CTGAN - Conditional GAN for generating synthetic tabular data.
AI-basketball-analysis - :basketball::robot::basketball: AI web app and API to analyze basketball shots and shooting pose.
RobustVideoMatting - Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!