Show HN: A Tool for Data Obfuscation

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • ClickHouse

    ClickHouse® is a free analytics DBMS for big data

  • A few years ago, I was challenged with a task: given a database table with production data, generating fake data with the same structure but resembling most of the probability distributions, inter-column dependencies, and keeping the compression ratios.

    This task was very difficult to solve. Either I was getting something too random, not anonymized enough, or too slow.

    After experimenting with five different methods (explicit distributions, Markov models, Feistel Networks, LSTM, compressed data mutation), I've implemented it in a tool named `clickhouse-obfuscator`.

    It works directly on files and is not dependent on the particular database: it can work with ClickHouse, Snowflake, Redshift, DuckDB, SQLite, or PostgreSQL...

    Source code: https://github.com/ClickHouse/ClickHouse/tree/master/program...

    Install:

  • ClickBench

    ClickBench: a Benchmark For Analytical Databases

  • You can also use this tool to amplify the data volume for tests.

    For example, based on a dataset of 100 million records from ClickBench, I created a dataset of 100 billion records. Here is a description of how to generate this dataset:

    https://github.com/ClickHouse/ClickBench/tree/main/clickhous...

    Basically, you train a model on the existing dataset, then run the generator multiple times in parallel with different seeds.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts