DataProfiler Alternatives

Similar projects and alternatives to DataProfiler

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better DataProfiler alternative or higher similarity.

Suggest an alternative to DataProfiler

Reviews and mentions

Posts with mentions or reviews of DataProfiler. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-07.
  • Show HN: Graphsignal – Production Model Monitoring
    news.ycombinator.com | 2021-10-07
    We built a very similar application internally with our open source library: https://github.com/capitalone/dataprofiler

    Effectively, you can monitor changes between profiles:

    # Load a CSV file

  • Miller CLI – Like Awk, sed, cut, join, and sort for CSV, TSV and JSON
    news.ycombinator.com | 2021-08-24
    Not exactly the same, but we wrote a library to easily load any delimited type of file and finds header (even if not first row). It also works to load JSON, Parquet, AVRO and loads it into a dataframe. Not CLI exactly, but pretty easy:

    https://github.com/capitalone/dataprofiler

    Anyway, pretty interesting Miller CLI

  • Launch HN: Lightly (YC S21): Label only the data which improves your ML model
    news.ycombinator.com | 2021-08-09
    Having built a model to identify sensitive data having a solid data labeling solution would be awesome.

    https://github.com/capitalone/DataProfiler

    In this space, Prodigy really dominates:

    https://prodi.gy/

    We actually built our own internal system which integrates and can export the labels (does predictive labeling, etc). Of course, we only focused on text data at the moment.

  • DataProfiler - What's in your data? Extract schema, stats and entities
    reddit.com/r/Python | 2021-07-30
    We made a library called DataProfiler - designed to replace pandas-profiling.
  • DataProfiler – What's in your data? Extract schema, stats and entities
  • Launch HN: Exams, tasks, K8, eCommerce, cell sites, health, travel, data quality
    news.ycombinator.com | 2021-07-23
    Telm.ai (YC S21) - Real-time data quality monitoring

    Looks interesting! I worked on https://github.com/capitalone/DataProfiler

    We are looking to monitor correlation changes over time, see if sensitive data gets entered, track schema changes, etc and see the impact of down stream modeling, etc

    I'm curious how heavy the input is? because usually these systems take a lot of effort to setup. Any idea?

  • Modeling Libraries Don’t Matter
    news.ycombinator.com | 2021-07-22
    My team and I wrote an NLP application to detect sensitive data and detect / validate schemas, etc as well as the other items provided by pandas-profiling.

    https://github.com/capitalone/DataProfiler

    That being said, we noted the same thing. It shouldn't matter what modeling you use. It's the data pipelining where 99% of the work typically is. Modeling itself always needs the same basic input -- matrix of data and outputs a matrix of data.

    Some libraries are good at specific components. Others have improved speeds ups, etc. But it's all so new it's effectively going to change month-to-month. So I always tell the team to build what you can as fast as you can, with the tools you have. We can always update it later, once the pipeline is in place.

  • PyWhat: Identify Anything
    news.ycombinator.com | 2021-06-16
    We built a similar tool, utilizing a CNN. It works on structured (and unstructured) data and provides additional info.

    https://github.com/capitalone/DataProfiler

    Cool part, is you can “extend” the intern name-entity recognition model by refitting with the new data.

    Out if the box, the DataProfiler does something like 18 entities including most of the PII dada.

  • Ask HN: What is the best tool to infer data type of tabular data?
    news.ycombinator.com | 2021-06-08
    [1] - https://github.com/capitalone/DataProfiler
  • DataProfiler - What's in your data? Extract schema, statistics and entities from datasets
  • Show HN: DataProfiler – What's in your data? Extract schema, stats and entities
    news.ycombinator.com | 2021-05-10
  • Show HN: DataProfiler – A Replacement for Pandas-Profiling
    news.ycombinator.com | 2021-05-04
  • Show HN: DataProfiler – Sensitive Data Detection and Profiling
    news.ycombinator.com | 2021-05-03
  • Show HN: DataProfiler – Extract schema, statistics and PPI / NPI detection
    news.ycombinator.com | 2021-04-30
  • Show HN: The DataProfiler – What's in your data?
    news.ycombinator.com | 2021-04-26

Stats

Basic DataProfiler repo stats
24
658
9.4
8 days ago

capitalone/DataProfiler is an open source project licensed under Apache License 2.0 which is an OSI approved license.

Scout APM: A developer's best friend. Try free for 14-days
Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
scoutapm.com
Find remote Python jobs at our new job board 99remotejobs.com.
There are 10 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.