One does not simply "create a visualization" from unstructured data!

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

  • In this example given in the article, I can't just use SQL functions to extract the age and phone number. I guess the phone number could be regexed but ideally I should use something like spaCy and also record some kind of confidence score. This is where Spark/Dask/etc really shine. Does Airbyte support user defined functions in a language like Python?

  • CoreNLP

    CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.

  • If your looking at spacy have a look at Apache OpenNLP and Core NLP.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts