Can Parquet file format index string columns?

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

JetBrains - Tell us how you use coding tools. You may win a prize!
Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
surveys.jetbrains.com
featured
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
  1. annoy

    Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

    Yes you can do this for equality predicates if your row groups are sorted . This blog post (that I didn't write) might add more color. You can't do this for any kind of text searching. If you need to do this with file based storage I'd recommend using a vector based text search and utilize a ANN index library like Annoy.

  2. JetBrains

    Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!

    JetBrains logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Vector Databases 101

    3 projects | /r/datascience | 25 Jun 2023
  • I'm an undergraduate data science intern and trying to run kmodes clustering. Did this elbow method to figure out how many clusters to use, but I don't really see an "elbow". Tips on number of clusters?

    2 projects | /r/datascience | 21 Jun 2023
  • Calculating document similarity in a special domain

    1 project | /r/LanguageTechnology | 1 Jun 2023
  • Billion-Scale Approximate Nearest Neighbor Search [pdf]

    1 project | news.ycombinator.com | 6 May 2023
  • [R] Unlimiformer: Long-Range Transformers with Unlimited Length Input

    1 project | /r/MachineLearning | 5 May 2023

Did you know that C++ is
the 7th most popular programming language
based on number of references?