Jaccard Index

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • MiniMath

    MiniMath is an experiment to create different math functions using a single line of Python code using only built-in functions.

    I've used this recently to do some fuzzy matching of column names in datasets, I also added it to a small python one-liner library I've been making for practice. p.s. don't give me flack, I know this isn't an efficient way to do things.

    jaccard = lambda A, B: len(set(A).intersection(set(B))) / len(set(A).union(set(B)))

    https://github.com/b-mc2/MiniMath

  • RoaringBitmap

    A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others

    As an aside if you find yourself having to compute them on the fly, know that the Roaring Bitmaps libraries is the way to go [1]. The bitmaps are compressed, and can be streamed directly into SIMD computations (batching XORs and popcnts 256 bits wide!). The Jaccard index is just intersection_len / union_len [2] away

    [1] https://roaringbitmap.org/

    [2] https://roaringbitmap.readthedocs.io/en/latest/#roaringbitma...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts