Kullback–Leibler Divergence

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • pysurprisal

    Calculate surprisal for words in text.

  • I have used KL-divergence in authorship verification: https://github.com/capjamesg/pysurprisal/blob/main/pysurpris...

    My theory was: calculate entropy ("surprisal") of used words, then KL-divergence between a given prose and a collection of surprisals for different authors. The author to whom the prose had the highest KL-divergence was assumed to be the author. I think it has been used in stylometry a bit.

  • WaveFunctionCollapse

    Bitmap & tilemap generation from a single example with the help of ideas from quantum mechanics

  • Intuitively, it measures the difference between two probability distributions. It's not symmetric, so it's not quite that, but in my opinion, it's good intuition.

    As motivation, say you're an internet provider, providing internet service to a business. You naturally want to save money, so you perhaps want to compress packets before they go over the wire. Let's say the business you're providing service to also compresses their data, but they've made a mistake and do it inefficiently.

    Let's say the business has, incorrectly, determined the probability distribution for their data to be $q(x)$. That is, they assign probability of seeing symbol $x$ to be $q(x)$. Let's say you've determined the "true" distribution to be $p(x)$. The entropy, or number of bits, they expect to transmit per packet/symbol will be $-\sum p(x) lg(q(x))$. Meaning, they'll compress their stream under the assumption that the distribution is $q(x)$ but the actually probability of seeing a packet, $x$, is $p(x)$, which is why the term $p(x) lg(q(x))$ shows up.

    The number of bits you're transmitting is just $-\sum p(x) lg(p(x))$. Now we ask, how many bits, per packet, is the savings of your method over the businesses? This is $-\sum p(x) lg(q(x)/p(x))$, which is exactly the Kullback-Leibler divergence (maybe up to a sign difference).

    In other words, given a "guess" at a distribution and the "true" distribution, how bad is it between them? This is the Kullback-Leibler distribution and why it shows up (I believe) in machine learning and fitness functions.

    As a more concrete example, I just ran across a paper talking [0] about using WFC [1] to asses how well it, and other algorithms, do when trying to create generative "super mario brothers" like levels. Take a 2x2 or 3x3 grid, make a library of tiles, use that to generate a random level, then use the K-L divergence to determine how well your generative algorithm has done compared to the observed distribution from an example image.

    [0] https://arxiv.org/pdf/1905.05077.pdf

    [1] https://github.com/mxgmn/WaveFunctionCollapse

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts