AI Language Models Are Struggling to “Get” Math

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • diffgeo

    A formalization of synthetic differential geometry in Coq using infinitesimal analysis

  • > Of course computers can do arithmetic operations, but this is not the same as solving math problems, proving theorems, etc.

    Computers can solve math problems and prove theorems; this remains a significant subfield of Computer Science with lots of industrial use cases. However, pure machine learning based approaches toward these problems remain subpar.

    > Even mathematical objects are approximated up to an approximation error in a computer (like a differentiable manifold or a real number).

    Only because it caught on (and in the case of non-computationally-intensive applications, for purely historical reasons). For example, Mathematica has Reals and even functionality for Reals that is literally impossible to implement for integers [1,2]. There are also precise characterizations of objects in differential geometry [3]. You could imagine applying LLMs to these types of programs a la Copilot, but when you do this you will find yourself agreeing with Paul Houle's observation that math is harder to fake than eg art, language, or even glue code for web apps (assuming you're not on a grant that economically incentivizes you to draw the opposite conclusion, ofc).

    [1] https://reference.wolfram.com/language/ref/Reduce.html

    [2] https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_...

    [3] https://github.com/bollu/diffgeo

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts