Transforming free-form geospatial directions into addresses - SOTA?

This page summarizes the projects mentioned and recommended in the original post on /r/LanguageTechnology

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • libpostal

    A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

  • I know of https://github.com/openvenues/libpostal which handles typos and omissions in addresses, but I am looking into a more fuzzy description of a location.

  • duckling

    Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.

  • To understand what relative distance and direction is indicated from the reference point, I'd look into something like Facebook & Wit.AI's Duckling, and a custom classifier to identify if it's on the reference point ("corner of"), or some distance from ("200 meters southwest"). If you can parse out a distance and direction, then it's all logic to plot the point.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

  • If you've got a specific area you're looking at, and already have street data, you could: 1. Follow the ArcGis blog's directions, creating intersection features. 2. Train a classifier (or a specific NER entity type; SpaCy would be a good package for that) on the types of cross-street references you're finding in your text. You can see some of the relevant tokens in the examples you provided - "Corner of", "along", and I'd imagine "intersection of" etc. Even simple string lookups could help you bootstrap the training data. 3. Use some sort of embedding similarity to compare the hit terms to potential cross-streets.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts