Surfer is the first personal data scraper

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io
featured
  1. Protocol

    Open-source framework for exporting your personal data. (by Surfer-Org)

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. HPI

    Human Programming Interface 🧑👽🤖

    Take a look at https://github.com/karlicoss/HPI

    It builds an entire ecosystem around your data where it is programmatic rather than just dumping text files. The point of HPI is to build your own stuff onto it.

    The next stop after Karlicoss is https://github.com/seanbreckenridge/HPI_API which creates a REST API on top of your HPI without any additional configuration.

    If you want to get really fancy, you can use https://github.com/hpi/authenticated_hpi_api or https://github.com/hpi/hpi-graph so you can theoretically expose it to the web (I am squatting the HPI org, I am not the creator of HPI). I made the authentication method JWTs so you can create JWTs where it will give access to only certain services' data.

  4. HPI_API

    An automatic JSON API for HPI

    Take a look at https://github.com/karlicoss/HPI

    It builds an entire ecosystem around your data where it is programmatic rather than just dumping text files. The point of HPI is to build your own stuff onto it.

    The next stop after Karlicoss is https://github.com/seanbreckenridge/HPI_API which creates a REST API on top of your HPI without any additional configuration.

    If you want to get really fancy, you can use https://github.com/hpi/authenticated_hpi_api or https://github.com/hpi/hpi-graph so you can theoretically expose it to the web (I am squatting the HPI org, I am not the creator of HPI). I made the authentication method JWTs so you can create JWTs where it will give access to only certain services' data.

  5. authenticated_hpi_api

    Take a look at https://github.com/karlicoss/HPI

    It builds an entire ecosystem around your data where it is programmatic rather than just dumping text files. The point of HPI is to build your own stuff onto it.

    The next stop after Karlicoss is https://github.com/seanbreckenridge/HPI_API which creates a REST API on top of your HPI without any additional configuration.

    If you want to get really fancy, you can use https://github.com/hpi/authenticated_hpi_api or https://github.com/hpi/hpi-graph so you can theoretically expose it to the web (I am squatting the HPI org, I am not the creator of HPI). I made the authentication method JWTs so you can create JWTs where it will give access to only certain services' data.

  6. hpi-graph

    Discontinued [GET https://api.github.com/repos/hpi/hpi-graph: 404 - Not Found // See: https://docs.github.com/rest/repos/repos#get-a-repository]

    Take a look at https://github.com/karlicoss/HPI

    It builds an entire ecosystem around your data where it is programmatic rather than just dumping text files. The point of HPI is to build your own stuff onto it.

    The next stop after Karlicoss is https://github.com/seanbreckenridge/HPI_API which creates a REST API on top of your HPI without any additional configuration.

    If you want to get really fancy, you can use https://github.com/hpi/authenticated_hpi_api or https://github.com/hpi/hpi-graph so you can theoretically expose it to the web (I am squatting the HPI org, I am not the creator of HPI). I made the authentication method JWTs so you can create JWTs where it will give access to only certain services' data.

  7. Huginn

    Create agents that monitor and act on your behalf. Your agents are standing by!

    Myself, I'd probably prefer to use something like Huginn to create a customized approach to all of my online platforms I'm interested in, rather than a curated list.

    https://github.com/huginn/huginn

  8. vector

    A high-performance observability data pipeline.

    I made something like this since I was tired of the asymmetric nature of data collection that happens on the Internet. Still not where I would like to be, but it's been really nice being able to treat my browsing history as a any old log that I can query over. Tools like dogsheep are nice, but they tend to rely on data being allowed to be removed from the platform. This bypasses those limits by just doing it on the client.

    This lets me create dashboards to see usage for certain topics. For example, I have a "Dev Browser" which tracks the latest sites I've visited that are related to development topics [1].

    I've talked about my first iteration before on here [2].

    My second iteration ended up with a userscript which sends the data on the sites I visit to a Vector instance (http://vector.dev, no affiliation). Vector is in there because for certain sites (ie. those behind draconian Cloudflare configuration), I want to save a local copy of the site. So Vector can pop that field save it to a local minio instance and at the same time push the rest of the record to something like Grafana Loki and Postgres while being very fast.

    I similarly have a few for all the online reading I do. One for blogs, one for fanfiction, and one for webfiction in general.

    I've started looking into a third iteration utilizing MITMproxy. It helps a lot with saving local copies since it's happening outside of the browser, so I don't feel the hitch when a page is inordinately heavy for whatever reason. It also is very nice that it'd work with all browsers just by setting a proxy which means I could set it up for my phone both as a normal proxy or as a wireguard "transparent" proxy. Only need to set up certificates for it work.

    ---

    [1] https://raw.githubusercontent.com/zamu-flowerpot/zamu-flower...

  9. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Huginn is a system for building agents that perform automated tasks online

    1 project | news.ycombinator.com | 11 May 2025
  • Implement the Sovereign Individual (1997) + AI

    1 project | news.ycombinator.com | 12 Apr 2025
  • Show HN: Mashups – Resurrecting Yahoo Pipes, my side project

    3 projects | news.ycombinator.com | 6 Jan 2025
  • Obtainium: Get Android App Updates Directly from the Source

    4 projects | news.ycombinator.com | 2 Nov 2024
  • Imagining a Personal Data Pipeline

    5 projects | news.ycombinator.com | 10 Aug 2024

Did you know that Python is
the 2nd most popular programming language
based on number of references?