Command-line Tools can be 235x Faster than your Hadoop Cluster

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Access the most powerful time series database as a service
  • SonarQube - Static code analysis for 29 languages.
  • SaaSHub - Software Alternatives and Reviews
  • frawk

    an efficient awk-like language

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • fd

    A simple, fast and user-friendly alternative to 'find'

  • wget2

    Another tip is using wget2 instead of wget if you're mirroring a site (but this is more I/O tip than computationally heavy) https://gitlab.com/gnuwget/wget2/-/wikis/home

    Sadly, wget2 doesn't support WARC last time I checked, but wget2 comes with a `--max-threads` parameter that together with `--mirror` and `--tries` makes it trivial to mirror even the slowest websites out there.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts