Slack is experiencing a service disruption

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • PowerDNS

    PowerDNS Authoritative, PowerDNS Recursor, dnsdist

  • dotfiles

  • Great analysis, thanks!

    Slack support says that users should tell their ISPs to invalidate the DNS cache for slack.com https://status.slack.com/2021-09/06c1e17de93e7dc2 (access with 8.8.8.8 as resolver)

    Since the faulty DS record was in .com, everyone has a max wait-for-ttl-to-expire time of 24h.

    Google/Cloudflare etc. seem to also invalidate .com caching very quickly, 8.8.8.8 quickly was the first workaround.

    Meanwhile, 14 hours later, DTAG in Germany still does not resolve. The default resolvers have dnssec enabled.

    dig slack.com +cd

    tells the resolver to skip dnssec validation tests, and then it works again. Screenshots with the command output in https://twitter.com/dnsmichi/status/1443840645513293853?s=2

    Very interested in the post-mortem analysis. I think there were similar mistakes as with nasa.gov incident and the comcast analysis in 2012: https://www.internetsociety.org/blog/2012/01/comcast-release...

    Learnings for me:

    - dnstracer (https://gitlab.com/dnsmichi/dotfiles/-/blob/main/Brewfile#L5...) helps with detecting missing glue records, but not dnssec

    - dnstrace (https://github.com/rs/dnstrace) is a better alternative with dnssec

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dnstrace

    DNS resolution tracing tool (by rs)

  • Great analysis, thanks!

    Slack support says that users should tell their ISPs to invalidate the DNS cache for slack.com https://status.slack.com/2021-09/06c1e17de93e7dc2 (access with 8.8.8.8 as resolver)

    Since the faulty DS record was in .com, everyone has a max wait-for-ttl-to-expire time of 24h.

    Google/Cloudflare etc. seem to also invalidate .com caching very quickly, 8.8.8.8 quickly was the first workaround.

    Meanwhile, 14 hours later, DTAG in Germany still does not resolve. The default resolvers have dnssec enabled.

    dig slack.com +cd

    tells the resolver to skip dnssec validation tests, and then it works again. Screenshots with the command output in https://twitter.com/dnsmichi/status/1443840645513293853?s=2

    Very interested in the post-mortem analysis. I think there were similar mistakes as with nasa.gov incident and the comcast analysis in 2012: https://www.internetsociety.org/blog/2012/01/comcast-release...

    Learnings for me:

    - dnstracer (https://gitlab.com/dnsmichi/dotfiles/-/blob/main/Brewfile#L5...) helps with detecting missing glue records, but not dnssec

    - dnstrace (https://github.com/rs/dnstrace) is a better alternative with dnssec

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts