Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Great analysis, thanks!
Slack support says that users should tell their ISPs to invalidate the DNS cache for slack.com https://status.slack.com/2021-09/06c1e17de93e7dc2 (access with 8.8.8.8 as resolver)
Since the faulty DS record was in .com, everyone has a max wait-for-ttl-to-expire time of 24h.
Google/Cloudflare etc. seem to also invalidate .com caching very quickly, 8.8.8.8 quickly was the first workaround.
Meanwhile, 14 hours later, DTAG in Germany still does not resolve. The default resolvers have dnssec enabled.
dig slack.com +cd
tells the resolver to skip dnssec validation tests, and then it works again. Screenshots with the command output in https://twitter.com/dnsmichi/status/1443840645513293853?s=2
Very interested in the post-mortem analysis. I think there were similar mistakes as with nasa.gov incident and the comcast analysis in 2012: https://www.internetsociety.org/blog/2012/01/comcast-release...
Learnings for me:
- dnstracer (https://gitlab.com/dnsmichi/dotfiles/-/blob/main/Brewfile#L5...) helps with detecting missing glue records, but not dnssec
- dnstrace (https://github.com/rs/dnstrace) is a better alternative with dnssec
Great analysis, thanks!
Slack support says that users should tell their ISPs to invalidate the DNS cache for slack.com https://status.slack.com/2021-09/06c1e17de93e7dc2 (access with 8.8.8.8 as resolver)
Since the faulty DS record was in .com, everyone has a max wait-for-ttl-to-expire time of 24h.
Google/Cloudflare etc. seem to also invalidate .com caching very quickly, 8.8.8.8 quickly was the first workaround.
Meanwhile, 14 hours later, DTAG in Germany still does not resolve. The default resolvers have dnssec enabled.
dig slack.com +cd
tells the resolver to skip dnssec validation tests, and then it works again. Screenshots with the command output in https://twitter.com/dnsmichi/status/1443840645513293853?s=2
Very interested in the post-mortem analysis. I think there were similar mistakes as with nasa.gov incident and the comcast analysis in 2012: https://www.internetsociety.org/blog/2012/01/comcast-release...
Learnings for me:
- dnstracer (https://gitlab.com/dnsmichi/dotfiles/-/blob/main/Brewfile#L5...) helps with detecting missing glue records, but not dnssec
- dnstrace (https://github.com/rs/dnstrace) is a better alternative with dnssec