Launch HN: Rootly (YC S21) – Manage Incidents in Slack

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • dispatch

    All of the ad-hoc things you're doing to manage incidents today, done for you, and much more!

  • The open source option from Netflix is quite popular too: https://github.com/Netflix/dispatch

  • incident-response-docs

    PagerDuty's Incident Response Documentation.

  • Cool, thanks for this view.

    I'm also intrigued by the text in this launch announcement:

    > Our focus in the early days was build a hyper opinionated product to help them follow what we believe are the best practices. Now our product direction is focused on configuration and flexibility, how can we plug Rootly into your already existing way of working and automate it. This has helped our larger enterprise customers be successful with their current processes being automated.

    As I have gotten more experience managing complex incidents I've come around to the idea that having a standard process you follow for big issues is somewhat more important than what the process really is.

    I loved the PagerDuty response documentation ( https://response.pagerduty.com/ ) not so much because of the specifics but because it suggests they have a culture where there is a well-understood protocol they always try to follow for big problems.

    I think about archery and "shot grouping" - once you learn to always land in the same place, you can move your aim to start landing somewhere else.

    A number of the things that I see as valuable incident management involve having responders with a shared set of priorities. Tooling can influence how easy/hard some of these things are but it's really up to the people to do things like:

    * Actually finding and fixing the problem and being sure the fix worked

    * Clearly communicating the current user impact to the people who care

    * Figuring out who the right responders are, and getting them in the room quickly

    * Making one production change at a time with the incident coordinator's signoff, so you know which one helped and when it happened

    * Helping the rest of the organization learn from what happened (you may not know what there is to learn)

    Do you see room for the tooling company to also provide best-practices training, mentorship, or other kinds of support? That stuff scales less well than a web app but is arguably more important to changing a company's culture in a way that gets better user outcomes.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • What's your incident response flow?

    2 projects | /r/sre | 27 May 2023
  • SRE - Process to handle incident management

    1 project | /r/devops | 7 Sep 2022
  • What happens if you cannot resolve the issue at hand?

    1 project | /r/sysadmin | 25 Aug 2022
  • Startup guide to incident management

    1 project | dev.to | 16 Mar 2022
  • PagerDuty Postmortem Handbook

    1 project | news.ycombinator.com | 7 Dec 2023