Ask HN: How do you develop internal tools for your organization?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

MkDocs

114 18,219 9.0 Python

Project documentation with Markdown.

TL;DR: manually added at first, and later added invite links and password-reset flows. Still invite based.
>Do you develop one single big application for everything in your organization, or few distinct applications, each with its own specific goal?
TL;DR: plugin system with a core that loads extensions at run-time. Each extension is its own separate repository. We could compose different products.
>Any specific thing to watch out for?
Yes, here we go...
Context: consultancy specialized in building end-to-end products for large enterprise. Around 17 people in 2018 working on about 8 projects simultaneously. CEO (co-founder) in one country and CTO (co-founder) in another country with the team. CTO quits a year and a half after I joined. Pretty much everyone quit after that except literally a handful. A colleague and I took over. We killed projects that were going nowhere, unstuck some, tightened the stack (it was Scala-Python-Spark-Kafka-Hadoop-Clojure-Vue-and what have you).
Problem: building end-to-end products for several sectors (telcos, energy, banking, transportation, health) meant that in addition to diving into different domains, acquiring data from peculiar data sources, and building the models, we also had to build bespoke "quick-no-time-for-reusability-one-off" applications that used these models and, in some cases, enabled domain experts at client organizations to train models themselves on new data. User authentication and administration, workspaces/groups, etc.
This also created a situation where I bumped into a university classmate founding a startup who described his problem, that was exactly the same as a problem we solved for a telco for which we built a product, but we were unable to sell him the product because it was custom-made. It sucked. He needed the exact same features but although they were contained, the containing application was too custom made.
The next project we took on, we wanted to change the way we develop product, and the way we worked. I remember the first few commits I made sure to set up a plugin architecture so we could have a minimal core that practically did nothing but load extensions dynamically, and all the functionality would be in these extensions with separate repositories so we could compose and add and remove to create different products.
Lesson: It should be as easy to remove things as it is to add things.
Next, in 2019, we started experimenting with remote work. It was painful to watch colleagues look at the window to check if their train arrived, or to arrive at the office at 11AM because commute sucked. They were wasting up to 6 hours per day in commute. We experimented with remote work one day a week and saw what broke and fixed it. Then two days a week. Then a whole week here and there. The rationale was the we may have to do it some day and we might as well learn the lessons right then when the stakes are low not to be taken by surprise when the stakes are higher.
In the meantime, my colleague (data scientist) had an obligation that put him away for one year. He could read emails but not reply, or open web pages due to the poor internet connection. This introduced a forcing function for asynchroneous communication, tracking, collaboration to be built into the internal platform. We were tired of not knowing which approaches generated the best models, and being unable to easily re-use work from previous projects.
The reason they were coming to the office was to train models on a "beefy machine", and it was ridiculous. The next thing to do was to enable our people to do their job remotely on our internal platform. We built a plugin that offered a notebook experience so they could run training jobs that required a GPU and large amounts of memory right from the comfort of their home.
Lesson: Just enough engineering and features to get the job done (that's the second important thing). No over building. Get the job to be done right from target users, and implement a slightly more generalized form of the problem while avoiding the "General Problem": https://xkcd.com/974. We're not DSL-creating-architecture-astronauts shooting off to far abstraction orbits.
We were only left with one data scientist who was already busy working on a project for a client and we lacked feedback from the people (one person, given the other was absent) we were building for. She taught a course at university and had around thirty students in a town that was hit hard by COVID. The university shut down and she asked me if the idea about enabling students to use the platform was still on the table because her students couldn't go to university anymore and use the machines, and practically none of them had, or could afford, a station with GPU/RAM requirements and it put their final year project in machine learning at risk. Here come our users, then. As I said, we needed users because we only had one user and she was busy.
We onboarded her students: we created their accounts (yes, no account resetting/forgot password flows), we created a Slack workspace for them, helped them troubleshoot and optimize their code, and guided them through getting the work done. We stayed late. They did two things:
- Discover issues we were not aware of.
- Put weight on the issues we were aware of in terms of frequency/impact (most important).
When the tenth user would complain about something, we knew we really needed to fix it:
- We added real-time collaboration in notebooks because they were taking pictures with their phone cameras of the screen and sending them on Slack to get help from their peers or from us.
- We added long-running notebook / scheduled background-execution in notebooks because there were resource conflicts: everyone trying to train a model at the same time and getting OOM errors, and users staying late to get resources.
Lesson: get initial users as soon as functionally possible. Ask questions. Ruthlessly prioritize issues according to frequency/gravity. Our issue templates on GitLab include an "Instances" section: how many times has this happened and how much does it suck? If it's an imaginary scenario, we won't do it now. If it's a minor inconvenience that happens rarely, we won't do it "now".
Action: add documentation (MkDocs: https://www.mkdocs.org )
Then we worked hard to fix a lot of bugs whose stack traces were buried in large logs.
Action: add application monitoring (Sentry here: https://sentry.io ) to prioritize bugs and issues. Add incident report templates to make it easier for people to fill them up (in addition to feature/bug templates).
Then we needed more feedback to accelerate and solve more generalized versions of the problems we were solving from professionals who were working on projects and problems we were unfamiliar with. Revamp the landing page, add a "Request access" field to get emails. Interact with people and try and get them to try the product. Listen also to why they wouldn't. Send invite links to people we chatted with...
Action: add analytics to see sticking points (PostHog: https://posthog.com )
When still nothing has happened (we still need feedback and more data points because we're too tiny), we started getting into more conversations with people in a more hands-on approach. Actively look for people online on several platforms and forums. I manually curated around 500 email addresses for whom I got the first name, last name, email address (from Twitter, keybase links to GitHub with `git log | grep Author`. Manually cloned a lot of repos to do that). Then instead of just sharing a link, I qualified early on: most of the professionals who are active in the field, engineers at companies, sometimes CTOs/CEOs (some of them in the middle of mergers and/or acquisitions) had the most useful feedback, got on calls, and wrote huge emails going deep into the product. AI enthusiasts/audience builders/medium writers/people tweeting about AI were often uncouth. The only way to know who was who was to qualify in the initial emails.
Lesson: talk with the right people. The right people are more knowledgeable, and those who did something with their life or built a product or a company understand the effort and have been way more generous and courteous.
Lesson: even though I started talking with people before and while we were building the product, and the product was for a problem and use cases we intimately know, I would have wanted to talk with people even earlier than that.
Lesson: sell the product early on (integrated with Stripe to be able to send invoices). Buying is honest feedback.
Product guidelines:
- APIs: everything you can do with point and click on the web interfact, you must be able to do with API calls.
- The service must be able to die without users being left scrambling to exfiltrate their data with an end of service email (derived from a personal one: I must be able to die without the team being impacted except for nostalgia).
- The site must be able to run on a Raspberry PI, but do the work with the right integrations to compute/storage/networking.

Sentry

265 36,817 10.0 Python

Developer-first error tracking and performance monitoring

TL;DR: manually added at first, and later added invite links and password-reset flows. Still invite based.
>Do you develop one single big application for everything in your organization, or few distinct applications, each with its own specific goal?
TL;DR: plugin system with a core that loads extensions at run-time. Each extension is its own separate repository. We could compose different products.
>Any specific thing to watch out for?
Yes, here we go...
Context: consultancy specialized in building end-to-end products for large enterprise. Around 17 people in 2018 working on about 8 projects simultaneously. CEO (co-founder) in one country and CTO (co-founder) in another country with the team. CTO quits a year and a half after I joined. Pretty much everyone quit after that except literally a handful. A colleague and I took over. We killed projects that were going nowhere, unstuck some, tightened the stack (it was Scala-Python-Spark-Kafka-Hadoop-Clojure-Vue-and what have you).
Problem: building end-to-end products for several sectors (telcos, energy, banking, transportation, health) meant that in addition to diving into different domains, acquiring data from peculiar data sources, and building the models, we also had to build bespoke "quick-no-time-for-reusability-one-off" applications that used these models and, in some cases, enabled domain experts at client organizations to train models themselves on new data. User authentication and administration, workspaces/groups, etc.
This also created a situation where I bumped into a university classmate founding a startup who described his problem, that was exactly the same as a problem we solved for a telco for which we built a product, but we were unable to sell him the product because it was custom-made. It sucked. He needed the exact same features but although they were contained, the containing application was too custom made.
The next project we took on, we wanted to change the way we develop product, and the way we worked. I remember the first few commits I made sure to set up a plugin architecture so we could have a minimal core that practically did nothing but load extensions dynamically, and all the functionality would be in these extensions with separate repositories so we could compose and add and remove to create different products.
Lesson: It should be as easy to remove things as it is to add things.
Next, in 2019, we started experimenting with remote work. It was painful to watch colleagues look at the window to check if their train arrived, or to arrive at the office at 11AM because commute sucked. They were wasting up to 6 hours per day in commute. We experimented with remote work one day a week and saw what broke and fixed it. Then two days a week. Then a whole week here and there. The rationale was the we may have to do it some day and we might as well learn the lessons right then when the stakes are low not to be taken by surprise when the stakes are higher.
In the meantime, my colleague (data scientist) had an obligation that put him away for one year. He could read emails but not reply, or open web pages due to the poor internet connection. This introduced a forcing function for asynchroneous communication, tracking, collaboration to be built into the internal platform. We were tired of not knowing which approaches generated the best models, and being unable to easily re-use work from previous projects.
The reason they were coming to the office was to train models on a "beefy machine", and it was ridiculous. The next thing to do was to enable our people to do their job remotely on our internal platform. We built a plugin that offered a notebook experience so they could run training jobs that required a GPU and large amounts of memory right from the comfort of their home.
Lesson: Just enough engineering and features to get the job done (that's the second important thing). No over building. Get the job to be done right from target users, and implement a slightly more generalized form of the problem while avoiding the "General Problem": https://xkcd.com/974. We're not DSL-creating-architecture-astronauts shooting off to far abstraction orbits.
We were only left with one data scientist who was already busy working on a project for a client and we lacked feedback from the people (one person, given the other was absent) we were building for. She taught a course at university and had around thirty students in a town that was hit hard by COVID. The university shut down and she asked me if the idea about enabling students to use the platform was still on the table because her students couldn't go to university anymore and use the machines, and practically none of them had, or could afford, a station with GPU/RAM requirements and it put their final year project in machine learning at risk. Here come our users, then. As I said, we needed users because we only had one user and she was busy.
We onboarded her students: we created their accounts (yes, no account resetting/forgot password flows), we created a Slack workspace for them, helped them troubleshoot and optimize their code, and guided them through getting the work done. We stayed late. They did two things:
- Discover issues we were not aware of.
- Put weight on the issues we were aware of in terms of frequency/impact (most important).
When the tenth user would complain about something, we knew we really needed to fix it:
- We added real-time collaboration in notebooks because they were taking pictures with their phone cameras of the screen and sending them on Slack to get help from their peers or from us.
- We added long-running notebook / scheduled background-execution in notebooks because there were resource conflicts: everyone trying to train a model at the same time and getting OOM errors, and users staying late to get resources.
Lesson: get initial users as soon as functionally possible. Ask questions. Ruthlessly prioritize issues according to frequency/gravity. Our issue templates on GitLab include an "Instances" section: how many times has this happened and how much does it suck? If it's an imaginary scenario, we won't do it now. If it's a minor inconvenience that happens rarely, we won't do it "now".
Action: add documentation (MkDocs: https://www.mkdocs.org )
Then we worked hard to fix a lot of bugs whose stack traces were buried in large logs.
Action: add application monitoring (Sentry here: https://sentry.io ) to prioritize bugs and issues. Add incident report templates to make it easier for people to fill them up (in addition to feature/bug templates).
Then we needed more feedback to accelerate and solve more generalized versions of the problems we were solving from professionals who were working on projects and problems we were unfamiliar with. Revamp the landing page, add a "Request access" field to get emails. Interact with people and try and get them to try the product. Listen also to why they wouldn't. Send invite links to people we chatted with...
Action: add analytics to see sticking points (PostHog: https://posthog.com )
When still nothing has happened (we still need feedback and more data points because we're too tiny), we started getting into more conversations with people in a more hands-on approach. Actively look for people online on several platforms and forums. I manually curated around 500 email addresses for whom I got the first name, last name, email address (from Twitter, keybase links to GitHub with `git log | grep Author`. Manually cloned a lot of repos to do that). Then instead of just sharing a link, I qualified early on: most of the professionals who are active in the field, engineers at companies, sometimes CTOs/CEOs (some of them in the middle of mergers and/or acquisitions) had the most useful feedback, got on calls, and wrote huge emails going deep into the product. AI enthusiasts/audience builders/medium writers/people tweeting about AI were often uncouth. The only way to know who was who was to qualify in the initial emails.
Lesson: talk with the right people. The right people are more knowledgeable, and those who did something with their life or built a product or a company understand the effort and have been way more generous and courteous.
Lesson: even though I started talking with people before and while we were building the product, and the product was for a problem and use cases we intimately know, I would have wanted to talk with people even earlier than that.
Lesson: sell the product early on (integrated with Stripe to be able to send invoices). Buying is honest feedback.
Product guidelines:
- APIs: everything you can do with point and click on the web interfact, you must be able to do with API calls.
- The service must be able to die without users being left scrambling to exfiltrate their data with an end of service email (derived from a personal one: I must be able to die without the team being impacted except for nostalgia).
- The site must be able to run on a Raspberry PI, but do the work with the right integrations to compute/storage/networking.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
PostHog

99 17,013 10.0 Python

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.

TL;DR: manually added at first, and later added invite links and password-reset flows. Still invite based.
>Do you develop one single big application for everything in your organization, or few distinct applications, each with its own specific goal?
TL;DR: plugin system with a core that loads extensions at run-time. Each extension is its own separate repository. We could compose different products.
>Any specific thing to watch out for?
Yes, here we go...
Context: consultancy specialized in building end-to-end products for large enterprise. Around 17 people in 2018 working on about 8 projects simultaneously. CEO (co-founder) in one country and CTO (co-founder) in another country with the team. CTO quits a year and a half after I joined. Pretty much everyone quit after that except literally a handful. A colleague and I took over. We killed projects that were going nowhere, unstuck some, tightened the stack (it was Scala-Python-Spark-Kafka-Hadoop-Clojure-Vue-and what have you).
Problem: building end-to-end products for several sectors (telcos, energy, banking, transportation, health) meant that in addition to diving into different domains, acquiring data from peculiar data sources, and building the models, we also had to build bespoke "quick-no-time-for-reusability-one-off" applications that used these models and, in some cases, enabled domain experts at client organizations to train models themselves on new data. User authentication and administration, workspaces/groups, etc.
This also created a situation where I bumped into a university classmate founding a startup who described his problem, that was exactly the same as a problem we solved for a telco for which we built a product, but we were unable to sell him the product because it was custom-made. It sucked. He needed the exact same features but although they were contained, the containing application was too custom made.
The next project we took on, we wanted to change the way we develop product, and the way we worked. I remember the first few commits I made sure to set up a plugin architecture so we could have a minimal core that practically did nothing but load extensions dynamically, and all the functionality would be in these extensions with separate repositories so we could compose and add and remove to create different products.
Lesson: It should be as easy to remove things as it is to add things.
Next, in 2019, we started experimenting with remote work. It was painful to watch colleagues look at the window to check if their train arrived, or to arrive at the office at 11AM because commute sucked. They were wasting up to 6 hours per day in commute. We experimented with remote work one day a week and saw what broke and fixed it. Then two days a week. Then a whole week here and there. The rationale was the we may have to do it some day and we might as well learn the lessons right then when the stakes are low not to be taken by surprise when the stakes are higher.
In the meantime, my colleague (data scientist) had an obligation that put him away for one year. He could read emails but not reply, or open web pages due to the poor internet connection. This introduced a forcing function for asynchroneous communication, tracking, collaboration to be built into the internal platform. We were tired of not knowing which approaches generated the best models, and being unable to easily re-use work from previous projects.
The reason they were coming to the office was to train models on a "beefy machine", and it was ridiculous. The next thing to do was to enable our people to do their job remotely on our internal platform. We built a plugin that offered a notebook experience so they could run training jobs that required a GPU and large amounts of memory right from the comfort of their home.
Lesson: Just enough engineering and features to get the job done (that's the second important thing). No over building. Get the job to be done right from target users, and implement a slightly more generalized form of the problem while avoiding the "General Problem": https://xkcd.com/974. We're not DSL-creating-architecture-astronauts shooting off to far abstraction orbits.
We were only left with one data scientist who was already busy working on a project for a client and we lacked feedback from the people (one person, given the other was absent) we were building for. She taught a course at university and had around thirty students in a town that was hit hard by COVID. The university shut down and she asked me if the idea about enabling students to use the platform was still on the table because her students couldn't go to university anymore and use the machines, and practically none of them had, or could afford, a station with GPU/RAM requirements and it put their final year project in machine learning at risk. Here come our users, then. As I said, we needed users because we only had one user and she was busy.
We onboarded her students: we created their accounts (yes, no account resetting/forgot password flows), we created a Slack workspace for them, helped them troubleshoot and optimize their code, and guided them through getting the work done. We stayed late. They did two things:
- Discover issues we were not aware of.
- Put weight on the issues we were aware of in terms of frequency/impact (most important).
When the tenth user would complain about something, we knew we really needed to fix it:
- We added real-time collaboration in notebooks because they were taking pictures with their phone cameras of the screen and sending them on Slack to get help from their peers or from us.
- We added long-running notebook / scheduled background-execution in notebooks because there were resource conflicts: everyone trying to train a model at the same time and getting OOM errors, and users staying late to get resources.
Lesson: get initial users as soon as functionally possible. Ask questions. Ruthlessly prioritize issues according to frequency/gravity. Our issue templates on GitLab include an "Instances" section: how many times has this happened and how much does it suck? If it's an imaginary scenario, we won't do it now. If it's a minor inconvenience that happens rarely, we won't do it "now".
Action: add documentation (MkDocs: https://www.mkdocs.org )
Then we worked hard to fix a lot of bugs whose stack traces were buried in large logs.
Action: add application monitoring (Sentry here: https://sentry.io ) to prioritize bugs and issues. Add incident report templates to make it easier for people to fill them up (in addition to feature/bug templates).
Then we needed more feedback to accelerate and solve more generalized versions of the problems we were solving from professionals who were working on projects and problems we were unfamiliar with. Revamp the landing page, add a "Request access" field to get emails. Interact with people and try and get them to try the product. Listen also to why they wouldn't. Send invite links to people we chatted with...
Action: add analytics to see sticking points (PostHog: https://posthog.com )
When still nothing has happened (we still need feedback and more data points because we're too tiny), we started getting into more conversations with people in a more hands-on approach. Actively look for people online on several platforms and forums. I manually curated around 500 email addresses for whom I got the first name, last name, email address (from Twitter, keybase links to GitHub with `git log | grep Author`. Manually cloned a lot of repos to do that). Then instead of just sharing a link, I qualified early on: most of the professionals who are active in the field, engineers at companies, sometimes CTOs/CEOs (some of them in the middle of mergers and/or acquisitions) had the most useful feedback, got on calls, and wrote huge emails going deep into the product. AI enthusiasts/audience builders/medium writers/people tweeting about AI were often uncouth. The only way to know who was who was to qualify in the initial emails.
Lesson: talk with the right people. The right people are more knowledgeable, and those who did something with their life or built a product or a company understand the effort and have been way more generous and courteous.
Lesson: even though I started talking with people before and while we were building the product, and the product was for a problem and use cases we intimately know, I would have wanted to talk with people even earlier than that.
Lesson: sell the product early on (integrated with Stripe to be able to send invoices). Buying is honest feedback.
Product guidelines:
- APIs: everything you can do with point and click on the web interfact, you must be able to do with API calls.
- The service must be able to die without users being left scrambling to exfiltrate their data with an end of service email (derived from a personal one: I must be able to die without the team being impacted except for nostalgia).
- The site must be able to run on a Raspberry PI, but do the work with the right integrations to compute/storage/networking.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project