Dreaming and Breaking Molds – Establishing Best Practices with Scott Haines

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

  • For example, when I was at Yahoo, we did a lot of things where we had the ability to basically process data in stream. But we didn't have repeatable libraries we could easily use. So we had to invent everything. So it was like, oh, we want to create a session. So somebody starts a user journey, where do they go within a journey? And is it all within a 15 to 30-minute timeout from the last event? How do we understand how people are using something or interacting with it? And those types of things are a lot more difficult than when we're like oh, we could do it like X, Y, or Z. And that stuff was just for free when we started using Spark.

  • Apache Hadoop

    Apache Hadoop

  • So Yahoo bought that. I think it was 2013 or 2014. Timelines are hard. But I wanted to go join the Games team and start things back up. But that was also my first kind of experience in actually building recommendation engines or working with lots of data. And I think for me, like that was, I guess...at the time, we were using something called Apache Storm. We had Hadoop, which had been around for a while. And it was like one of the biggest user groups was out of the Yahoo campus. It was called the HUG group, like the Hadoop Users Group. So they met for basically pizza and stories on Wednesdays once a month, which was really fun.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Apache Avro

    Apache Avro is a data serialization system.

  • Scott: It's like a very large row of Avro data that had everything you could possibly ever need. It was like 115 columns. Most things were null, and it became every data type you'd ever want. It's like, is it mobile? Look for mobile_. It's like, this is really crappy. I didn't know about, I guess, the hardships of data engineering at that point. Because this was the first time where I was like, okay, you're on the ground basically pulling data now, and now we're going to do stuff with it. We're going to power our whole entire application with it. And I remember that just being exciting. The gears were turning. I was waking up super early. I wanted to go in to just to work on it more. It was the first thing where it's like, man, that's just like the coolest thing in the whole entire world.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts