disco reviews and mentions
-
DeWitt and Stonebraker's "MapReduce: A major step backwards" (2009)
I agree. I used Disco MR to do amazing things. Trivial to use, like anyone could be productive in under an hour.
Erasure codes are awesome, but so is just having 3 copies. When you have skin in the game, simplicity is the most important driver of good outcomes. Look at the dimensions that Netezza optimized, they saw a technological window and they took it. Right now we have workstations that can push 100GB/s from from flash. We are talking about being able to sort 1TB of data in 20 seconds (from flash) the same machine could do it from ram in 10.
https://github.com/discoproject/disco
I need to give Ray and Dask a try.
I don't know where to put this comment so I'll put it here. DeWitt and Stonebraker are right, but also wrong. Everyone is talking past each other there. Both are geniuses, this essay wasn't super strong.
If I was their editor, I would say, reframe it as MapReduce is an implementation detail, we also need these other things for this to be usable by the masses. Their point about indexes proves my point about talking past each other. If you are scanning the data basically once, building an index is a waste.
Stats
discoproject/disco is an open source project licensed under BSD 3-clause "New" or "Revised" License which is an OSI approved license.
The primary programming language of disco is Erlang.
Sponsored