Database

Scaling Distributed Joins

Most users of SQL databases have a good understanding of the join algorithms single-box databases employ. They understand the trade-offs and uses for nested loop joins, merge joins, and hash joins. Distributed join algorithms, on the other hand, tend not to be as well understood. Distributed databases need to make a different set of tradeoffs to account for table data that is spread around a cluster of machines instead of stored on a single machine, like in a traditional database. Because these...