A reconfigurable fabric for accelerating large-scale datacenter services- 学术资源搜索

[PDF][PDF] Retrospective: A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

A Putnam, AM Caulfield, ES Chung, L Adams… - sites.coecis.cornell.edu

A Putnam, AM Caulfield, ES Chung, L Adams, K Constantinides, J Demme, D Firestone…

sites.coecis.cornell.edu

Hardware specialization can improve performance and energy efficiency by several orders of magnitude over conventional CPUs [7]. However, the wide variety of cloud applications, their rapid rate of change, and the need to support multiple geographical regions and generations simultaneously make scalable custom hardware for the cloud challenging. The Catapult program began in 2010, one year after the launch of Microsoft Bing and two years after the launch of Red Dog, the predecessor to Azure. Bing approached Microsoft Research (MSR) to find ways to give Bing a competitive edge. Many at Microsoft and in the architecture research community envisioned 1000 core” manycore” multiprocessors as the future for application acceleration, but our team looked to hardware specialization as a better approach for Bing. The short timeline of the request (as soon as possible) combined with the low budget quickly ruled out custom hardware solutions. We looked at both GPUs and FPGAs and decided that FPGAs could cover a broader range of workloads, and the SIMD execution model of GPUs did not match the latency-sensitive Bing workload which made batching requests impractical. Accordingly, MSR developed an accelerator platform based on FPGAs, with the vision of eventually creating a fully-custom solution. Under the codename Project Catapult, the 2014 paper focused on the architecture and deployment of FPGA hardware in Microsoft’s cloud and the hardware/software co-design that doubled Bing’s ranking throughput and reduced latency by 30% at true production scale.

It is worth noting that the 1,632 servers in the paper was not a random number. This was the smallest number number of servers that could run one Bing instance. With workload performance measured in 99%+ tail latencies and heavily dependent on IO performance, engineering and deploying scale systems is the only way to do realistic evaluations. Supporting specialized hardware at scale was more challenging than initially envisioned. We went through three design iterations. First, we designed a” mega-board” with six large FPGAs, four of which were placed in a special server per rack. However, datacenters prefer homogeneous racks to simplify power/cooling and to limit the blast radius of failures. In addition, network designs that concentrate traffic at one node

sites.coecis.cornell.edu

展开收起

被引用次数：1 相关文章

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

[PDF][PDF] Retrospective: A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

引用