AliCloud's FuxiSort sets world sorting records

Nurdianah Md Nur

AliCloud announced that its distributed computation framework, FuxiSort, has set new world records for the Daytona and Indy categories of Sort Benchmark's GraySort and MinuteSort benchmarks.

The Daytona category of the GraySort and MinuteSort benchmarks has long been considered the gold standard for measuring the scalability and efficiency of general-purpose distributed computing systems. Sort Benchmark was first held in 1987 with single systems, and gradually accepted computing clusters as processing hardware from 1998. GraySort -- named after the pioneering computer scientist Jim Gray -- has evolved over the years into a benchmark for sorting at least 100TB of data, while MinuteSort focuses on sorting as much data as possible in one minute.

According to Sort Benchmark, AliCloud's FuxiSort took 377 seconds to sort 100TB of data, beating last year's record of 23.4 minutes by Apache Spark. The AliCloud team employed a cluster of 3,377 commodity servers to set the Daytona GraySort record of 15.9TB/min and Daytona MinuteSort record of 7.7TB, an improvement of 3.6 times and 2.1 times over the previous records respectively.

Chao Li, team leader, Fuxi said: "As more mobile devices and sensors from the Internet of Things put data online, we will be capturing and analysing ever larger volumes of data in various formats. Gaining accurate, actionable insights affordably and quickly from increasingly large volumes of data will require smarter technologies. We will thus strive to process even higher volumes of data in shorter times going forward."

FuxiSort is built on top of Apsara, a general-purpose computing system developed in-house by AliCloud. Apsara manages cluster resources within a data centre, and schedules parallel execution for a wide range of distributed online and offline applications. According to AliCloud, a single Apsara cluster can be scaled up to 5,000 servers with hundreds of petabytes of storage and hundreds of thousands of CPU cores.

Apsera is the foundation for the majority of public cloud services offered by AliCloud, including Open Data Processing Service (ODPS), Open Storage Service (OSS) and Open Table Service (OTS). It supports all data-processing workloads within Alibaba Group as well. Fuxi is the framework that handles cluster-resource management and job scheduling within Apsara.