TY - JOUR
T1 - Workload-Adaptive Configuration Tuning for Hierarchical Cloud Schedulers
AU - Han, Rui
AU - Liu, Chi Harold
AU - Zong, Zan
AU - Chen, Lydia Y.
AU - Liu, Wending
AU - Wang, Siyi
AU - Zhan, Jianfeng
PY - 2019/12/1
Y1 - 2019/12/1
N2 - Cluster schedulers provide flexible resource sharing mechanism for best-effort cloud jobs, which occupy a majority in modern datacenters. Properly tuning a scheduler's configurations is the key to these jobs' performance because it decides how to allocate resources among them. Today's cloud scheduling systems usually rely on cluster operators to set the configuration and thus overlook the potential performance improvement through optimally configuring the scheduler according to the heterogeneous and dynamic cloud workloads. In this paper, we introduce AdaptiveConfig, a run-time configurator for cluster schedulers that automatically adapts to the changing workload and resource status in two steps. First, a comparison approach estimates jobs' performances under different configurations and diverse scheduling scenarios. The key idea here is to transform a scheduler's resource allocation mechanism and their variable influence factors (configurations, scheduling constraints, available resources, and workload status) into business rules and facts in a rule engine, thereby reasoning about these correlated factors in job performance comparison. Second, a workload-adaptive optimizer transforms the cluster-level searching of huge configuration space into an equivalent dynamic programming problem that can be efficiently solved at scale. We implement AdaptiveConfig on the popular YARN Capacity and Fair schedulers and demonstrate its effectiveness using real-world Facebook and Google workloads, i.e., successfully finding best configurations for most of scheduling scenarios and considerably reducing latencies by a factor of two with low optimization time.
AB - Cluster schedulers provide flexible resource sharing mechanism for best-effort cloud jobs, which occupy a majority in modern datacenters. Properly tuning a scheduler's configurations is the key to these jobs' performance because it decides how to allocate resources among them. Today's cloud scheduling systems usually rely on cluster operators to set the configuration and thus overlook the potential performance improvement through optimally configuring the scheduler according to the heterogeneous and dynamic cloud workloads. In this paper, we introduce AdaptiveConfig, a run-time configurator for cluster schedulers that automatically adapts to the changing workload and resource status in two steps. First, a comparison approach estimates jobs' performances under different configurations and diverse scheduling scenarios. The key idea here is to transform a scheduler's resource allocation mechanism and their variable influence factors (configurations, scheduling constraints, available resources, and workload status) into business rules and facts in a rule engine, thereby reasoning about these correlated factors in job performance comparison. Second, a workload-adaptive optimizer transforms the cluster-level searching of huge configuration space into an equivalent dynamic programming problem that can be efficiently solved at scale. We implement AdaptiveConfig on the popular YARN Capacity and Fair schedulers and demonstrate its effectiveness using real-world Facebook and Google workloads, i.e., successfully finding best configurations for most of scheduling scenarios and considerably reducing latencies by a factor of two with low optimization time.
KW - Cloud datacenter
KW - cluster scheduler
KW - configuration
KW - job latency
KW - YARN
UR - http://www.scopus.com/inward/record.url?scp=85075108359&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2019.2923197
DO - 10.1109/TPDS.2019.2923197
M3 - Article
AN - SCOPUS:85075108359
SN - 1045-9219
VL - 30
SP - 2879
EP - 2895
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 12
M1 - 8741093
ER -