An Availability-on-Demand Mechanism for Datacenters

Siqi Shen; Alexandru Iosup; Assaf  Israel; Walfredo Cirne; Danny Raz; Dick Epema

An Availability-on-Demand Mechanism for Datacenters

Siqi Shen, Alexandru Iosup, Assaf Israel, Walfredo Cirne, Danny Raz, Dick Epema

Data-Intensive Systems

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

13 Citations (Scopus)

60 Downloads (Pure)

Abstract

Data enters are at the core of a wide variety of daily ICT utilities, ranging from scientific computing to online gaming. Due to the scale of today's data enters, the failure of computing resources is a common occurrence that may disrupt the availability of ICT services, leading to revenue loss. Although many high availability (HA) techniques have been proposed to mask resource failures, datacenter users' -- who rent datacenter resources and use them to provide ICT utilities to a global population' -- still have limited management options for dynamically selecting and configuring HA techniques. In this work, we propose Availability-on-Demand (AoD), a mechanism consisting of an API that allows datacenter users to specify availability requirements which can dynamically change, and an availability-aware scheduler that dynamically manages computing resources based on user-specified requirements. The mechanism operates at the level of individual service instance, thus enabling fine-grained control of availability, for example during sudden requirement changes and periodic operations. Through realistic, trace-based simulations, we show that the AoD mechanism can achieve high availability with low cost. The AoD approach consumes about the same CPU hours but with higher availability than approaches which use HA techniques randomly. Moreover, comparing to an ideal approach which has perfect predictions about failures, it consumes 13% to 31% more CPU hours but achieves similar availability for critical parts of applications.

Original language	English
Title of host publication	15th IEEE/ACM Symp. on Cluster, Cloud and Grid Computing
Publication status	Published - 2015

Access to Document

Shen_CCGrid-2015-Availability-on-DemandAccepted author manuscript, 1.01 MB

http://10.1109/CCGrid.2015.58

Cite this

@inproceedings{dc109391606a4160aa255380e1505041,

title = "An Availability-on-Demand Mechanism for Datacenters",

abstract = "Data enters are at the core of a wide variety of daily ICT utilities, ranging from scientific computing to online gaming. Due to the scale of today's data enters, the failure of computing resources is a common occurrence that may disrupt the availability of ICT services, leading to revenue loss. Although many high availability (HA) techniques have been proposed to mask resource failures, datacenter users' -- who rent datacenter resources and use them to provide ICT utilities to a global population' -- still have limited management options for dynamically selecting and configuring HA techniques. In this work, we propose Availability-on-Demand (AoD), a mechanism consisting of an API that allows datacenter users to specify availability requirements which can dynamically change, and an availability-aware scheduler that dynamically manages computing resources based on user-specified requirements. The mechanism operates at the level of individual service instance, thus enabling fine-grained control of availability, for example during sudden requirement changes and periodic operations. Through realistic, trace-based simulations, we show that the AoD mechanism can achieve high availability with low cost. The AoD approach consumes about the same CPU hours but with higher availability than approaches which use HA techniques randomly. Moreover, comparing to an ideal approach which has perfect predictions about failures, it consumes 13% to 31% more CPU hours but achieves similar availability for critical parts of applications.",

author = "Siqi Shen and Alexandru Iosup and Assaf Israel and Walfredo Cirne and Danny Raz and Dick Epema",

year = "2015",

language = "English",

booktitle = "15th IEEE/ACM Symp. on Cluster, Cloud and Grid Computing",

}

TY - GEN

T1 - An Availability-on-Demand Mechanism for Datacenters

AU - Shen, Siqi

AU - Iosup, Alexandru

AU - Israel, Assaf

AU - Cirne, Walfredo

AU - Raz, Danny

AU - Epema, Dick

PY - 2015

Y1 - 2015

N2 - Data enters are at the core of a wide variety of daily ICT utilities, ranging from scientific computing to online gaming. Due to the scale of today's data enters, the failure of computing resources is a common occurrence that may disrupt the availability of ICT services, leading to revenue loss. Although many high availability (HA) techniques have been proposed to mask resource failures, datacenter users' -- who rent datacenter resources and use them to provide ICT utilities to a global population' -- still have limited management options for dynamically selecting and configuring HA techniques. In this work, we propose Availability-on-Demand (AoD), a mechanism consisting of an API that allows datacenter users to specify availability requirements which can dynamically change, and an availability-aware scheduler that dynamically manages computing resources based on user-specified requirements. The mechanism operates at the level of individual service instance, thus enabling fine-grained control of availability, for example during sudden requirement changes and periodic operations. Through realistic, trace-based simulations, we show that the AoD mechanism can achieve high availability with low cost. The AoD approach consumes about the same CPU hours but with higher availability than approaches which use HA techniques randomly. Moreover, comparing to an ideal approach which has perfect predictions about failures, it consumes 13% to 31% more CPU hours but achieves similar availability for critical parts of applications.

AB - Data enters are at the core of a wide variety of daily ICT utilities, ranging from scientific computing to online gaming. Due to the scale of today's data enters, the failure of computing resources is a common occurrence that may disrupt the availability of ICT services, leading to revenue loss. Although many high availability (HA) techniques have been proposed to mask resource failures, datacenter users' -- who rent datacenter resources and use them to provide ICT utilities to a global population' -- still have limited management options for dynamically selecting and configuring HA techniques. In this work, we propose Availability-on-Demand (AoD), a mechanism consisting of an API that allows datacenter users to specify availability requirements which can dynamically change, and an availability-aware scheduler that dynamically manages computing resources based on user-specified requirements. The mechanism operates at the level of individual service instance, thus enabling fine-grained control of availability, for example during sudden requirement changes and periodic operations. Through realistic, trace-based simulations, we show that the AoD mechanism can achieve high availability with low cost. The AoD approach consumes about the same CPU hours but with higher availability than approaches which use HA techniques randomly. Moreover, comparing to an ideal approach which has perfect predictions about failures, it consumes 13% to 31% more CPU hours but achieves similar availability for critical parts of applications.

M3 - Conference contribution

BT - 15th IEEE/ACM Symp. on Cluster, Cloud and Grid Computing

ER -

An Availability-on-Demand Mechanism for Datacenters

Abstract

Access to Document

Fingerprint

Cite this