Jia RaoPersonal website: http://ranger.uta.edu/~jrao/
Jia Rao, Ph.D., is an Assistant Professor at the CSE Department. His research interests lie in the broad area of operating systems, distributed and parallel computing, and machine learning. His research focuses on improving the quality-of-service (QoS) of multi-tenant systems, including cloud systems and BigData analytics. The overarching goals are to guarantee fairness and service differentiation between users while achieving high hardware efficiency and delivering predictable performance. His research blends systematic profiling, statistical modeling, hardware-software interaction and cross-layer coordination to identify fundamental challenges as well as limitations of existing systems, and to develop simple but elegant solutions. Dr. Rao’s research has led to the winning of two NSF CNS core research projects and four Best Paper award/nominations in the prestigious conferences, including HPCA, HPDC, APSys, and ICAC.
With the advances in virtualization, datacenter computing, and the explosion of Big Data, IT paradigms have shifted from the traditional client-server computing based on well-structured data to a cloud computing model working with exascale unstructured data. This paradigm shift has revolutionized the way people use Internet services and greatly advanced data-driven machine intelligence and learning. The rising of Amazon as a new IT giant and the surprising loss of a human world champion to Google’s AlphaGo robot in the game of Go are two examples of this revolution. The success of this computing trend relies on an effective consolidation of heterogeneous workloads on the shared cloud infrastructure and an efficient means to extract useful information from Big Data. Dr. Rao’s research is at the intersection of computer systems and data sciences. On the one hand, his research aims to develop better understandings of computer systems via statistical machine learning and build systems that are adaptive to changing workloads, scalable for platform growth, and capable of providing QoS guarantees. On the other hand, his research characterizes the needs of Big Data workloads and designs cloud systems to better support Big Data processing.
One thrust of Dr. Rao’s research is to help migrate applications from traditional systems to cloud systems, which enable users to use virtually infinite resources and allow for the pay-as-you-go billing model. The key challenge is how to exploit such resource scalability and elasticity to run applications in an efficient and cost-effective manner. Dr. Rao’s work on autonomic cloud management dynamically provisions resources for applications with time-varying demands to avoid resource waste due to over-provisioning and QoS violations due to under-provisioning. The resource manager, implemented as a self-adaptive learning agent, interacts with the cloud to determine the best provisioning strategy for various applications. Nevertheless, the cloud is a quite complex environment with dynamism, uncertainties, and heterogeneity, making it difficult to characterize application behaviors with mathematical models and to derive good provisioning policies. To this end, Dr. Rao developed a model-free learning framework based on reinforcement learning (RL) and proposed a number of techniques to accelerate the convergence of the learning algorithm in a cloud environment. A similar RL approach is later employed by Google to build its AlphaGo robot.
Modern computer systems are built with complex software and hardware stacks, including application runtimes, operating systems, hypervisors, and heterogeneous devices. Delivering metered cloud services, such as computing, storage, and software “as a service”, from such complex systems is non-trivial. The support of multi-tenancy, the guarantee of service predictability, and the optimization of resource efficiency are the basic principles behind cloud computing. One critical issue in multi-tenant cloud is the enforcement of fairness between users. Dr. Rao has identified deficiencies in the state-of-art multi-processor scheduling algorithms that cause unfairness in CPU allocation and proposed a novel SMP scheduling algorithm to address the unfairness while preserving scheduling efficiency. Cloud predictability requires that a cloud application should achieve a level of performance in proportion to the amount of resources it consumes and the performance should be consistent over multiple runs irrespective of the status of co-located applications. However, contentions at various levels of the cloud affect the effective capacity of virtual resources leased by users and lead to wildly unpredictable performance. To address the unpredictability, Dr. Rao incorporated the awareness of multi-tenant interference and the real-time measurement of hardware capability into the virtual resource scheduling to account for resource variability. This work won the best paper nomination at the 19th IEEE International Symposium on High Performance Computing Architecture (HPCA), a prestigious conference in computer architecture
While cloud has demonstrated its value in many application domains, it still lacks support for Big Data analytics, the most important type of computing that drives recent advances in artificial intelligence and data sciences. Big Data frameworks implement their own complex runtimes for automatic parallelization and fault tolerance. These mechanisms are designed for dedicated systems, thereby not fitting well in a shared cloud environment with varying resource availability. A trivial migration of analytic workloads to the cloud incurs significant performance loss and even job failures. Dr. Rao’s research found that task scheduling, job configuration, and resource allocation in Big Data analytics should be made adaptive to dynamic cloud environments to attain high efficiency. Among the work he has done in this direction, the interference-aware MapReduce scheduling and flexible data shuffling in Big Data processing won the best paper nomination and the best paper award at the USENIX International Conference on Autonomic Computing (ICAC) and the ACM International Conference on High Performance Distributed Computing (HPDC), respectively.
Dr. Rao’s current research focuses on developing a fundamental understanding of the semantic gaps between virtual and physical systems and exploring the performance-energy tradeoff for emerging Big Data machine learning workloads on heterogeneous hardware. The abstraction of resources in virtual systems is significantly different
from that in physical systems, which causes two semantic gaps – resource discontinuity and semantics heterogeneity. Dr. Rao is working on redesigning the guest operating system’s resource management policies to bridge these semantic gaps. The ultimate goal is to make any workloads resilient to cloud interference. To reduce energy consumption in cloud datacenters, Dr. Rao is exploring aggressive workload consolidation to reduce the number of servers that need to power on, while minimizing performance degradations. He is actively collaborating with colleagues at UTA to identify workloads in interdisciplinary fields that are best suited for consolidation-based energy saving
In the long term, Dr. Rao is interested in addressing the fundamental limitations of existing cloud systems due to emerging hardware, such as high-speed SSDs, GPUs, and manycore processors, and the ever evolving user need, such as ultra-fine grained resource metering, differentiated services, and multi-dimensional fairness. He will pursue cross-layer, adaptive, and elegant approaches with a focus on balanced system design.