RLEM Workshop 2022


Previous Edition: RLEM Workshop 2020

First ACM SIGEnergy Workshop on Reinforcement Learning for Energy Management in Buildings & Cities (RLEM)


RLEM brings together researchers and industry practitioners for the advancement of (deep) reinforcement learning (RL) in the built environment as it is applied for managing energy in civil infrastructure systems (energy, water, transportation).

RLEM’21 will be held in conjunction with ACM BuildSys’20. Following BuildSys’s directive, the conference will be held virtually on November 17th 2020.

Watch RLEM’20 recording!


Register now via BuildSys 2020.

Important Dates

Call for Papers

Buildings account for 40% of the global energy consumption and 30% ofthe associated greenhouse gas emissions, while also offering a 50–90% CO2 mitigation potential. The transportation sector is responsible for an additional 30%. Optimal decarbonization requires electrification of end-uses a nd concomitant decarbonization of electricity supply, efficient use of electricity for lighting, space heating, cooling and ventilation (HVAC), and domestic hot water generation, and upgrade of the thermal properties of buildings. A major driver for decarbonization are integration of renewable energy systems (RES) into the grid, and photovoltaics (PV) and solar-thermal collectors as well as thermal and electric storage into residential and commercial buildings. Electric vehicles (EVs), with their storage capacity and inherent connectivity, hold a great potential for integration with buildings.

The integration of these technologies must be done carefully to unlock their full potential. Artificial intelligence is regarded as a possible pathway to orchestrate these complexities of Smart Cities. In particular, (deep) reinforcement learning algorithms have seen an increased interest and have demonstrated human expert level performance in other domains, e.g., computer games. Research in the building and cities domain has been fragmented and with focus on different problems and using a variety of frameworks. The purpose of this Workshop is to build a growing community around this exciting topic, provide a platform for discussion for future research direction, and share common frameworks.

Topics of Interest

Topics of interest include, but are not limited to:

Submission Instructions

Submitted papers must be unpublished and must not be currently under review for any other publication. Paper submissions must be at most 4 single-spaced US Letter (8.5”x11”) pages, including figures, tables, and appendices (excluding references). All submissions must use the LaTeX (preferred) or Word styles found here https://www.acm.org/publications/proceedings-template. Authors must make a good faith effort to anonymize their submissions by (1) using the “anonymous” option for the class and (2) using “anonsuppress” section where appropriate. Papers that do not meet the size, formatting, and anonymization requirements will not be reviewed. Please note that ACM uses 9-pt fonts in all conference proceedings, and the style (both LaTeX and Word) implicitly define the font size to be 9-pt.


All times are in Greenwich Meridian Time (GMT).

Time Session Title Speaker Abstract
12:00-12:20 Opening remarks General Chair and TPC Chairs -
12:20-13:00 Keynote Incorporating robust control guarantees within (deep) reinforcement learning Zico Kolter (Associate Professor, Carnegie Mellon University): Dr Kolter is an Associate Professor in the Computer Science Department with the School of Computer Science at Carnegie Mellon University. In addition, he also serves as Chief Scientist of AI Research for the Bosch Center for AI (BCAI), working in the Pittsburgh Office. His research group focuses on machine learning, optimization, and control. Specifically, much of the research aims at making deep learning algorithms safer, more robust, and more explainable; to these ends, we have worked on methods for training provably robust deep learning systems, and including more complex “modules” (such as optimization solvers) within the loop of deep architectures. Further focus is on several application domains, with a particular focus on applications in smart energy and sustainability domains.</span> Reinforcement learning methods have produced breakthrough results in recent years, but their application to safety-critical systems has been substantially limited by their lack of guarantees, such as those provided by modern robust control techniques. In this talk, I will discuss a technique we have recently developed that embeds robustness guarantees inside of arbitrary RL policy classes. Using this approach, we can build deep RL methods that attain much of the performance advantages of modern deep RL (namely, superior performance in "average case" scenarios), while still maintaining robustness in worst-case adversarial settings. I will highlight experimental results on several simple control systems highlighting the benefits of the method, in addition to a larger-scale smart grid setting, and end by discussing future directions in this line of work.
13:00-13:50 Session 1: Demanding Response Demand Response through Price-setting Multi-agent Reinforcement Learning M.H. Christensen, C. Ernewein, P. Pinson (Technical University of Denmark) Price based demand response is a cost-effective way of obtaining flexibility needed in power systems with high penetration of intermittent renewable energy sources. Model-free deep reinforcement learning is proposed as a way to train autonomous agents for enabling buildings to participate in demand response programs as well as coordinating such programs though price setting in a multiagent setup. First, we show price responsive control of buildings with electric heat pumps using deep deterministic policy gradient. Then a coordinating agent is trained to manage a population of buildings by adjusting the price in order to keep the total load from exceeding the available capacity considering also the non-flexible base load.
Electricity Pricing aware Deep Reinforcement Learning based Intelligent HVAC Control K. Kurte, J. Munk, K. Amasyali, O. Kotevska, R. Smith, H. Zandi (Oak Ridge National Laboratory) Recently, deep reinforcement learning (DRL) based intelligent control of Heating, Ventilation, and Air Conditioning (HVAC) has gained a lot of attention due to DRL's ability to optimally control HVAC for minimizing operational cost while maintaining resident's comfort. The success of such DRL-based techniques largely depends on the articulation of the problem in terms of states, actions, and reward function. Inclusion of the electricity pricing information in the problem formulation can play an important role in saving the cost of HVAC operation. However, less attention has been given in the literature on formulating well-crafted state features based on electricity pricing. In this work, we propose an approach for training the DRL model with a specific focus on feature engineering based on electricity pricing. During training, we generate random but sufficiently realistic electricity price signals so that the pre-trained DRL model is robust and adaptive to the dynamic and variable electricity prices. The validation results are encouraging and show the potential of ≈12%-15% savings in the one day cost of HVAC operation, proving the usefulness of including electricity pricing related features as state features.
A Centralised Soft Actor Critic Deep Reinforcement Learning Approach to District Demand Side Management through CityLearn A. Kathirgamanathan, K. Twardowski, E. Mangina, D. Finn (University College Dublin) Reinforcement learning is a promising model-free and adaptive controller for demand side management, as part of the future smart grid, at the district level. This paper presents the results of the algorithm that was submitted for the CityLearn Challenge, which was hosted in early 2020 with the aim of designing and tuning a reinforcement learning agent to flatten and smooth the aggregated curve of electrical demand of a district of diverse buildings. The proposed solution secured second place in the challenge using a centralised 'Soft Actor Critic' deep reinforcement learning agent that was able to handle continuous action spaces. The controller was able to achieve an averaged score of 0.967 on the challenge dataset comprising of different buildings and climates. This highlights the potential application of deep reinforcement learning as a plug-and-play style controller, that is capable of handling different climates and a heterogenous building stock, for district demand side management of buildings.
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms on a Building Energy Demand Coordination Task G. Dhamankar, J. Vazquez-Canteli, Z. Nagy (The University of Texas at Austin) Periods of high demand for electricity can raise electricity prices for building users. Flattening the electricity demand curve reduces can reduce costs and increase resiliency. We formulate this task as a multi-agent reinforcement learning (MA-RL) problem, to be achieved through demand response and coordination of electricity consuming agents, i.e., buildings. Bechmarks for such MA-RL problems do not exist. Here, we contribute an empirical comparison of three classes of MA-RL algorithms: independent learners, centralized critics with decentralized execution, and value factorization learners. We evaluate these algorithms on an energy coordination task in CityLearn, an Open AI Gym environment. We found independent learners with shaped rewards to be competitive with more complex algorithms. Agents with centralized critics aim to learn a rich joint critic, which may complicate the training process and cause scalability issues. Our findings indicate value factorization learners possess the coordination benefits of centralized critics and match independent learners without individualized reward shaping.
13:50-14:00 Break
14:00-14:50 Session 2: Clash of Algorithms Less is More: Simplified State-Action Space for Deep Reinforcement Learning based HVAC Control S. Murugensan, Z. Jiang, M. Risbeck, J. Amores, C. Zhang, V. Ramamurti, K. Drees, Y. Lee (Johnson Controls Inc) How do we optimize heating, ventilation and air-conditioning (HVAC) systems for energy cost and occupant comfort? How do we accomplish this in an automated fashion that adapts with time with minimal human intervention? Answers to these questions have tremendous impact on building occupant comfort, building operating costs and, importantly, environmental footprint. Understandably, this topic has received considerable attention from experts both in the industry and the academia. Among these works, deep learning, specifically deep reinforcement learning (DRL) is emerging as a data-driven control strategy without requiring an explicit dynamic model of the system. Another advantage of DRL, when successfully developed, is that it can continue to learn and adapt as the building/HVAC characteristics change with time. DRL agents, however, are challenging to train. On one hand, they may need months or years of training data (sample inefficiency), potentially inconveniencing building occupants and incurring high energy costs for a long time. On the other hand, they may converge to local optima or simply do not converge. This paper highlights one strategy to mitigate some of these challenges. We show that the choice of state and action space is as important as the choice of DRL architectures and neural network training techniques. Specifically, we formulate the HVAC control problem as a Partially Observable Markov Decision Process (POMDP), build a DRL agent using Deep Q Networks (DQN) on a building simulator, and quantify gains over a widely adopted baseline heuristic method. Subsequently, we reformulate the original problem as a restricted POMDP by severely restricting the observation (state space) and action space, and build a DRL agent for the restricted POMDP. The performance gains from this DRL agent is double that of the original agent, implying that complex state and action spaces, while information rich, can lead to complex loss functions that could not be maneuvered well by a DRL agent. Our larger message is this: 'less' (state-action space) can be 'more', in the context of DRL training.
Continual adaptation in deep reinforcement learning-based control applied to non-stationary building environments A. Naug, M. Quinones, G. Biswas (Vanderbilt University) This paper develops a continual deep reinforcement relearning (RL) controller for large buildings that exhibit non-stationary behaviors. The non-stationarity in building operations caused by unexpected changes in weather patterns, occupancy, and faults, makes it imperative to develop control algorithms that adapt to the changing conditions. Given the slow time constants in building operations, we assume that the non-stationarity can be modeled as discrete transitions between stationary models of system behavior. We address the challenge of detecting transitions between stationary processes using trend analysis, and relearn control policies that accommodate the tradeoff between energy savings and comfort after such transitions occur. We demonstrate our approach on a "smart building test-bed" for developing data-driven HVAC controllers that are deployed in large buildings on our university campus.
A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters D. Biagioni, X. Zhang, P. Graf, D. Sigler, W. Jones (National Renewable Energy Laboratory) We present a careful comparison of two model-free control algorithms, Evolution Strategies (ES) and Proximal Policy Optimization (PPO), with receding horizon model predictive control (MPC) for operating simulated, price responsive water heaters. Four MPC variants are considered: a one-shot controller with perfect forecasting yielding optimal control; a limited-horizon controller with perfect forecasting; a mean forecasting-based controller; and a two-stage stochastic programming controller using historical scenarios. In all cases, the MPC model for water temperature and electricity price are exact; only water demand is uncertain. For comparison, both ES and PPO learn neural network-based policies by directly interacting with the simulated environment under the same scenarios used by MPC. All methods are then evaluated on a separate one-week continuation of the demand time series. We demonstrate that optimal control for this problem is challenging, requiring more than 8-hour lookahead for MPC with perfect forecasting to attain the minimum cost. Despite this challenge, both ES and PPO learn good general purpose policies that outperform mean forecast and two-stage stochastic MPC controllers in terms of average cost and are more than two orders of magnitude faster at computing actions. We show that ES in particular can leverage parallelism to learn a policy in under 90 seconds using 1150 CPU cores.
Flexible Reinforcement Learning Framework for Building Control using EnergyPlus-Modelica Energy Models J. Lee, S. Huang, A. Rahman, A. Smith, S. Katipamula (Pacific Northwest National Laboratory) In recent years, reinforcement learning (RL) methods have been greatly enhanced by leveraging deep learning approaches. RL methods applied to building control have shown potential in many applications because of their ability to complement or replace conventional methods such as model-based or rule-based controls. However, RL-based building control software is likely tailored either to one target building system or to a specific RL method so that significant additional effort would be required to customize the RL-based controller for use in other building systems or with other RL approaches. Also, RL-based building controls usually depend on building energy simulations to train controllers, so emulating building dynamics (i.e., thermal dynamics and control dynamics) and capturing sub-hourly dynamic profiles are crucial to further the development of effective RL-based building control methods. To address these challenges, we present an RL-based control software employing a high-fidelity hybrid EnergyPlus-Modelica building energy model that emulates building dynamics at 1 minute resolution. This software consists of decoupled components (environment, building emulator, control agent, and RL algorithm), which allows for quick prototyping and benchmarking of standard RL algorithms in different systems; for example, a single component can be replaced without revising the software. To demonstrate this software framework, we conducted a benchmark study using an EnergyPlus-Modelica building energy model for a Chicago office building with an RL-based controller to dynamically control the chilled water temperature setpoint and the air handling unit supply air temperature setpoints.
14:50-15:00 Break
19:00-20:20 Session 3: Keeping it ReaL Augmenting Reinforcement Learning with a Planning Model for Optimizing Energy Demand Response in a Prospective Experiment L. Spangher, A. Gokul, M. Khattar, J. Palakapilly, A. Tawade, U. Agwan, C. Spanos (University of California, Berkeley) While reinforcement learning (RL) on humans has shown incredible promise, it often suffers from a scarcity of data and few steps. In instances like these, a planning model of human behavior may greatly help. We present an experimental setup for the development and testing of an Soft Actor Critic (SAC) V2 RL architecture for several different neural architectures for planning models: an autoML optimized LSTM, an OLS, and a baseline model. We present the effects of including a planning model in agent learning within a simulation of the office, currently reporting a limited success with the LSTM.
Transferable Reinforcement Learning for Smart Homes X. Zhang, X. Jin, C. Tripp, D. Biagioni, P. Graf, H. Jiang (National Renewable Energy Laboratory) To harness the great amount of untapped resources on the demand side, smart home technology plays a vital role in solving the "last mile" problem in smart grid. Reinforcement learning (RL), which has demonstrated an outstanding performance in solving many sequential decision-making problems, can be a great candidate to be used in smart home control. For instance, many studies have started investigating the appliance scheduling problem under dynamic pricing scheme. Based on those, this study aims at providing an affordable solution to encourage a higher smart home adoption rate. Specifically, we investigate combining transfer learning (TL) with RL to reduce the training cost of an optimal RL control policy. Given an optimal policy for a benchmark home, TL can jump-start the RL training of a policy for a new home, which has different appliances and user preferences. Simulation results show that by leveraging TL, RL training converges faster and requires much less computing time for new homes that are similar to the benchmark home. In all, this study proposes a cost-effective approach for training RL control policies for homes at scale, which ultimately reduces the controller's implementation costs, increases the adoption rate of RL controllers, and makes more homes grid-interactive.
Deep Reinforcement Learning in Buildings: Implicit Assumptions and their Impact A. Prakash, S. Touzani, M. Kiran, S. Agarwal, M. Pritoni, J. Granderson (Lawrence Berkeley National Laboratory) As deep reinforcement learning (DRL) continues to gain interest in the smart building research community, there is a transition from simulation-based evaluations to deploying DRL control strategies in actual buildings. While the efficacy of a solution could depend on a particular implementation, there are common obstacles that developers have to overcome to deliver an effective controller. Additionally, a deployment in a physical building can invalidate some of the assumptions made during the controller development. Assumptions on the sensor placement or on the equipment behavior can quickly come undone. This paper presents some of the significant assumptions made during the development of DRL based controllers that could affect their operations in a physical building. Furthermore, a preliminary evaluation revealed that controllers developed with some of these assumptions can incur twice the expected costs when they are deployed in a building.
Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control B. Chen (Carnegie Mellon University), M. Jin (Virginia Tech), Z. Wang, T. Hong (Lawrence Berkeley National Laboratory), M. Berges (Carnegie Mellon University) We present an initial study of off-policy evaluation (OPE), a problem prerequisite to real-world reinforcement learning (RL), in the context of building control. OPE is the problem of estimating a policy's performance without running it on the actual system, using historical data from the existing controller. It enables the control engineers to ensure a new, pretrained policy satisfies the performance requirements and safety constraints of a real-world system, prior to interacting with it. While many methods have been developed for OPE, no study has evaluated which ones are suitable for building operational data, which are generated by deterministic policies and have limited coverage of the state-action space. After reviewing existing works and their assumptions, we adopted the approximate model (AM) method. Furthermore, we used bootstrapping to quantify uncertainty and correct for bias. In a simulation study, we evaluated the proposed approach on 10 policies pretrained with imitation learning. On average, the AM method estimated the energy and comfort costs with 1.84% and 14.1% error, respectively.
15:50-16:00 Closing remarks General Chair and TPC Chairs -
Happy Hour


General Chairs
  1. Zoltan Nagy (University of Texas at Austin)
Technical Program Committee Co-Chairs
  1. Mario Berges (Carnegie Mellon University)
  2. Bingqing Chen (Carnegie Mellon University)
  3. June Young Park (University of Texas at Arlington)
Technical Program Committee
  1. Henning Lange (University of Washington)
  2. Helia Zandi (Oak Ridge National Laboratory)
  3. Jose Vazquez-Canteli (University of Texas at Austin)
  4. Zhe Wang (Lawrence Berkeley National Lab)
  5. Duc Van Le (NTU Singapore)
  6. Wan Du (University of California, Merced)
  7. Xin Jin (National Renewable Energy Laboratory)
  8. Ming Jin (University of California, Berkeley)
  9. Alex Vlachokostas (Pacific Northwest National Laboratory)
  10. Hari Prasanna Das (UC Berkeley)
  11. Lucas Spangheer (University of California, Berkeley)
  12. Kuldeep Kurte (Oak Ridge National Laboratory)
  13. Ross May (Dalarna University)


RLEM Workshop 2020 will be held virtually while ACM BuildSys’20 is located at Yokohama, Japan.