[Alt: A graphic of a nuclear microreactor titled: standard ISO container dimensions. A rectangular metal apparatus contains four long silver cylinders positioned parallel to ground. Labels point to a motor below the cylinders, the coupled core at the center of the cylinders, and a generator to the right of the cylinders.]

Reinforcement learning for nuclear microreactor control

A machine learning approach outcompetes the industry standard for adjusting power generation to meet demand, especially in imperfect conditions.

Written by Patricia DeLacey

A machine learning approach leverages nuclear microreactor symmetry to reduce training time when modeling power output adjustments, according to a recent study led by University of Michigan researchers.

Improved training efficiency will help researchers model reactors faster, taking a step towards real-time automated nuclear microreactor control for operation in remote locations or eventually in space.

These compact reactors—able to generate up to 20 megawatts of thermal energy that can be used directly as heat or converted to electricity—could be easily transported or potentially used in cargo ships that wish to take very long trips without refueling. If incorporated into an electrical grid, nuclear microreactors could provide stable, carbon-free energy when renewables like solar or wind are not abundantly available.

Small reactors sidestep the huge capital costs that come with large reactors, and partial automation of microreactor power output control would help keep costs low. In potential space applications—such as directly propelling a spacecraft or providing electrical power to the spacecraft’s systems—nuclear microreactors would need to operate completely autonomously. 

As a first step towards automation, researchers are simulating load-following—when power plants increase or decrease output to match the electricity demand of the grid. This process is relatively simple to model compared to reactor start-up, which includes rapidly changing conditions that are harder to predict.

The Holos-Quad microreactor design modeled in this study adjusts power through the position of eight control drums that center around the reactor’s central core where neutrons split uranium atoms to produce energy. One side of the control drum’s circumference is lined with a neutron-absorbing material, boron carbide. When rotated inwards, the drums absorb neutrons from the core, causing the neutron population and the power to decrease. Rotating the cores outwards keeps more neutrons in the core, increasing power output. 

“Deep reinforcement learning builds a model of system dynamics, enabling real-time control—something traditional methods like model predictive control often struggle to achieve due to the repetitive optimization needs,” said Majdi Radaideh, an assistant professor of nuclear engineering and radiological sciences at U-M and senior author of the study. 

The research team simulated load-following by control drum rotation based on reactor feedback with reinforcement learning—a machine learning paradigm that enables agents to make decisions through repeated interactions with their environment through trial and error. While deep reinforcement learning is highly effective, it requires extensive training which drives up computational time and cost.

For the first time, the researchers tested a multi-agent reinforcement learning approach that trains eight independent agents to control a specific drum while sharing information about the core as a whole. This framework exploits the microreactor’s symmetry to help reduce training time by multiplying the learning experience. 

The study evaluated the multi-agent reinforcement learning against two other models: a single-agent approach, where a single agent observes core status and controls all eight drums, and the industry standard proportional-integral-derivative (PID) control, that uses a feedback-based control loop.

Reinforcement learning approaches achieved similar or superior load following compared to PID. In imperfect scenarios where sensors provided imperfect readings or when reactor conditions fluctuated, reinforcement learning maintained lower error rates than PID at up to 150% lower control costs—meaning it reached the solution with less effort. 

The multi-agent approach trained at least twice as fast as the single-agent approach with only a slightly higher error rate.

The technique needs extensive validation in more complex, realistic conditions before real-world application, but the findings establish a more efficient path forward for reinforcement learning in autonomous nuclear microreactors. 

“This study is a step toward a forward digital twin where reinforcement learning drives system actions. Next, we aim to close the loop with inverse calibration and high-fidelity simulations to enhance control accuracy,” Radaideh said.

This research was funded by the Idaho National Laboratory (INL) Laboratory Directed Research & Development (LDRD) Program (award number 24A1081-116FP) and the Department of Energy (DOE) Office of Nuclear Energy’s Distinguished Early Career Program (award number DE-NE0009424). 

Header image: A new machine learning approach models adjusting power output of the Holos-Quad microreactor design by HolosGen LLC. The multi-agent reinforcement learning approach trains more efficiently than previous approaches, taking a step forward towards more autonomous nuclear microreactors for operation in remote areas. Credit: HolosGen LLC. 

Full citation: “Nuclear microreactor transient and load-following control with deep reinforcement learning,” Leo Tunkle, Kamal Abdulraheem, Linyu Lin, Majdi I. Radaideh, Energy Conversion and Management: X (2025). DOI: 10.1016/j.ecmx.2025.101090