Astrophysicists Show How to “Weigh” Galaxy Clusters with Artificial Intelligence

Press Contact

Lee Sandberg
NASA, ESA, and M. Montes (University of New South Wales)
Massive galaxy cluster Abell S1063 as captured by NASA's Hubble Space Telescope

Scholars from the Institute for Advanced Study have used a machine learning algorithm known as “symbolic regression” to generate new equations that help solve a fundamental problem in astrophysics: inferring the mass of galaxy clusters.

Galaxy clusters are the most massive objects in the Universe: a single cluster contains anything from a hundred to many thousands of galaxies, alongside collections of plasma, hot X-ray emitting gas, and dark matter. These components are held together by the cluster’s own gravity. Understanding such galaxy clusters is crucial to pinning down the origin and continuing evolution of our universe.

Perhaps the most crucial quantity determining the properties of a galaxy cluster is its total mass. But measuring this quantity is difficult—galaxies cannot be “weighed” by placing them on a scale. The problem is further complicated by the fact that the dark matter that makes up much of a cluster’s mass is invisible. Instead, scientists infer the mass of a cluster from other observable quantities.

Previously, scholars considered a cluster’s mass to be roughly proportional to another, more easily measurable quantity called the “integrated electron pressure” (or the Sunyaev-Zel’dovich flux, often abbreviated to YSZ). The theoretical foundations of the Sunyaev-Zel’dovich flux were laid in the early 1970s by Rashid Sunyaev, current Distinguished Visiting Professor in the Institute’s School of Natural Sciences, and his collaborator Yakov B. Zel’dovich.

However, the integrated electron pressure is not a reliable proxy for mass because it can behave inconsistently across different galaxy clusters. The outskirts of clusters tend to exhibit very similar YSZ, but their cores are much more variable. The YSZ/mass equivalence was problematic in that it gave equal weight to all parts of the cluster. As a result, a lot of “scatter” was observed, meaning that the error bars on the mass inferences were large.

Digvijay Wadekar, current Member in the Institute’s School of Natural Sciences, has worked with collaborators across ten different institutions to develop an AI program to improve the understanding of the relationship between the mass and the YSZ. Their work was recently published in Proceedings of the National Academy of Sciences.

Wadekar and his collaborators “fed” their AI program with state-of-the-art cosmological simulations that have been developed by groups at the Harvard & Smithsonian Center for Astrophysics, and at the Flatiron Institute's Center for Computational Astrophysics (CCA) in New York. Their program searched for and identified additional variables that might make inferring the mass from the YSZ more accurate.

Symbolic regression advantages
Digvijay Wadekar
The trade-offs between different machine learning techniques. Symbolic regression is much less powerful than deep neural networks on high-dimensional datasets, but it is much more interpretable as it provides mathematical equations as output.

AI is useful for identifying new parameter combinations that could be overlooked by human analysts. While it is easy for human analysts to identify two significant parameters in a data set, AI is better able to parse through high volumes often revealing unexpected influencing factors.

More specifically, the AI method that Wadekar and his collaborators employed is known as symbolic regression. “Right now, a lot of the machine learning community focuses on deep neural networks,” Wadekar explained. “These are very powerful but the drawback is that they are almost like a black box. We cannot understand what goes on in them. In physics, if something is giving good results, we want to know why it is doing so. Symbolic regression is beneficial because it searches a given dataset and generates simple, mathematical expressions in the form of simple equations that you can understand. It provides an easily interpretable model.”

Their symbolic regression program (called PySR) handed them a new equation, which was able to better predict the mass of the galaxy cluster by augmenting YSZ with information about the cluster’s gas concentration. Wadekar and his collaborators then worked backward from this AI-generated equation and tried to find a physical explanation for it. They realized that gas concentration is in fact correlated with the noisy areas of clusters where mass inferences are less reliable. Their new equation therefore improved mass inferences by providing a way for these noisy areas of the cluster to be “down-weighted”. In a sense, the galaxy cluster can be compared to a spherical doughnut. The new equation extracts the jelly at the center of the doughnut (that introduces larger errors), and concentrates on the doughy outskirts for more reliable mass inferences.

Digvijay Wadekar
The performance of the new equation from symbolic regression is shown in the middle panel, whereas that of the traditional method is shown in the top. The lower panel explicitly quantifies the reduction in the scatter.

The new equations can provide observational astronomers engaged in upcoming galaxy cluster surveys with better insights into the mass of the objects that they observe. “There are quite a few surveys targeting galaxy clusters which are planned in the near future,” Wadekar stated. “Examples include the Simons Observatory (SO), the Stage 4 CMB experiment (CMB-S4), and an X-ray survey called eROSITA. The new equations can help us in maximizing the scientific return from these surveys.”

He also hopes that this publication will be just the tip of the iceberg when it comes to using symbolic regression in astrophysics. “We think that symbolic regression is highly applicable to answering many astrophysical questions,” Wadekar added. “In a lot of cases in astronomy, people make a linear fit between two parameters and ignore everything else. But nowadays, with these tools, you can go further. Symbolic regression and other artificial intelligence tools can help us go beyond existing two parameter power laws in a variety of different ways, ranging from investigating small astrophysical systems like exoplanets to galaxy clusters, the biggest things in the universe.”

About the Institute

The Institute for Advanced Study has served the world as one of the leading independent centers for theoretical research and intellectual inquiry since its establishment in 1930, advancing the frontiers of knowledge across the sciences and humanities. From the work of founding IAS faculty such as Albert Einstein and John von Neumann to that of the foremost thinkers of the present, the IAS is dedicated to enabling curiosity-driven exploration and fundamental discovery.

Each year, the Institute welcomes more than 200 of the world’s most promising post-doctoral researchers and scholars who are selected and mentored by a permanent Faculty, each of whom are preeminent leaders in their fields. Among present and past Faculty and Members there have been 35 Nobel Laureates, 44 of the 62 Fields Medalists, and 23 of the 26 Abel Prize Laureates, as well as many MacArthur Fellows and Wolf Prize winners.