Astronomers have used machine learning to find new exoplanets and fast radio bursts. Can it also be used to help them find extraterrestrial life?
There is a strange forest in an otherwise unremarkable stretch of empty land north of San Francisco, where 42 steel trees sweep their parabolic canopies across the skies like industrial sunflowers. It’s quiet here, at the Allen Telescope Array, but the silence is deceiving. Just ask the trees, which are condemned to listen to a shrieking cosmos so that they might hear an extraterrestrial whisper—a whisper so faint that the sound of a snowflake falling to the earth is deafening by comparison. The astronomers who walk among these trees are preoccupied by the Big Question: What will the whisperer say? Will it be a greeting? A warning?
One can’t help but sympathize with the overwhelmed astronomers who must harvest the fruit from this mechanical forest in perennial bloom. The Allen Telescope Array spends 12 hours a night listening to the stars and yields an absolute torrent of ones and zeros—50 terabytes of data generated by the galaxy each and every night. Most of the data is noise pollution: artifacts produced by telescope instruments or terrestrial communication signals.
Hidden among the noise could be proof we’re not alone in the universe. But there’s a chance we might not even notice.
Arguably the biggest problem for astronomers working on the Search for Extraterrestrial Intelligence at the Allen Telescope Array is that their radio telescopes are pulling down far too much data to store. This means they need to do signal analysis in real time. Any signal that matches a set of predefined criteria for intelligence is saved for further analysis. The rest is tossed out.
But what if an intelligent signal looks different than expected? We could mistake our first extraterrestrial message as just more cosmic noise. The astronomers working at the ATA are acutely aware that they risk accidentally trashing a signal from ET. But faced with extremely limited resources, they don’t really have much of a choice.
Not all hope is lost, however. Researchers working on SETI have long been interested in wielding a narrow form of artificial intelligence known as deep learning to assist in the search for intelligent signals. Rather than relying on a set of predefined, human-selected criteria to identify signals of interest, deep learning algorithms could comb through radio data and identify signals of interest that might otherwise escape notice. Until recently, integrating deep learning methods and SETI was little more than a dream. The idea was there, but the resources and expertise were not.
Yet thanks to the work of Yunfan “Gerry” Zhang, a postdoctoral researcher at UC Berkeley’s SETI Research Center, using artificial intelligence to search for alien intelligence may soon be a reality. Zhang became involved with SETI on a whim, but his work on deep learning is already transforming the field of radio astronomy. He is at the forefront of the discovery of fast radio bursts and soon enough the same techniques Zhang uses to uncover these mysterious natural signals from space may be applied to SETI.
HOW TO FIND AN ALIEN
The problem of identifying an intelligent extraterrestrial signal among the cosmic noise has always haunted SETI. In the early days, processing radio data for signs of an intelligent signal was done on site by a human analyst. The planetary astronomer Frank Drake completed the first search for extraterrestrial intelligence in 1960 using a radio telescope at the Green Bank Observatory in West Virginia to observe two nearby sun-like stars. In order to make sure they didn’t miss a message, Drake and his colleagues kept vigil at the radio observatory for hours at a time, watching the telescope readout on a paper recorder similar to an early seismograph. As Drake recounted in Is Anyone Out There?, this was an excruciatingly dull way to pass the time.
“After about five days, we could no longer sustain our high level of eager anxiety,” Drake wrote. “We sat quietly in the control room as the loud speaker hissed randomly. The tape recorders turned. The pen on the chart recorder drifted slowly up and down. The whole thing started to become, well, boring. People actually yawned.”
But as computer processing speeds and memory increased in the subsequent decades, the search for extraterrestrial intelligence became increasingly automated. This was good news for SETI, which could now scan millions of bandwidths and observe thousands of stars. But it also created its own problem: how to search through this huge influx of data for signs of intelligent life?
Jon Richards is the senior software engineer at the SETI Institute, and one of his many jobs is developing the computer programs that sift through ATA data in real time, searching for an intelligent signal. First, this software converts the voltage fluctuations registered by the telescope into radio frequencies. Because the telescope collects across a wide range of frequencies, these are divvied up into small bundles that are sent to individual computers for analysis. These bundles come in the form of a spectrogram, which is a 2-D picture of the power of the frequency signals over time. Each computer runs an algorithm that checks the spectrogram to see if it has the characteristics the SETI Institute expects to see in an intelligent signal.
So what does the SETI Institute expect an alien signal to look like? Richards says he and his colleagues use a small set of criteria to identify "signals of interest." The first criteria is the frequency of the signal must shift over time. Since the transmitting planet and Earth are both moving in space, the wavelength of the signal will be stretched or condensed based on the relative motion of the two planets. If the planet is moving away from the Earth the wavelength will increase (shift to a lower frequency) and if it’s moving toward the Earth the wavelength will decrease (shift to a higher frequency).
“If there’s no frequency shift, we know that the signal is from something terrestrial,” Richards says. “Something like a radio station or a cell phone. The majority of the signals we get have no shift, so we shut those out and look for other things.”
Next, the software checks to see if a candidate signal is low power. Any extraterrestrial signal will have to travel across billions of miles of empty space. By the time it reaches Earth it will be very faint. If the SETI Institute receives a signal that is blasting their receivers, they can be sure it’s either terrestrial in origin or from a natural source, like the Sun.
Natural signals can be differentiated from artificial ones by looking at their bandwidth. An intelligent signal is likely to concentrate its power in a narrow range of frequencies, whereas natural radio sources in space smear their energy across a wide swathe of the electromagnetic spectrum.
Finally, the software checks to see if the signal is consistent. If it detects a blip in the data, known as a transient, Richards and his colleagues won’t bother following up. But if it sees a steady signal or a repeating on-off pattern, this makes it far more likely to be intelligent in origin. If the software doesn’t see anything in the data that matches these parameters, the data is tossed out and the algorithm starts processing the next set—a process that is repeated every 90 seconds throughout the night. At the end of the night, the software produces a summary of signals detected by the ATA, but almost all the raw data is discarded. Only signals that meet these criteria will have their raw data stored for further analysis.
The situation at Breakthrough Listen, a SETI program funded by a $100 million donation from the Russian billionaire Yuri Milner and based at the UC Berkeley SETI Research Center, is somewhat better. An upgrade to the 100-meter Green Bank telescope funded by Breakthrough Listen allows astronomers to record up to 24 gigabytes of data per second. This data is then analyzed for signals by onsite computers and compressed for long term storage. This compressed data—which still amounts to about 1 petabyte per year—is then uploaded to Breakthrough Listen’s Open Data Archive, which allows anyone to download the data for analysis.
While Breakthrough Listen has the resources to store more data, it still faces the same fundamental problem experienced by the SETI Institute. If the search for an intelligent signal is limited to a predefined set of characteristics, an extraterrestrial signal may go unnoticed. So how can SETI software be sure the extraterrestrial baby isn’t tossed out with the cosmic bathwater? Enter deep learning, a narrow form of artificial intelligence used to discover patterns in massive datasets.
AI MEET SETI
A particularly powerful type of deep learning algorithm is the artificial neural network, which is loosely modeled on the computational process used by the human brain. Generally speaking, neural nets are trained to do a task by moving up a chain of abstraction. This means if you were training a neural net to identify pictures of people, for example, it might start by identifying simple shapes like boundaries and lines in photos, then learning to identify a face in a photo, and then the features of a particular face. Training can be supervised or unsupervised. In supervised training, researchers feed the algorithm a labeled dataset, so that it can match new data against the labeled examples in order to learn how to accomplish a specified goal, like identifying a cat in a photo. In unsupervised training, the algorithm learns how to identify something in the data through millions of iterations of trial and error rather than relying on human-labeled data.
Neural nets are particularly good at making sense of large, unstructured data sets like the kind produced in radio astronomy.
This has to do with a neural net’s ability to generalize, says Zhang. Whereas traditional algorithms use human-selected parameters to find candidate signals, machine learning algorithms are more tolerant to unexpected variations. They can identify possible candidate signals that wouldn’t meet the signal criteria in a traditional algorithm.
The results of applying machine learning to astronomy speak for themselves. In 2018, Zhang and his colleagues used machine learning to identify new fast radio bursts in radio astronomy data for Breakthrough Listen. These mysterious broadband radio signals only last for a few milliseconds. Explaining their origin is one of the hottest questions in astronomy today. The first fast radio burst was discovered in 2007, and since then almost 300 others have been detected—almost a third of them by Zhang’s neural nets. But machine learning isn’t limited to fast radio bursts. Earlier this year, a team of astronomers at the University of Texas used neural nets to find two new exoplanets more than 1,200-light years from Earth.
Machine learning has ushered in an exciting new era for radio astronomy and has led some astronomers to wonder if neural networks might assist in the search for intelligent signals. The problem is that the neural nets used to discover exoplanets and non-intelligent radio signals were trained using examples of previously discovered fast radio bursts or planets. This is a luxury that is unavailable to SETI. We’ve never detected an intelligent signal, so we can’t teach a neural net what to look for by feeding it past examples.
So how do you train a neural net to look for signals from ET, if you don’t know what you’re looking for?
One approach, says Richards, is to train the algorithm to recognize all the types of signals that tend to show up in the data—TV signals, noise from the telescope instruments, and so on. If the neural net can reliably sort these known false positives from the data, any remaining signals in the data could be examined by humans to determine if they have the characteristics of an intelligent signal.
In 2017, the SETI Institute held its first hackathon, inviting citizen scientists to develop machine learning algorithms that could be used to do just that. The goal of the hackathon was to develop image-recognition machine learning algorithms that would reduce the number of incorrectly identified candidate signals in ATA spectrograms, as well as to identify signals that are of interest but that fall outside of the parameters defined in the analysis software.
Contestants trained their machine learning models on data that simulated signals typically observed at the Allen Telescope Array. The signals used to train these algorithms were not as complicated or diverse as the actual data collected by the ATA. The competition was only meant to be a pathfinder toward more robust machine learning architectures that could be used to flag candidate signals in real time.
At the time of the SETI hackathon, Zhang was just wrapping up a PhD in astronomy at UC Berkeley and entered the contest on a whim. His team ended up winning the two-day competition when their machine learning algorithm was ultimately able to identify signals of interest with an accuracy of 94.99 percent. “From then on I fully switched to doing SETI from what I was doing before,” Zhang says. This led him to a research position as part of Breakthrough Listen, where he led the effort to use machine learning to discover fast radio bursts.
Although Breakthrough Listen has toyed with the idea of using machine learning for SETI since it was founded in 2015, Zhang says that when he joined the project there were no concrete plans in place. He says Breakthrough Listen’s main goal is to collect massive amounts of observational data so the search for extraterrestrial life can be turned into an offline data mining problem. Zhang says he has developed algorithms that can be used to search through radio data collected by Breakthrough Listen, but so far these algorithms haven’t been applied to a search for candidate signals.
Earlier this year, Zhang and his colleagues published a paper detailing a self-supervised deep learning algorithm that was able to search through actual data collected by Breakthrough Listen at the Green Bank Telescope for anomalous signals. At Green Bank, the telescope typically observes a star for five minutes and then moves the telescope off that star for five minutes and repeats this process three times per target. The idea is that if a signal is present when the telescope is both looking at the star and pointed away from it, then it must be terrestrial radio interference.
The algorithm created by Zhang is trained to predict what a signal should look like when the telescope is pointed away from a target based on what the signal looks like when the telescope is pointed at the target. If the signal seen when the telescope is pointed away from the star does not match the algorithm’s predictions, it marks the event as an anomaly and flags the data for further analysis. Although this method was only searching for a specific kind of signal in the Green Bank data, the success of Zhang and his colleagues’ initial tests suggests this approach could be generalized to other types of signals.
The SETI Institute, which operates independently of UC Berkeley and Breakthrough Listen, is also exploring ways to incorporate machine learning into the search. Earlier this year, Richards and other researchers from the institute wrote a paper in which they described a new framework for using an image-recognition neural network to identify candidate intelligent signals in data from the Allen Telescope Array. The machine learning architecture described in the paper was the result of a month-long challenge that followed on the heels of the hackathon. The challenge required contestants to build machine learning algorithms that could identify candidate signals among a far noisier data set—similar to what would be encountered in real life.
But developing neural networks that can detect fake signals in fake data is a far cry from using them in a real search.
There is a much larger variety of signals encountered in real life and the data is often messier. Although Hollywood would have us believe the first extraterrestrial message could be detected by Jodie Foster using a laptop plugged into a telescope, Richards says the future of SETI looks more like a data center filled with hundreds of specialized computer chips. These chips, known as graphics processing units or GPUs, are tailored to run the same code over and over again at high speeds, which is necessary to handle the rapid cycles that define machine learning algorithms.
Breakthrough Listen already employs a computing cluster of 64 high-end GPUs for signal analysis at the Green Bank telescope. Richards says the SETI Institute hasn’t made the jump to GPUs yet, although he expects that they’ll transition to these specialized chips within the next five years or so. Initially, he says, they will be used to hunt for fast radio bursts. But one of the biggest hurdles with implementing machine learning at the ATA is getting the expertise and material resources needed to make it happen. Unlike UC Berkeley’s SETI Research Center, which now has tens of millions of dollars in funding through Breakthrough Listen, the SETI Institute has a much smaller operating budget. Richards says that the Institute is looking to raise more funding to hire more people to help get its machine learning program off the ground.
Using AI to help us search for extraterrestrial intelligence has only just begun. But thanks to the pioneering work of Zhang and his colleagues, as well as a hefty dose of funding from Breakthrough Listen, when an ET finally calls it may very well be artificial intelligence that picks up the phone.
Daniel Oberhaus is the author of Extraterrestrial Languages, a forthcoming book from MIT Press.