Troy, N.Y. — At this very moment, tens of thousands of home computers around the world are quietly working together to solve the largest and most basic mysteries of our galaxy.
Enthusiastic and inquisitive volunteers from Africa to Australia are donating the computing power of everything from decade-old desktops to sleek new netbooks to help computer scientists and astronomers at Rensselaer Polytechnic Institute map the shape of our Milky Way galaxy. Now, just this month, the collected computing power of these humble home computers has surpassed one petaflop, a computing speed that surpasses the world’s second fastest supercomputer.
The project, [email protected], uses the Berkeley Open Infrastructure for Network Computing (BOINC) platform, which is widely known for the [email protected] project used to search for signs of extraterrestrial life. Today, [email protected] has outgrown even this famous project, in terms of speed, making it the fastest computing project on the BOINC platform and perhaps the second fastest public distributed computing program ever in operation (just behind [email protected]).
The interdisciplinary team behind [email protected], which ranges from professors to undergraduates, began the formal development under the BOINC platform in July 2006 and worked tirelessly to build a volunteer base from the ground up to build its computational power.
Each user participating in the project signs up their computer and offers up a percentage of the machine’s operating power that will be dedicated to calculations related to the project. For the [email protected] project, this means that each personal computer is using data gathered about a very small section of the galaxy to map its shape, density, and movement.
In particular, computers donating processing power to [email protected] are looking at how the different dwarf galaxies that make up the larger Milky Way galaxy have been moved and stretched following their merger with the larger galaxy millions of years ago. This is done by studying each dwarf’s stellar stream. Their calculations are providing new details on the overall shape and density of dark matter in the Milky Way galaxy, which is widely unknown.
The galactic computing project had very humble beginnings, according to Heidi Newberg, associate professor of physics, applied physics, and astronomy at Rensselaer. Her personal research to map the three-dimensional distribution of stars and matter in the Milky Way using data from the extensive Sloan Digital Sky Survey could not find the best model to map even a small section of a single galactic star stream in any reasonable amount of time.
“I was a researcher sitting in my office with a very big computational problem to solve and very little personal computational power or time at my fingertips,” Newberg said. “Working with the [email protected] platform, I now have the opportunity to use a massive computational resource that I simply could not have as a single faculty researcher, working on a single research problem.”
Before taking the research to BOINC, Newberg worked with Malik Magdon-Ismail, associate professor of computer science, to create a stronger and faster algorithm for her project. Together they greatly increased the computational efficiency and set the groundwork for what would become the much larger [email protected] project.
“Scientists always need additional computing power,” Newberg said. “The massive amounts of data out there make it so that no amount of computational power is ever enough.” Thus, her work quickly exceeded the limits of laboratory computers and the collaboration to create [email protected] formally began in 2006 with the assistance of the Claire and Roland Schmitt Distinguished Professor of Computer Science Boleslaw Szymanski; Associate Professor of Computer Science Carlos Varela; postdoctoral research assistant Travis Desell; as well as other graduate and undergraduate students at Rensselaer.
With this extensive collaboration, leaps and bounds have been made to further the astrophysical goals of the project, but important discoveries have also been made along the way in computational science to create algorithms that make the extremely distributed and diverse [email protected] system work so well, even with volunteered computers that can be highly unreliable.
“When you use a supercomputer, all the processors are the same and in the same location, so they are producing the same results at the same time,” Varela said. “With an extremely distributed system, like we have with [email protected], we are working with many different operating systems that are located all over the globe. To work with such asynchronous results we developed entirely new algorithms to process work as it arrives in the system.” This makes data from even the slowest of computers still useful to the project, according to Varela. “Even the slowest computer can help if it is working on the correct problem in the search.”
In total, nine articles have been published and multiple public talks have been given regarding the computer science discoveries made during the creation of the project, and many more are expected as the refined algorithms are utilized for other scientific problems. Collaboration has already begun to develop a [email protected] platform to find gene regulations sites on human DNA. Collaborations have also started with biophysicists and chemists on two other BOINC projects at Rensselaer to understand protein folding and to design new drugs and materials.
In addition to important discoveries in computer science and astronomy, the researchers said the project is also making important strides in efforts to include the public in scientific discovery. Since the project began, more than 45,000 individual users from 169 countries have donated computational power to the effort. Currently, approximately 17,000 users are active in the system.
“This is truly public science,” said Desell, who began working on the project as a graduate student and has seen the project through its entire evolution. “This is a really unique opportunity to get people interested in science while also allowing us to create a strong computing resource for Rensselaer research.” All of the research, results, data, and even source code are made public and regularly updated for volunteers on the main [email protected] Web site found at: http://MilkyWay.cs.rpi.edu/.
Desell cites the public nature and regular communication as important components of the project’s success. “They are not just sitting back and allowing the computer to do the work,” he says, referencing that volunteers have made donations for equipment as well as made their own improvements to the underlying algorithms that greatly increased computational speed. Varela jokes, “We may end up with a paper with 17,000 authors.”
In addition to the volunteers, others within Rensselaer and outside of the Institute have been involved in the project. Some of these collaborators include Rensselaer graduate students Matthew Newby, Anthony Waters, and Nathan Cole; and [email protected] creator David Anderson at Berkeley. The research was funded primarily by the National Science Foundation (NSF) with donations of equipment by IBM, ATI, and NVIDIA.