A CluE in the Search for Data-Intensive Computing

The Computer and Information Science and Engineering (CISE) directorate at the National Science Foundation (NSF) released a solicitation for proposals for the new Cluster Exploratory (CluE) initiative. The CluE program was announced in February as a part of a relationship between Google, IBM and NSF. NSF hopes this initiative will help lead to innovations in the field of data-intensive computing, as well as serve as an example for future collaborations between the private sector and the academic computing research community.

CluE will provide NSF-funded researchers access to software and services running on a Google-IBM cluster to explore innovative research ideas in data-intensive computing. NSF will allocate cluster computing resources for a broad range of proposals which will explore the potential of this technology to contribute to science and engineering research and produce applications which promise to benefit society as a whole.

“The software and services that run on these data clusters provide a brand new paradigm for highly parallel, highly reliable distributed computing, especially for processing massive amounts of data,” said Jeannette Wing, assistant director for CISE at NSF. Academic researchers have expressed a need for access to similar computing resources that will allow them to engage and explore this emerging and pervasive model of computing.

In the last five years, private sector companies have launched a number of highly effective Internet-scale applications powered by massively scaled, highly distributed computing resources known as data clusters. Sometimes referred to as data centers or server farms, these clusters contain as many as 90,000 servers, each co-located with hundreds of gigabytes of data. These increases in network capacity and fundamental changes in computer architecture are encouraging software developers to take new approaches to computer-science problem solving.

Until now, such resources have not been easily available or affordable for academic researchers. In October 2007, Google and IBM created a large-scale computer cluster of approximately 1600 processors to give the academic community access to otherwise prohibitively expensive resources. Earlier this year, NSF joined with two companies to assist with this effort, and the CluE initiative was born. “With the CluE initiative,” Wing said, “through the software and services provided by Google and IBM, the academic research community will now have access to such resources.”

This new relationship expands access to this research infrastructure to academic institutions across the nation. In an effort to create greater awareness of research opportunities using data-intensive computing, NSF is now soliciting proposals from academic researchers who will then be selected by NSF to have access to the cluster. NSF will also provide support to the researchers to conduct their work while Google and IBM will cover the costs associated with operating the cluster and other support to the researchers.

Wing noted that the initiative is looking for proposals that focus on data-intensive applications and “not cluster computing per se. We are not looking for scientific applications that are based primarily on solving massive numbers of partial differential equations since high-end computing resources are available for such research already.”

From this initial solicitation of the CluE initiative, NSF expects to award up to $5 million spread between 10 and 15 awards, depending on availability of funds. Selected projects will be funded up to $500,000, for durations of up to two years.

http://www.nsf.gov

Related