Ernest Bowman-Cisneros, LROC Science Operations Center manager, said NASA expects about 30 GB of data per day to be sent through a satellite link to Arizona State University's (ASU) Fulton School of High Performance Computing. The lunar mission is scheduled to launch in May. Other spacecraft data will be sent to ASU from NASA's Goddard Space Flight Center, and NASA expects approximately 330 TB of data to accumulate on ASU's NetApp Inc. disk arrays during the processing phase.
After processing the data, NASA and ASU plan to distribute approximately 130 TB of images to the public. This critical data set will be replicated a second time to a third storage -- this one in the cloud.
"Data from the spacecraft is precious," said Dan Stanzione, director of the Fulton School. "It's nearly impossible to re-create if it's lost."
That's why the university decided it needs at least three viable copies of the LROC data, including one off-campus in case of a regional disaster.
Stanzione said he and his staff considered setting up and managing a third replication site using the same NetApp/GFS system it already uses, but "leasing space and provisioning a storage system somewhere else would've involved a lot of administrative overhead and cost."
Because it considers tape "unwieldy," Stanzione's team wants the data accessible from disk.
"The beauty of disk is that you can see all the data online in the backup repository and retrieve it to make sure it's still valid," he said. "It's tough to find one file among hundreds of tapes."
For a hosted service, ASU considered Amazon's S3 but didn't like Amazon's monthly billing model.
"We didn't want recurring costs," Stanzione said. Instead ASU signed a year-long deal with Nirvanix about a month ago after testing the service for about three months. Nirvanix's pricing is normally subscription-based as well, but ASU worked out a deal to pay Nirvanix a fee for a fixed maximum capacity throughout the year.
"Our capital outlay to host our own third data center would've been more expensive than Nirvanix by a ratio of about 3-to-1," Stanzione said.
So far the university has sent only about 100 GB of test data to Nirvanix, though it will be sending hundreds of terabytes when all is said and done. Conventional wisdom says users with large volumes of data generally will steer clear of the cloud because of wide-area and Internet network bandwidth restrictions.
Stanzione doesn't disagree with that, but said the tertiary storage requirements for the LROC data won't demand high performance.
"It's a large volume of data, but it will build up slowly over time," he said. "With 30 gigabytes per day over 24 hours, you don't need that high a data rate. It's write once and read only if there's a failure -- perfect for the cloud. If we had to move it back and forth every day, there wouldn't be enough bandwidth."
Because Nirvanix is a startup in a relatively new field of cloud storage, Stanzione said he also carefully checked the vendor's financial records and backing to be sure it would remain a viable company for the length of the mission.
"In the worst case, we could switch vendors with relative ease, because Nirvanix provides data to us in a standard way" over NFS or CIFS, he said.