Everyone needs a computer geek connection, even scientists who have taught themselves to do amazing things on their own computers.
Tyson Swetnam made his geek connection during a conference in Washington, D.C.
He could have simply walked across Speedway to the Bio5 offices to meet Nirav Merchant, director of information technology at Arizona Research Labs and a principal investigator for CyVerse, the newly renamed National Science Foundation cyber-infrastructure project headquartered at the University of Arizona.
CyVerse began its life as the iPlant Collaborative in 2008 with a $50 million award from the National Science Foundation to create computational infrastructure for the plant sciences. It was renewed for another $50 million in 2013, with an expanded mission to support all the life sciences. It is expanding its reach once again.
It is headquartered at the UA, with partners at the Texas Advanced Computing Center at the University of Texas, Austin, Cold Spring Harbor Laboratory in New York and the University of North Carolina, Wilmington
Swetnam, a research scientist in the University of Arizona School of Natural Resources, is studying effective energy and mass transfer in Western forests with big sets of data gathered in LIDAR (laser radar) flyovers.
He layers them with other datasets that provide temperature and precipitation, working with another NSF-funded project at the UA, the Critical Zone Observatory.
He wants to predict the future of our forests and one of the trickier parts is figuring out how much sunlight hits given areas to discover, among other things, where trees might survive under predicted higher temperatures. The data sets are rich enough to hone in on individual tress and the shadows they cast.
He used to do his calculations “by hand,” (actually on his computer), and recording the sun’s daily movement took him a whole day.
Now he can recreate a full year in a single day, courtesy of a CyVerse link to more computing power, built in a class in Applied Concepts in Cyberinfrastructure taught last year by Merchant and Eric Lyons, another principal investigator for CyVerse.
CyVerse was originally proposed as a means of answering “grand questions” in the plant science field with massive amounts of genomic data.
It still answers those grand questions, but it has expanded its reach in recent years — to climate studies such as Swetnam’s and even into the cosmos, where astronomer Jared Males is looking for exoplanets in the dust rings of distant stars from a mountaintop in Chile.
Males is no slouch at computing. He wrote much of the software that makes it possible for his team of Steward Observatory astronomers to erase the blurring effects of the atmosphere, using an adaptive-optics system that allows one to view distant objects in visible light from the Magellan Telescope at Las Campanas Observatory in Chile.
Merchant said he was at a symposium hosted by the UA Office of Research and Discovery when Steward Observatory Director Buell Jannuzi introduced him to Males, who said he had trouble processing large sets of image data with the limited computing ability available to him at the site.
“The amount of data was trivial,” said Merchant. “I got him an account and moved hundreds of gigabytes while we were having a coffee break.”
Merchant deals routinely in 1,000-multiples of giga. Terabytes and petabytes don’t really phase a man with access to clouds of processors and the Texas supercomputer.
He doesn’t even like the phrase ‘big data.’ “It’s just data,” he said.
He has simple analogies for what his team does creating “middlewear,” software links between those office computers and the super-computing capability of CyVerse.
“The way I see iPlant is we are the Lego building blocks of cyberinfrastructure. You can get a (pre-assembled) spaceship or a boat or a robot and you can use that. Or you can say, ‘I need something a little different and I need it to run on 100 machines at once.’”
The image-processing challenge became the subject of the fall semester class taught by Lyons and Merchant.
At its final meeting in December, the class presented its results, with Males attending by Skype from his Chilean observatory.
Pointing to a telescope camera (VisAO) image projected on a screen at the front of the class, student Asher Baltzell said: “This little white dot there. That’s the planet. So we found it. Or, rather, Jared found it and we helped.”
Males said only that he was “close to having some publishable results.”
He wasn’t yet ready to call it a planet. “I’m pretty sure there is a planet there and I really want there to be a planet there,” Males said.
In a later email, Males said the increase in speed provided by the class’s software connections give him the ability to question his assumption.
“In my specific application, it lets me analyze data in many more ways in a reasonable amount of time,” said Males. He said the students did a month’s worth of work in a single day.
“This makes a huge difference because now I can really test the robustness of my measurements, and be sure that I’m not biasing or just making up the results.”
He’s also working with CyVerse to transmit and store his data from Chile, rather than carrying hard drives on the plane ride back to Tucson.
When Males and his colleagues publish results from their recent observations, it will be the first astronomy paper to cite Cyverse, but far from the first scientific paper. Cyverse spokeswoman Shelley Littin said Cyverse is credited in 529 peer-reviewed scientific papers.
The major part of its work is still in the plant sciences, said Merchant.
CyVerse is ideally suited to storing, comparing and analyzing the proliferating amount of genomic data being generated by plant scientists, said Eric Lyons, who developed an online platform called CoGe to do just that.
Lyons, an assistant professor of plant sciences, said he grew up with a deep love of nature and a fascination with “fossils, insects and critters.”
He was also entranced by the “über-geek culture.” He mastered the Rubik’s cube, Dungeons & Dragons and early video games. Even today, he is known as something of a pinball wizard among his colleagues.
He said his “cruel, harsh parents” refused to buy him a home video-game player, but did give him a Commodore-PET, the first home computer. He went to his friends’ homes, hacked their games and taught himself to write computer code, so he could play pirated versions on his computer.
Today, he keeps a venerable, non-working, Commodore computer on his desk at the Keating Research Building. It sits next to a fancy desk phone with hundreds of features he doesn’t know how to use. He did manage to record a greeting telling callers to send him an email.
The Commodore computer, with its 1 megaHertz chip and 8k of memory, once was sufficient for most computing needs. Researchers today, he said, have much more powerful desktop computers and access to even more computing power.
“When iPlant began, most people had no clue why science needed this. Most of the data sets they could readily process on their laptops or on that one nice machine that the lab had.”
That changed quickly, especially in the field of plant genomics.
The iPlant Collaborative was formed just five years after the human genome was sequenced — a feat that took 10 years and billions of dollars. Today, organisms are sequenced in a matter of days for a couple thousand bucks.
There are now 26,000 genomes from 17,000 to 18,000 species, he said.
“The amount of data has just exploded and the costs for doing this just keep getting cheaper and cheaper. Our ability to get highly quantitative, massively-sourced data is just rapidly changing,” said Lyons.
That can be a logjam or “an unprecedented opportunity to solve all types of new problems,” he said.
CoGe, the online comparative genomics platform Lyons created for CyVerse, lets researchers layer on each newly sequenced genome. It hosts 24,000 genomes from 16,000 organisms that can now be comparatively analyzed.
Its ability to store and share comparative data has enabled a lot of research, including the simultaneous publication of 28 scientific papers in December 2014 by the Avian Phylogenomics Consortium, led by Erich Jarvis of Howard Hughes Medical Institute at Duke University.
Researchers can choose to share data with other collaborators or with the scientific community. Those growing open data sets are beginning to be mined by researchers, said Merchant.
“The future is: We are going to enable a lot more data expeditions where people are trying to do interesting things with data they were not able to do before,” he said.
Expeditions are already underway in “virtual classrooms” where students are taught how to navigate the world of big data using tools and data from Cyverse.
Rachel Gallery, an assistant professor of microbial ecology at the UA teaches a course called Ecoinformatics in conjunction with Kathryn Docherty, an assistant professor of biological sciences at Western Michigan University.
When the course was taught last spring, the students published a paper on variation in soil microbe populations in the peer-reviewed, scientific journal PLOS ONE.
Gallery said publication was an expectation of the course.
“If we work towards a goal of a manuscript, then people become more invested and the likelihood of success goes up,” Galley said.
The class used data that was open to the public and provided by the federally funded National Ecological Observatory Network. Both Gallery and Docherty are former NEON scientists and saw their archived work as an opportunity for students to learn how to work with data outside of their comfort zones.
“It’s incredibly challenging for students to work with data sets that they themselves haven’t been involved with collecting,” Gallery said.