Abstract: Scientific cyberinfrastructure uses scalable distributed systems, cloud computing, and Web technology approaches to connect and combine heterogeneous scientific software, data, and research computing resources into federated systems. The goals of these systems are to better enable scientists to conduct large-scale research and educators to bring state-of-the-art research systems into the classroom. Cyberinfrastructure systems, particularly science gateways, which focus on mapping the scientific research process to the federated resources, are also areas of research in their own right.

In this talk, we provide a technical survey our team’s efforts in researching, developing, operating, and continuously innovating open-source science gateway cyberinfrastructure software and point toward new opportunities. Key research areas include developing pervasive middleware for seamlessly integrating scientists’ local resources for computing and data with remote cloud and high-end supercomputing systems; designing and implementing flexibly managed, high-performance data transfer; creating “data lakes” that enable diverse, distributed data sets to be integrated seamlessly into research platforms; developing user interfaces and environments that are both flexible and usable; and developing end-to-end security approaches for managing users, groups, and access permissions across federated resources.

We discuss the integration of the above research areas into Apache Airavata, a comprehensive software system for science gateways that we illustrate with examples from collaborations with partners from chemistry, biophysics, digital humanities, and other research areas.

We review the “virtuous cycle” benefits of complementing systems research with open source engineering and “developer-operator” operations practices. We conclude with a discussion of new directions for science gateway research that can more comprehensively support scientists’ entire research processes by more seamlessly integrating all resources needed for research.

Marlon Pierce is the director of the Cyberinfrastructure Integration Research Center at Indiana University (CIRC), which develops and operates the open-source Apache Airavata software system for science gateways. Pierce’s research interests are in the application of distributed systems concepts, open source engineering practices, and science user-centered design to develop cyberinfrastructure to support scientific research. Pierce received his Ph.D. in computational condensed matter physics from Florida State University in 1999.

Suresh Marru is the Principal Investigator of the NSF-funded Cybershuttle project and Deputy Director and Chief Architect of the Cyberinfrastructure Integration Research Center at Indiana University. His background is in computational science, and he has worked extensively with Atmospheric Science and Geoscience projects, helping design and develop service-oriented architectures to orchestrate data processing and large-scale simulations. He is an elected member of the Apache Software Foundation and is the foundation’s vice-president for the Apache Airavata project.

Sudhakar Pamidighantam is a senior scientist at the Cyberinfrastructure Integration Research Center at Indiana University (CIRC). He has a Ph.D. in computational physical organic chemistry and experience in protein sequence, structure, and function; protein-small molecule interaction modeling; and deployed cyberinfrastructure for the chemistry, materials, and computational biology communities. His research involves ab initio and molecular dynamics modeling, reaction mechanism, and cyberinfrastructure for complex workflows. He works collaboratively with teams to design and deploy cyberinfrastructure solutions for diverse communities. He has been an affiliate of the IEEE and a member of the American Chemical Society since 1990.