Web-based Visualization and Analytics of Petascale data: Equity as a Tide that Lifts All Boats
Aashish Panta - University of Utah, Salt Lake City, United States
Xuan Huang - Scientific Computing and Imaging Institute, Salt Lake City, United States
Nina McCurdy - NASA Ames Research Center, Mountain View, United States
David Ellsworth - NASA, mountain View, United States
Amy Gooch - university of Utah, Salt lake city, United States
Giorgio Scorzelli - university of Utah, Salt lake city, United States
Hector Torres - NASA, Pasadena, United States
Patrice Klein - caltech, Pasadena, United States
Gustavo Ovando-Montejo - Utah State University Blanding, Blanding, United States
Valerio Pascucci - University of Utah, Salt Lake City, United States
Screen-reader Accessible PDF
Download preprint PDF
Room: Bayshore II
2024-10-13T16:00:00ZGMT-0600Change your timezone on the schedule page
2024-10-13T16:00:00Z
Abstract
Scientists generate petabytes of data daily to help uncover environmental trends or behaviors that are hard to predict. For example, understanding climate simulations based on the long-term average of temperature, precipitation, and other environmental variables is essential to predicting and establishing root causes of future undesirable scenarios and assessing possible mitigation strategies. Unfortunately, bottlenecks in petascale workflows restrict scientists' ability to analyze and visualize the necessary information due to requirements for extensive computational resources, obstacles in data accessibility, and inefficient analysis algorithms. This paper presents an approach to managing, visualizing, and analyzing petabytes of data within a browser on equipment ranging from the top NASA supercomputer to commodity hardware like a laptop. Our approach is based on a novel data fabric abstraction layer that allows querying scientific information in a form that is user-friendly while hiding the complexities of dealing with file systems or cloud services. We also optimize network utilization while streaming from petascale repositories through state-of-the-art progressive compression algorithms. Based on this abstraction, we provide customizable dashboards that can be accessed from any device with an internet connection, offering straightforward access to vast amounts of data typically not available to those without access to uniquely expensive hardware resources. Our dashboards provide and improve the ability to access and, more importantly, use massive data for a wide range of users, from top scientists with access to leadership-class computing environments to undergraduate students of disadvantaged backgrounds from minority-serving institutions. We focus on NASA's use of petascale climate datasets as an example of particular societal impact and, therefore, a case where achieving equity in science participation is critical. In particular, we validate our approach by improving the ability of climate scientist to explore their data even on the top NASA supercomputer, introducing the ability to study their data in a fully interactive environment instead of being limited to using pre-choreographed videos that can take days to generate each. We also successfully introduced the same dashboards and simplified training material in an undergraduate class on Geospatial Analysis in a minority-serving campus (Utah State Banding) with 69% of the Native American students and 86% being low-income. The same dashboards are also released in simplified form to the general public, providing an unparalleled democratization for the access and use of climate data that can be extended to most scientific domains.