IEEE VIS 2024 Content: Web-based Visualization and Analytics of Petascale data: Equity as a Tide that Lifts All Boats

Web-based Visualization and Analytics of Petascale data: Equity as a Tide that Lifts All Boats

Aashish Panta - University of Utah, Salt Lake City, United States

Xuan Huang - Scientific Computing and Imaging Institute, Salt Lake City, United States

Nina McCurdy - NASA Ames Research Center, Mountain View, United States

David Ellsworth - NASA, mountain View, United States

Amy Gooch - university of Utah, Salt lake city, United States

Giorgio Scorzelli - university of Utah, Salt lake city, United States

Hector Torres - NASA, Pasadena, United States

Patrice Klein - caltech, Pasadena, United States

Gustavo Ovando-Montejo - Utah State University Blanding, Blanding, United States

Valerio Pascucci - University of Utah, Salt Lake City, United States

Screen-reader Accessible PDF

Room: Bayshore II

2024-10-13T16:00:00ZGMT-0600Change your timezone on the schedule page
2024-10-13T16:00:00Z
Exemplar figure, described by caption below
We provide unprecedented equitable access to massive data via our novel data fabric abstraction enabled by dashboards on commodity desktop computers with a simple weblink for everyone from top NASA scientists to students in disadvantaged communities to the general public. This image shows a field called Eastward Wind Velocity (U), combined together from a cubed-sphere grid.
Abstract

Scientists generate petabytes of data daily to help uncover environmental trends or behaviors that are hard to predict. For example, understanding climate simulations based on the long-term average of temperature, precipitation, and other environmental variables is essential to predicting and establishing root causes of future undesirable scenarios and assessing possible mitigation strategies. Unfortunately, bottlenecks in petascale workflows restrict scientists' ability to analyze and visualize the necessary information due to requirements for extensive computational resources, obstacles in data accessibility, and inefficient analysis algorithms. This paper presents an approach to managing, visualizing, and analyzing petabytes of data within a browser on equipment ranging from the top NASA supercomputer to commodity hardware like a laptop. Our approach is based on a novel data fabric abstraction layer that allows querying scientific information in a form that is user-friendly while hiding the complexities of dealing with file systems or cloud services. We also optimize network utilization while streaming from petascale repositories through state-of-the-art progressive compression algorithms. Based on this abstraction, we provide customizable dashboards that can be accessed from any device with an internet connection, offering straightforward access to vast amounts of data typically not available to those without access to uniquely expensive hardware resources. Our dashboards provide and improve the ability to access and, more importantly, use massive data for a wide range of users, from top scientists with access to leadership-class computing environments to undergraduate students of disadvantaged backgrounds from minority-serving institutions. We focus on NASA's use of petascale climate datasets as an example of particular societal impact and, therefore, a case where achieving equity in science participation is critical. In particular, we validate our approach by improving the ability of climate scientist to explore their data even on the top NASA supercomputer, introducing the ability to study their data in a fully interactive environment instead of being limited to using pre-choreographed videos that can take days to generate each. We also successfully introduced the same dashboards and simplified training material in an undergraduate class on Geospatial Analysis in a minority-serving campus (Utah State Banding) with 69% of the Native American students and 86% being low-income. The same dashboards are also released in simplified form to the general public, providing an unparalleled democratization for the access and use of climate data that can be extended to most scientific domains.