Scientific Data Management

 

Project 1:  DAMASC

Project 2: Automated Index Tuning

Project 1: DAMASC (DAta MAnagement in Scientific Computing)

DAMASC architecture 

 

File systems are the backbone of large-scale data processing for scientific applications. Motivated by the need to provide an extensible and flexible framework beyond the abstractions provided by API libraries for files to manage and analyze large-scale data, we are developing Damasc, an enhanced file system where rich data management services for scientific computing are provided as a native part of the file system. The Damasc project is about the design explorations for a performant file system that would allow scientists or even casual users to pose declarative queries and updates over views of underlying files that are stored in their native bytestream format. In Damasc, a configurable layer is added to expose the contents of files in a logical data model through which views can be defined and used for queries and updates. The logical data model and views are leveraged to optimize access to files through caching and self-organizing indexing. In addition, provenance capture and analysis to file access is also built into Damasc. 

 

 

This work is funded by DOE grant DE-SC0005428, partially funded by NSF grant #1018914, and ISSDM

 

Project 2: Automated Index Tuning

Members: Karl Schnaitter

Information about Automated Index Tuning can be found at the Database Group's website

 

Publications: 

On the role of NVRAM in data intensive HPC architectures

Essen BV, Pearce R, Ames S, Gokhale M. On the role of NVRAM in data intensive HPC architectures. In: Workshop on Emerging Supercomputing Technologies (WEST) at ICS 2011. Workshop on Emerging Supercomputing Technologies (WEST) at ICS 2011. Tucson, AZ; 2011.

Fusing data management services with file systems

Brandt S, Maltzahn C, Polyzotis N, Tan W-C. Fusing data management services with file systems. In: PDSW '09: Proceedings of the 4th Annual Workshop on Petascale Data Storage. PDSW '09: Proceedings of the 4th Annual Workshop on Petascale Data Storage. New York, NY, USA: ACM; 2009. pp. 42–46.