This project builds infrastructure that encourages data-driven discovery from distributed, fragmented datasets without requiring movement of massive amounts of data and without exposing sensitive raw datasets to end users.
A Virtual Information-Fabric Infrastructure (VIFI) is created, allowing scientists to search, access, manipulate, and evaluate fragmented, distributed data in the information ‘fabric’ (the infrastructure to facilitate data sharing) without directly accessing or moving large amounts of data. The system addresses the challenges of coordinating loosely federated infrastructure, distributed data management, security and privacy. The architecture combines a set of loosely coupled components representing some proven capabilities with several emerging components. The VIFI infrastructure includes a novel orchestration layer for on-site analytics and hybrid-infrastructure (GPU, CPU) management, a dynamic secure container-based infrastructure which enables online adaptive analytics from unshareable data at distributed locations, and enhanced data and code management tools. The layer also provides search, access and query based on improvements using persistent identifiers and automated semantic descriptions (or metadata) of raw data using semantic data mining techniques.