jSpace - Scientific Dataspace Support Platform


Scientific data, collected in various research domains are made accessible for significant analysis through portals by the means of e-Infrastructures. Managing the outcome of these analyses in conjunction with its corresponding input data, by enriching the existing relationship with semantics to facilitate reuse of data and analytical methods is nowadays more important than ever. Systems providing advanced integrated view to large-scale and distributed scientific data are described in the literature to a great extent, however the key (dataspace) feature managing semantic relationships is not well considered and thus it represents an open research challenge to be addressed by jSpace.

The initial ideas on managing dataspaces have started to evoke interests of the data management community, however most effort is related to the database research and application mainstream and so far not considered for advanced scientific data management. Furthermore, most of the approaches towards realizing a dataspace system that were presented at international conferences to date, focus on personal information management. In the figure below we illustrate our extension to the mainstream dataspace research providing advanced scientific data management. The mainstream dataspace research can be summarized as the research direction focusing on the realization of dataspace concepts for an on-demand data integration, which is referred in the literature to as pay-as-you-go data integration.

Scientific dataspaces aim at providing associated mechanisms for managing semantically rich relationships among scientific data resources as well as to keep track of scientific studies - independent of the e-Science application domain - that are being conducted by members of a scientific community and to link these studies with user information i.e. institutional affiliation, email address, working field, etc. of the scientist who conducted the study. jSpace focuses its effort on scientific dataspaces, which, if applied in e-Science applications can provide a highly efficient and powerful scientific data management solution for e-Infrastructures. Our approach is to semantically enrich the existing relationship among primary and derived datasets and to preserve both, relationships and datasets together within a dataspace to be reused by owners and others. This approach is shown to significantly improve assisted publishing, discovery, and reuse of primary and derived data used in scientific studies within e-Infrastructures. To enable reuse, data must be well preserved, which can best be established if the full life cycle of data is addressed.

We present a novel OWL ontology for the creation of semantically rich relationships among primary and derived datasets in scientific studies. The major contributions of jSpace include:

  1. 1.e-Science life cycle model, a specific model addressing the complete data life cycle to provide well-preserved scientific studies,

  2. 2.Long-term preservation framework providing preservation of the complete life cycle of data in scientific studies,

  3. 3.Large-scale scientific datas- pace platform - jSpace integrating the achievements presented in this thesis enabling to interconnect multiple dataspace instances from various domains, and

  4. 4.jSpace Java API providing all needed methods to construct semantic data about scientific studies and a model for their management within a distributed data environment.

The jSpace prototype software can be downloaded from here.

A brief introduction to Scientific Dataspaces