Aim of the DARIAH-DE Repository
The aim of the DARIAH-DE Repository is to provide researchers in the arts, humanities and cultural sciences with a low threshold tool to store their research data in a sustainable way, describe it with metadata and publish it. In accordance with the FAIR Principles and the Open Access Guidelines of Göttingen University, DARIAH-DE is committed to provide data under open licenses and recommends that all researchers use creative commons licenses for this purpose.
DARIAH-DE advocates a scientific reuse of the research data published in the repository according to the research data life cycle. It wants to promote scientific growth in a self-management system and thus remind users of their own responsibility. The repository and its applications are considered to be a living system in which users are encouraged to handle data responsibly and also confidently.
Research Data in the context of DARIAH-DE
DARIAH-DE has developed the following definition of research data in the context of project work:
DARIAH-DE understands all those sources /materials and results collected, written, described and/or evaluated in the context of a research and research question in the field of human and cultural sciences, and in machine-readable form for the purpose of archiving, citation and for further processing as research data.
As all data in the DARIAH-DE Repository are organized in collections, it was necessary to describe this term as well:
A scientific collection within DARIAH-DE is not defined only as digital literacy or full-text collection based solely on formal criteria. Much more, within DARIAH-DE a scientific collection could also be formed according to content, research-related or professional criteria (See report on collection concepts from DARIAH-DE II). These may have been formed from formal collections, but may also have been grouped from several other of these formal collections. In this sense, formal collections are raw data and the basis for the creation of scientific collections. These are also aggregations of research and metadata, which are compiled by scientists to answer research questions within specific research contexts and enriched with further information such as standard data or annotations in machine-readable form.
A collection of scientifically relevant research data (see Definition Oltersdorf/Schmunk 2016)
- is the subject of scientific questions and serves the validation of statements, methods, theses, hypotheses or theories in research and teaching,
- can be the origin and result of scientific work (research data life cycle),
is documented in a regular form, ideally recorded according to international standards and decorated with normative data,
provides information on their usage conditions (licenses), and
serves to order the collection's objects and their archival storage.
Ultimately, from the point of view of DARIAH-DE nearly all objects can be aggregated in collections. They do not have to be physical objects or their representation, but also, for example, metadata for objects or graphical evaluations or visualizations of data sets. A collection can also be a number of research data, practically a number of files.
This means that there are many possible use cases in terms of licences, usage and reuse and there are very different guidelines which have to be respected, for example licence guidelines of metadata are very different to the data itself. DARIAH-DE is aware of the problems that accompany this broad definition and is confronting them in an open discourse (hmm...nicht sicher ob wir das wirklcih so sagen wollen? Kann DARIAH-DE überhaupt sich über etwas aware sein? Und falls nicht, wer sonst?)
The research process in DARIAH-DE has to fulfill certain basic requirements. These are usually composed of:
- Reliability / documentation of the creation and raising context
- Machine readability (and thus processability)
- Referencing with the information provided by the author and on legal information regarding their further use (by third parties)
Recommendations and List of Preferred Formats
DARIAH-DE provides guidelines with information about formats suitable for long time storage and reuse: Empfehlungen für Forschungsdaten, Tools und Metadaten in der DARIAH-DE Infrastruktur.
Legal and Regulatory Framework
“own[s] all necessary rights to publish this collection including all data and metadata and to allow re-use by third parties” (see Publikator screenshot).
“Data, collections, or metadata that allow conclusions to be drawn about individual persons may not be imported unless the author obtains explicit confirmation from the persons concerned or their legal representatives that they are in agreement with publication in the DARIAH-DE repository. This confirmation must be presented to DARIAH-DE in writing“.
As data can only be uploaded after the authentication of the user (see DARIAH Authentication and Authorization Infrastructure), misuse can be traced back to the perpetrator. If there is a misuse, DARIAH-DE reserves the right to delete the affected data from the repository. (Wollen wir das hier nochmal so explizit unexplizit sagen? Es wird ja nicht gesagt, dass wir alle Daten dahingehen prüfen....)
After the publication the data is stored securely in the DARIAH-DE Repository and is publicly accessible. Following the open access policy of DARIAH-DE, creative commons licences are recommended to the community.
Collection development and Data Policies
The DARIAH-DE repository offers an unique solution, as it enables the researchers to upload their research data by their own hand and publish them without having to take many different hurdles. The DARIAH-DE Repository has a low threshold for its users respectively both the technical resources and the prior knowledge necessary for describing their data appropriately. Each step of the process is to be done online via the DARIAH-DE Publikator in a user friendly GUI. Furthermore each step of the process is elaborately and precisely documented within the DARIAH-DE Repository Documentation. In case of technical problems or further questions, the helpdesk (näher beschreiben?) connects users within less than 48 hours with experts of DARIAH-DE. Articles from the user’s point of view (DHd Blog), FAQs, a user guide, tutorials and workshops of the DARIAH-DE partners complete the support. To illustrate the skills needed to store and publish data at the repository every procedure is explained step by step.
The curation of the DARIAH-DE Repository involves a brief checking of basic metadata as the upload of the data involves a form for the Simple Dublin Core, which comprises 15 elements. Three fields are mandatory (title, author, license regulations). Otherwise the content is distributed as deposited. The main idea of the repository is, that DARIAH-DE provides the tools as well as counselling for the users to do so as they have the expert knowledge to describe content and veracity of the data. Users may use the tools provided by the DARIAH-DE research data federation infrastructure for example in order to map the data, improve the findability and make them citable.
The DARIAH-DE policy for the development of the collection, data access, quality and re-use as well as preservation is strongly influenced by its community driven approach. The demands of the various research communities of the Arts, Humanties and Cultural Sciences where crucial for the development of the Data Model for the collections stored in the DARIAH-DE Repository. Three different approaches were chosen in order to ensure that the data model is suited for the demands of the different research communities:
- Interaction with various researchers provided valuable information. To ensure a more systematic communication about scientific collections, a stakeholder committee with researchers who have experience with collections of the Arts, Humanties and Cultural Sciences was established.
- A detailed analysis of use cases, see Modellierung und Dokumentation von Use-Cases für wissenschaftliche Sammlungen (Modelling and documentation of use cases for research collections. Find a short abstract in English here) and Dokumentation theorie- und verfahrensgeleiteter Sammlungskonzepte (Documentation of theoretical and process guided concepts of collection. Find a short abstract in English here) also provided crucial information.
- Cooperation with various research projects was helpful to understand different practical approaches for managing research data and working with collections in the Arts, Humanties and Cultural Sciences:
Fig. 1: Cooperation in the context of DARIAH-DE and TextGrid (as Virtual Research Environment)
Based on the demands and feedback from the community on the one hand and established standards on the other, the DARIAH Collection Description Data Model (DCDDM) was developed. The DCDDM is a data model for collection descriptions that specifies a fixed number of classes, elements to assist institutions and individual scholars in creating descriptions of physical (or analogue) and digital collections that can be read by humans as well as by machines. It is based on the Dublin Core Collections Application Profile (DCCAP). The DCDDM was developed in close consultation with the community and has recently been revised. It is a dynamic model that can be further customized as needed.
In order to provide tools for working with the data and metadata of the collections stored in the DARIAH-DE Repository, for modelling and mapping the metadata to other schemes, and to make the data collections findable, the DARIAH-DE Data Federation Architecture (DFA) was developed. With the DARIAH-DE Repository as one of its central components, the DFA facilitates the reuse of the research and metadata published in the the DARIAH-DE Repository:
- DARIAH-DE Publikator is an easy to use tool for importing research data conveniently into the DARIAH-DE repository via graphical interface and adding metadata. An extensive user guide description leadsguiding users through the whole process of publishing data and can be found in the documentation of the tool).
- The Data Modeling Environment (DME) is the place where data can be modelled and mappings between data models can be stored, managed on a long-term basis and combined as required. It thus provides conceptual support for researchers in the arts, humanities and social sciences to connect heterogeneous data and thus creates interoperability. Mappings allow automated translations of data from one model into another. Therefore, the DME forms the basis, for example, for the generic search of different collections.
- The Collection Registry serves as a catalogue of collections which occurred within the scope of research projects or serves as a basis for them. It links data, whose data models and the description of a collection for technical reuse by services such as search or analysis tools and also serves to manage collection descriptions. These can include, in addition to digitally accessible, analogue, protected or offline collections.
- The Generic Search provides a front-end for the data stored in the Collection Registry and the DARAH-DE repository. The generic search can be used to search the distributed metadata records. In addition, using the generic search, it is possible to search the listed metadata, save this search in a personalized way, and then adapt or refine it at a later date.
Reuse of research data and metadata is one of the main goals of all services provided by DARIAH-DE. This is reflected by the very definition DARIAH-DE provides for research data, which is considered
“all those sources / materials and results collected, written, described and/or evaluated in the context of a research and research question in the field of human and cultural sciences, and in machine-readable form for the purpose of archiving, citation and for further processing.“ (https://de.dariah.eu/en/weiterfuhrende-informationen).
Since the DARIAH-DE Repository is part of the DARIAH-DE Data Federation Architecture (DFA), the data published in the DARIAH-DE Repository can be managed and reused according to different processes of the research data life cycle:
- Planning and creation
- Conservation measures
The concept of the research data life cycle for the DARIAH-DE services was described in the paper Diskussion und Definition eines Research Data LifeCycle für die digitalen Geisteswissenschaften (übersetzen, abstract!?) and can be visualised in the following schema:
Fig. 2: The DARIAH-DE research data lifecycle
DARIAH-DE continues to engage with the different communities of the Arts, Humanities and Cultural Sciences in order to improve DARIAH-DE Repository as well as the whole DARIAH-DE Data Federation Architecture (DFA). Within the framework of the project CLARIAH-DE for example, the DFA is being evaluated and tested for the specific demands of research data in the fields of linguistics. One of the aim of these works is to link the collections published in the DARIAH-DE Repository to collections of linguistics data. This will not only increase the data available for reuse via the DFA, but it will also ensure that the DFA is suited for research data management in the field of linguistics.
The Preservation Policy of the DARIAH-DE Repository is in line with the open access strategy of the University of Göttingen and its research data policy. To ensure its long term sustainability, the DARIAH-DE Repository is operated by the Humanities Data Centre. The technical measures for preservation are provided by one of the two organisations funding the Humanities Data Centre, the Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen mbH (GWDG). Hier Text über die GWDG ergänzen