Skip to end of metadata
Go to start of metadata

To the DARIAH-DE Website

Table of content

Introduction

DARIAH-DE is developing a repository as a digital long-term archive for human and cultural-scientific research data, which is now in an advanced version. The DARIAH-DE Repository is a central component of the DARIAH-DE Research Data Federation Infrastructure, which aggregates various services and applications and can be used comfortably by DARIAH users.

The DARIAH-DE Repository allows you to save your research data in a sustainable and secure way, to provide it with metadata and to publish it. Your collection as well as each individual file is available in the DARIAH-DE Repository in the long term and gets a unique and permanently valid persistent identifier (PID) with which your data can be permanently referenced and can be cited. In addition, you can register your collections within the  Collection Registry, which are then also found in the Generic Search.

The DARIAH-DE Publikator

The entry point for importing collections is the DARIAH-DE Publikator, which allows you to prepare, manage, and finally import your collections into the DARIAH-DE Repository.

Collections

The term collection requires an explanation in connection within the DARIAH-DE Repository or the DARIAH-DE Research Data Federation Architecture: A collection here means certain quantities of research data, which is practically a quantity of files that belong together in some way.

If your files are already publicly accessible as a collection and are already provided with persistent identifiers and if someone (eg a data center) takes care of their safe storage, you can register and describe them as a collection in the DARIAH-DE Collection Registry. If you have a technical interface to your collection, you can also specify it in your collection description. So the contents of your collection are indexed in the Generic Search of DARIAH-DE and can be found there.

However, your research data can also be stored locally on a hard disk, on a CD or in a non-publicly accessible location, either as a collection or as a single file. Then it is not accessible to other researchers, your research data can not be searched for and found by other interested parties and may be lost to science if not maintained. If you want to make your data available to other scientists and keep your research results safe and citable, you can import them into the DARIAH-DE Repository via the DARIAH-DE Publikator.

After that, your research data

  • will be stored safely in the repository,
  • will include a persistent identifier (your collection itself and all the files).

Your data then

  • can be permanently referred to and be cited,
  • is publicly accessible,
  • can be described as a collection in the Collection Registry, and
  • is searchable in the Generic Search.

Your research data are then included in the research data life cycle and are thus available for subsequent use.

Log in with the DARIAH-DE account or with the federation account

You can reach the DARIAH-DE Publikator in the DARIAH-DE Portal from the side of the

DARIAH-DE Repository

Or also directly via this

Link to the DARIAH-DE Publikator.

To use the DARIAH-DE Publikator, please log in with your DARIAH or Federation account. If you do not have a DARIAH account, you can apply for it here.

Publishing with the DARIAH-DE Publikator

A collection created in the DARIAH-DE Publikator is initially only used to aggregate research data. In this way, you have a superordinate unit that summarizes your data into a topic and allows you to describe your data as a collection of related objects.The associated data can be assigned to this collection and uploaded for publication. Your files are also described with metadata. As a metadata standard, Dublin Core Simple has been used to track a generic approach so that you have a small and refined stock of metadata to describe your data. Only a few details are obligatory. After the publication the data of this collection is stored securely in the DARIAH-DE Repository and is publicly accessible.

In the Collection Registry collections are described and these descriptions are stored. Only references to the data are stored here (or an access method specified on the data) but not the data itself. In the collection registry, you describe your collection – including technical interfaces, and you can access a much more detailed description scheme (DARIAH Collection Description Data Mode – DCDDM) than it is possible with the publication with Dublin Core Simple. The DARIAH-DE Publikator  will now provide you with a draft for a collection description that is based on the metadata you entered, which can be added to the Collection Registry and then published there.


Warning

Please note that all data published and edited through this beta version are publicly accessible and will be deleted after the beta phase! For bug reports please contact funk@sub.uni-goettingen.de. You can find the bug tracker under projects.gwdg.de. We are happy to add you there as a user, so you can register issues for yourself. For this, please write to register@dariah.eu.

Info

First, the files are saved by the DARIAH-DE Publikator in the DARIAH-DE OwnStorage – an implementation of the DARIAH Storage API. During the publication process, the DARIAH-DE Publikator delivers the objects of a collection including metadata to the DARIAH-publish service, which in turn passes the data to the DARIAH-crud service, that is for basal operations such as CREATE and RETRIEVE on the DARIAH-DE OwnStorage, and now also gets PIDs and performs some metadata conversions, and finally safely stores each individual file, along with descriptive, administrative, and technical metadata, in the repository.

Two views of the DARIAH-DE Publikator

The user interface of the DARIAH-DE Publikator is divided into two viewsThe first includes an overview of your collections. Here you can create collections and you can see a list of all collections you have created so far. For each collection in this list, the title and the status of the publication process are displayed:

  • DRAFT  –  The collection has been kust created or is currently being edited within the Publikator. Collections in draft status are only visible to you as a logged in user or registered user. The content of draft collections can be changed.
  • RUNNING  –  A publishing operation has been started and is currently in progress.
    ERROR  – 
    An error occurred during a publication process. 
  • PUBLISHED  –  The collection and its data are published in the DARIAH-DE Repository.
  • REGISTERED  –  The collection ist published and additionally registered in the Collection Registry and is indexed by the Generic Search.

If you click on Create new collection or on one of the collections and click Edit collection, you will be taken to the second view: Edit Collection. Here you can edit contents of the collection and edit the metadata.

Start publishing

Creating a new collection

If you have not created a collection yet, you can create a new one by clicking on the create new collection buttonA newly created collection is initially in the status DRAFT. You will now be taken directly to the Edit Collection view.

Tagging your collection with metadata

Any changes that you make in this view are saved automatically. Iif you click on the to main view button, all your data is stored securely.

First, you should fill out the displayed mandatory metadata fields to describe your collection directly. At the moment, three items are mandatory:

  • Title (dc:title)
  • Creator (dc:creator)
  • Rights management (dc:rights)

The required metadata fields are marked with an asterisk (*) and appear in red as long as they are not filled out. If you are not familiar with the Dublin Core metadata schema, you can click on the (i) to display a description of the metadata field including examples. Dublin Core Simple has 15 metadata fields, the other twelve you can add by clicking the button add optional metadata. All fields are repeatable and you can add them by clicking on (+) as often as you want, and of course delete them by means of (-). Each obligatory field must contain at least one at the latest when the collection is published.

Integrating files (and more metadata)

You can now add your research data as files by clicking Add file(s) to your collection. These will then appear below the title of your collection and you will see a small collection tree, which may be familiar to you. Two metadata fields are automatically assigned: The filename is used as the title, and the format comes from the mimetype of the file, which is determined automatically. You are welcome to change or delete this data. The three metadata fields mentioned above are also mandatory for each file.

If the file contains additional metadata, some of them are also automatically copied, for example, creators (dc:creator) of PDF documents, time data (dc:date), or coverage (GPS coordinates) for image files (dc:coverage).

However, if you add many files to your collection, you do not have to enter all the metadata for each file individually. For all fields, such as creator, author, or licensing, you can select the title of your collection in the tree, and then click (↓). Then the content of the selected field (eg rights management) is copied to all files attached to the collection. If there already exists content for rights management field, this information will not be deleted, but a further field will be added. Be careful to not inadvertently pick up the title of the collection for all files. There is no (not yet) back or undo function!

In the following screenshot you see the edit collection view with optional metadata fields of the sample collection:


In the second screenshot below you can see the view of the metadata of the attached file.  Each file and the collection has its own set of metadata. You can edit them independentlyIf you have selected a file, you can view the file, remove the file, and update the file. If the file is deleted, it is removed from the OwnStorage, including the metadata. File and metadata are no longer available in the DARIAH-DE Publikator. Of course the file will remain on your hard disk. If you want to update the file, for example because you have made local changes to it, you can exchange the file by updating it. Please check whether all metadata are correct after the update. It could happen that automatically generated metadata fields will be added.

FIXME:  screenshot


Status: draft

You can edit your collection as often as you want, the data and metadata are stored in the Publikator safely until you publish them. Once you have finished editing your collection, you can go back to the overview and work on your collection at any time. You will see a list of your collections in the overview, and if you come directly from the Edit mode, the last edited collection will already be opened.

You can now create additional collections or continue working with the already existing ones. Since you are logged in to the DARIAH-DE Portal, the collections of this view are only visible to you as long as they are not published. These collections are in status DRAFTIn the table under the list title, some metadata of the collection are listed. The field below explains the possibilities for you to proceed.

Status: Draft

Your collection is in the draft stage and is only visible to you. Click edit collection to add files to your collection, and enrich the collection and its content with metadataPlease note that some metadata fields must be completed before you can publish the collection. You also have optional metadata fields that will increase the visibility of your collection after publication.

If you have finished editing your collection and you are happy with all your metadata, you can publish the collection: Your collection and all the files contained in it will get persistent identifiers (PIDs) during the publication process and can thus be permanently and unambiguously referenced.

You can also delete the collection and all contained files including metadata from the Publikator, leaving your source files on your hard drive. Published collections can be deleted from the Publikator, but not from the DARIAH-DE Repository.

Publish your collection

If you are now satisfied with your collection, which means that you have added all files and metadata information (at least the obligatory), then you can click the publish collection button. 

Warning

Please be aware that all data and metadata are publicly accessible after the publication process and can no longer be deleted by you!

Status: running

After confirming the note that your collection and the related data can then no longer be deleted from the DARIAH-DE Repository after you have started the publishing process, you will get a message that the publishing process has been started. After a short time the status of your collection changes to the status RUNNING.


During the publication process many things happen, which are described in the info boxes of this documentation below (for the work with the Publikator you can skip them)Data and metadata of your collection will be passed on by the DARIAH-DE Publikator to the DARIAH-Publish Service and from there to the DARIAH-crud service. Information about the status of the publication process are displayed in the blue box. This information comes directly from the Publish Service. Partly, they are very technical and generally not translated.

About architecture...

The DARIAH-Publish Service ...

... is a workflow service that performs various steps within the publication.

Among other things, the metadata is validated, references to objects within the imported collection are converted to persistent identifiers (PIDs) and technical metadata is generated. Finally, after the creation of the collection file, all referenced data, including metadata are passed from the OwnStorage (by reference) on to the DARIAH-crud.

If the publication service is successfully terminated, your collection has been successfully published. This means initially that

  • all files were written to the PublicStorage, where they are publicly accessible,
  • all files have a persistent identifier,
  • the collection and its contents can be queried via the DARIAH OAI-PMH service, and
  • a draft collection description has been created for your published collection in the Collection Registry.

This collection description can then be supplemented and published, so that the collection can be indexed by the DARIAH Generic Search via OAI-PMH. Only after registering the data in tge Generic Search, the collection data can also be searched via the Generic Search.

DARIAH-crud ...

... is the storage service of the DARIAH-DE Repository and provides basic storage operations.

Two instances of the DH-crud service are in operation. One can only be reached internally (eg from the DARIAH-publish service)This is primarily responsible for the generation and administration of data. Here the metadata and data of all objects

  • are stored in DARIAH-DE PublicStorage,
  • are entered into the index database for later retrieval by OAI-PMH service, and
  • get a PID which uniquely identifies and references each object.

The second instance, which allows read-only access to the data, can be accessed externally. It returns data and metadata of the stored objects, as well as a small and fine index page for an overview of the collection and its contents.

Status: published

If the publication process has been successful, the status of your collection changes from RUNNING to PUBLISHED. The  overview looks like this (the published collection is expanded in this screenshot):


The generated persistent identifier of your collection is displayed in the table as the PID of the collection (11022/0000-0001-328E-7). The displayed link refers to the metadata of the PID. It shows you which metadata is stored at the handle service.

Status: published

Congratulations! Your collection is now published and thus publicly accessible and referenced via the displayed persistent identifier (PID). By clicking on the PID, you will see the administrative metadata of your collection as stored in the PID service. You can also display your collection through a basic index page of the repository. From there, you have direct access to your collection's data and metadata, and you can view descriptive, technical, and administrative metadata. Furthermore, you have access to all the related objects in your collection, which are stored safely as so-called BagIt bags in the repository. Each of these bags contains all of the data and metadata associated with an object, which you can also retrieve individually.

If you want to index your collection and its contents via the DARIAH-DE Generic Search, please add the collection to the Collection Registry. You will find a button "add to Collection Registry" A draft of your collection description has already been submitted there. Please add all neccessary information before publishing your description in the Collection Registry, for making it available via the DARIAH-DE Generic Search.

If you want to delete your collection, the data and metadata are only deleted in the DARIAH-DE Publikator, but not in the DARIAH-DE Repository!

Via the index page you can get a quick overview of your collection and its data. You can see some core metadata of the respective collection or content file and you can download all data and metadata.

Die Indexseite der Publikation

You can retrieve various generated and saved metadata for each file as well as for the collection itself. All metadata and files can also be found in the bagit bag, which stores each of your files together with their metadata in a ZIP file. The collection itself is also stored in the repository as a single file, which refers to its content files via PID. For each file, the bagit file includes the file itself plus descriptive metadata (Dublin Core Simple Metadata you already provided, see above), administrative metadata (provided by the DARIAH-crud service) and automatically extracted technical metadata. These bagit bags are stored in the DARIAH-DE PublicStorage.

If the PID of the collection (11022/0000-0001-328E-7) – or a PID of one of the content files – is opened via a handle resolver (eg http://hdl.handle.net/11022/0000-0001-328E-7 ), you are automatically forwarded to the bagit bag, which is stored in the repository. In future, the PID will most certainly refer to the current index page (as a landing page).

Status: failure

If errors occur during the publication process, the status of your collection changes to ERROR. You will first see a general error description concerning your collection.


You will get a detailed description of the problem by clicking on show error details. In some cases, you can also jump to the place in your collection, which you need to correct and then re-publish your collection.


In our example, some mandatory metadata are missing.

Info

If you are experiencing any errors that you feel you can not solve, please report them to funk@sub.uni-goettingen.de. You can find the bugtracker at projects.gwdg.de. We would also be happy to add you as a reporter, please write to register@dariah.eu.

Register your collection in the DARIAH-DE Collection Registry

Your collection is now securely and permanently stored in the DARIAH-DE Repository and can be persistently referenced via PID. With the help of the PID or a URL including handle resolver and the PID, everyone can access your collection and its associated data. If you want to describe your collection in more detail and integrate your research data as a collection into the Generic Search, register your collection in the Collection Registry and expand and publish the already created draft of your description. To do so, please click add to Collection Registry.

You will probably be asked again for your federation account. If you are registered with your DARIAH account in the publisher, please select DARIAH as an institution and log in again with your DARIAH account into the Collection Registry.


You now will be taken to the Collection Registry. Select the item Collections at the top of the menu bar and after that your designs in the menu on the left, or click directly on the title of your collection on the right side under Latest Activities.


Now click on the link to the right of the list with the correct title and you will be taken to the collection description of your collection:


If you wish to add more information to your collection, please switch on "Show hints" (Editor options on the left).
If you want to index the collection in the generic search, you must not delete or modify the access data (OAI-PMH) under Collection Access.

After saving and publishing your collection, it will be indexed by the Generic Search and the contents of your collection can be found there.

Status: registered

Congratulations! Your collection and research data are now published in the DARIAH-DE Repository and the corresponding collection description is registered in the Collection Registry. The status of your collection now is REGISTERED!

Info

If the status of your collection is still PUBLISHED, please click on the link [update status] in the lower right part of your collection description – directly under the button delete collection. The status is then updated directly from the server and not from the cache.


In addition to the publication in the DARIAH-DE Repository and the referencing via the displayed identifier (PID), your collection is now also registered in the Collection Registry. The collection description now published also contains the link of your collection in the DARIAH-DE Repository and is therefore now also indexed by the Generic Search. It will only take a few minutes for your content to be publicly searchable.

(If you are wondering why there are now four objects (or three, or ten) in our test collection and why the PID has changed: For documentation of the error, we created and published another collection. If you really have noticed this, you are a very attentive beta tester! Hats off! :-))

Certification and subsequent use

Peristent identifiers (PIDs)

The verification of your collection and the data contained is mainly provided by the persistent identifier. The collection itself as well as each individual content file gets such a PID and it looks as follows:

Info

11022/0000-0001-328E-7

You can now use this PID as a reference to your collection and your research data. As a PID and identifier you should use it as following:

Info

 hdl:11022/0000-0001-328E-7

If you want to use or share a URL at the same time, you can simply use a handle resolver:

Index pages, data and metadata

You can also reference to all metadata and files themselves (these URLs will also be persistent):

The collection description in the DARIAH-DE Collection Registry

The link to the collection description of your collection can be found in the Collection Registry of our Beta-Release. It looks like the one for our sample collection:

The DARIAH-DE Generic Search

The Generic Search (beta release) can be found here:

  • No labels