| Hauptseite > JuOSC (Juelich Open Science Collection) > Cluster Analysis of Open Research Data: A Case for Replication Metadata |
| Journal Article | FZJ-2023-04211 |
2023
Digital Curation Centre
Bath
This record in other databases:
Please use a persistent id in citations: doi:10.2218/ijdc.v17i1.833 doi:10.34734/FZJ-2023-04211
Abstract: Research data are often released upon journal publication to enable result verificationand reproducibility. For that reason, research dissemination infrastructures typicallysupport diverse datasets coming from numerous disciplines, from tabular data andprogram code to audio-visual files. Metadata, or data about data, is critical to makingresearch outputs adequately documented and FAIR. Aiming to contribute to thediscussions on the development of metadata for research outputs, I conduct anexploratory analysis to determine how research datasets cluster based on whatresearchers organically deposit together. The content of over 40,000 datasets from theHarvard Dataverse research data repository is used as a sample for the cluster analysis. Ifind that the majority of the clusters are formed by single-type datasets, while in the restof the sample no meaningful clusters can be identified. For the result interpretation, Iuse the metadata standard employed by DataCite, a leading organization fordocumenting a scholarly record, and map existing resource types to my results. About65% of the sample can be described with a single-type metadata (such as Dataset,Software or Report), while the rest would require aggregate metadata types. ThoughDataCite supports an aggregate type such as a Collection, I argue that a significantnumber of datasets, in particular those containing both data and code files (about 20%of the sample), would be more accurately described as a Replication resource metadatatype. Such resource type would be particularly useful in facilitating researchreproducibility.
|
The record appears in these collections: |