By conscientiously filling in and quickly sending the supporting documents to the AUSSDA team, depositors help with the publication of the data.
Metadata
Metadata make it possible to find individual data records in our AUSSDA Dataverse. While some metadata describe the data superficially (e.g. title, abstract, principal investigator), others give a more detailed insight into the type of data collection (e.g. sampling procedure, mode of data collection, type of instrument). The thematic classification of the data is done by Topic Classification and Keywords. AUSSDA uses the European Language Social Science Thesaurus (ELSST) to create the keywords so that the data can also be found in other data catalogues such as the CESSDA Data Catalogue.
A carefully completed metadata sheet helps us to complete the entry in the AUSSDA Dataverse with the important information about the data and thus increases the probability that interested researchers and Citizen Scientists will find and use the archived data.
Documentation
In the best case, we not only archive the datasets, but also additional documentation that provides more detailed insights into the archived data. One example is the methodological report provided, which among other things enables other researchers to assess the type of research, as they learn more about the sampling process. Additionally, Codebooks allow a quick overview of the individual variables and their labels in the datasets. Even if long variable labels are only displayed in a shortened form in the Stata format (see below), codebooks allow a better understanding of the variables. AUSSDA can also archive questionnaires, data management plans, field reports and much more along with the actual data.
Datasets
The data producers are responsible for the anonymisation of the data. Datasets with high reuse potential are additionally checked by our data processing specialists to identify critical variables and also to ensure a good understanding of the variables and labels. Critical variables are, for example, when personal information can be found (such as full names, e-mail addresses), thus when individuals can be uniquely identified. Further, identification can often also be possible via so-called cross tables, especially in the case of small numbers of cases. If, for example, in the only city in Burgenland with more than 10,000 inhabitants (Eisenstadt), a male pharmacist (ISCO codes, gender) at the age of 47 (Age) can be identified. For this reason, we often recommend grouping low case numbers if a data should be open access.
A short, concise labeling of data (variable and value labels) leads to more clarity and better usability. For example, in stata labels only up to 80 characters are displayed, in SPSS longer labels are possible. Therefore, the transformation from SPSS to Stata often leads to poorly readable variable labels.
What does a publication look like?
In Dataverse you will find the dataset itself, which is available in SPSS, Stata and tab formats, as well as research documentation such as the questionnaire, a codebook and a method report as downloadable files. The metadata are embedded in the predefined fields and can be read by search engines. Via its own DOI, a long-term identifier, the archived dataset can be found permanently.