Our bread and butter: data checks

06.03.2018

As of now, our daily work consists of mysterious work steps to you. After reading this article you will want to deposit your data with us – we promise!

Here we will answer the question: „What happens after data is deposited at AUSSDA?” (you have asked yourself the very same thing already – right?) The short answer: a lot. But which step in the deposit process are we talking about exactly?  

  • Data ingest – first contact, consultations, signing the deposit agreement, provide AUSSDA with your data
  • Processing – (still) mysterious AUSSDA processing steps for you that center around the comparison of dataset, codebook and documentation
  • Controlling & Publishing – feedback to changes by depositor, last controls, publishing the data set in the AUSSDA Dataverse

You guessed it: step 2! What we call „processing“ includes a series of steps our data processing specialists take to ensure that your data is reusable and easy to understand. Data that are not accompanied by documentation or that are not understandable cannot be used in the future.  

Saving the data

After receiving the data and documentation, a data processing specialist saves the documents and data. As we archive quantitative data at the moment, we save datasets as STATA files. 

First checks

Thereupon the first data checks follow. Are all documents in usable formats (e.g. data in SPSS, STATA, csv, tab)? Were all direct personal identifiers deleted from the dataset (such as name or social security number of respondents)? If all these questions can be answered positively, we can continue with the next step.  

Comparing, comparing, comparing

This is the step in the deposit process where it pays off to have two computer screens. We open and view your dataset and documentation (codebook, methods report, questionnaire) as well as your metadata entry. We pay special attention to:

  • Metadata checks: are all metadata entries filled out correctly? Can optional fields be filled by us?
  • Data checks: is the number of cases and variables the same across files? Are value labels the same across all files? Were skip patterns implemented correctly? Are missing values defined consistently?
  • Anonymization: Depending on the license you chose we check if changes to the data are needed to ensure the anonymity of your respondents. How are variables like age, place of residence and education coded?
  • Plausibility checks: If we find a retired person that is 12 years old we will get in touch with you!

While we test all that and more, we enter all our notes into a log.  

Log

In this document we try to document thoroughly and in a comprehensive way all discrepancies and inconsistencies we find. At best, we find little, but in the long process of labeling, coding and formatting it is easy for errors to sneak in. Therefore we differentiate between various categories of “inconsistencies” (let’s not call them errors):

  • Differences in data and documentation
  • Notes on variables and labels
  • Notes on values and labels
  • Anonymization
  • Spelling/grammatical errors

Errors that occur in the last category are easy to solve, others from the first four categories may be challenging for us to fix.  

Internal transparency

Responsibility and accountability are important during the deposit process. You trust us with your data and we handle them with care. In our internal documentation we track who took which action so no step gets skipped.  

 

I thank all of our readers who stayed with me until the end. If you want to see this log of identified inconsistencies and if you’re ready to release your data into our archive, don’t hesitate to contact us! You can reach us by phone (+43 4277 15323) or by mail and we are looking forward to meeting you and receiving your data set.

Hands typing on a computer keyboard. Above the keyboard is a folder with "AUSSDA Data Checks" written on it.
A Data Processing Specialist hard at work (photo: Lena Raffetseder)