Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Implementation and future work

This section answers the following questions:

  • What is this error framework missing?
  • How should this framework be implemented and how will a central repository work?
  • How do we see it being used, and who will be responsible for it?

Out-of-scope work

The error framework presented in this guide provides a general way to understand the quality of datasets and outputs. However, it is high-level and we haven’t found solutions to many of the difficult technical problems that are the natural next steps to enhance this framework.

At the start of this project we identified key methodological gaps that were out of this guide’s scope.

As part of developing this framework, we listed the methodological areas excluded from our scope. The main areas we identified were:

  • time-series quality measurement
  • the effect of confidentiality rules on quality
  • measuring conceptual and validity errors
  • quality measures for linked data
  • combining different measures into a single overall measure
  • weighting in the presence of combined survey and administrative data
  • quality measures for statistical outputs that use administrative data for benchmarking or calibration
  • quality measures for apportionment (eg of GST and rolling mean employee counts on the business register)
  • quality measures for editing and imputation
  • combined assessment of costs, quality, and respondent burden.

Statistics NZ, along with other statistical agencies and statistics researchers, has investigated many of these areas. A full summary of these complex areas is out of the scope of this framework.

Central repository of quality assessments

To maximise the benefits from the error framework in this guide, we must make the outcomes of quality assessments available in a central repository. Statistics NZ uses Colectica as a standard corporate metadata tool, and it can naturally be extended for use as a quality assessment repository.

  • It already contains a lot of basic metadata, links to existing studies, and other information, so we avoid duplication.
  • Detailed documents, such as feasibility studies and in-depth quality assessments can be linked into Colectica (many already are).
  • It is possible to add 'quality statement' templates to Colectica that can capture more detailed information in an organised way.
  • The standard metadata items can be modified over time so we capture and organise key information. Our template contains good questions that can be easily integrated into Colectica.
  • We can create phase 1 and 2 assessment templates in a format to be directly uploaded to Colectica.

Centralising quality assessments has the following benefits.

  • Eliminates duplication of work and lets new studies build on the old (especially for phase 1 assessments).
  • Ensures all relevant work and knowledge about a given dataset is easily found in one place for reuse.
  • Encourages certain basic information to be understood and recorded for all our collections in the same way.
  • Provides an easy way to find quality information for releases and to answer queries.
  • Assists analysts new to undertaking quality assessments (with examples).
  • Provides guidance for performing quality assessments of sample and census survey data.

Implementing the error framework

Using administrative data requires our statistical analysts to change the way we organise our work. Since the providers produce administrative data primarily for their own use, the data have to be assessed before we can use them in statistical outputs. The assessment replaces the controls we generally rely on during the initial phases of a survey. The error framework guides the assessment process and leads the user of the administrative data to decide on its potential uses.

We envision that more and more datasets and outputs will be assessed using this error framework. Although the framework and measures we’ve presented here are very detailed, the time we’ve available to undertake quality assessment for an administrative data source may be limited. Initially, data quality assessments may only focus on the most important and useful aspects of the framework.

Tracking the changes

New users may need to update an existing quality assessment when considering their data use or when the administrative data changes. We recommend that analysts include an audit trail when updating the meta-information template or when computing additional phase 1 quality indicators. The audit trail should indicate who did the update and its date. The current meta-information template includes an audit trail section, but now we have loaded the template into our Colectica tool we can more easily keep track of changes and share them. We’re continuing to develop our systems for recording and updating quality information about datasets and outputs.

Future work

This guide is not intended to be the final word on administrative data quality, but it should provide a consistent language and structure for assessments. As we gain more experience applying it in different contexts, we will probably discover gaps or ambiguities in the types of error and the aspects of datasets we need to consider.

A major area for future research that we’ve found through recent Census Transformation work is ‘Phase 3’ for the framework and assessment process. The idea is to build knowledge of the sources of errors in the output microdata into a model that would attempt to correct for, and quantify, the uncertainty these errors introduce into our statistical estimates. Bryant and Graham (2015) describe this sort of model for use in population estimation from administrative datasets.

We hope that a ‘virtuous cycle’ can be created where we use the errors identified to help correct and improve the model, and use the model to measure and test the effect of the errors on the output. This would take us closer to the goal of using the framework to compute a general ’total survey error’ quantity for the statistical outputs we produce from administrative data.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+