Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Overview of making a quality assessment

This section summarises how to use the error framework and where to begin a quality assessment.

You can apply the error framework to almost any dataset and/or statistical output. Some of the thinking involved can be complex and difficult, so it’s important to consider your quality assessment’s scope – start with these questions:

  • What are your aims? Some possibilities include:
    o evaluating the quality of a new administrative dataset as it arrives at Statistics NZ
    o developing better measures of quality for an existing output
    o understanding the impact of design choices on quality when using administrative data
    o getting a better understanding of the trade-offs between more administrative data use and final output quality.
  • What are the relevant data sources for your output (whether planned or existing), including both survey and administrative data? Which are the most important for your purposes?
  • Has the dataset been used before within Statistics NZ, and is there any earlier work that could save you time? Look at what is available on Colectica and check relevant internal documents and databases to see if meta-information templates or other studies have already been completed for the dataset.
  • What are the variables in the different data sources? Which are the most important for your intended data use?
  • What population do the relevant datasets cover? Is the basic unit people, businesses, or something else?
  • How are the dataset’s raw variables combined or transformed to produce your final data?
  • How are the dataset’s basic units converted into the statistical units in the final data?
  • What are the main quality problems you know of or guess might be relevant to your purpose, based on your understanding of the original data?

Stages of quality assessment

This section lays out the main steps to carry out using the error framework so your time spent on quality assessment is as effective as possible.

It starts with the most important and generally useful aspects of the framework and works down to the details. Your aim should be to produce a quality assessment that gives enough information to make design and other decisions confidently.

Metadata information template

The metadata information template encourages thinking about the key aspects of quality in an organised way. It is also a convenient way to record a standard set of information – to compare different datasets. See ‘Available files’ for this template.

The first step of a quality assessment is to briefly answer the main questions in the template. The most important are:

  • General information: Items 1.1–1.6 including source agency, purpose of collection, summary of variables, and time span of the data.
  • Population: The target population, admin population, and reporting units. The items relating to coverage might not be possible to answer with a quick assessment but note anything you do know.
  • Variables: A short description of key variables. As work progresses, record the target concepts for the variables under investigation as they become known.
  • Collection: The timing/delay information and method of collection are important and should be easy to find out and record.

Note: Colectica may have much of this information for datasets already used at Statistics NZ. All items are crucial to a sound understanding of a dataset’s quality and the issues that might arise from using it for a different purpose. For example, understanding the original purpose of the data collection can guide you to which variables might be of higher quality than others, and to the likely coverage of the data.

Record any useful information for other questions but ignore any non-relevant boxes in the template. If you uncover relevant information later in the assessment, then add it – ideally the meta-information template for a given dataset should be improved and expanded as different people in Statistics NZ find out more about it.

Phase 1 of the error framework summarised

The error framework, explained in detail in the next section, has two phases. Phase 1 deals with datasets in their raw state – as they look when originally produced. The key questions in this stage of the assessment are “what information did the creators of this dataset want to capture?” and “how well does the final dataset capture this ideal information?”

The framework is split into two sides:

  • ‘measurement’, which deals with the variables in the data
  • ‘representation’, which deals with the respondents or other reporting units, generically labelled ‘objects’.

The phase 1 assessment should give you a detailed understanding of the issues that arise during the original data creation processes, and how they affect data quality for the original purpose.

You need to define or describe each boxed term in the figure 1 phase 1 error framework diagram (eg target concept, harmonized measure) for the datasets being assessed. Use the general information in the meta-information template to do this. Think about each step from the point of view of the original data producers and what their goals were when they created the dataset.

Once you’ve defined the terms, categorise known data quality issues or strengths according to the error source (the ovals on figure 1). This shows exactly where any quality issues arise.

Phase 2 of the error framework summarised

Phase 2 of the error framework aims to determine how well a given combination of datasets meets a statistical need.

Firstly, describe or define the boxed terms in figure 2 (phase 2 error framework). This requires knowledge of the output design and the processes that transform the source data into the final statistical output. Apply the phase 2 framework to your proposed designs to help decide on the design, or to existing designs – to understand the current strengths and weaknesses of an output and where improvements might be possible.

Throughout your phase 2 assessment, the target concept and population (see figure 2) are the ideal statistical information you would like to have. You must identify a clear statistical need to carry out an effective assessment.

Using the results

Record your phase 1 and phase 2 descriptions in a simple table.

See Case study: the Quarterly Building Activity Survey (tables 1–6) for a guide to the appropriate level of detail needed.

Once you’ve completed these first steps, conduct a more detailed investigation of specific error sources for any causing a problem in the final statistical output. However, a comprehensive evaluation of every source of error in a complex output could be time-consuming. Time spent on these tasks should reflect what is needed to meet each project or assessment’s goals – focus on areas where you can make useful mitigations or improvements.

The lists of quality measures and indicators (see ‘Available files’ on this webpage) for phase 1 and phase 2 may be useful at this stage, to help understand some ways to measure various aspects of data quality. These lists are not intended to be universally applied, but are meant to give some ideas and potentially prompt thinking about more specialised measures that might be useful for a particular output. Use these measures to help form an objective picture of the quality of a dataset at a particular point in time. They can also be used as ongoing monitoring checks for the output, to ensure that its quality is consistent.

Record your completed quality assessment or template centrally, for others to reuse the results and analysis.

See Central repository of quality assessments section for more details on doing this.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+