Stats NZ has a new website.

For new releases go to

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Case study: the Quarterly Building Activity Survey

To explain how to apply the error framework, we use the Quarterly Building Activity Survey (QBAS) as an example. The current QBAS design is a sample survey that uses an administrative dataset (Building Consents) as the sampling frame, the source of some variables, and to aid editing and imputation. Figure 3 shows the QBAS structure.

Figure 3
Image, structure of QBAS.
The building consents output measures the number and value of all consents each month, with breakdowns by area and building type. It serves primarily as an economic indicator of likely activity in the construction sector and the wider economy. It’s a full-coverage dataset, and we spend considerable effort on processing and coding the data supplied by all territorial authorities.

Building consents data includes information on the location, consented value, building type, floor area, and some other variables, for each construction job above $5000. This information is published monthly.

QBAS aims to measure the actual value of construction work done in New Zealand each quarter, and is an important component of national accounts series. Based on the monthly building consent datasets, we select a postal sample for the quarterly QBAS. This sample is stratified by residential/non-residential consents and value. We estimate the low-value (under $45,000 for residential and $80,000 for non-residential) jobs with a simple model that assumes the job starts and finishes in the quarter it is consented. We measure the highest-value jobs with a full-coverage survey, and estimate the middle jobs from a sample survey. The survey asks for a single variable – the value of work put in place on the job up to the reference quarter.

QBAS has now moved to this a new design, which makes more use of modelling based on building consents data and significantly reduces the number of construction jobs surveyed. The main change is to model the former sample survey strata from consents data and historic survey data.

See Methodology and classification changes to Value of Building Work Put in Place statistics for details of the changes and what they mean for the published output.

Datasets for phase 1 of the error framework

We combine two unit record datasets to produce the final QBAS output: building consents and the survey data. Each needs a separate assessment. We focus on two variables: work put in place to date (WPIP) from QBAS, and building consent value from the building consents dataset.

First we look at the measurement (variables) side of the phase 1 framework. In tables 1–4 we define the target concept, and the measures for each dataset, and briefly note the most important sources of each type of error. Each table covers one dataset.

Table 1
Measurement side of phase 1 framework for building consents

 Measurement (variables)  Building consent value variable  Error type in measurement side  Potential errors arising in building consent value variable
 Target concept  Value recorded on each building consent approved by the territorial authority (TA).  Not applicable  Not applicable
 Target measure  Measure takes the value from the consent form, as recorded by TA.  Validity error  Alignment between target concept and target measure is very good – the output simply reports building consents data.
 Obtained measure  Values that end up in the datasets supplied by each TA.  Measurement error  Values could be wrongly entered on forms (eg $20,000 not $200,000). Rounding could affect responses (eg consent for $285,000 is entered as $300,000).

For the building type variable, an example is consenting a building that could be either retail or office space. Consent may say 'retail/office', but the building’s true use can’t be determined at any finer level.

Missing values for consent value (item non-response) are also measurement errors.
 Edited measure  We check consent values supplied by TAs. Suspicious or missing values are followed up with TA. Edited measure is the final value after checking and confirmation.  Processing error  Where more than one building type is in a consent, we assign the value of the consent to each building type using a predefined formula – some errors for the target measure will arise, because the exact value of each construction job by type can’t be determined.

Table 2
Measurement side of phase 1 framework for Quarterly Building Activity Survey respondent data

 Measurement (variables)  QBAS work put in place to date (WPIP) variable  Error types in measurement  Potential errors arising in QBAS WPIP variable
 Target concept  Work put in place: the actual dollar value of work done on a construction job in the reference quarter.  Not applicable  Not applicable
 Target measure  Measure used is the survey question, "What is the total cost of work put in place on this job, from the start of the job until now" (it also explains what costs to include and exclude).  Validity error  Alignment between question and target is good. WPIP is a well-defined dollar value, so it is relatively easy to create a practical question to measure it. If we know WPIP for each quarter we can easily subtract previous work and obtain the quarterly work put in place. No additional transformations should be necessary.
 Obtained measure  Actual responses we receive on questionnaire forms.  Measurement error  Respondents can make mistakes in their estimates, or round off values. They may also not understand the instructions and include costs (eg legal fees) that shouldn’t be counted towards WPIP.

Category includes item non-response, but QBAS only really asks one question so not much difference between item and unit non-response. Non-response is around 10–15% by value for each category/building type and overall.
 Edited measure  From WPIP and previous responses, we derive the work put in place for the reference quarter. QBAS responses have edits (eg checking for magnitude errors), and we impute missing values. The edited measure is the final value after this processing is done.  Processing error  Errors in imputed values mostly result from the regression imputation assumption that the relationship between WPIP and consent value is the same for all jobs in the imputation cell. Errors in imputation flow to the next quarter, since WPIP depends on the previously reported/imputed values. If we impute too low a value for WPIP one quarter and get a true response next quarter, WPIP based on subtracting the imputed value from the response will be too high.

Note: although the WPIP and consent value variables differ considerably, from the phase 1 perspective they are both valid measures of the intended concept of each dataset.
We also need to compare the representation (objects) side of the phase 1 framework.

Table 3
Representation side of phase 1 framework for building consents

Representation (objects)  Building consents units  Error types in representation  Potential errors arising in building consents
 Target set  All building consents issued in NZ with a value greater than $4,999 in specified month.  Not applicable  Not applicable
 Accessible set  All building consents actually recorded by territorial authorities (TAs) and sent through to us.  Frame error  We assume we get information on all building consents each TA processes. Any consents that can't get into the system would cause frame error (eg manual errors in date of consent so it is not in the monthly data we receive, or missing records due to TA data supply problems). Could include the $5000+ restriction if we consider the target set to be all building consents.
 Accessed set  Building consents is a census of the consents that arrive, so the accessed set is the same as the accessible set: all consents that end up in the TA data sent to us.  Selection error  There should be no selection errors (eg sampling errors). Depending on exactly how we define the target population and where very small consents are removed (eg by TA or by Statistics NZ when loading the data), some errors mentioned under frame errors could be selection errors.
 Observed set  Includes all units we have data for, so is the same as the accessible set.  Missing/
redundancy error
 Once we 'select' a consent (we select 100% of consents) we always get a response – the consent both forms the target population and contains the responses we want.

Table 4
Representation side of phase 1 framework for Quarterly Building Activity Survey respondent data

 Representation (objects)  QBAS units  Error types in representation  Potential errors arising in the QBAS survey
 Target set  Active construction projects in NZ during ref qtr  Not applicable  Not applicable
 Accessible set  For a survey this is the sampling frame. Construction jobs with building consents approved during the months of the reference quarter.  Frame error  Construction work is likely to happen outside the consent frame (eg people do small home renovations without getting a consent or realising they need one). Errors in the consents systems could mean a job doesn’t appear in our consents for the relevant months (eg a manual error puts it in the wrong month, or another mistake when TAs’ prepare data for us).

The frame may also be in error for staged or split jobs. Sometimes stages are missed, so the corresponding job is not in the correct stratum. For split jobs, the building types could be difficult to determine or the value apportioned may be uncertain – jobs might not have a corresponding consent for the correct value/type.
 Accessed set  For a survey this is the sample. Includes units selected into sample from the building consents each month, including the modelled, sample, and full-coverage strata.  Selection error  For full-coverage strata, selection errors should be minimal (but see staged or split jobs mentioned above). Same applies for the lowest strata, which can be treated as full-coverage – WPIP data for them comes from administrative data rather than survey.

For sample strata there are sampling errors, which we calculate routinely for QBAS releases. Typical values are around 3% in the total WPIP across all buildings, and around 4% for residential/non-residential categories.
 Observed set  Final set of responding units in the dataset, which includes survey responses and modelled units. Non-response causes this set to be smaller than the accessed set.  Missing/redundancy error  QBAS treatment of unit non-response is very similar to that for item non-response – there is only really one target variable. According to the latest tech description, non-response is around 10%.

This example demonstrates how some types of error tend to affect survey data more than administrative data, and vice versa. For example, validity and measurement errors are often an issue in surveys, particularly in social surveys where concepts such as ethnicity or well-being are very difficult to define and measure, and respondents may not understand the questions in the way the designer intended. Administrative datasets, because they are created for an operational purpose, tend to aim to collect strictly defined information that matches their rules. They are often less affected by validity and measurement error.

Phase 1 assessments of administrative data sources are valuable because they are easy to pick up and reuse when we evaluate the dataset for a new statistical purpose. The building consent–QBAS example also demonstrates how to use the framework to assess a stand-alone survey that doesn’t use any administrative data. This is useful if our aim is to compare an existing survey design with a new administrative data-based design, where we want to know all the quality issues and trade-offs involved.

Phase 2 of the error framework

Phase 2 of the error framework is where we evaluate the combination of separate data sources against a specific statistical purpose. The first step is to define the target population and target concepts clearly. We also need to understand the processes by which the source datasets are transformed into the final dataset.

Because QBAS uses the building consents dataset as a frame, reconciling the units in the two datasets and creating statistical units is fairly simple, and the most important sources of error are in the measurement side. Table 5 explains how the framework matches up with the QBAS design and the structure of each dataset.

Table 5
Measurement side of phase 2 framework for final QBAS output

 Measurement (variables)  Output QBAS dataset combining building consents and survey responses  Phase 2 error types  Potential errors arising in final QBAS output
 Target concept  Work put in place (WPIP) in each job during the reference quarter. Other important variables are secondary: building type, floor area, location/region, and institutional sector. These come from consents data (the frame).  Not applicable  Not applicable
 Harmonised measures  The final measure on QBAS units is WPIP, as defined by QBAS questionnaire. For other variables, we have Statistics NZ’s building type, institutional sector, and location classifications.  Relevance errors  Refer to concepts, definitions, metadata, not actual data.

WPIP variable: We assume conceptual alignment between QBAS WPIP question and target concept is excellent because it is a direct survey collection designed with the target concept in mind.

For building consents and WPIP modelling we consider the conceptual alignment between consent value and WPIP. Major discrepancies at the conceptual level are:
Consent value is the estimated total value of the project, while WPIP is the actual work done in a given period (quarter). Consents indicate confidence/ intentions at a point in time, while QBAS measures real economic activity over a certain period.
Consent value includes GST; WPIP excludes GST.

Other variables: QBAS questionnaire includes consent details for location, value, nature of job, and asks respondents to correct details if wrong. This results in differences between QBAS and consents for these variables (due to timing, updates, and fixing mistakes rather than conceptual misalignment). The underlying concepts for all variables other than WPIP are the same for building consents and QBAS. These 'harmonised measures' may differ from the target concepts for these variables in similar ways to WPIP, because the consent is only a plan/estimate.
 Reclassified measures  Not a lot of conversion is necessary for QBAS data, since we collect it with harmonised measures. The main conversion is for building type, where the job description on the consent has to be converted into our classification.

In the new design, WPIP is modelled from the consent value and the age of the consent, based on historic survey responses. The reclassified measure includes details of the modelling methodology, including the rules that determine which jobs will be modelled and how.
 Mapping error  For WPIP: for QBAS responses the WPIP we collect is already for the target harmonised measures (apart from adjusting from total WPIP to date to WPIP in the previous quarter using earlier responses, which could also be part of adjusting the measures below).

When we model WPIP from building consent values, this raises further mapping error possibilities. Although modelling units from administrative data is similar to imputation, we distinguish the two because modelling is designed to deal with the conceptual mismatch between the administrative consent value variable and the target statistical variable. In contrast, imputation corrects for non-response in otherwise well-aligned variables.

Building type is the most likely other variable to have mapping errors (eg free text consent descriptions in administrative data are not clear or are ambiguous – the analyst has to judge the best fit). Ideally we’d like to measure WPIP by building type; this may be classified when the building consent comes in but left for the rest of the project even if the project changes slightly.
 Adjusted measures  For QBAS data, our main adjustment is calculating quarterly WPIP by subtracting the previous WPIP from the latest survey response. We also edit and impute at this stage. Imputation uses a combination of auxiliary consents data, previous responses from the imputed unit, and responses from similar units.

Editing, imputation, and adjustment for other variables at this stage is relatively minor.
 Comparability error  WPIP imputation will never be perfectly accurate, which contributes to errors. The imputation method we use for QBAS respondents assumes all units in the imputation cell have a certain relationship between WPIP to date and the consent value. This means jobs that run slower or faster than average, or have complications during construction that increase costs, will be in error to some degree.

Table 6
Representation side of phase 2 framework for final QBAS output

 Representation (units)  Output QBAS dataset combining building consents and survey responses  Phase two error types  Potential errors arising in final QBAS output
 Target population  Building projects in NZ that did work during the reference quarter.  Not applicable  Not applicable
 Linked sets  The unit record dataset that matches up data from consents with (if the consent was in the survey) QBAS responses.  Coverage error  Coverage of the two data sources for the target population. We expect the building consents frame to cover nearly all significant building projects, except any lost due to clerical or other errors (as mentioned in phase 1). We don’t expect coverage errors for QBAS to be an important contributor to overall errors in the output. Linkage of QBAS responses to their consents is a fairly trivial process.

Note: the design doesn’t cover consents < $5000.
 Aligned sets  Alignment sorts the relationships between different sets of units in different datasets. QBAS has little distinction between linked and aligned sets, because the survey frame and statistical units all come directly from building consents. In most cases alignment is already achieved by the link between the QBAS form and consent number. Split and staged consents are the main problem, where we want to find all the consents, updates applying to a single job.  Identification error  These could result from staged or split consents that aren’t identified. All consents relating to a single project should be linked to produce the aligned set of statistical units. Similarly, ideally we want to treat a split consent as two separate projects and count work done on the two different building types separately.
 Statistical units  Creating a statistical unit corresponding to a building job/consent is simple for QBAS, because of the relationship between consents, sampling frame, and target population.  Unit error  Because the fundamental statistical units are based on building consents, unit errors are minimal. Extra statistical units might be created if we miss staged or updated consents and treat them as new consents. However, they are more accurately identification errors, where we haven’t correctly connected later consents to the original consent/statistical unit.
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+