Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Non-Sampling Error and the Processing System (PS)

Processing system 

Data entry

The majority of our surveys still have, whether partly or entirely, manual data entry. Despite the best intentions, errors can be introduced by manually entering records. The potential impact if the errors are not detected, will either be an overestimate or underestimate of the industry or total. These errors will usually be random, not systematic, though. The errors may even cancel out at times, so the magnitude of the impact is often quite small.

Currently errors are detected (and rectified) in peer review and editing processes. For example, macro editing is used to examine aggregated records and can identify errors that occur during the processing period. The processing section prepares a handover report which includes details of variations from the expected movements from the last period at the industry and sub-industry level. More resources (time) are used in industries where results are quite different to what was expected as these industries are more likely to contain any errors. We are however more likely to pick up errors that result in large movements from the previous period or when a small value is punched as a large value (in relation to other units in the industry). It is difficult to pick up errors that result in a large value punched as a small value (in relation to other units) or when a unit did not exist in the last period.

top

Apportionment of data

A response can sometimes be received for an entire enterprise, when it may be required for each geographic unit separately or even be required only for certain geographic units. It usually occurs due to the businesses' inability separate records down to a finer level.
Apportionment is the process of splitting up a reported value across a number of known parts of a business. For example, it is quite common in some surveys, such as the Quarterly Employment Survey, to apportion a value reported by the enterprise across a number of known geographic enterprises. The impact is difficult to assess, but we can get some idea since the extent (the number of instances) of apportionment can be determined. However, it is likely that some bias may be introduced due to an adjustment being made under the assumption that the response values can be apportioned across the business units.

Apportionment is generally made proportionally to the relative size of the geographic unit as part of the enterprise, for example, by the number of employees. So if the geographic unit of interest has 50 percent of the employees in the enterprise, then the response for that geographic unit may be 50 percent of the value reported by, and for, the entire enterprise. Often it is a combination of information that is used to apportion a reported value. We work through each survey on a regular basis to determine the extent of apportionment in the survey, and then review the questions asked and the information sought in the questionnaire. We also review whether the level that has been chosen for the reporting unit, for example the enterprise, is the most appropriate.

top

Business classification

Any classification errors that remain in our survey datasets were probably introduced when units were classified on the Business Frame, where a business may have been given the wrong industrial classification. This source of error could potentially also happen at the survey processing stage, when new or additional information is obtained. These errors can lead to incorrect industry or sector level estimates due to misclassification of units, and can result in either overestimates or underestimates for an industry or total. At this phase, there are not many changes to classifications and therefore this source of error is not a significant problem.

Currently checking is carried out through edits to search for and identify any coding errors. For example, a business may have quite an unusual value, but further investigation reveals that it is unusual only because it is in the wrong industry. As always, providing adequate training for staff and developing better systems will contribute to minimising this source of error.

top

Electronic data capture

Although we are aware of, and have investigated many of the new technologies available, applying alternative data capture methods to our situation is a fairly new area for Statistics New Zealand. Each data capture method is subject to some source of error in itself, for example, scanning and electronic imaging methods do introduce data errors, just as manual data entry does currently. Another consideration is that a respondent may not respond to a survey if they feel that the capture method is not confidential. For example, there may be perceived risks in using fax or email to respond to the survey.

If these methods are introduced, some random errors in the data will still remain. However, any errors introduced will be easier to measure and therefore it will be easier to set quality standards for monitoring performance over time. Further, as we continue our strong emphasis on security and confidentiality, the new technology should not deter many respondents. The overall impact of introducing these new methods will vary from survey to survey, but after thorough investigations and sound implementation, the impact should be a positive one. However, note that although data can be captured electronically independently of the questionnaire, for example, by email, the introduction of new technologies for data capture is inherently linked with questionnaire design, so an integrated approach is required (see Incorporating new technology).

Currently most of the data capture is done manually, except for some data received from the IRD, which is automated. However, recent investigations and trial runs for the scanning and imaging were very positive, confirming that there could be a significant improvement in efficiency and data quality in some cases. Any new potential errors should be controlled with the re-engineering of our processes, and ultimately an imaging centre could be far more sophisticated in the capture and checking of data than what can currently be done manually. At the moment, respondents in the Quarterly Employment Survey do have the option to provide data electronically in the form of spreadsheets. This is loaded in bulk, achieving both processing and cost reductions.

In the future we have planned further reviews for a number of surveys with regard to data capture methods such as scanning and imaging. In these reviews recognition error rates will be compared to punch error rates, and we will also investigate what timing and cost reductions we could make. We are currently setting up an internal imaging centre, where forms will be electronically scanned and recognised. Where investigations show that a significant improvement (including reduction of non-sampling error) is achievable, the survey processing system will gradually be integrated with this system.  

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+