*Measuring price change for consumer electronics using scanner data* describes the methodology we introduced for creating price indexes from consumer electronics scanner data.

From the September 2014 quarter, we incorporated retail transaction data, or ‘scanner data’, into the consumers price index (CPI), to measure price change for consumer electronics categories.

Read the publication below, or download and print the PDF from 'Available files'. If you have problems viewing the file, see opening files and PDFs.

Read the related information release: Consumers Price Index: September 2014 quarter.

## Using scanner data in the CPI

From the September 2014 quarter, we incorporated retail transaction data, or ‘scanner data’, into the consumers price index (CPI), to measure price change for consumer electronics categories.

### Consumer electronics categories that use scanner data

We use scanner data for 12 consumer electronics categories in the CPI:

- heat pumps
- desktop computers
- laptop computers
- tablet computers
- multi-function devices
- cellphone handsets
- digital cameras
- digital camera memory cards
- television sets
- set-top boxes for television sets
- DVD, Blu-ray players and player/recorders
- home theatre and stereo systems.

### Contribution to the CPI

In terms of expenditure weighting:

**heat pumps**contribute a fifth of the ‘major household appliances’ class, which contributes 0.71 percent to the all groups CPI expenditure weight for the June 2014 quarter**cellphone handsets**contribute over 90 percent of the ‘telecommunication equipment’ class, which contributes 0.29 percent to the all groups CPI**the remaining 10 categories**contribute four-fifths of the ‘audio-visual and computing equipment’ subgroup, which contributes 1.16 percent to the all groups CPI.

## Scanner data from market research company GfK

The market research company GfK supplies us with the scanner data. GfK adds information about product features, or characteristics, to the data collected from retailers.

Total monthly sales values and quantities sold are available, across retailers, for each product, along with extensive information about the characteristics of each product. The scanner data represents hundreds of thousands of transactions each month.

The number of characteristics in the data range from 10 for digital camera memory cards to 77 for digital cameras, as shown in Table 1 for August 2014, the latest month data is available.

The coverage of the data is between 80 and 95 percent of transactions for most of the categories, with lower coverage for some categories, such as heat pumps.

**Table 1
Number of characteristics in each category (August 2014)**

Category |
Characteristics |

Heat pumps | 27 |

Desktop computers | 56 |

Laptop computers | 71 |

Tablet computers | 73 |

Multi-function devices | 53 |

Cellphone handsets | 59 |

Digital cameras | 77 |

Digital camera memory cards | 10 |

Television sets | 62 |

Set-top boxes for television sets | 49 |

DVD, Blu-ray players, and player/recorders | 50 |

Home theatre and stereo systems | 62 |

The number of distinct products represented in the data for each category, in August 2014, is shown in Table 2.

**Table 2
Number of distinct products in each category (August 2014) **

Category |
Products |

Heat pumps | 72 |

Desktop computers | 107 |

Laptop computers | 445 |

Tablet computers | 148 |

Multi-function devices | 102 |

Cellphone handsets | 392 |

Digital cameras | 228 |

Digital camera memory cards | 254 |

Television sets | 325 |

Set-top boxes for television sets | 24 |

DVD, Blu-ray players, and player/recorders | 129 |

Home theatre and stereo systems | 224 |

## Benefits of using scanner data for consumer electronics

We have moved to using scanner data to measure price change for consumer electronics because it:

- allows more accurate price measurement
- allows us to re-use existing data
- accurately reflects seasonality in quantities
- reflects product substitution.

### Scanner data allows more accurate price measurement

Until now, we have relied on sampling consumer electronics prices across several dimensions – categories, products, outlets, and time. For each consumer electronics category, a product was priced at each of about 60 appliance retailers and department stores. When a product was no longer available, we replaced it with a similar product, based on discussion with retailers about market share and features.

We based quantities, or expenditure shares, on information we acquired during the Household Economic Survey reference period, and updated every three years.

In contrast, scanner data has the potential to give a more complete picture of both prices and quantities sold at any point in time. With information on the characteristics of each product, we are also able to use statistical methods to explicitly quality-adjust the price indexes.

Information on region of sale is not available in the data, so we will use national movements in each of the regions.

### Scanner data allows us to re-use existing data

Market research companies already collate consumer electronics scanner data for businesses, so it is good practice to re-use the data to generate official statistics. This reduces fieldwork, and the respondent load associated with collection, which involves observing products and prices in stores, and discussing product changes with store staff.

### Scanner data accurately reflects seasonality in quantities

Scanner data for consumer electronics has monthly information on both prices and quantities. Quantities can be seasonal, as shown in Figure 1 for digital cameras. Correspondingly, total expenditure on digital cameras is also highly seasonal. Note that average price, unadjusted for quality change, also shows seasonality, corresponding to cheaper cameras being bought around the Christmas period.

**Figure 1**

The current fixed-basket approach to price measurement, when applied to seasonal prices and quantities, has the potential to over- or under-state actual price movements when we combine seasonal prices with quantities that were fixed at an average annual level.

Figure 2 shows that when we appropriately incorporate the seasonal prices and quantities of digital cameras using the imputation Törnqvist rolling year GEKS (ITRYGEKS) index (discussed in the following section), the resulting quality-adjusted price movement is no longer seasonal.

**Figure 2**

### Scanner data reflects product substitution

Because scanner data provides prices and quantities for the most detailed level of product specification (ie the barcode level), we can use it to incorporate new products in the index at the time they are introduced, and to reflect their relative importance based on actual quantities sold.

We can also use scanner data to empirically test substitution effects across different product categories, to infer the appropriate level to fix expenditure shares between CPI basket and weight reviews.

## New methods to create price indexes

To produce price indexes from scanner data, we are using the imputation Törnqvist rolling year GEKS (ITRYGEKS) index. The ITRYGEKS is an extension of the rolling year GEKS (RYGEKS) index. Both indexes are covered in more detail below.

### The need for new methods

Consumer electronics is a rapidly changing product class, so it is particularly important that we use methods that will appropriately adjust for the change in quality of the products purchased.

Also, because consumer electronics products can have short life-cycles, it is important to introduce new products into the index in such a way that the implicit price movement associated with their introduction is appropriately reflected in the index. That is, if a new product has a low introductory price, relative to its set of features, then the price index should reflect this price decrease.

In addition to this requirement for appropriate quality adjustment, the CPI is non-revisable. This places another constraint on methods for creating price indexes from scanner data using statistical models, where the models are generally based on a past window of data – usually a year – ending with the most recent quarter.

The two key reasons why traditional index number methods do not work well in the case of scanner data are:

- the high level of ‘churn’ – or products appearing and disappearing from the market
- volatile prices and quantities due to discounting, which leads to a bias called ‘chain drift’ when chained superlative indexes such as a chained Törnqvist are used to continually update the basket in the presence of high churn.

Over the past five years, we have collaborated on research to determine an index method that is appropriate for producing non-revisable, quality-adjusted price indexes from scanner data.

### Rolling year GEKS (RYGEKS) index

Ivancic, Diewert, and Fox (2011) proposed a method for producing price indexes from scanner data that uses all the prices and quantities in the data, and is free of chain drift. Called the rolling year GEKS (RYGEKS) index, it is based on the Gini, Eltetö and Köves, and Szulc (GEKS) index used for multilateral spatial price indexes such as purchasing price parities – which compare prices in different countries at a point in time.

Within a window of time (usually just over one year – ie five quarters for a quarterly index, or 13 months for a monthly index), the RYGEKS index between periods *t1* and *t2* is the geometric mean of all the superlative bilateral indexes (such as the Törnqvist index or, as used in Ivancic et al (2011), the Fisher index) between:

*t1*and all the other periods in the window, and*t2*and all the other periods in the window.

The monthly RYGEKS, based on a 13-month rolling estimation window, is as follows:

For the first window (ie *T*=0 to 12), the RYGEKS index is equal to the GEKS index:

(equation 1)

Where is any superlative index (eg a Törnqvist index) between periods *i* and *j*.

From t=13 onwards, RYGEKS links on the most recent movement from the GEKS calculated on the next window (ie from *t*=1 to 13, then from *t*= 2 to 14, and so on) as follows:

** ** (equation 2)

and so on.

However, a limitation of the RYGEKS method is that it does not reflect the implicit price movements of new or disappearing products entering or leaving the market.

For example, if the initial price of the latest model of a cellphone is high relative to its set of features, then this implicit price increase is not reflected in the RYGEKS index.

### Imputation Törnqvist rolling year GEKS (ITRYGEKS) index

Jan de Haan, of Statistics Netherlands, proposed an extension of the RYGEKS index, called the imputation Törnqvist rolling year GEKS (ITRYGEKS). A paper on the method by de Haan and Statistics NZ senior researcher Frances Krsinich was recently published in the *Journal of Business and Economic Statistics* (de Haan and Krsinich, 2014).

The ITRYGEKS method is an extension of the GEKS method. Like the GEKS, the ITRYGEKS can utilise all the information in the data, while remaining free of chain-drift. In addition, it reflects the implicit price movements of new and disappearing products by imputing price movements based on statistical modelling of the relationship between price and product features.

Unlike the RYGEKS method described above, which is based on superlative indexes such as the Törnqvist or Fisher (ie the in equations (1) and (2), the ITRYGEKS index is based on ‘bilateral time-dummy hedonic’ indexes. The ‘Törnqvist’ in its name refers to the fact that it is algebraically equivalent to a Törnqvist index based on real and predicted prices, as shown in equation (4).

A bilateral time-dummy hedonic index between any periods *0* and *t* is derived from a statistical regression model based on the data from periods *0* and *t*.^{1}

The estimating equation for the bilateral time-dummy hedonic regression model is:

(equation 3)

Where:

is the log of the average monthly price for product *i* in period *t*.

is the intercept term.

has the value 1 if the observation relates to period *t* (*t* ≠ 0) and the value 0 if the observation relates to period 0.

is the quantity of the kth characteristic, or feature, for product *i*.^{2} By definition, this will be the same in both periods.^{3}

is an error term with an expected value of 0.

Since the model described by equation (3) controls for changes in the product characteristics, is a measure of quality-adjusted price change between periods *0* and *t* (de Haan and Krsinich, 2014).

And, when the average^{4} expenditure shares are used as weights for the matched products, and half of the expenditure shares for the unmatched products (in the periods they are available), the ITRYGEKS index from period *0* to *t* can be expressed as follows:

(equation 4)

Where:

is the set of matched products *i* with respect to periods *0* and *t*.

is the set of 'disappearing' products *i* with respect to periods *0* and *t* – that is, products that exist in period *0* but not in period *t*.

is the set of 'new' products *i* with respect to periods *0* and *t*.

is the expenditure share of product *i* in period *t*.

**Notes:**

^{1. This can be generalised to any two periods i and j.}

^{2. For categorical characteristics, such as those on the consumer electronics scanner data, each characteristic k will be represented by a set of dummy variables corresponding to all possible values of characteristic k – that is, variables which are set to 1 (or 0) in the presence (or absence) of that value of the characteristic. }

^{3. A product corresponds to a distinct set of characteristics, or features. Therefore, any change in characteristic would result in a different product. }

^{4. Across both time periods.}

## Implementing the ITRYGEKS index in production

This section summarises key decisions we made before using the ITRYGEKS index to produce price indexes for consumer electronics categories in the CPI.

We receive monthly scanner data from GfK, but the New Zealand CPI is a quarterly index. We can either derive the quarterly index from a monthly index, or pre-aggregate the data to a quarterly level before deriving a quarterly index.

While it is useful to produce a monthly index as part of the monitoring and analysis process, it is conceptually more appropriate to derive the quarterly index from quarterly average prices and expenditure shares. This ensures the prices for products sold in each month of the quarter are appropriately weighted for price deflation, to produce quarterly volume indexes in the National Accounts.

### Using two months of the quarter for production

Consumer electronics data for just the first two months of the quarter are available in time to incorporate into the CPI.

There are four options for how to deal with this limitation in production. Using back-data, we assessed each of these options against the benchmark of the index we could calculate if all three months of the quarter were available.

The four options were to:

- base the published index for the most recent quarter on the first two months of the quarter, with complete back-data feeding into the estimation (ie the third month of the quarter will be incorporated into the five-quarter estimation window for the following quarter’s index calculation)
- base the published index on three months of data, lagged by one month
- base the published index on only the middle month of each quarter
- base the published index on only the first two months of each quarter.

Option 1 performed best of all four options, and sits very close to the benchmark index.

Therefore, we derived the ITRYGEKS index for the latest quarter from the first two months of that latest quarter, with the third month of the quarter then being incorporated into the five-quarter estimation window used to calculate the following quarter's CPI, as shown in Figure 3.

**Figure 3
Data incorporated into each quarterly index at time of production**

Figure 3 shows that, for example, to calculate the quarterly price movement for the third quarter of 2014 (ie the top row), we use full quarterly data for each of the quarters from quarter three of 2013 through to quarter two of 2014, and the first two months of data for the third quarter of 2014.

### Integration with other changes arising from the 2014 CPI review

We implemented the ITRYGEKS index in production along with the CPI basket and weight changes arising from the 2014 CPI Review. The basket has been realigned to reflect the consumer electronics categories that we are measuring using scanner data. The price movement used in the CPI for the June to September 2014 quarters is a like-for-like movement based on the scanner data, linked in on the new price reference period – the June 2014 quarter.

Initially, we fixed the expenditure shares at the item level of the CPI basket for consumer electronics categories. However, we will review this decision in the future to determine whether we will use information in the scanner data to reflect substitution across consumer electronics categories, between the three-yearly CPI basket and weight reviews.

## References

de Haan, J, & Krsinich, F (2014). Scanner data and the treatment of quality change in nonrevisable price indexes. *Journal of Business and Economic Statistics, 32(3)*. Available from www.tandfonline.com

Ivancic, L, Diewert, WE, & Fox, KJ (2011). Scanner data, time aggregation and the construction of price indexes. *Journal of Econometrics, 161(1)*. Available from www.sciencedirect.com

Statistics New Zealand (2012). A fresh look at patterns in gadget sales. *Economic News*, April 2012. Available from www.stats.govt.nz

ISBN 978-0-478-42940-4 (online)

Published 6 November 2014