• Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Data in the IDI

Find out what microdata is available in the IDI, including the data dictionaries for each collection.

The IDI contains de-identified microdata about people and households, which comes from a range of government agencies, Statistics NZ surveys including the 2013 Census, and non-government organisations.

The IDI is updated (or ‘refreshed’) regularly to keep the data up-to-date. Most datasets are updated quarterly, however some data is updated less frequently.

The diagram below shows an overview of the datasets currently available in the IDI. Click on the image to access a downloadable PDF.

Image, Data in the IDI at October 2016.   

The IDI contains de-identified data

The data held in the IDI is de-identified data. De-identified data has had personal identifiers removed or encrypted (ie replaced with another number) so that data records are not associated with named individuals. De-identification is one of the ways we keep the data safe – it supports analytical insights while maintaining privacy and confidentiality.

For more information about the benefits, risks, and possible uses of de-identified data, see our de-identified data fact sheet.

New data

Upcoming datasets for the IDI and LBD
Find out which new datasets will be added to the IDI in future and which datasets have been added recently.

How to add a dataset to the IDI or LBD
Find out how to apply to get a new dataset added to the IDI and the different options for loading data.

Core data

These data tables are available to all IDI researchers with an approved research project.

Security concordance

What the data is about: An encrypted identifier for each identity in the IDI that is common across all datasets. Allows a researcher to link between the datasets they have access to, and shows where individuals do not link or are not present.

Personal details

What the data is about: Statistics NZ’s best estimate of demographic information, such as sex and birth month/year, derived from multiple collections in the IDI using a set of specific rules. Ethnicity variables are an ‘ever-indicator’ that shows all ethnicities an identity has recorded across data collections over time.

Source ranked ethnicity

What the data is about: Ranking of ethnicity information from all data sources. The ranking is based on an investigation by Statistics NZ’s Statistical Methods team, which looked at the potential for administrative data to provide ethnicity information comparable to the current census and the official series of ethnic population estimates.

Restricted data

We grant access to the following data on a case-by-case basis where a research project meets access criteria. This ensures approved researchers only get access to the IDI data essential for their project. 

If you are applying to use the IDI, please quote the dataset name and application code in the ‘data requirements’ section of your Data Lab application form.

Email access2microdata@stats.govt.nz to view data dictionaries not yet available below.

See:

Benefits and social services data

Auckland City Mission data

Source: Auckland City Mission
Time: From 1996
What the data is about:  Income, expenses, housing status, and household composition of Auckland City Mission clients, and the services these clients use. Auckland City Mission is a social service provider in Auckland CBD, that helps Aucklanders in need by providing effective integrated services and advocacy. Note: data dictionary available on the IDI Wiki in the Data Lab.
Application code: ACM

Benefit dynamics data

Source: Ministry of Social Development
Time: From 1990
What the data is about:  People who have received a working-age social welfare benefit. Provides basic information on demographic characteristics, and traces their changing benefit status and other circumstances. Also traces the benefit histories of partners and dependent children included in benefits. Note: data dictionary available on the IDI Wiki in the Data Lab.
Application code: MSD

Children's Action Plan

Source: Ministry of Social Development
Time: From 2013
What the data is about: This dataset has demographic and referral information of vulnerable children under CAP's care.
Application code: CYF

Youth services data

Source: Ministry of Social Development
Time: From 2004
What the data is about: Young people who are not in education, employment, or training (NEET) or are at risk of becoming NEET. The service aims to reduce the likelihood these youth will become welfare dependent, and to contribute to better life outcomes for them. Note: data dictionary available on the IDI Wiki in the Data Lab.
See IDI Data Dictionary: MSD Youth Service Interventions for more information.
Application code: YST

Back to list

Education and training data

Primary and secondary schools data

Source: Ministry of Education
Time: From 2007
What the data is about: Students enrolled at New Zealand schools, including student interventions, student NCEA qualifications attained, student qualifications and standards attained, university entrance attained and student Expected Percentile Score.
See IDI Data Dictionary: Primary and secondary schools data for more information.
Application code: MOE

Tertiary education data

Source: Ministry of Education
Time: From 1994
What the data is about: Students enrolled or who have completed qualifications in formal/non-formal tertiary qualifications at government-funded tertiary education organisations.
See IDI Data Dictionary: Tertiary education data for more information.
Application code: MOE

Industry training education data

Source: Ministry of Education
Time: From 2001
What the data is about: Individuals in programmes administered by an Industry Training Organisation in each training fund in a calendar year.
See IDI Data Dictionary: Industry training education data for more information.
Application code: MOE

Targeted training data

Source: Ministry of Education
Time: From 2001
What the data is about: Targeted training programmes: Gateway, Skill Enhancement, Training Opportunities, Foundation-Focused Training Opportunities, and Youth Training.
See IDI Data Dictionary: Targeted training data for more information.
Application code: MOE

Back to list

Family and household data

2013 Census

Source: Statistics NZ
Time: 5 March 2013
What the data is about: People and dwellings in New Zealand on census night. Provides a snapshot of our society at a point in time.
See IDI Data Dictionary: 2013 Census data for more information.
Application code: CENSUS

Life event data

Source: Department of Internal Affairs
Time: From 1840
What the data is about: Life events relating to births, deaths, marriages, and civil unions registered in New Zealand.
See IDI Data Dictionary: Life event data for more information.
Application code: DIA

Working for Families (WFF) research dataset

Source: Inland Revenue
Time: From 2003
What the data is about: A package designed to help make it easier to work and raise a family. It consists of WFF tax credits, Accommodation Supplement, and Childcare Assistance.
See IDI Data Dictionary: Working for Families research data for more information.
Application code: WFF

Child, Youth and Family data

Source: Ministry of Social Development
Time: From 1991
What the data is about: A child or young person (CYP). Information ranges from instances of neglect or abuse, concerns about behaviour or care, and offences committed by a CYP.
See IDI Data Dictionary: Child, Youth and Family data for more information.
Application code: CYF

Tenancy bond data

Source: Ministry of Business, Innovation and Employment (MBIE)
Time: From 2000
What the data is about: Bonds lodged for residential tenancy agreements. MBIE collects the data from bond lodgement forms completed by tenants and landlords for residential tenancy agreements.
See IDI Data Dictionary: Tenancy bond data for more information.
Application code: DBH

Social housing data

Source: Housing New Zealand Corporation
Time: From 1980
What the data is about: Individuals waiting for social housing, those in social housing, details about the tenancy, and the property details of government housing. Note: data dictionary available on the IDI Wiki in the Data Lab.
Application code: HNZ

Household Labour Force Survey (HLFS)

Source: Statistics NZ
Time: From 2006
What the data is about: Sample data about employment, unemployment, and people not in the labour force in New Zealand.
See IDI Data Dictionary: Household Labour Force Survey for more information.
Application code: HLFS

New Zealand Income Survey

Source: Statistics NZ
Time: From 2006
What the data is about: Sample data about sources of incomes, including wages and salaries, self-employment, government transfers, other private transfers, and investments. 
See IDI Data Dictionary: New Zealand Income Survey for more information.
Application code: HLFS 

Survey of Family Income and Employment (SoFIE)

Source: Statistics NZ
Time: 2002–10
What the data is about: Sample data about respondents’ work, family, household circumstances, income and net worth, and studies, and how these change over time.
See Survey of Family, Income, and Employment (SoFIE) for more information.
Application code: SOFIE

Household Economic Survey (HES)

Source: Statistics NZ
Time: From 2006
What the data is about: Sample data about household expenditure on goods and services, sources of income, as well as a wide range of demographic information on individuals and households.
See Household Economic Survey and Household Economic Survey (Income) for more information.
Application code: HES

Back to list

Geographic data

Address notification

Source: Multiple sources
What the data is about: Prioritised address history for all snz_uid where address information exists. Uses a simple set of business rules to limit the full address table to a best-guess list of residential addresses. Where possible, Statistics NZ has provided a corresponding meshblock, territorial authority, and regional council code for each snz_uid.
Application code: GEOGRAPHIC

Address notification – full

Source: Multiple sources
What the data is about: Address information from all sources, providing a full list of every geocoded address and collating all address change notifications. Researchers can apply their own rules to filter the table to best suit their needs. Where possible, Statistics NZ has provided a corresponding meshblock, territorial authority, and regional council code for each snz_uid.
Application code: GEOGRAPHIC

Person overseas spell

Source: Multiple sources
What the data is about: A summary of all border movements by individuals in MBIE’s immigration data.
Application code: GEOGRAPHIC

Note: geocoding allows us to make location information available to researchers as small geographic units called meshblocks, which keep specific addresses anonymous.

Back to list

Health and safety data

ACC injury claims data

Source: Accident Compensation Corporation
Time: From 1994
What the data is about: All claims made due to work-related or non-work related injury, whether the injury occurred in New Zealand or not.
See IDI Data Dictionary: ACC injury claims data for more information.
Application code: ACC

B4 School Check

Source: Ministry of Health
Time: From 2011
What the data is about: Results of the B4 School Check for each individual who undergoes the checks, and information about those who have declined. The B4 School Check aims to promote health and well-being in four-year-olds, and identify and address concerns that could affect a child’s ability to get the most benefit from school. 
See IDI Data Dictionary: B4 School check data for more information.
Application code: MOH_B4SC

Cancer registrations

Source: Ministry of Health
Time: From 1995
What the data is about: A subset of fields from the NZ Cancer Registry. Specifically, information on malignant cancer registrations for healthcare users in the population cohort. 
See IDI Data Dictionary: Cancer registrations data for more information.
Application code: MOH_CANCER_REG

Chronic condition/significant health event cohort

Source: Ministry of Health
Time: From 2007
What the data is about: Healthcare users in the population cohort diagnosed with one or more of eight chronic conditions/significant health events identified using a variety of sources. 
See IDI Data Dictionary: Chronic condition/significant health event cohort for more information.
Code: MOH_CHRONIC_CONDITION

General Medical Subsidy (GMS) claims data

Source: Ministry of Health
Time: From 2002
What the data is about: A subset of fields from GMS. Specifically, information from fee-for-service claims made by general practitioners.
See IDI Data Dictionary: General Medical Subsidy claims data for more information.
Application code: MOH_GMS_CLAIMS

Health Tracker

Source: Ministry of Health
Time: 2006–13
What the data is about: Data from Health Tracker, a health-focused census of people living in New Zealand, using data from national health data collections held by the Ministry of Health. Note: Located in the IDI sandpit database. The data dictionary is available on the IDI Wiki in the Data Lab.
Application code: MOH_HEALTH_TRACKER

Laboratory claims data

Source: Ministry of Health
Time: From 2003
What the data is about: A subset of fields from the Laboratory Claims Collection. Specifically, contains claim and payment information for laboratory tests processed by the General Transaction Processing System (GTPS). Also contains laboratory test information from Pegasus IPA and Medlab South. Note: Data dictionary available on the IDI Wiki in the Data Lab.
Application code: MOH_LAB_CLAIMS

Mortality data

Source: Ministry of Health
Time: From 1988
What the data is about: Data classifying the underlying cause of death for all deaths registered in New Zealand, including all registered fetal deaths (stillbirths), using the World Health Organization Rules and Guidelines for Mortality Coding. Note: Data dictionary available on the IDI Wiki in the Data Lab.
Application code: MOH_MORTALITY

National Immunisation Register (NIR)

Source: Ministry of Health
Time: From 2006
What the data is about: Data derived from unit record immunisation event information. A subset of all data available within the NIR. The NIR collection provides data for monitoring immunisation coverage and the progress of immunisation campaigns such as meningococcal B and HPV. Note: Data dictionary is available on the IDI Wiki in the Data Lab.
Application code: MOH_NIR

National Needs Assessment and Service Coordination Information (SOCRATES)

Source: Ministry of Health
What the data is about: The National Needs Assessment and Service Coordination Information (SOCRATES) is used by Ministry-funded Needs Assessment and Service Coordination (NASC) agencies to record information about clients who are eligible for Disability Support Services (DSS).
Application code: MOH_SOCRATES

National Non-Admitted Patient Collection (NNAPAC) data

Source: Ministry of Health
Time: From 2007
What the data is about: Subset of fields from NNPAC. Specifically, contains information about all non-admitted (eg outpatient) events reported for the population cohort.
See IDI Data Dictionary: National non-admitted patient collection data for more information.
Application code: MOH_NNPAC

Pharmaceutical data

Source: Ministry of Health
Time: From 2005
What the data is about: Subset of fields from the Pharmaceutical Claims Collection. Specifically, contains information about subsidised dispensings processed by the General Transaction Processing System (GTPS), including demographic information about healthcare users to whom these prescriptions were dispensed.
See IDI Data Dictionary: Pharmaceutical data for more information.
Application code: MOH_PHARMACEUTICAL

PHO enrolment data

Source: Ministry of Health
Time: From 2003
What the data is about: Subset of fields from the PHO Enrolment collection. Specifically, contains information about the demographics of the population cohort enrolled in a PHO, along with when they were enrolled, when they were last seen, and the practice type.
Application code: MOH_PHO_ENROLMENT

Population cohort demographics

Source: Ministry of Health
Time: From 2004
What the data is about: Subset of fields from the NHI. Includes the most up-to-date data available at the time of extraction about each healthcare user in the population cohort.
See IDI Data Dictionary: Population cohort name and demographics data for more information.
Application code: MOH_DEMOGRAPHICS

Population cohort addresses

Source: Ministry of Health
Time: 2005 to 2013
What the data is about: Subset of fields from the NHI. Specifically, contains all address information (including domicile code) for each healthcare user in the population cohort. Note: data dictionary is available on the IDI Wiki in the Data Lab.
Application code: MOH_ADDRESS

Programme for the Integration of Mental Health Data (PRIMHD)

Source: Ministry of Health
Time: From 2008
What the data is about: Subset of fields from PRIMHD. Specifically, contains data about the referral, what services (activities) were provided, and demographic information. Excludes outcomes, diagnosis, and legal status data.
See IDI Data Dictionary: Programme for the integration of mental health data for more information.
Application code: MOH_PRIMHD

Publicly funded hospital discharges – event and diagnosis/procedure information

Source: Ministry of Health
Time: From 1988
What the data is about: Subset of fields from the National Minimum Dataset (NMDS). Includes discharge and event data about publicly funded hospital events and demographic data reported for the population cohort.
See IDI Data Dictionary: Publicly funded hospital discharges - event and diagnosis/procedure information for more information.
Application code: MOH_HOSPITAL_DISCHARGES

Back to list

Justice data

Recorded crime offenders data

Source: New Zealand Police
Time: From 2009
What the data is about: Alleged offenders recorded by NZ Police as prescribed in the NZ Police National Recording Standard, who have been proceeded against by police.
See IDI Data Dictionary: Recorded crime offenders data for more information.
Application code: POL

Recorded crime victims data

Source: New Zealand Police
Time: From 2014
What the data is about: Victims of crime recorded by NZ Police as prescribed in the NZ Police National Recording Standard, where the matter came to Police attention.
See IDI Data Dictionary: Recorded crime victims data for more information.
Application code: POL

Court charges data

Source: Ministry of Justice
Time: 1992 to 2013
What the data is about: All charges processed by the courts pertaining to individuals tried as an adult (excludes Youth Court).
See IDI Data Dictionary: Court charges data for more information.
Application code: MOJ

Sentencing and remand data

Source: Department of Corrections
Time: From 1998
What the data is about: Data about Corrections’ management of convicted adult offenders who have received a community sentence or imprisonment, and people remanded until their trial is completed.
See IDI Data Dictionary: Sentencing and remand data for more information.
Application code: COR

Back to list

Student loans and allowances data

Student loans and allowances data from StudyLink

Source: Ministry of Social Development
Time: From 1992
What the data is about: Data provided by applicants, their education provider(s), and related parties, that is used by StudyLink to assess eligibility and entitlement to student allowances, student loans, and scholarships.
See IDI Data Dictionary: Student loans and allowances data from StudyLink for more information.
Application code: SLA

Student loans and allowances data from Inland Revenue

Source: Inland Revenue
Time: From 1992
What the data is about: Consolidated data extract (CDE) made of a series of student-loan-specific variables and unit-record monthly data (URMD), which is used for several purposes including contributing to the valuation of the student loans scheme.
See IDI Data Dictionary: Student loans and allowances data from Inland Revenue for more information.
Application code: SLA

Back to list

Tax and income data

Tax data (for non-government researchers)

Source: Inland Revenue
Time: From 1999
What the data is about: A subset of the employee monthly schedule (EMS) table, containing tax information relating to individuals. No access to business data.
See IDI Data Dictionary: IR tax data for more information.
Application code: IR_RESTRICT 

Tax data (for government researchers)

Source: Inland Revenue
Time: From 1999
What the data is about: Full tax information relating to individuals and businesses. Includes information on location, encrypted employer IRD numbers, and encrypted enterprise numbers.
See IDI Data Dictionary: IR tax data for more information.
Application code: IR (Note: access restricted to government researchers)

Tax derived tables

Source: Multiple sources
Time: From 1999
What the data is about: In March 2014, Statistics NZ introduced six derived tables after ceasing the use of Linked Employer-Employee Dataset (LEED) research tables. Release of the tables followed a period of consultation with various agencies, where they were asked which LEED tables they used and how these were used. Note: Data dictionary available on the IDI Wiki in the Data Lab.
Application code: IR (Note: access restricted to government researchers)

Business Register

Source: Multiple sources
Time: From 1999
What the data is about: Enterprise-level data, Permanent Business Number (PBN)-level data, the relationship between business IRD numbers to enterprises from the Statistics NZ Business Register. Note: Data dictionary available on the IDI Wiki in the Data Lab.
Application code: BR (Note: access restricted to government researchers)

Business-centred data is usually accessed via Statistics NZ’s Longitudinal Business Database (LBD). The LBD is a linked longitudinal database that includes a range of business information. It can be used in conjunction with the person-centred data in the IDI. Access is restricted to government researchers. See Longitudinal Business Database.

Back to list

Travel and migration data

Driver licence and motor vehicle registers

Source: New Zealand Transport Agency
What the data is about: driver licence information related to people who hold, or have held, a New Zealand driver licence; and motor vehicle related information for people with a vehicle currently registered in their name.
See IDI Data Dictionary: Transport data for more information.
Application code: NZTA

Immigration data

Source: Ministry of Business, Innovation and Employment
Time: From 1997
What the data is about: Immigration administrative data on migrants and international visitors applying for a visa to enter New Zealand. Includes all resident visa applications and those applying for a temporary stay (work, study, and visitor).
See IDI Data Dictionary: Immigration data for more information.
Application code: DOL

International travel and migration data

Source: Statistics NZ
Time: From 1997
What the data is about: International passenger arrivals to, and departures from, New Zealand. The New Zealand Customs Service has supplied Statistics NZ with electronic passport and flight records, and these are combined with information from arrival and departure cards.
See IDI Data Dictionary: International travel and migration data for more information.
Application code: CUS

Migrant Survey data

Source: Ministry of Business, Innovation and Employment
Time: From 2012
What the data is about: Data from the 2012 Migrant Survey, which is part of the Immigration Survey Monitoring Programme (ISMP). ISMP compiles information about migrants’ settlement and labour market outcomes, employers’ experience with migrants, and community attitudes towards immigration.
See IDI Data Dictionary: Migrant Survey data for more information.
Application code: MS

Longitudinal Immigration Survey of NZ

Source: Statistics NZ
Time: 2005–09
What the data is about: Data from the longitudinal immigration survey of NZ, which is designed to trace the pathways of migrants and produce a detailed, ongoing information base of their experiences and settlement outcomes. 
See Longitudinal Immigration Survey of NZ for more information. Note: data dictionary available on the IDI Wiki in the Data Lab.
Application code: LISNZ

Back to list

Updated 7 June 2017

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+