AHRI Data Repository
Data Catalog
  • Home
  • Microdata Catalog
  • Citations
  • Login
    Login
    Home / Central Data Catalog / ADHOC / AHRI.VERBALAUTOPSY.HARMONISATION
AdHoc

AHRI:Health and Demographic Surveillance System: Deaths and Causes of Death from Verbal Autopsies (2000-2024)

South Africa, 2000 - 2025
Get Microdata
Reference ID
AHRI.VerbalAutopsy.Harmonisation
Producer(s)
Ariane Sessego, Siyabonga Nxumalo, Dickman Gareta, Dr. Alison Castle, Dr. Guy Harling, Prof. Mark Siedner, Prof. Janet Seeley, Prof. Collins Iwuji, Prof. Willem Hanekom, Dr. Kobus Herbst
Collections
AdHoc Datasets
Metadata
Documentation in PDF DDI/XML JSON
Created on
Mar 31, 2026
Last modified
Apr 01, 2026
Page views
105
  • Study Description
  • Data Dictionary
  • Downloads
  • Get Microdata
  • Identification
  • Version
  • Scope
  • Coverage
  • Producers and sponsors
  • Sampling
  • Data collection
  • Data processing
  • Data Access
  • Metadata production
  • Identification

    Survey ID number

    AHRI.VerbalAutopsy.Harmonisation

    Title

    AHRI:Health and Demographic Surveillance System: Deaths and Causes of Death from Verbal Autopsies (2000-2024)

    Country
    Name Country code
    South Africa ZA
    Abstract

    This dataset documents deaths and causes of deaths captured through verbal autopsies among participants of the Africa Health Research Institute's Health and Demographic Surveillance System from 2000 to 2024. Verbal autopsy (VA) is a standardized method of cause of death estimation and based on a structured post-mortem survey administered to the final caretaker of the deceased about medical history and symptoms preceding deaths (Chandramohan et al. 2021). A total of 27,431 deaths were recorded over this period, 26,290 (95.8%) of which have completed verbal autopsies.

    The datasets includes demographic information (age, sex), probable causes of death, detailed verbal autopsy symptoms, and information on death setting and health services utilization prior to death. Probable causes of death were estimated using two automated algorithms, InterVA5 (Byass et al. 2019) and InSilicoVA (McCormick et al. 2016), which assign cause-of-death probabilities based on reported symptoms and demographic characteristics. Both algorithms are based on the same set of prior probabilities, however their method of estimation differ. InterVA5 uses a deterministic formula to estimate causes of death for each death, and selects up to three most probable causes (Byass et al. 2019). These probabilities can be summed to analyze cause specific mortality fractions at the population level. By contrast, InSilicoVA uses bayesian estimation methods to estimate causes of death at the individual level and at the population level with credible intervals (McCormick et al. 2016). For InSilicoVA, we include here the top causes of death at the individual level1. To analyze cause-specific mortality trends at the population level with InSilicoVA, we advise users to run the algorithm using the detailed symptoms to obtain cause-specific mortality fractions and credible intervals at the level fitted to their analysis (for guidance see Li et al. 2023).

    Compared to InterVA5, InSilicoVA does not have pre-defined rules to select the top causes of death (Byass et al. 2012). For this dataset, InSilicoVA's top causes of death were selected according to the following criteria (similar to InterVA's rules) : a cause is considered "Undetermined" if the most probable cause is associated with a mean posterior probability <= 40%, otherwise the most probable cause is selected as the first cause. A second and a third cause are selected, if their mean posterior probability is higher or equal to 20%. We included the 95% posterior credible interval (CI) for each selected cause. Note that a small minority of deaths might have a mean posterior probability outside of their CI due to skewed probability distributions.

    There are two datasets:

    1. AHRI.HDSS.DEATHS-Cause-of-death-2000-2024.v1 a dataset documenting all deaths registered in the HDSS during this period, with information on whether a verbal autopsy was completed and probable causes of death estimated through InterVA5 and InSilicoVA. Probable stillbirths are included among deaths, as they are subject to verbal autopsies and diagnosis may remain uncertain. Users might consider removing them for mortality analyses.

    2. AHRI.HDSS.VA-detailed-input-data-2000-2024.v1, a dataset that includes detailed verbal autopsy symptoms and circumstances of death data (healthcare access, cost of care… - D'Ambruoso et al. 2016, 2021) for deaths with completed verbal autopsies. It can be used to run the algorithms. It also includes five additional covid-related symptoms, which are not used for cause of death estimation. Its format corresponds to the input required to run the algorithms, so these variables are not labelled. The variable values are as follows 'y' = 'yes', 'n' = 'no' and '-' = 'I don't know or not applicable'.

    Some question labels are too long to appear in full on this document. Please refer to the "Variables documentation" in the resource files for the full question corresponding to the variable.

    These datasets encompass 25 years of data collection, during which the VA questionnaires evolved. The data have been harmonized across questionnaire versions to create a consistent dataset suitable for longitudinal analyses of mortality and cause-specific mortality patterns. We evaluated the impact of questionnaire evolutions on cause of death estimation and highlight limited but noticeable cause-dependent discrepancies between 2000-2016 and 2017-2024, for more information see the harmonization report in the resource files (soon to be made available in the resource files).

    Resource files include:
    · VA Questionnaires: all versions of VA questionnaires (four main versions : 2000-2009, 2010-2016, 2017-2021, 2022-2024)

    · Variable documentation: documents the availability of each variable depending on the VA questionnaire version

    · Causetext_VA_causes_and_ICD10_equivalent.csv: a spreadsheet with the classification of causes of death used by InterVA and InSilicoVA and their ICD-10 equivalent.

    · Mapping files: correspondence tables documenting which variables from the historical data are used to produced the harmonized dataset

    · Harmonization report (that will be made available soon): describing the harmonization process and the consistency analysis carried out to assess comparability of cause-of-death estimation across questionnaire versions.

    The code used to harmonize the data and estimate causes of death is also available on Github
    https://github.com/AHRIORG/VAConvert.jl/releases/tag/release_v.1

    References

    · Byass, Peter, Daniel Chandramohan, Samuel J. Clark, et al. 2012. 'Strengthening Standardised Interpretation of Verbal Autopsy Data: The New InterVA-4 Tool'. Global Health Action 5 (1): 19281. https://doi.org/10.3402/gha.v5i0.19281.

    · Byass, Peter, Laith Hussain-Alkhateeb, Lucia D'Ambruoso, et al. 2019. 'An Integrated Approach to Processing WHO-2016 Verbal Autopsy Data: The InterVA-5 Model'. BMC Medicine 17 (1): 102. https://doi.org/10.1186/s12916-019-1333-6.

    · Chandramohan, Daniel, Edward Fottrell, Jordana Leitao, et al. 2021. 'Estimating Causes of Death Where There Is No Medical Certification: Evolution and State of the Art of Verbal Autopsy'. Global Health Action 14 (sup1): 1982486. https://doi.org/10.1080/16549716.2021.1982486.

    · D'Ambruoso, Lucia, Kathleen Kahn, Ryan G. Wagner, et al. 2016. 'Moving from Medical to Health Systems Classifications of Deaths: Extending Verbal Autopsy to Collect Information on the Circumstances of Mortality'. Global Health Research and Policy 1 (1): 2. https://doi.org/10.1186/s41256-016-0002-y.

    · D'Ambruoso, Lucia, Jessica Price, Eilidh Cowan, et al. 2021. 'Refining Circumstances of Mortality Categories (COMCAT): A Verbal Autopsy Model Connecting Circumstances of Deaths with Outcomes for Public Health Decision-Making'. Global Health Action 14 (sup1): 2000091. https://doi.org/10.1080/16549716.2021.2000091.

    · Gareta, Dickman, Kathy Baisley, Thobeka Mngomezulu, et al. 2021. 'Cohort Profile Update: Africa Centre Demographic Information System (ACDIS) and Population-Based HIV Survey'. International Journal of Epidemiology 50 (1): 33-34. https://doi.org/10.1093/ije/dyaa264.

    · Li, Zehang Richard, Jason Thomas, Eungang Choi, Tyler H. McCormick, and Samuel J. Clark. 2023. 'The openVA Toolkit for Verbal Autopsies'. The R Journal, February 25, 1.

    · McCormick, Tyler H., Zehang Richard Li, Clara Calvert, Amelia C. Crampin, Kathleen Kahn, and Samuel J. Clark. 2016. 'Probabilistic Cause-of-Death Assignment Using Verbal Autopsies'. Journal of the American Statistical Association 111 (515): 1036-49. https://doi.org/10.1080/01621459.2016.1152191.

    Kind of Data

    Population surveillance (death registration) and survey data (verbal autopsies)

    Unit of Analysis

    Deceased individuals

    Version

    Version Description

    v1.0.0

    Scope

    Topics
    Topic Vocabulary URI
    Population Surveillance, Deaths, Causes of death, Verbal Autopsy, Circumstances of death Harmonisa data Africa Health Research Institute www.ahri.org
    Keywords
    South Africa, mortality, deaths, causes of death, verbal autopsy, surveillance

    Coverage

    Geographic Coverage

    AHRI's Health and Demographic Surveillance System (HDSS) in uMkhanyakude District, KwaZulu-Natal, South Africa. From 2000 to 2016, the HDSS covered an area of 438 km2 with a population of approximately 85 000 people. It expanded in 2017 to cover an area of 845 km2 and ~150,000 participants (Gareta et al. 2021). .

    Universe

    AHRI.HDSS.DEATHS-Cause-of-death-2000-2024.v1: All participants of AHRI's HDSS deceased between 01-01-2000 and 31-12-2024
    AHRI.HDSS.VA-detailed-input-data-2000-2024.v1: All deceased in AHRI.HDSS.DEATHS-Cause-of-death-2000-2024.v1 with a complete verbal autopsy

    Producers and sponsors

    Primary investigators
    Name Affiliation
    Ariane Sessego Africa Health Research Institute, French Institute of Demographic Studies (INED), Ecole des Hautes Etudes en Sciences Sociales (Centre Maurice Halbwachs)
    Siyabonga Nxumalo Africa Health Research Institute
    Dickman Gareta Africa Health Research Institute
    Dr. Alison Castle Africa Health Research Institute
    Dr. Guy Harling Africa Health Research Institute
    Prof. Mark Siedner Africa Health Research Institute
    Prof. Janet Seeley Africa Health Research Institute
    Prof. Collins Iwuji Africa Health Research Institute
    Prof. Willem Hanekom Africa Health Research Institute
    Dr. Kobus Herbst Africa Health Research Institute
    Producers
    Name
    Africa Health Research Institute
    Funding Agency/Sponsor
    Name Abbreviation Role
    Wellcome Trust WT Core funding
    SAPRIN SAPRIN
    Ecole des Hautes Etudes en Sciences Sociales EHESS
    French Institute of Demographic Studies INED
    Other Identifications/Acknowledgments
    Name Affiliation Role
    Sweetness Dube Africa Health Research Institute Data Documentation
    AHRI Data Collection Team Africa Health Research Institute Data Collection
    AHRI Community Engagement Team Africa Health Research Institute Community Engagement
    Nompumelelo Mkwanazi Africa Health Research Institute Community Engagement
    We thank the household members of the HDSS. We acknowledge the support of the study and field staff at the Africa Health Research Institute, particularly the verbal autopsy nurses, who made this study possible Africa Health Research Institute Data Documentation
    We would also like to thank the openVA team who developed and implemented the tools used here to estimate causes of death and provided support during part of the production process. In particular, we would like to thank Jason Thomas, Sam Clark, Zehang Richard Li and Tyler McCormick Africa Health Research Institute Data Documentation

    Sampling

    Sampling Procedure

    This dataset is not based on a sample, it contains information from the complete demographic surveillance area.

    Data collection

    Dates of Data Collection
    Start End Cycle
    2000-01-15 2025-07-28 Data collection dates
    2000-01-01 2024-12-31 Time period:Deaths that occurred

    Data processing

    Data Editing

    Data collection mode

    Face-to-face between January 2000 and April 2021.
    From May 2021 to 2024, first three attempts are telephonic. If they are not successful interviews attempts are made face to face.
    The mode of data collection is documented with the variable VAInterview for each death.

    Notes on Data Collection

    AHRI's HDSS monitors vital events (births, death, in- and out-migration) in the rural district of uMkhanyakude, KwaZulu-Natal, South Africa (Gareta et. al 2021). When a death is recorded, a trained nurses conducts a VA with the caregiver of the deceased, after a bereavement period of at least 3 months.
    From 2000 to 2016, verbal autopsies were collected through paper-based questionnaires. In 2017, the data collection process was digitalized. Different questionnaires were used during the period, with four significant changes (2000-2009, 2010-2016, 2017-2021, 2022-2024). For more details, see the harmonization report and Questionnaires in the resource files.

    Reference: Gareta, Dickman, Kathy Baisley, Thobeka Mngomezulu, et al. 2021. 'Cohort Profile Update: Africa Centre Demographic Information System (ACDIS) and Population-Based HIV Survey'. International Journal of Epidemiology 50 (1): 33-34. https://doi.org/10.1093/ije/dyaa264.

    Harmonization Process

    These datasets are the result of the harmonization of data collected across 25 years and four different verbal autopsy questionnaires. To ensure transparency and reproducibility, this was carried out using a structured data processing framework implemented in a Julia package, VAConvert. This framework is based on correspondence tables that systematically document which variables from the historical data are used to produced the harmonized input format required by InterVA5 and InSilicoVA (see the harmonization report for more details).

    These correspondence tables are available in the resource files (Mapping files) and the code used for harmonization is available on Github (https://github.com/AHRIORG/VAConvert.jl/releases/tag/release_v.1).

    Cause-of-death Estimation

    Causes of death were estimated using the R package openVA (Li et al. 2023). Default parameters were used for InterVA5 and InSilicoVA. For InSilicoVA, the seed used was 19800508 (date of the declaration of smallpox eradication!). The detailed code is available on Github (https://github.com/AHRIORG/VAConvert.jl/releases/tag/release_v.1, src/CoD_attribution.R).

    Compared to InterVA5, InSilicoVA does not have pre-defined rules to select the top causes of death (Byass et al. 2012). For this dataset, InSilicoVA's top causes of death were selected according to the following criteria (similar to InterVA's rules): a cause is considered “Undetermined” if the most probable cause is associated with a mean posterior probability <= 40%, otherwise the most probable cause is selected as the first cause. A second and a third cause are selected, if their mean posterior probability is higher or equal to 20%. We included the 95% posterior credible interval (CI) for each selected cause. Note that a small minority of deaths might have a mean posterior probability outside of their CI due to skewed probability distributions.

    Reference:

    · Byass, Peter, Daniel Chandramohan, Samuel J. Clark, et al. 2012. 'Strengthening Standardised Interpretation of Verbal Autopsy Data: The New InterVA-4 Tool'. Global Health Action 5 (1): 19281. https://doi.org/10.3402/gha.v5i0.19281.
    · Li, Zehang Richard, Jason Thomas, Eungang Choi, Tyler H. McCormick, and Samuel J. Clark. 2023. 'The openVA Toolkit for Verbal Autopsies'. The R Journal, February 25, 1.

    Data Access

    Access conditions

    Access to the data requires accurate completion of the online data access application form accessible on the AHRI Data repository(https://data.ahri.org/). Data users are required to abide by the data use conditions stipulated on the application for access to the data. Failure to do so may result in their data access privileges being revoked by the Data Custodian. In order to recognise the effort and intellectual contributions of AHRI investigators in producing and curating the data, users of AHRI data must acknowledge the source of the data and abide by the terms and conditions under which the data is accessed and must cite the dataset in publication using the citation provided as part of this documentation. All analytical datasets published on the AHRI Data Repository are assigned digital object identifier (DOIs) and the DOIs can be found on the Data Repository under Study Description tab - Access policy. AHRI data users are required to always cite the dataset using the relevant DOI.

    Citation requirements

    Sessego, A., Nxumalo, S., Gareta, D., Castle, A., Harling, G., Siedner, M., Seeley, J., Iwuji, C., Hanekom, W., & Herbst, K. (2026). AHRI:Health and Demographic Surveillance System: Deaths and Causes of Death from Verbal Autopsies (2000-2024) [Data set]. Africa Health Research Institute.

    DOI: https://doi.org/10.23664/AHRI.VERBALAUTOPSY.HARMONISATION

    Metadata production

    DDI Document ID

    DDI.AHRI.VerbalAutopsy.Harmonisation

    Producers
    Name Abbreviation
    Africa Health Research Institute AHRI
    Back to Catalog
    AHRI Data Repository

    © AHRI Data Repository, All Rights Reserved.