AHRI.VerbalAutopsy.Harmonisation
AHRI:Health and Demographic Surveillance System: Deaths and Causes of Death from Verbal Autopsies (2000-2024)
| Name | Country code |
|---|---|
| South Africa | ZA |
This dataset documents deaths and causes of deaths captured through verbal autopsies among participants of the Africa Health Research Institute's Health and Demographic Surveillance System from 2000 to 2024. Verbal autopsy (VA) is a standardized method of cause of death estimation and based on a structured post-mortem survey administered to the final caretaker of the deceased about medical history and symptoms preceding deaths (Chandramohan et al. 2021). A total of 27,431 deaths were recorded over this period, 26,290 (95.8%) of which have completed verbal autopsies.
The datasets includes demographic information (age, sex), probable causes of death, detailed verbal autopsy symptoms, and information on death setting and health services utilization prior to death. Probable causes of death were estimated using two automated algorithms, InterVA5 (Byass et al. 2019) and InSilicoVA (McCormick et al. 2016), which assign cause-of-death probabilities based on reported symptoms and demographic characteristics. Both algorithms are based on the same set of prior probabilities, however their method of estimation differ. InterVA5 uses a deterministic formula to estimate causes of death for each death, and selects up to three most probable causes (Byass et al. 2019). These probabilities can be summed to analyze cause specific mortality fractions at the population level. By contrast, InSilicoVA uses bayesian estimation methods to estimate causes of death at the individual level and at the population level with credible intervals (McCormick et al. 2016). For InSilicoVA, we include here the top causes of death at the individual level1. To analyze cause-specific mortality trends at the population level with InSilicoVA, we advise users to run the algorithm using the detailed symptoms to obtain cause-specific mortality fractions and credible intervals at the level fitted to their analysis (for guidance see Li et al. 2023).
Compared to InterVA5, InSilicoVA does not have pre-defined rules to select the top causes of death (Byass et al. 2012). For this dataset, InSilicoVA's top causes of death were selected according to the following criteria (similar to InterVA's rules) : a cause is considered "Undetermined" if the most probable cause is associated with a mean posterior probability <= 40%, otherwise the most probable cause is selected as the first cause. A second and a third cause are selected, if their mean posterior probability is higher or equal to 20%. We included the 95% posterior credible interval (CI) for each selected cause. Note that a small minority of deaths might have a mean posterior probability outside of their CI due to skewed probability distributions.
There are two datasets:
AHRI.HDSS.DEATHS-Cause-of-death-2000-2024.v1 a dataset documenting all deaths registered in the HDSS during this period, with information on whether a verbal autopsy was completed and probable causes of death estimated through InterVA5 and InSilicoVA. Probable stillbirths are included among deaths, as they are subject to verbal autopsies and diagnosis may remain uncertain. Users might consider removing them for mortality analyses.
AHRI.HDSS.VA-detailed-input-data-2000-2024.v1, a dataset that includes detailed verbal autopsy symptoms and circumstances of death data (healthcare access, cost of care… - D'Ambruoso et al. 2016, 2021) for deaths with completed verbal autopsies. It can be used to run the algorithms. It also includes five additional covid-related symptoms, which are not used for cause of death estimation. Its format corresponds to the input required to run the algorithms, so these variables are not labelled. The variable values are as follows 'y' = 'yes', 'n' = 'no' and '-' = 'I don't know or not applicable'.
Some question labels are too long to appear in full on this document. Please refer to the "Variables documentation" in the resource files for the full question corresponding to the variable.
These datasets encompass 25 years of data collection, during which the VA questionnaires evolved. The data have been harmonized across questionnaire versions to create a consistent dataset suitable for longitudinal analyses of mortality and cause-specific mortality patterns. We evaluated the impact of questionnaire evolutions on cause of death estimation and highlight limited but noticeable cause-dependent discrepancies between 2000-2016 and 2017-2024, for more information see the harmonization report in the resource files (soon to be made available in the resource files).
Resource files include:
· VA Questionnaires: all versions of VA questionnaires (four main versions : 2000-2009, 2010-2016, 2017-2021, 2022-2024)
· Variable documentation: documents the availability of each variable depending on the VA questionnaire version
· Causetext_VA_causes_and_ICD10_equivalent.csv: a spreadsheet with the classification of causes of death used by InterVA and InSilicoVA and their ICD-10 equivalent.
· Mapping files: correspondence tables documenting which variables from the historical data are used to produced the harmonized dataset
· Harmonization report (that will be made available soon): describing the harmonization process and the consistency analysis carried out to assess comparability of cause-of-death estimation across questionnaire versions.
The code used to harmonize the data and estimate causes of death is also available on Github
https://github.com/AHRIORG/VAConvert.jl/releases/tag/release_v.1
References
· Byass, Peter, Daniel Chandramohan, Samuel J. Clark, et al. 2012. 'Strengthening Standardised Interpretation of Verbal Autopsy Data: The New InterVA-4 Tool'. Global Health Action 5 (1): 19281. https://doi.org/10.3402/gha.v5i0.19281.
· Byass, Peter, Laith Hussain-Alkhateeb, Lucia D'Ambruoso, et al. 2019. 'An Integrated Approach to Processing WHO-2016 Verbal Autopsy Data: The InterVA-5 Model'. BMC Medicine 17 (1): 102. https://doi.org/10.1186/s12916-019-1333-6.
· Chandramohan, Daniel, Edward Fottrell, Jordana Leitao, et al. 2021. 'Estimating Causes of Death Where There Is No Medical Certification: Evolution and State of the Art of Verbal Autopsy'. Global Health Action 14 (sup1): 1982486. https://doi.org/10.1080/16549716.2021.1982486.
· D'Ambruoso, Lucia, Kathleen Kahn, Ryan G. Wagner, et al. 2016. 'Moving from Medical to Health Systems Classifications of Deaths: Extending Verbal Autopsy to Collect Information on the Circumstances of Mortality'. Global Health Research and Policy 1 (1): 2. https://doi.org/10.1186/s41256-016-0002-y.
· D'Ambruoso, Lucia, Jessica Price, Eilidh Cowan, et al. 2021. 'Refining Circumstances of Mortality Categories (COMCAT): A Verbal Autopsy Model Connecting Circumstances of Deaths with Outcomes for Public Health Decision-Making'. Global Health Action 14 (sup1): 2000091. https://doi.org/10.1080/16549716.2021.2000091.
· Gareta, Dickman, Kathy Baisley, Thobeka Mngomezulu, et al. 2021. 'Cohort Profile Update: Africa Centre Demographic Information System (ACDIS) and Population-Based HIV Survey'. International Journal of Epidemiology 50 (1): 33-34. https://doi.org/10.1093/ije/dyaa264.
· Li, Zehang Richard, Jason Thomas, Eungang Choi, Tyler H. McCormick, and Samuel J. Clark. 2023. 'The openVA Toolkit for Verbal Autopsies'. The R Journal, February 25, 1.
· McCormick, Tyler H., Zehang Richard Li, Clara Calvert, Amelia C. Crampin, Kathleen Kahn, and Samuel J. Clark. 2016. 'Probabilistic Cause-of-Death Assignment Using Verbal Autopsies'. Journal of the American Statistical Association 111 (515): 1036-49. https://doi.org/10.1080/01621459.2016.1152191.
Population surveillance (death registration) and survey data (verbal autopsies)
Deceased individuals
v1.0.0
| Topic | Vocabulary | URI |
|---|---|---|
| Population Surveillance, Deaths, Causes of death, Verbal Autopsy, Circumstances of death Harmonisa data | Africa Health Research Institute | www.ahri.org |
AHRI's Health and Demographic Surveillance System (HDSS) in uMkhanyakude District, KwaZulu-Natal, South Africa. From 2000 to 2016, the HDSS covered an area of 438 km2 with a population of approximately 85 000 people. It expanded in 2017 to cover an area of 845 km2 and ~150,000 participants (Gareta et al. 2021). .
AHRI.HDSS.DEATHS-Cause-of-death-2000-2024.v1: All participants of AHRI's HDSS deceased between 01-01-2000 and 31-12-2024
AHRI.HDSS.VA-detailed-input-data-2000-2024.v1: All deceased in AHRI.HDSS.DEATHS-Cause-of-death-2000-2024.v1 with a complete verbal autopsy
| Name | Affiliation |
|---|---|
| Ariane Sessego | Africa Health Research Institute, French Institute of Demographic Studies (INED), Ecole des Hautes Etudes en Sciences Sociales (Centre Maurice Halbwachs) |
| Siyabonga Nxumalo | Africa Health Research Institute |
| Dickman Gareta | Africa Health Research Institute |
| Dr. Alison Castle | Africa Health Research Institute |
| Dr. Guy Harling | Africa Health Research Institute |
| Prof. Mark Siedner | Africa Health Research Institute |
| Prof. Janet Seeley | Africa Health Research Institute |
| Prof. Collins Iwuji | Africa Health Research Institute |
| Prof. Willem Hanekom | Africa Health Research Institute |
| Dr. Kobus Herbst | Africa Health Research Institute |
| Name |
|---|
| Africa Health Research Institute |
| Name | Abbreviation | Role |
|---|---|---|
| Wellcome Trust | WT | Core funding |
| SAPRIN | SAPRIN | |
| Ecole des Hautes Etudes en Sciences Sociales | EHESS | |
| French Institute of Demographic Studies | INED |
| Name | Affiliation | Role |
|---|---|---|
| Sweetness Dube | Africa Health Research Institute | Data Documentation |
| AHRI Data Collection Team | Africa Health Research Institute | Data Collection |
| AHRI Community Engagement Team | Africa Health Research Institute | Community Engagement |
| Nompumelelo Mkwanazi | Africa Health Research Institute | Community Engagement |
| We thank the household members of the HDSS. We acknowledge the support of the study and field staff at the Africa Health Research Institute, particularly the verbal autopsy nurses, who made this study possible | Africa Health Research Institute | Data Documentation |
| We would also like to thank the openVA team who developed and implemented the tools used here to estimate causes of death and provided support during part of the production process. In particular, we would like to thank Jason Thomas, Sam Clark, Zehang Richard Li and Tyler McCormick | Africa Health Research Institute | Data Documentation |
This dataset is not based on a sample, it contains information from the complete demographic surveillance area.
| Start | End | Cycle |
|---|---|---|
| 2000-01-15 | 2025-07-28 | Data collection dates |
| 2000-01-01 | 2024-12-31 | Time period:Deaths that occurred |
Data collection mode
Face-to-face between January 2000 and April 2021.
From May 2021 to 2024, first three attempts are telephonic. If they are not successful interviews attempts are made face to face.
The mode of data collection is documented with the variable VAInterview for each death.
Notes on Data Collection
AHRI's HDSS monitors vital events (births, death, in- and out-migration) in the rural district of uMkhanyakude, KwaZulu-Natal, South Africa (Gareta et. al 2021). When a death is recorded, a trained nurses conducts a VA with the caregiver of the deceased, after a bereavement period of at least 3 months.
From 2000 to 2016, verbal autopsies were collected through paper-based questionnaires. In 2017, the data collection process was digitalized. Different questionnaires were used during the period, with four significant changes (2000-2009, 2010-2016, 2017-2021, 2022-2024). For more details, see the harmonization report and Questionnaires in the resource files.
Reference: Gareta, Dickman, Kathy Baisley, Thobeka Mngomezulu, et al. 2021. 'Cohort Profile Update: Africa Centre Demographic Information System (ACDIS) and Population-Based HIV Survey'. International Journal of Epidemiology 50 (1): 33-34. https://doi.org/10.1093/ije/dyaa264.
Harmonization Process
These datasets are the result of the harmonization of data collected across 25 years and four different verbal autopsy questionnaires. To ensure transparency and reproducibility, this was carried out using a structured data processing framework implemented in a Julia package, VAConvert. This framework is based on correspondence tables that systematically document which variables from the historical data are used to produced the harmonized input format required by InterVA5 and InSilicoVA (see the harmonization report for more details).
These correspondence tables are available in the resource files (Mapping files) and the code used for harmonization is available on Github (https://github.com/AHRIORG/VAConvert.jl/releases/tag/release_v.1).
Cause-of-death Estimation
Causes of death were estimated using the R package openVA (Li et al. 2023). Default parameters were used for InterVA5 and InSilicoVA. For InSilicoVA, the seed used was 19800508 (date of the declaration of smallpox eradication!). The detailed code is available on Github (https://github.com/AHRIORG/VAConvert.jl/releases/tag/release_v.1, src/CoD_attribution.R).
Compared to InterVA5, InSilicoVA does not have pre-defined rules to select the top causes of death (Byass et al. 2012). For this dataset, InSilicoVA's top causes of death were selected according to the following criteria (similar to InterVA's rules): a cause is considered “Undetermined” if the most probable cause is associated with a mean posterior probability <= 40%, otherwise the most probable cause is selected as the first cause. A second and a third cause are selected, if their mean posterior probability is higher or equal to 20%. We included the 95% posterior credible interval (CI) for each selected cause. Note that a small minority of deaths might have a mean posterior probability outside of their CI due to skewed probability distributions.
Reference:
· Byass, Peter, Daniel Chandramohan, Samuel J. Clark, et al. 2012. 'Strengthening Standardised Interpretation of Verbal Autopsy Data: The New InterVA-4 Tool'. Global Health Action 5 (1): 19281. https://doi.org/10.3402/gha.v5i0.19281.
· Li, Zehang Richard, Jason Thomas, Eungang Choi, Tyler H. McCormick, and Samuel J. Clark. 2023. 'The openVA Toolkit for Verbal Autopsies'. The R Journal, February 25, 1.
Access to the data requires accurate completion of the online data access application form accessible on the AHRI Data repository(https://data.ahri.org/). Data users are required to abide by the data use conditions stipulated on the application for access to the data. Failure to do so may result in their data access privileges being revoked by the Data Custodian. In order to recognise the effort and intellectual contributions of AHRI investigators in producing and curating the data, users of AHRI data must acknowledge the source of the data and abide by the terms and conditions under which the data is accessed and must cite the dataset in publication using the citation provided as part of this documentation. All analytical datasets published on the AHRI Data Repository are assigned digital object identifier (DOIs) and the DOIs can be found on the Data Repository under Study Description tab - Access policy. AHRI data users are required to always cite the dataset using the relevant DOI.
Sessego, A., Nxumalo, S., Gareta, D., Castle, A., Harling, G., Siedner, M., Seeley, J., Iwuji, C., Hanekom, W., & Herbst, K. (2026). AHRI:Health and Demographic Surveillance System: Deaths and Causes of Death from Verbal Autopsies (2000-2024) [Data set]. Africa Health Research Institute.
DOI: https://doi.org/10.23664/AHRI.VERBALAUTOPSY.HARMONISATION
DDI.AHRI.VerbalAutopsy.Harmonisation
| Name | Abbreviation |
|---|---|
| Africa Health Research Institute | AHRI |