Administrative Supplement to Support Collaborations to Improve AI/ML-Readiness
Funded Grant
Overview
Affiliation
View All
Overview
description
PROJECT SUMMARY This goal of this project is to create a unique and comprehensive research repository of aging trajectory da- tasets, related resources, and analytic methods that can be used to answer new and important questions in aging and related sciences. Specifically, by harmonizing and merging multiple data sets this project will gener- ate the data infrastructure needed to understand change over time in care settings, geriatric syndromes, physi- cal functioning, and shared risk factors at multiple levels (patient, provider, community, healthcare system, and society) and across multiple domains (biological, behavioral, sociocultural, and physical/built environments) including chronic conditions and history of acute illness such as COVID-19, exposure to air pollution, neighbor- hood socioeconomic, and health care system factors (Aim 1). Analytic strategies will be developed for user- defined cohorts and their propensity score-matched controls, e.g., older adults who were living with chronic conditions including Alzheimer's disease and related dementias (ADRD), diabetes, heart failure, end-stage re- nal disease, metastatic cancer, and HIV. State-of-the-art analytic methods are used to identify patterns of ag- ing trajectories (care setting, geriatric syndromes, physical functioning) experienced by older adults during the final years of life and their association with shared risk factors and distal outcomes (Aim 2). From the assem- bled trajectory file in Aim 1, cohorts are derived by aligning an originating index time such as age cutoff point and time at diagnosis (e.g., ADRD, stroke, chronic kidney disease). Both a model-based approach and ma- chine learning algorithms are then used to discover multilevel and potentially interactive predictors of trajecto- ries (e.g., rapid functional decline in independent living beneficiaries) and specific outcomes (e.g., respiratory ventilator usage among Medicare beneficiaries diagnosed with COVID-19) (Aim 3). The unique resources are then shared to disseminate resources including datasets, documentation, source code, and methodology (Aim 4). At the end of this project, the research infrastructure to investigate the relationship between shared risk factors and aging trajectories will be ready to use and replicate, giving investigators unprecedented ability to solve new challenges in aging science. This will allow researchers to understand the underlying processes and systems associated with reversible periods of disability across care settings, and interventions that may be used to support recovery of function and reduction of geriatric syndromes including cognitive decline, for the purpose of reducing burdensome care transitions, and maintenance of functional independence. This project will also create the resources and methods needed to evaluate the impact of innovations and interventions im- plemented at the patient, provider, community, healthcare system, and society/policy levels to improve care quality and outcomes for older adults.
SUMMARY This application is for an administrative supplement (revision) to an existing award, R33AG068931, "Advanced Development and Utilization of Assembled Aging Trajectory Files from Multiple Datasets." The goal of the parent study is to create a comprehensive research repository of aging trajectory datasets and to demonstrate their utility for aging research at Rutgers University through 4 specific aims: 1) Harmonizing and merging multiple data sets to generate the data infrastructure needed to understand change over time in care settings, geriatric syndromes, physical functioning, and shared risk factors at multiple levels and across multiple domains, 2) Developing state-of-the-art analytic methods to identify patterns of aging trajectories experienced by older adults during the final years of life and their association with shared risk factors and distal outcomes, 3) Discovering multilevel and potentially interactive predictors of trajectories using both model-based approaches and machine learning algorithms to predict specific outcomes, and 4) Disseminating resources generated including datasets, documentation, source code, and methodology. For the supplement, new work in the CMS Virtual Research Data Center (VRDC) will create AI/ML-ready datasets, workflows, and source code for data cleaning and pre-processing, breaking the siloed barriers between researchers working in the VRDC and institutional data enclaves at Universities. Data harmonization procedures need to be customized to the server architecture and resources of each data warehouse, necessitating VRDC-specific workflows and code to ensure timely access and reproducibility. In this project, data are made AI/ML-ready in four stages: 1) the cohort of patients to be studied is defined and key inclusion and exclusion criteria variables are selected; 2) data pre-processing steps include data cleaning, data annotation, formatting, standardizing taxonomies, variables transformation, data rescale/normalization, variable aggregating, variable decomposing and variable selection with a focus on variables important to measure health disparities and improve minority health and reduce health disparities; 3) feature extraction and engineering include generating derived variables (e.g., intercept, slope, average, etc.) from irregularly spaced individual trajectories; and 4) Medicare data sets are merged with publicly available data to add socioeconomic and environmental context, and data variable relationships are mapped to produce a final, AI/ML-ready data. Supplement Aim. Develop and implement code for data pre-processing, data fine-tuning and precision, missing data imputation, data connectivity and fully established hierarchical relationships for the AI/ML framework to interactively model late-life aging trajectories and selected outcomes in a cohort of Medicare beneficiaries. Completion of this work will contribute to the NIH vision of a modernized and integrated biomedical data ecosystem that adopts the latest data science technologies, and best practice guidelines including FAIR (findable, accessible, interoperable, reusable) principles and open-source development.