About Our Research
Our research explores the intricate interplay between biological, environmental, and social factors in determining health outcomes. We aim to bridge gaps in traditional biomedical research by integrating clinical, molecular, and social determinants of health (SDoH) data into comprehensive knowledge graphs. These efforts span multiple projects, each focused on addressing specific challenges within the healthcare domain, with the overarching goal of improving both individual and population-level health outcomes.
Integrating Biomedical and SDoH Knowledge
Our primary research initiative focuses on creating a holistic view of health by integrating biomedical data with social determinants of health (SDoH) information. Traditional biomedical knowledge graphs often neglect the critical influence of factors such as socioeconomic status, education, and employment. By incorporating these elements, we aim to uncover novel interactions and mechanisms underlying diseases, particularly those with complex etiologies like suicidality and PTSD.
This project leverages extensive datasets, including MIMIC-3 medical records and PubMed abstracts, to construct knowledge graphs that represent the multifaceted pathways in human health. These graphs provide researchers and healthcare professionals with deeper insights into how various factors interact to influence health outcomes.
PubMed Dataset
PubMed/PubMedCentral is a vast collection of abstracts and full-text articles encompassing the fields of life sciences and biomedicine. With a staggering count of 32 million documents, the dataset encompasses a diverse range of study types:
- 27.54% case reports
- 23.61% series of randomized clinical trials (RCTs)
- 21.05% cohort studies
- 17.49% cross-sectional studies
- 9.15% case-control studies
- 1.01% non-RCTs
- 0.15% pragmatic clinical trials (PCTs)
Each article entry in PubMed includes essential data elements such as the title, abstract, author names, affiliations, publication date, journal information, and citation details.
MIMIC Dataset
MIMIC (Medical Information Mart for Intensive Care) is a comprehensive and widely-used publicly available dataset that has significantly contributed to advancements in healthcare research and innovation. The MIMIC dataset comprises de-identified EHRs of patients admitted to critical care units, providing a rich and diverse collection of clinical data. This includes:
- Vital signs
- Laboratory measurements
- Medications
- Procedures
- Clinical notes
- Demographic information
The MIMIC dataset has fostered numerous studies and advancements in critical care, clinical decision support systems, machine learning, and artificial intelligence in healthcare.
Collaboration with Veterans Health Administration (VHA)
In collaboration with the Veterans Health Administration (VHA), our research delves into how social determinants of health (SDoH) influence mental health outcomes, particularly the risks of suicide and PTSD among veterans. Utilizing Veterans Health Administration (VHA) Electronic Health Records (EHR), we develop knowledge graphs that elucidate the contributions of SDoH to these critical health outcomes.
These knowledge graphs have significant applications at both the patient and population levels:
- At the patient level, they help predict individual risk, informing treatment decisions and improving care quality.
- At the population level, they identify risk factors that can guide the allocation of resources and the development of targeted interventions.
The ultimate goal of this collaboration is to develop clinical decision support tools that can be integrated into the EHR system, providing both patient-specific insights and aggregated data summaries to mental health providers and VA leadership.
VA Synthetic Data
A critical component of our work with the VHA involves the use of VA synthetic data, created by MDClone under a VA contract. This synthetic data is designed to mimic real-world patterns found in VA healthcare data while preserving patient privacy, allowing for comprehensive research without the ethical and legal challenges associated with real patient data.
The use of synthetic data enables us to:
- Conduct privacy-preserving research that does not expose actual patient information.
- Access and analyze data that may not be readily available due to privacy concerns.
- Enhance datasets with underrepresented groups or outcomes, allowing for more inclusive research.
- Perform temporal analysis to study trends and patterns in healthcare data over time.
VA Data Use - Risks, Mitigations & Ethics Issues
While synthetic data offers numerous advantages, it also comes with certain risks and ethical considerations. For instance, combining VA synthetic data with external data requires explicit permission under the VA Synthetic Data License Agreement. To mitigate this risk, we plan to seek approval from VA legal counsel and adhere to strict transparency and safety protocols.
Our commitment to ethical research is further demonstrated through our use of literature reviews and the sharing of resources on data transparency. Key references guiding our approach include Gebru et al.'s "Datasheets for Datasets" and recent discussions on data and model transparency at NIH workshops.
Use Cases of the Knowledge Graph
Here are some specific use cases demonstrating how our knowledge graph can be utilized in the field of healthcare. These use cases are crucial as they provide a comprehensive understanding of various interventions and their relationships, allowing healthcare professionals to make informed decisions and improve patient outcomes.
Subgraph Representing Methods of Suicide Prevention
Subgraph Representing PTSD Prevention
Continual Learning
Continual learning is an essential aspect of our research project. By continuously updating the knowledge graph with new data from sources such as PubMed and MIMIC-3, we ensure that our insights remain current and relevant. This ongoing process enables our system to adapt to new discoveries and emerging trends in healthcare, ultimately leading to better patient care and more effective interventions.