Harnessing the power of machine learning in stroke prediction

Xueting (Mimi) Ding is a PhD candidate in public health whose research focuses on advanced statistical and machine learning techniques in medical data analysis

Breaking New Ground in Healthcare Analytics 

As a public health researcher focusing on stroke and related outcomes, I’ve always been fascinated by the complexity of medical data, e.g. extensive information about health conditions, symptoms, and treatments across various settings. What often struck me was how traditional analysis methods struggle to effectively handle this wealth of information, especially when examining different types of disease risk factors and dealing with numerous subcategories of disease risk factors. I noticed that these methods oversimplified the detailed patient histories into basic binary indicators, which missed crucial patterns for disease outcome predictions, for example, stroke recurrence. 

From Challenge to Innovation  

This limitation became more apparent as I worked with electronic health records from healthcare organizations across the United States. To document diagnoses, healthcare providers used standardized ICD codes to capture the detailed aspects of medical conditions. For example, hypertension alone has multiple ICD-10 codes representing different types and stages of the condition, and each code has its own distinct implications for patient care. The same complexity exists for other conditions like diabetes and heart disease. However, our traditional data aggregation methods often reduce these rich medical records into simplified yes/no flags of broad disease categories like “hypertension: yes”. It potentially overlooks critical information about disease progression and severity. 

This gap between the rich data available and our limited ability to effectively use it motivated me and my colleagues to seek a better solution. I recently served as co-corresponding author on a study published in the Journal Discover Public Health. Under the guidance of Dr. Bernadette Boden-Albala, and with expertise from additional authors, Yang Meng and Liner Xiang, this study explored how innovative machine-learning techniques can revolutionize the way we process and analyze medical data.  

Our results showed that by adopting new data analysis techniques, we improved our ability to predict stroke recurrence. By combining our proposed data aggregation method with a random forest classifier, we achieved 84.2% accuracy in predicting whether a patient might experience another stroke within 30 days following the initial stroke diagnosis, and it significantly outperformed conventional methods. Beyond just accuracy, our proposed approach also demonstrated an excellent balance between sensitivity and specificity. 

Over the course of this study, I concluded that the way we process health information directly affects our ability to predict health outcomes. Beyond just organizing information, advanced aggregation methods have the potential to better reveal hidden patterns in patient histories, capture condition severity, and account for complex interactions among risk factors. This is particularly critical in cardiovascular health research, where multiple comorbidities interact in intricate ways to influence stroke risks among diverse populations.   

Looking to the Future 

Working on this project opened new directions for me in public health research. The success of machine learning in handling complex medical data has inspired me to realize that emerging analytical techniques could revolutionize how we approach disease prediction and prevention. I am more convinced that we need to be proactive in exploring innovative approaches for public health challenges as better prediction tools are not just technical achievements. They represent real opportunities to improve patient care, especially for communities with less access to advanced healthcare.  

The success of machine learning in handling complex medical data has inspired me to think bigger about my own research goals. I aspire to push the boundaries of what’s possible with health data analysis. Beyond improving prediction models, I want to develop stroke prevention strategies that work effectively for people from all backgrounds and communities. This means creating analytical tools that can account for the complex ways that health disparities, access to care, and social determinants of health interact with medical conditions. My research will continue to focus on leveraging advanced health data analytical techniques to understand these relationships better, ultimately working toward more equitable health outcomes.  

 Acknowledgments 

We express our sincere gratitude to the late, Dr. Bruce Albala, for his valuable insights and contributions to our research question and analysis design. We also deeply appreciate Dr. Annie Qu for her substantial constructive feedback on the initial draft of this manuscript. Their expertise and guidance were instrumental in shaping this study.