News analysis for the detection of cyber security issues in digital healthcare

A text mining approach to uncover actors, attack methods and technologies for cyber defense


  • Markus Bertl



Digital Healthcare, Cyber Crime, Text Mining, IBM Watson, OSINT, OSInfo, GDELT, Media Mining, Cyber Defense, Information Retrieval


Objectives: This research reviews the possibilities of text mining in the area of cybercrime in digital healthcare showing how advanced information retrieval and natural language processing can be used to get insights. The aim is to mine news data to find out what is reported about digital healthcare, what security-related critical events happened, and what actors, attack methods, and technologies play a role there.
Methods: Different projects already apply text mining successfully in the cyber domain. However, none of these are specifically tailored to threats in the digital healthcare sector or uses an as big data foundation for analysis. To achieve that goal, different text mining methodologies like fact extraction, semantic fields as well as statistical methods like frequency, correlation and trend calculations were used. The news data for the analysis was provided by the DocCenter from the National Defense Academy (DocCenter/NDA) of the Austrian Armed Forces. About 300,000 news articles were processed and analyzed. Additionally, the open source GDELT dataset was investigated.
Results & Conclusion: The data points out that cyber threats are present in digital health technologies and cyberattacks are more and more threatening to organizations, governments, and every person them self. Not only hacker groups, firms, and governments are involved in these attacks, also terroristic organizations use cyberwarfare. That, together with the amount of technology in digital healthcare like pacemakers, IoT, wearables but also the importance of healthcare as critical infrastructure and the dependence on electronic health records makes our society vulnerable.