Abstrak/Abstract |
Background:
To parse free text medical notes
into structured data such as disease names, drugs,
procedures, and other important medical information
first, it is necessary to detect medical entities. It is
important for an Electronic Medical Record (EMR) to
have structured data with semantic interoperability to
serve as a seamless communication platform whenever a
patient migrates from one physician to another. However,
in free text notes, medical entities are often expressed
using informal abbreviations. An informal abbreviation
is a non-standard or undetermined abbreviation, made
in diverse writing styles, which may burden the semantic
interoperability between EMR systems. Therefore, a
detection of informal abbreviations is required to tackle
this issue.
Objectives:
We attempt to achieve highly reliable
detection of informal abbreviations made in diverse
writing styles.
Methods:
In this study, we apply the Long ShortTerm Memory (LSTM) model to detect informal
abbreviations in free text medical notes. Additionally,
we use sliding windows to tackle the limited data issue
and sample generator for the imbalance class issue,
while introducing additional pre-trained features (bag
of words and word2vec vectors) to the model.
Results:
The LSTM model was able to detect informal
abbreviations with precision of 93.6%, recall of 57.6%,
and F1-score of 68.9%.
Conclusion:
Our method was able to recognize
informal abbreviations using small data set with high
precision. The detection can be used to recognize
informal abbreviations in real-time while the physician
is typing it and raise appropriate indicators for the
informal abbreviation meaning confirmation, thus
increase the semantic interoperability |