论文部分内容阅读
Arabic(Arabic:(Ejzl),al-arabīyah,(czjc),arabīy)is a Semitic language which is the liturgical language of1.8billion speakers where all varieties of Arabic combined are spoken by perhaps as many as422million speakers(native and non-native).The largest differences between the classical/standard and the colloquial Arabic are the loss of morphological markings of grammatical case;changes in word order,an overall shift towards a more analytic morph syntax,the loss of the previous system of grammatical mood,along with the evolution of a new system;the loss of the inflected passive voice,except in a few relic varieties;restriction in the use of the dual number and(for most varieties)the loss of the feminine plural.
Medical health services continue to be developed toward an intelligent direction,applying machine learning technology in the medical industry has become one of the hotspots research recently in the medical artificial intelligence area.As a great important part of the medical documents,the medical imaging reports record the summary of the radiologists towards the imaging findings,containing a large number of descriptions towards the lesions in the medical imaging.Extracting this important information from the Arabic text in the medical imaging reports and establishing the association between reports and imaging can better serve various clinical information systems such as clinical decision-making and clinical data mining besides reducing the professional physiciansworkload.
However,most of the current medical imaging reports are stored in unstructured or semi-structured text form,which is difficult for computers to extract valuable information directly from these irregular text data,and it is hard for exploiting machine learning algorithm to further analyze and mining these data.Accordingly,it is of great challenge to automatically and efficiently extract the required information from medical imaging reports,forming structured data,and establishing the association between text descriptions and imaging features of lesions in the current intelligent medical service process.
Taking into consideration the Arabic language characterizes and the lack of artificial intelligence techniques especially natural language processing for Arabic contents in the medical area,thus this thesis focus on extracting text feature tags of the Arabic contents in medical image reports by proposing the machine learning model Bi-directional CNN-LSTM-CRF and design mapping rules for structuralization as the Arabic language became day by day one the largest spoken language in the world.The main works of this paper are as follows:
1.Analyzing the imaging reports and designing a structured report template.Combining the relevant research locally and globally with the guidance of professional physicians,this paper summarizes the content and structure of the mammography reports and designs a structured report template.
2.Designing the text feature tags of imaging reports.There are characteristics of terminology and expression diversity in the clinical free language.This paper designs the unified Arabic text feature tags for the description of lesions in the imaging reports,as the input of the machine learning model.
3.Exploiting the proposed machine learning models to extract the Arabic text feature tags,where all are based on the data provided by the top-three hospitals in Saudi Arabic,Riyadh.However,this paper iterative training the Bi-CNN-LSTM+CRF model,where it achieved and showed higher accuracy than Bi-RNN-LSTM+CRF model by approximately97%.
4.Compare and integrate the Imaging Reports proposed models based on the frames and the differentiation of parameters.The models proposed in this paper are encapsulated the functional modules including machine learning models and structuralization mapping rules,and designs and implements an automated and scalable imaging reports structuralization system.
Medical health services continue to be developed toward an intelligent direction,applying machine learning technology in the medical industry has become one of the hotspots research recently in the medical artificial intelligence area.As a great important part of the medical documents,the medical imaging reports record the summary of the radiologists towards the imaging findings,containing a large number of descriptions towards the lesions in the medical imaging.Extracting this important information from the Arabic text in the medical imaging reports and establishing the association between reports and imaging can better serve various clinical information systems such as clinical decision-making and clinical data mining besides reducing the professional physiciansworkload.
However,most of the current medical imaging reports are stored in unstructured or semi-structured text form,which is difficult for computers to extract valuable information directly from these irregular text data,and it is hard for exploiting machine learning algorithm to further analyze and mining these data.Accordingly,it is of great challenge to automatically and efficiently extract the required information from medical imaging reports,forming structured data,and establishing the association between text descriptions and imaging features of lesions in the current intelligent medical service process.
Taking into consideration the Arabic language characterizes and the lack of artificial intelligence techniques especially natural language processing for Arabic contents in the medical area,thus this thesis focus on extracting text feature tags of the Arabic contents in medical image reports by proposing the machine learning model Bi-directional CNN-LSTM-CRF and design mapping rules for structuralization as the Arabic language became day by day one the largest spoken language in the world.The main works of this paper are as follows:
1.Analyzing the imaging reports and designing a structured report template.Combining the relevant research locally and globally with the guidance of professional physicians,this paper summarizes the content and structure of the mammography reports and designs a structured report template.
2.Designing the text feature tags of imaging reports.There are characteristics of terminology and expression diversity in the clinical free language.This paper designs the unified Arabic text feature tags for the description of lesions in the imaging reports,as the input of the machine learning model.
3.Exploiting the proposed machine learning models to extract the Arabic text feature tags,where all are based on the data provided by the top-three hospitals in Saudi Arabic,Riyadh.However,this paper iterative training the Bi-CNN-LSTM+CRF model,where it achieved and showed higher accuracy than Bi-RNN-LSTM+CRF model by approximately97%.
4.Compare and integrate the Imaging Reports proposed models based on the frames and the differentiation of parameters.The models proposed in this paper are encapsulated the functional modules including machine learning models and structuralization mapping rules,and designs and implements an automated and scalable imaging reports structuralization system.