Original Article

자연어 처리 기반 『傷寒論』 辨病診斷體系 분류를 위한 기계학습 모델 선정

Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification

2022 Vol. 14 No. 1
김영남

연세대학교 보건과학대학원 의생명과학전공
Young-Nam Kim

Department of Biomedical Life Science, Graduate School of Public Health Science, Yonsei University

Abstract

Objectives : The purpose of this study is exploring the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification by using natural language processing (NLP).

Methods : 201 data were collected from 『Shanghanlun』 and 『Clinical Shanghanlun』, ‘Taeyangbyeong-gyeolhyung’ and ‘Eumyangyeokchahunobokbyeong’ were excluded for preventing oversampling and undersampling. Data were pretreated by using twitter korean tokenizer, and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. Accuracy was used for evaluating each model.

Results : As a result of machine learning, ridge regression and naive bayes classifier show 0.843 accuracy, logistic regression and random forest show 0.804 accuracy, and decision tree shows 0.745 accuracy, lasso regression shows 0.608 accuracy.

Conclusions : Ridge regression and naive bayes classifier are suitable NLP machine learning model for Shanghanlun diagnostic system classification.

Key words : Artificial intelligence, Machine learning, Natural Language Processing, Shanghanlun, Diagnostic system

Figure

Table

전체 0