Evaluations of Health Attitude-related Twitter Messages on Covid-19 Widespread Situation for Sentiment Analysis Using Supervised Machine Learning Algorithms

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/4478

Title:	Evaluations of Health Attitude-related Twitter Messages on Covid-19 Widespread Situation for Sentiment Analysis Using Supervised Machine Learning Algorithms การประเมินข้อความทวิตเตอร์ ที่เกี่ยวกับทัศนคติด้านสุขภาพในสถานการณ์การแพร่ระบาดของโควิด-19 เพื่อวิเคราะห์ความรู้สึก โดยใช้อัลกอริทึมการเรียนรู้ของเครื่องแบบมีผู้สอน
Authors:	Thewan THIANWAN เทวัญ เทียนวรรณ VERAYUTH LERTNATTEE วีรยุทธ์ เลิศนที Silpakorn University VERAYUTH LERTNATTEE วีรยุทธ์ เลิศนที LERTNATTEE_V@su.ac.th LERTNATTEE_V@su.ac.th
Keywords:	การวิเคราะห์ความรู้สึก การเรียนรู้ของเครื่องแบบมีผู้สอน วัคซีนป้องกันโควิด-19 ทวิตเตอร์ Sentiment analysis Supervised machine learning Covid-19.vaccine Twitter
Issue Date:	4
Publisher:	Silpakorn University
Abstract:	The objectives of this research were to investigate models of supervised machine learnings for classification of Twitter’s messages with the contents on opinions regarding Covid-19 vaccines in terms of effectiveness, adverse events, and sentiment towards each vaccine available in Thailand. We collected 1,843 Covid-19 vaccine related messages with labeling, consisting of 90% training dataset and 10% testing dataset. Then, we removed unnecessary words or symbols, and tokenized the messages. The study extracted features using the TF-IDF method and selected features using the k-best method. We developed learning models by Python language using support vector machine (SVM) algorithm with One-vs-Rest classifier for multi-label classification and SVM with One-vs-One classifier for multi-class classification. We trained models with training dataset by the 10-fold cross validation, and tested the performances of the models with micro-averaging or micro-F1 score. Subsequently, we adjusted parameters of model for higher performances and selected the model with highest performance for prediction of unlabeled dataset. The results show that the best model for multi-label classification including types of vaccines, vaccine effectiveness, and adverse events with training dataset showed micro-F1 score at 91.31%, and with testing dataset showed micro-F1 score at 90.80%. The best model of multi-class classification for sentiments with training dataset showed micro-F1-score at 82.20% and with testing dataset showed micro-F1 score at 81.11%. When the models were used to analyze 8,070 messages of unlabeled dataset, they were able to classify vaccine-related messages with 5,643 messages on vaccines' effectiveness, 3,255 messages on vaccine-related adverse events. The classification showed 3,229 messages with positive sentiment, 2,228 messages with negative sentiment, and 2,613 messages being neutral sentiment. In conclusion, supervised machine learning cloud be applied to classify Covid-19 vaccine-related messages, and could be applied to proposal of public health policies. งานวิจัยนี้มีวัตถุประสงค์เพื่อหาโมเดลการเรียนรู้ของเครื่องแบบมีผู้สอนในการทำนายข้อความจากทวิตเตอร์ เพื่อวิเคราะห์ความคิดเห็นที่เกี่ยวข้องกับวัคซีนโควิด-19 ด้านประสิทธิภาพ อาการไม่พึงประสงค์ และความรู้สึกต่อวัคซีน แต่ละชนิดที่มีในประเทศไทย โดยผู้วิจัยรวบรวมข้อความเกี่ยวกับวัคซีนโควิด-19 ที่มีการติดกำกับข้อความ จำนวน 1,843 ข้อความ โดยแบ่งเป็นชุดข้อมูลเรียนรู้ร้อยละ 90 และชุดข้อมูลทดสอบร้อยละ 10 ผู้วิจัยทำการลบคำหรือสัญลักษณ์ที่ไม่ต้องการออก และตัดคำในข้อความ ศึกษาการสร้างคุณลักษณะของเอกสารด้วยวิธีค่าน้ำหนักของคำ และเลือกคุณลักษณะด้วยวิธีเคที่ดีที่สุด การพัฒนาโมเดลด้วยโปรแกรมภาษาไพทอน โดยใช้อัลกอริทึมซับพอร์ตเวกเตอร์แมกชีน ร่วมกับวิธี One-vs-Rest ในการจำแนกข้อความแบบหลายเลเบล และร่วมกับวิธี One-vs-One ในการจำแนกข้อความแบบหลายคลาส การศึกษาใช้ชุดเรียนรู้พัฒนาโมเดลด้วยวิธีการตรวจสอบไขว้ 10 ครั้ง และประเมินประสิทธิภาพด้วยค่าคะแนนไมโครเอฟวัน (micro-F1 score) แล้วปรับค่าพารามิเตอร์ของโมเดลเพื่อเพิ่มประสิทธิภาพ จากนั้นนำโมเดลที่มีประสิทธิภาพดีที่สุดมาทำนายชุดข้อมูลที่ไม่ได้ติดกำกับข้อความ จากผลการวิจัยพบว่าโมเดลที่มีประสิทธิภาพมากที่สุดในการจำแนกข้อความหลายเลเบล อันได้แก่ ชนิดของวัคซีน ประสิทธิภาพ และอาการไม่พึงประสงค์ของวัคซีน ด้วยชุดเรียนรู้มีค่าคะแนนไมโครเอฟวันร้อยละ 91.31 และชุดทดสอบมีค่าคะแนนไมโครเอฟวันร้อยละ 90.80 โมเดลที่ดีที่สุดสำหรับการจำแนกข้อความแบบหลายคลาส และโมเดลการจำแนกข้อความแบบหลายคลาส อันได้แก่ ความรู้สึก ด้วยชุดเรียนรู้มีค่าคะแนนไมโครเอฟวันร้อยละ 82.20 และชุดทดสอบมีค่าคะแนนไมโครเอฟวันร้อยละ 81.11 เมื่อนำโมเดลมาวิเคราะห์ข้อความที่ไม่ได้ติดกำกับข้อความจำนวน 8,070 ข้อความ พบว่าโมเดลสามารถจำแนกเนื้อหาที่เกี่ยวกับชนิดของวัคซีน มีการจำแนกข้อความที่เกี่ยวกับประสิทธิภาพของวัคซีน 5,643 ข้อความ และข้อความเกี่ยวกับอาการไม่พึงประสงค์ของวัคซีน 3,255 ข้อความ โดยเป็นข้อความเชิงบวก 3,229 ข้อความ ข้อความเชิงลบ 2,228 ข้อความ และข้อความเป็นกลาง 2,613 ข้อความ ดังนั้นจากผลการวิจัยแสดงให้เห็นว่าการจำแนกประเภทข้อความที่เกี่ยวกับวัคซีนโควิด ด้วยวิธีการเรียนรู้ของเครื่อง สามารถนำไปประยุกต์ใช้ในการกำหนดนโยบายทางด้านสาธารณสุข
URI:	http://ithesis-ir.su.ac.th/dspace/handle/123456789/4478
Appears in Collections:	Pharmacy

Files in This Item:

File	Description	Size	Format
620820019.pdf		3.23 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets