Using Machine Learning for Detection of Illegal Food Advertising Text

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/3362

Title:	Using Machine Learning for Detection of Illegal Food Advertising Text การใช้การเรียนรู้ของเครื่องสำหรับตรวจจับข้อความโฆษณาอาหารที่ผิดกฎหมาย
Authors:	Wannakan NITIROTSUPHAPHAK วรรณกัญญ์ นิธิโรจน์ศุภภัค VERAYUTH LERTNATTEE วีรยุทธ์ เลิศนที Silpakorn University. Pharmacy
Keywords:	ข้อความโฆษณาอาหาร การเรียนรู้ของเครื่อง กฎหมาย การจัดประเภทเอกสาร food advertising text machine learning law text classification
Issue Date:	10
Publisher:	Silpakorn University
Abstract:	The objective of this research is to find the appropriate model from machine learning techniques for classifying food advertising texts to legal and illegal texts. A set of 400 food advertising texts was divided into 200 legal texts and 200 illegal texts. In preprocessing steps, irrelevant information that could be linked to product owners, such as advertising license numbers, trade names, and company names was removed from original texts. Then, the Thai word segmentation with the longest matching algorithm was used to separate words in sentences/phrases. In the next step, a list of Thai stopwords was applied to remove unimportant words. Then, unigram words and bigram words were used as features in document vectors. The full set and subset of features were utilized for creating and testing models. The subset of features was selected using select k best method. The PHP language with the PHP-ML library for machine learning was used to construct a set of programs. Three techniques of supervised learning were applied to create the models, i.e., support vector machine, k-nearest neighbors, and naïve Bayes. By using the stratified random technique, 80% of the collection with an equal portion of legal and illegal texts was used for creating models and the rest of 20% was used for testing models. Each test was performed 10 times. The average score of F1 was used as a performance indicator. Then, models that obtained the highest average F1 for each learning technique were used to create a web application for detecting illegal food advertising text. The performance of each model was tested by 40 food advertising texts. The results showed the support vector machine is the most effective classifier for categorizing food advertising text with the highest F1-score of 0.990 when the model was created with full features of unigrams after removing stop words. In conclusion, Machine learning techniques could be applied for classifying legal/illegal food advertising texts. The model can be used to develop an application. Entrepreneurs and government inspectors can apply the application to inspect food advertising texts. Illegal texts should be correct before submitting a request for permission. Consumers can verify the advertising text from social media. Moreover, this method should also be applied to other types of advertisements. งานวิจัยนี้มีวัตถุประสงค์เพื่อหาแบบจำลองที่เหมาะสมจากเทคนิคการเรียนรู้ของเครื่องสำหรับจำแนกข้อความโฆษณาอาหาร เป็นข้อความที่ถูกกฎหมายและผิดกฎหมาย ผู้วิจัยเตรียมชุดข้อความโฆษณาอาหาร จำนวน 400 ตัวอย่าง แบ่งเป็นข้อความถูกกฎหมาย 200 ตัวอย่าง และผิดกฎหมาย 200 ตัวอย่าง ในขั้นตอนการเตรียมข้อมูล ข้อมูลที่ไม่เกี่ยวข้องที่สามารถเชื่อมโยงไปยังเจ้าของผลิตภัณฑ์ เช่น เลขที่ใบอนุญาตโฆษณา ชื่อการค้า และชื่อบริษัท ถูกลบออกจากข้อความโฆษณา หลังจากนั้นตัดคำภาษาไทยด้วยอัลกอรึทึมที่เลือกคำยาวที่สุด เพื่อใช้แบ่งคำในประโยค/วลี ขั้นตอนถัดไป นำรายการคำหยุดภาษาไทยมาใช้เพื่อลบคำที่ไม่สำคัญออก จากนั้นใช้ชุดคำแบบยูนิแกรม และแบบไบแกรม มาจัดทำคุณลักษณะในเวกเตอร์เอกสาร คุณลักษณะทั้งหมดและบางส่วนถูกนำมาใช้สร้างและทดสอบแบบจำลอง คุณลักษณะบางส่วนถูกเลือกโดยวิธี เคที่ดีที่สุด โปรแกรมภาษา PHP และชุดไลบรารี PHP-ML สำหรับการเรียนรู้ด้วยเครื่องถูกใช้เพื่อสร้างชุดโปรแกรม เทคนิคการเรียนรู้แบบมีผู้สอน 3 ชนิดถูกนำมาใช้ในการจัดทำแบบจำลอง ได้แก่ ซัพพอร์ทเวกเตอร์แมชชีน เคเนียเรสเนเบอร์ และนาอีฟเบย์ส ทำโดยใช้การสุ่มตัวอย่างแบบแบ่งชั้นร้อยละ 80 ของข้อมูลด้วยสัดส่วนที่เท่ากันของกลุ่มข้อความที่ถูกและผิดกฎหมาย เพื่อนำมาใช้สร้างแบบจำลอง และร้อยละ 20 ที่เหลือ ใช้ทดสอบแบบจำลอง แต่ละการทดสอบทำ 10 ครั้ง ใช้ค่าเฉลี่ยของคะแนนเอฟวันในการบอกประสิทธิภาพของแบบจำลอง จากนั้นนำแบบจำลองที่มีคะแนนเอฟวันเฉลี่ยมากที่สุดของแต่ละเทคนิคของการเรียนรู้ มาสร้างโปรแกรมตรวจจับข้อความโฆษณาที่ผิดกฎหมาย ทดสอบประสิทธิภาพของแต่ละแบบจำลอง ด้วยข้อความโฆษณา 40 ข้อความ ผลการวิจัยพบว่า ซัพพอร์ทเวกเตอร์แมชชีนเป็นตัวจำแนกข้อความโฆษณาอาหารที่มีประสิทธิภาพมากที่สุด ด้วยคะแนนเอฟวัน คือ 0.990 เมื่อใช้คุณลักษณะทั้งหมดแบบยูนิแกรม หลังตัดคำหยุดออก สรุปได้ว่า เทคนิคเรียนรู้ของเครื่องสามารถใช้สำหรับจำแนกข้อความโฆษณาอาหารที่ถูกหรือผิดกฎหมาย แบบจำลองสามารถใช้พัฒนาเป็นแอปพลิเคชัน ผู้ประกอบการและเจ้าหน้าที่ตรวจสอบของภาครัฐ สามารถใช้แอปพลิเคชันเพื่อตรวจสอบข้อความโฆษณาอาหาร ข้อความที่ผิดกฎหมายควรถูกแก้ไขให้ถูกต้องก่อนยื่นคำขออนุญาต ผู้บริโภคสามารถตรวจสอบข้อความโฆษณาจากสื่อสังคมออนไลน์ นอกจากนั้นวิธีการนี้สามารถประยุกต์ใช้กับการโฆษณาประเภทอื่นได้
Description:	Master of Pharmacy (M.Pharm) เภสัชศาสตรมหาบัณฑิต (ภ.ม.)
URI:	http://ithesis-ir.su.ac.th/dspace/handle/123456789/3362
Appears in Collections:	Pharmacy

Files in This Item:

File	Description	Size	Format
61363301.pdf		3.8 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets