Developing Framework for Consistency Evaluation of Article Writing Style

using Machine Learning Techniques

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/5891

Full metadata record

DC Field	Value	Language
dc.contributor	Pagon GATCHALEE	en
dc.contributor	ภากร กัทชลี	th
dc.contributor.advisor	Sajjaporn Waijanya	en
dc.contributor.advisor	สัจจาภรณ์ ไวจรรยา	th
dc.contributor.other	Silpakorn University	en
dc.date.accessioned	2025-08-14T06:49:25Z	-
dc.date.available	2025-08-14T06:49:25Z	-
dc.date.created	2025
dc.date.issued	4/7/2025
dc.identifier.uri	http://ithesis-ir.su.ac.th/dspace/handle/123456789/5891	-
dc.description.abstract	This dissertation aims to develop and present a framework for evaluating the consistency of articles based on content marketing principles, focusing on Thai-language content related to China. The analysis covers four main aspects: Timeliness, Intention, Emotion, and Storytelling vs. Translation style. According to the results, the WangchanBERTa model best-classified content as Timely or Timeless, with 93.00% accuracy and an F1-score of 92.00%. The tokenizer, trained on large-scale Thai data, helped improve the model’s ability to analyze complex content. For classifying content intentions based on the PIE framework (Persuade, Inform, Entertain), PhayaThaiBERT gave the best results, with a Micro F1-score of 88.74%, Macro F1-score of 84.14%, and the lowest Hamming Loss at 12.14%. Although there was minor overfitting, the model still worked well when tested on new data. It was also found that adding emotional context features did not clearly impact PhayaThaiBERT’s performance but did improve WangchanBERTa, which suggests a link between emotional context and content intention. In emotion classification, WangchanBERTa showed the highest capability in handling 8 emotion categories, with a Micro F1 Score of 78.35%, a Macro F1 Score of 55.74%, and the lowest Hamming Loss at 10.27%, indicating strong performance with imbalanced data. For Translation classification, the Random Forest model, an ensemble learning model using feature engineering, performed best, with 92.55% accuracy, 92.77% F1-score for Translation, and 92.31% for Storytelling. All models were integrated into a web application. When tested in actual usage, feedback from experts and content creators showed that the system was usable and practical, scoring an average of 4.0 out of 5.0 in both areas. Experts also recommended applying the tool to four specific types of writing: general writing, translated content, news articles, and real-time content. Future research should expand the dataset, include more variety, and build domain-specific corpora to further improve classification and content analysis performance.	en
dc.description.abstract	วิทยานิพนธ์นี้มีวัตถุประสงค์เพื่อพัฒนาและนำเสนอแนวทางการสร้างเครื่องมือประเมินความสอดคล้องของบทความตามหลักการตลาดเชิงเนื้อหา (Content Marketing) โดยเน้นเนื้อหาภาษาไทยที่เกี่ยวข้องกับประเทศจีน ซึ่งครอบคลุมการวิเคราะห์ใน 4 ด้าน ได้แก่ ความทันต่อเวลา (Timeliness) จุดประสงค์ในการสื่อสาร (Intention) การสื่อสารอารมณ์ (Emotion) และรูปแบบการเขียนการเล่าเรื่องหรืองานแปล (Storytelling vs. Translation) จากผลการวิจัยพบว่า โมเดล WangchanBERTa มีประสิทธิภาพสูงสุดในการจำแนกเนื้อหาตามกระแส (Timely–Timeless) โดยมี Accuracy บนชุดทดสอบ 93.00% และ F1-score อยู่ที่ 92.00% ด้วยความสามารถของ tokenizer ที่ผ่านการฝึกอบรมด้วยข้อมูลภาษาไทยขนาดใหญ่ช่วยเพิ่มความแม่นยำในการวิเคราะห์เนื้อหาที่ซับซ้อนส่วนการจำแนกจุดประสงค์ของเนื้อหาตามกรอบ PIE (Persuade, Inform, Entertain) โมเดล PhayaThaiBERT ให้ผลลัพธ์ที่ดีที่สุด โดยมีค่า Micro F1-score เท่ากับ 88.74%, Macro F1-score เท่ากับ 84.14% และ Hamming Loss ต่ำที่สุดที่ 12.14% แม้พบปัญหาการเกิด Overfitting แต่ยังคงทำงานได้ดีในการทดสอบกับข้อมูลใหม่ นอกจากนี้ยังพบว่า การเพิ่มฟีเจอร์อารมณ์ ในโมเดล PhayaThaiBERT ไม่มีผลต่อประสิทธิภาพอย่างชัดเจน แต่กลับส่งผลเชิงบวกต่อโมเดล WangchanBERTa แสดงถึงความสัมพันธ์ระหว่างบริบทอารมณ์และจุดประสงค์เนื้อหา โดยสำหรับ การจำแนกอารมณ์ พบว่า โมเดล WangchanBERTa มีศักยภาพสูงสุด ในการประมวลผลอารมณ์ 8 กลุ่ม โดยมีค่า Micro F1-score เท่ากับ 78.35%, Macro F1-score เท่ากับ 55.74% และ Hamming Loss ต่ำที่สุดที่ 10.27% ซึ่งแสดงถึงความแม่นยำสูงในการจัดการกับข้อมูลที่ไม่สมดุลทางอารมณ์ ส่วน การจำแนกรูปแบบงานแปลและการเล่าเรื่อง พบว่า โมเดล Random Forest ที่ใช้เทคนิค Ensemble และ Feature Engineering มีประสิทธิภาพสูงสุด โดยมีค่า Accuracy เท่ากับ 92.55% F1-score สำหรับคลาส Translation เท่ากับ 92.77% และสำหรับ Storytelling เท่ากับ 92.31% เมื่อนำโมเดลทั้งหมดมาพัฒนาเป็นเว็บแอปพลิเคชัน โดยทดสอบการใข้งานจริงได้รับการประเมินจากผู้เชี่ยวชาญและผู้มีส่วนร่วมในการผลิตสื่อเนื้อหา พบว่าระบบมีความเพึงพอใจและความเป็นไปได้ในการนำไปใช้จริง ได้คะแนนเฉลี่ย 4.0 จาก 5.0 คะแนน ทั้งสองด้าน ผู้เชี่ยวชาญยังแนะนำให้ประยุกต์ใช้ตามลักษณะเฉพาะของงานเขียน 4 ประเภท ได้แก่ งานเขียนทั่วไป งานเขียนแบบแปล งานเขียนข่าวสาร และงานเขียนที่ต้องการความทันต่อเหตุการณ์แบบเรียลไทม์ ทั้งนี้ งานวิจัยในอนาคตควรขยายขนาดและความหลากหลายของชุดข้อมูล รวมถึงการสร้างคลังข้อมูลเฉพาะทางเพื่อเพิ่มประสิทธิภาพในการจำแนกและวิเคราะห์เนื้อหาต่อไป	th
dc.language.iso	th
dc.publisher	Silpakorn University
dc.rights	Silpakorn University
dc.subject	การตลาดเชิงเนื้อหา	th
dc.subject	การเรียนรู้ของเครื่อง	th
dc.subject	การเรียนรู้เชิงลึก	th
dc.subject	สไตล์บทความ	th
dc.subject	เนื้อหาจีน	th
dc.subject	Content Marketing	en
dc.subject	Machine Learning	en
dc.subject	Deep Learning	en
dc.subject	Content Style	en
dc.subject	Chinese Content	en
dc.subject.classification	Computer Science	en
dc.subject.classification	Information and communication	en
dc.subject.classification	Computer science	en
dc.title	Developing Framework for Consistency Evaluation of Article Writing Style using Machine Learning Techniques	en
dc.title	การพัฒนาเฟรมเวิร์กเพื่อประเมินการเขียนบทความที่สอดคล้องกับต้นแบบ ด้วยเทคนิคการเรียนรู้ของเครื่อง	th
dc.type	Thesis	en
dc.type	วิทยานิพนธ์	th
dc.contributor.coadvisor	Sajjaporn Waijanya	en
dc.contributor.coadvisor	สัจจาภรณ์ ไวจรรยา	th
dc.contributor.emailadvisor	waijanya_s@silpakorn.edu
dc.contributor.emailcoadvisor	waijanya_s@silpakorn.edu
dc.description.degreename	Doctor of Philosophy (Ph.D.)	en
dc.description.degreename	ปรัชญาดุษฎีบัณฑิต (ปร.ด.)	th
dc.description.degreelevel	Doctoral Degree	en
dc.description.degreelevel	ปริญญาเอก	th
dc.description.degreediscipline	COMPUTER SCIENCE	en
dc.description.degreediscipline	คอมพิวเตอร์	th
Appears in Collections:	Science

Files in This Item:

File	Description	Size	Format
620730008.pdf		10.15 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets