Development of Thai Image Captioning Method Using Deep Learning

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/5321

Title:	Development of Thai Image Captioning Method Using Deep Learning การพัฒนาวิธีการสร้างคำบรรยายภาพภาษาไทยโดยใช้การเรียนรู้เชิงลึก
Authors:	Witchaphon TIEANCHO วิชญ์พล เทียนชอ SOPON PHUMEECHANYA โสภณ ผู้มีจรรยา Silpakorn University SOPON PHUMEECHANYA โสภณ ผู้มีจรรยา phumeechanya_s@su.ac.th phumeechanya_s@su.ac.th
Keywords:	คำบรรยายภาพภาษาไทย ชุดข้อมูลการจราจร ชุดข้อมูล Flickr8k โครงข่ายประสาทเทียมแบบ Convolutional LSTM แบบสองทิศทาง ตัวชี้วัด BLEU Thai Captions Traffic Dataset Flickr8k Dataset Convolutional Neural Networks(CNN) Bidirectional LSTM BLEU Metric
Issue Date:	28
Publisher:	Silpakorn University
Abstract:	This thesis designed and developed a deep learning model to create Thai image captions using Convolutional Neural Network (CNN) such as VGG16 and others to extract image features and use Bidirectional LSTM is used to create captions, where CNN is the encoding process and Bidirectional LSTM is the decoding process. Bidirectional LSTM is another type of LSTM that allows the model to learn in two directions. The forward and reverse directions allow the model to learn and distinguish similar words and improve the model's memory capacity. And the dataset used for training and testing includes: The first database is Flickr8k, which is a public database that contains 8091 images and 5 English subtitles, which will be translated into Thai using Google Translate first. Most of this database. It will be pictures and descriptions related to daily life. and the second database is A custom-made traffic dataset containing 429 images and 5 Thai language captions. This database contains images and captions related to road traffic such as A girl was walking across the road. A red light warns all cars and motorcycles to stop. The reason for creating this data set is because this thesis hopes that in the future this research will be able to create a warning system for drivers on the road or even people traveling on the road, not just drivers. The only notification system is an audio notification when the model receives image input, but this thesis does not go into that system. Therefore, the experiment of this thesis will combine the two datasets because we want to not only see traffic-related results but also to see general image description results. Moreover, combining the datasets also enhances learning for the model as well And finally, the subtitles generated by the model were evaluated against the reference subtitles using the BLEU metric. วิทยานิพนธ์เล่มนี้ได้ออกแบบและพัฒนาโมเดลการเรียนรู้เชิงลึกเพื่อสร้างคำบรรยายภาพภาษาไทยโดยใช้ Convolutional Neural Network (CNN) อย่างเช่น VGG16 และอื่นๆ เพื่อคัดแยกคุณลักษณะของรูปภาพและได้ใช้ Bidirectional LSTM ในการสร้างคำบรรยายภาพ โดยที่ CNN คือกระบวนการในการเข้ารหัส และ Bidirectional LSTM คือกระบวนการในการถอดรหัส ซึ่ง Bidirectional LSTM คือ LSTM อีกประเภทที่ช่วยให้โมเดลสามารถเรียนรู้ได้แบบสองทิศทางคือ ทิศทางไปข้างหน้าและทิศทางย้อนกลับทำให้โมเดลเรียนรู้และแยกแยะคำที่มีความคล้ายคลึงกันได้รวมถึงเพิ่มความสามารถของหน่วยความจำโมเดล และในส่วนของชุดข้อมูลที่ใช้สำหรับการฝึกสอนและทดสอบประกอบด้วย ฐานข้อมูลแรกคือ Flickr8k ซึ่งเป็นฐานข้อมูลสาธารณะที่ภายในฐานข้อมูลประกอบไปด้วยรูปภาพจำนวน 8091 รูป และคำบรรยายภาษาอังกฤษ 5 คำบรรยายซึ่งจะทำการแปลคำบรรยายเป็นภาษาไทยโดยใช้ Google Translate ก่อน โดยส่วนใหญ่ฐานข้อมูลชุดนี้จะเป็นรูปภาพและคำบรรยายที่เกี่ยวกับชีวิตประจำวันทั่วไป และฐานข้อมูลที่สองคือ ชุดข้อมูลการจราจรที่จัดทำขึ้นเองซึ่งภายในจะประกอบไปด้วยรูปภาพ 429 รูป และคำบรรยายภาษาไทย 5 คำบรรยาย โดยฐานข้อมูลชุดนี้คือรูปภาพและคำบรรยายที่เกี่ยวข้องกับการสัญจรบนท้องถนนอย่างเช่น เด็กผู้หญิงคนหนึ่งกำลังเดินข้ามถนน ไฟแดงเตือนให้รถยนต์และรถจักรยานยนต์ทุกคันต้องหยุด ซึ่งเหตุผลที่ได้จัดทำชุดข้อมูลนี้เพราะว่าวิทยานิพนธ์เล่มนี้หวังว่างานวิจัยชุดนี้ในอนาคตจะสามารถทำการสร้างระบบแจ้งเตือนให้กับผู้ขับขี่บนท้องถนนหรือแม้แต่ผู้ที่สัญจรอยู่ตามท้องถนนไม่ใช่กับผู้ขับขี่อย่างเดียวซึ่งระบบการแจ้งเตือนนั้นจะเป็นการแจ้งเตือนด้วยเสียงเมื่อโมเดลรับอินพุตภาพเข้ามาแล้วแต่วิทยานิพนธ์ฉบับนี้ไม่ได้ทำไปจนถึงระบบนั้น ดังนั้นการทดลองของวิทยานิพนธ์ฉบับนี้จะทำการรวมชุดข้อมูลทั้งสองเข้าด้วยกันเพราะไม่เพียงแต่ต้องการดูผลลัพธ์ที่เกี่ยวข้องกับการจราจรแต่ต้องการดูผลลัพธ์การบรรยายรูปภาพทั่วไปด้วยอีกทั้งการรวมชุดข้อมูลเข้าด้วยกันยังช่วยเสริมการเรียนรู้ให้กับโมเดลด้วย และสุดท้ายได้ทำการประเมินคำบรรยายที่โมเดลสร้างเทียบกับคำบรรยายอ้างอิงโดยการใช้ตัวชี้วัด BLEU
URI:	http://ithesis-ir.su.ac.th/dspace/handle/123456789/5321
Appears in Collections:	Engineering and Industrial Technology

Files in This Item:

File	Description	Size	Format
640920027.pdf		14.74 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets