Development of Lip Reading Method From Video Using Deep Learning

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/5322

Full metadata record

DC Field	Value	Language
dc.contributor	Aekapob JITTAKOTI	en
dc.contributor	เอกภพ จิตตโคติ	th
dc.contributor.advisor	SOPON PHUMEECHANYA	en
dc.contributor.advisor	โสภณ ผู้มีจรรยา	th
dc.contributor.other	Silpakorn University	en
dc.date.accessioned	2024-08-13T06:44:52Z	-
dc.date.available	2024-08-13T06:44:52Z	-
dc.date.created	2024
dc.date.issued	28/6/2024
dc.identifier.uri	http://ithesis-ir.su.ac.th/dspace/handle/123456789/5322	-
dc.description.abstract	This thesis presents a method for improving the efficiency of lip reading through the analysis of keyframes using CNN and LSTM working together, which combines the characteristics of image-based learning with sequential learning features. When attempting to enhance lip reading performance using the entire raw dataset, satisfactory results cannot be achieved. Thus, the selection of an appropriate number of frames and frame selection for learning directly affects the model's efficiency. The frame selection method is proposed through the Mediapipe face detection library in Python. The study divides experiments into three main groups: selecting 3, 5, and 10 frames. Additionally, the frame selection includes full-Lip image frames and half-Lip image frames options, based on the hypothesis of the symmetry of human body parts, both left and right. Furthermore, it demonstrates the reduction of input size by half and compares the performance of the obtained results. This proposes a lip reading method that has not been conducted before. The purpose of lip reading is to aid in speech retrieval from heavily corrupted audio-video files and also to facilitate communication for hearing-impaired individuals. In the database part, the AVDigits database, an English language database consisting of participants who are native and non-native speakers of English from 16 nationalities, is used. The results of this study show that the proposed models, including the crucial frame selection process, significantly improve lip reading performance for both full-Lip image and half-Lip image, achieving high and comparable results.	en
dc.description.abstract	วิทยานิพนธ์ฉบับนี้ได้นำเสนอวิธีการพัฒนาประสิทธิภาพของการอ่านริมฝีปากผ่านการวิเคราะห์เฟรมสำคัญโดยใช้ CNN และ LSTM ที่ทำงานร่วมกันซึ่งเป็นการใช้คุณลักษณะของการเรียนรู้แบบรูปภาพร่วมกับคุณลักษณะการเรียนรู้แบบลำดับขั้น หากต้องการเพิ่มประสิทธิของการอ่านริมฝีปากการใช้ชุดข้อมูลดิบทั้งหมดไม่สามารถให้ผลลัพธ์ที่ดีได้ ดังนั้นการเลือกจำนวนเฟรมและเฟรมที่เหมาะสมต่อการเรียนรู้จะส่งผลต่อประสิทธิภาพของแบบจำลองโดยตรง โดยวิธีการเลือกเฟรมได้ถูกนำเสนอผ่านไลบรารี่การตรวจจับใบหน้าของ Mediapipe บนโปรแกรมภาษา Python โดยการศึกษาได้มีการแบ่งการทดลองออกเป็น 3 กลุ่มหลัก นั่นคือ การเลือกจำนวนเฟรมที่ 3 5 และ 10 เฟรม อีกทั้งการเลือกเฟรมดังกล่าวยังแบ่งออกเป็นการเลือกแบบเฟรมเต็มปากและการเลือกแบบเฟรมครึ่งปาก โดยมีที่มาจากสมมติฐานเรื่องของความสมมาตรทางด้านร่างกายซ้ายและขวาของมนุษย์ อีกทั้งยังแสดงถึงการลดขนาดของอินพุตลงครึ่งนึงและเปรียบเทียบประสิทธิภาพของผลลัพธ์ที่ได้ ซึ่งเป็นการนำเสนอวิธีการวิธีการอ่านริมฝีปากที่ไม่มีงานวิจัยใดเคยทำมาก่อน โดยวัตถุประสงค์ของการอ่านริมฝีปากนั้น สามารถช่วยด้านการกู้ข้อมูลคำพูดจากไฟล์วิดีโอที่มีเสียงรบกวนจำนวนมาก รวมถึงการสื่อสารของผู้พิการทางการได้ยินด้วยเช่นกัน ในส่วนของฐานข้อมูลใช้ฐานข้อมูลที่ชื่อ AVDigits ซึ่งเป็นฐานข้อมูลภาษาอังกฤษที่มีการรวบรวมอาสาสมัครที่เป็นเจ้าของภาษาและไม่ใช่เจ้าของภาษากว่า 16 สัญชาติ โดยผลลัพธ์ทีได้จากการศึกษานี้พบว่า แบบจำลองที่ได้นำเสนอรวมถึงขั้นตอนของการเลือกเฟรมสำคัญทำให้ประสิทธิภาพของการอ่านริมฝีปากทั้งแบบเต็มปากและครึ่งปากให้ผลลัพธ์อยู่ในระดับที่สูงและมีความใกล้เคียงกัน	th
dc.language.iso	th
dc.publisher	Silpakorn University
dc.rights	Silpakorn University
dc.subject	การอ่านริมฝีปาก	th
dc.subject	โครงข่ายประสาทเทียมแบบคอนโวลูชัน	th
dc.subject	หน่วยความจำสั้นยาว	th
dc.subject	Lip Reading	en
dc.subject	Convolutional Neural Network	en
dc.subject	Long Short-Term Memory	en
dc.subject.classification	Engineering	en
dc.subject.classification	Information and communication	en
dc.subject.classification	Electronics and automation	en
dc.title	Development of Lip Reading Method From Video Using Deep Learning	en
dc.title	การพัฒนาวิธีการอ่านริมฝีปากจากภาพเคลื่อนไหวโดยใช้การเรียนรู้เชิงลึก	th
dc.type	Thesis	en
dc.type	วิทยานิพนธ์	th
dc.contributor.coadvisor	SOPON PHUMEECHANYA	en
dc.contributor.coadvisor	โสภณ ผู้มีจรรยา	th
dc.contributor.emailadvisor	phumeechanya_s@su.ac.th
dc.contributor.emailcoadvisor	phumeechanya_s@su.ac.th
dc.description.degreename	Master of Engineering (M.Eng.)	en
dc.description.degreename	วิศวกรรมศาสตรมหาบัณฑิต (วศ.ม)	th
dc.description.degreelevel	Master's Degree	en
dc.description.degreelevel	ปริญญาโท	th
dc.description.degreediscipline	ELECTRICAL ENGINEERING	en
dc.description.degreediscipline	วิศวกรรมไฟฟ้า	th
Appears in Collections:	Engineering and Industrial Technology

Files in This Item:

File	Description	Size	Format
640920030.pdf		11.93 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets