การจำแนกข้อมูลที่ไม่สมบูรณ์ด้วยลำดับเวลาของการตรวจโรคที่แตกต่างกันจากชุดข้อมูลลำดับเวลา

ศิลาจันทร์, เกล้ากัลยา; Silachan, Klaokanlaya

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/210

Title:	การจำแนกข้อมูลที่ไม่สมบูรณ์ด้วยลำดับเวลาของการตรวจโรคที่แตกต่างกันจากชุดข้อมูลลำดับเวลา
Other Titles:	CLASSIFYING INCOMPLETE DATA WITH DISTINCT DIAGNOSTIC TIME SEQUENCE OF TEMPORAL MEDICAL DATA
Authors:	ศิลาจันทร์, เกล้ากัลยา Silachan, Klaokanlaya
Keywords:	ชุดข้อมูลลำดับเวลา การแปลงค่า จำแนกข้อมูลเชิงเวลา ประมาณค่าสูญหาย TEMPORAL DATA TEMPORAL CLASSIFIER IMPUTATION TRANFORM
Issue Date:	5-Aug-2559
Publisher:	มหาวิทยาลัยศิลปากร
Abstract:	การศึกษาวิจัยในครั้งนี้ มีวัตถุประสงค์ดังนี้ 1. เพื่อพัฒนาวิธีการในการประมาณค่าสูญหายจากชุดข้อมูลทางการแพทย์ลำดับเวลาบนแนวคิดค่าตัวชี้วัดของผู้มารับการรักษาเป็นค่าเฉพาะของแต่ละบุคคลหรือจากค่าที่เหมือนหรือใกล้เคียงของแต่ละบุคคล 2. เพื่อพัฒนาวิธีการในการแปลงรูปแบบของข้อมูลจากชุดข้อมูลทางการแพทย์ลำดับเวลาแต่ยังคงประสิทธิภาพในการจำแนกประเภท ในงานวิจัยนี้ใช้ชุดข้อมูลของผู้ป่วยที่มาทำการตรวจโรคอ้วนที่ศูนย์โรคหลอดเลือดและหัวใจและเมตาลิค โรงพยาบาลรามาธิบดี จำนวน 458 คน รวม 1,215 ระเบียน และ ชุดข้อมูลผู้ป่วยโรคหลอดเลือดสมองชนิดอุดตัน จาก PKDD’02 จำนวน 93 คน รวม 3,010 ระเบียน ผลการวิจัยตามวัตถุประสงค์ที่ 1 ผู้วิจัยได้พัฒนาวิธีการในการประมาณค่าสูญหายในชุดข้อมูลเชิงเวลา 6 วิธี คือ NFDCs-DPimpute, NFDCsSlideW-DPimpute, CB-Extra-DPimpute, S-knn-DPimpute, SLLS-DPimpute และ DPimpute จากนั้นได้ทำการประเมินประสิทธิภาพ และความแม่นยำของการประมาณค่าสูญหายด้วยการประเมินค่าความคลาดเคลื่อนด้วย Normal root mean square error (NRSME) จากผลการวิจัยพบว่าวิธีการ NFDCs-DPimpute จะเป็นวิธีการที่ให้ค่าการประมาณสูญหายที่ให้ผลดีกว่าวิธีอื่น ในส่วนผลการวิจัยตามวัตถุประสงค์ข้อที่ 2 นั้น ผู้วิจัยได้นำเสนอการพัฒนาวิธีหาตัวแทนเพื่อลดมิติข้อมูลคือ วิธีการ Inner distance combination transform (IDCT) เพื่อหาตัวแทนชุดข้อมูลในลักษณะค่าเดียวในชุดข้อมูลลำดับเวลา และนำชุดตัวแทนข้อมูลนำเข้าเพื่อพิจารณาความแม่นยำในการจำแนกประเภทด้วยค่าความแม่นยำ (accuracy) และได้ทำการเปรียบเทียบประสิทธิภาพการจำแนกกับชุดข้อมูลที่เป็นตัวแทนด้วยวิธีการทางสถิติด้วยวิธีหาค่าเฉลี่ย (mean) ค่ามัธยฐาน (median) ค่าเบี่ยงแบนมาตรฐาน (standard deviation) และค่าความแปรปรวน (variance) จากการวิจัยพบว่าวิธี IDTC ให้ค่าความแม่นยำที่ดีกว่าวิธีอื่นในการหาค่าตัวแทนที่ใช้ในการจำแนกด้วยเทคนิคเหมืองข้อมูลซัพพอร์ตเวกเตอร์แมชชีน ผู้วิจัยได้นำวิธีที่ดีได้คัดเลือกไว้มาพัฒนาเป็นขั้นตอนวิธีสำหรับการประมาณค่าสูญหายและลดมิติข้อมูลลำดับเชิงเวลาทางการแพทย์รวมเรียกว่าวิธีการ NFDCs-DPimpute-IDTC The objectives of this research were 1. to develop methods for data imputation of medical temporal data set based on individual indicators or on similar indicators of each individual and 2. to develop a method to transform data patterns of medical temporal data set but still maintain the performance of classification. The data used in this research are real medical data of patients from the Cardiovascula and Metabolic Center, Ramathibodi Hospital. The data includes 1,215 medical records of 458 patients diagnosed with obesity and 3,010 medical records of 93 patients diagnosed with stroke from PKDD’02. According to the result of the first research objective, the researcher developed six methods to measure the data imputation of time-series data set, namely NFDCs-DPimpute, NFDCsSlideW-DPimpute, CB-Extra-DPimpute, S-knn-DPimpute, SLLS-DPimpute and DPimpute. Then the performance and the accuracy of data imputation were evaluated by measuring the standard error with Normal root mean square error (NRSME). The result showed that among the six methods, the NFDCs-DPimpute contributed the better data imputation as compared to the other methods. According to the result of the second research objective, the researchers proposed the Inner distance combination transform (IDCT) method to find the representation in order to reduce data dimension. This method found the representation of a data set as a single value in temporal medical data and then it was measured to find the classification with accuracy. The performance of the classification of the data set of the representation with statistical analysis method, namely mean, median, standard deviation, and variance. The result showed that the IDTC method contributes the better accuracy as compared to other methods used to study with the support vector machine classification model. As a consequence, the researcher adopted the abovementioned methods and developed an algorithm called NFDCs-DPimpute-IDTC to impute data as well as to reduce the dimensions of data in temporal dataset.
Description:	53307803 ; สาขาวิชาวิทยาการคอมพิวเตอร์และสารสนเทศ -- เกล้ากัลยา ศิลาจันทร์
URI:	http://ithesis-ir.su.ac.th/dspace/handle/123456789/210
Appears in Collections:	Science

Files in This Item:

File	Description	Size	Format
53307803.pdf		5.53 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets