Regularization in High Dimensional Logistic Regression Model by using Adaptive LASSO Method

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/3898

Title:	Regularization in High Dimensional Logistic Regression Model by using Adaptive LASSO Method การเรกูลาไรซ์ในตัวแบบถดถอยโลจิสติกที่ข้อมูลมีมิติสูงโดยใช้วิธีแลซโซแบบปรับได้
Authors:	Wasurat KHUMPASEE วสุรัตน์ ขำภาษี kannigarh Hirunkasi กรรณิกาณ์ หิรัญกสิ Silpakorn University. Science
Keywords:	ข้อมูลที่มีมิติสูง ตัวแบบถดถอยโลจิสติกแบบบางเบา การเรกูลาไรซ์ แลซโซแบบปรับได้ ค่าถ่วงน้ำหนัก High-dimensional data Sparse Logistic Rgression Model Regularization Adaptive LASSO Initial Weights
Issue Date:	1
Publisher:	Silpakorn University
Abstract:	Regularization or penalized logistic regression is widely used to estimate parameters for the high dimensional data. The purpose of this research was to compare the performance of three Adaptive LASSO (Least absolute shrinkage and selection operator)-types for logistic regression in high-dimensional sparse data; Adaptive LASSO using ridge initial weights, and Adaptive LASSO using LASSO initial weights and Adaptive LASSO using Stein-Ridge initial weight and also compared with LASSO under various conditions. The simulation study parameter setting was two cases of sample sizes as n=100, 200, number of quantitative predictors were 2n, 3n, and 4n and additioning 4 binary variables. There are two cases of the relationship structures between predictors and numbers of non-zero regression coefficients of quantitative predictors were 5, 10, and 15 predictors. For each condition, data was iteratively simulated 500 times. For the performance comparison, accuracy of prediction was measured by sensitivity, specificity, and area under ROC curve. The accuracy of parameter estimation was measured by of mean squared error of logistic regression coefficients estimate and variable selection performance was computed by the percentage of non-effected variables including in the model and percentage of effected predictors excluded from the model. The results showed that the Adaptive LASSO method using ridge initial weight had the best performances for all criterion when there are 5 nonzero coefficients of quantitative predictors in the sparse model and the Adaptive LASSO method using Stein-Ridge initial weight had the best performances when there are 10 and 15 nonzero coefficients of quantitative predictors in the sparse model in all cases of other parameters. การเรกูลาไรซ์หรือการวิเคราะห์การถดถอยโลจิสติกแบบพีนอลไลซ์เป็นวิธีที่ใช้กันอย่างแพร่หลายในการประมาณค่าพารามิเตอร์เมื่อข้อมูลมีมิติสูง งานวิจัยนี้มีวัตถุประสงค์เพื่อทำการศึกษาและเปรียบเทียบประสิทธิภาพการประมาณค่าสัมประสิทธิ์การถดถอยและการคัดเลือกตัวแปรของการเรกูลาไรซ์โดยใช้วิธีแลซโซแบบปรับได้ของตัวแบบถดถอยโลจิสติกในกรณีที่ข้อมูลมีมิติสูงแบบบางเบาทั้ง 3 วิธี ได้แก่ วิธีแลซโซแบบปรับได้ที่ถ่วงน้ำหนักด้วยตัวประมาณริดจ์ (Adaptive LASSO using ridge initial weight), วิธีแลซโซแบบปรับได้ที่ถ่วงน้ำหนักด้วยตัวประมาณแลซโซ (Adaptive LASSO using LASSO initial weight), และวิธีแลซโซแบบปรับได้ที่ถ่วงน้ำหนักด้วยตัวประมาณสไตน์-ริดจ์ (Adaptive LASSO using Stein-Ridge initial weight) รวมทั้งเปรียบเทียบกับวิธีแลซโซ (LASSO) ภายใต้สถานการณ์ที่มีขนาดตัวอย่างคือ 100 และ 200 มีจำนวนตัวแปรอธิบายแบบต่อเนื่องเป็นจำนวน 2, 3, และ 4 เท่าของขนาดตัวอย่าง รวมถึงจำนวนตัวแปรอธิบายแบบไม่ต่อเนื่องจำนวน 4 ตัวแปร รูปแบบความสัมพันธ์ระหว่างตัวแปรอธิบายแตกต่างกัน และจำนวนตัวแปรแบบต่อเนื่องที่สัมประสิทธิ์การถดถอยที่ไม่เท่ากับศูนย์ เท่ากับ 5, 10, และ 15 ตัว ในการจำลองแต่ละสถานการณ์ ทำซ้ำจำนวน 500 รอบ เกณฑ์ที่ใช้ในการทดสอบประสิทธิภาพ คือ ความถูกต้องของการทำนายจากค่าเฉลี่ยของค่าความไว ค่าความจำเพาะ และค่าพื้นที่ใต้โค้งของกราฟ Receiver Operating Characteristic (ROC) ความถูกต้องของการประมาณค่าจากค่าคลาดเคลื่อนกำลังสองเฉลี่ยของค่าประมาณสัมประสิทธิ์การถดถอย และความถูกต้องของการคัดเลือกตัวแปร ผลการวิจัยพบว่า วิธีที่มีประสิทธิภาพดีที่สุดทั้งการทำนาย การประมาณค่า และการคัดเลือกตัวแปร คือ วิธีแลซโซแบบปรับได้ที่ถ่วงน้ำหนักด้วยตัวประมาณวิธีริดจ์ เมื่อมีตัวแปรอธิบายแบบต่อเนื่องที่สัมประสิทธิ์การถดถอยไม่เท่ากับ 0 จำนวน 5 ตัวในตัวแบบบางเบา และวิธีแลซโซแบบปรับได้ที่ถ่วงน้ำหนักด้วยตัวประมาณวิธีสไตน์-ริดจ์ เมื่อมีตัวแปรอธิบายแบบต่อเนื่องที่สัมประสิทธิ์การถดถอยไม่เท่ากับ 0 จำนวน 10 และ 15 ตัวอยู่ในตัวแบบบางเบา ในทุกกรณีของค่าพารามิเตอร์อื่น ๆ
Description:	Master of Science (M.Sc.) วิทยาศาสตรมหาบัณฑิต (วท.ม)
URI:	http://ithesis-ir.su.ac.th/dspace/handle/123456789/3898
Appears in Collections:	Science

Files in This Item:

File	Description	Size	Format
61304202.pdf		5.27 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets