Improving Student Registration Data Quality Using Data Cleansing Techniques Incomplete and Incorrect Data: A PROTOTYPE-BASED EVALUATION

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/5949

Title:	Improving Student Registration Data Quality Using Data Cleansing Techniques Incomplete and Incorrect Data: A PROTOTYPE-BASED EVALUATION การใช้เทคนิค Data Cleansing เพื่อปรับปรุงคุณภาพข้อมูลทะเบียนนักศึกษาและประเมินผลโดยระบบต้นแบบ
Authors:	Anek RUNGNARAI เอนก รุ่งนาไร่ Orawan Chaowalit อรวรรณ เชาวลิต Silpakorn University Orawan Chaowalit อรวรรณ เชาวลิต ochaowalit@hotmail.com ochaowalit@hotmail.com
Issue Date:	28
Publisher:	Silpakorn University
Abstract:	This research aims to address data quality problems in the student registration system of a private university, which include duplicate records, missing values, invalid formats, and inconsistencies across tables. Such issues directly affect administrative processes, policy decision-making, and academic reporting to regulatory authorities, creating risks in the accuracy and reliability of academic information. Therefore, this study focuses on designing and developing a systematic data cleansing process to improve the quality of student registration data and ensure that it meets practical and institutional standards. The research methodology is structured into four main levels: (1) Rule-Based Cleaning, where explicit logical rules are applied to detect and correct erroneous values; (2) Software-Assisted Cleaning, using tools such as OpenRefine to perform text clustering, format validation, and semi-automated corrections; (3) Machine Learning-Based Cleaning, particularly anomaly detection with algorithms such as One-Class SVM to identify abnormal or outlier data; and (4) Manual Cleaning, reserved for small-scale or complex cases that automated methods cannot fully handle. The overall process adopts an ETL (Extract–Transform–Load) approach. Data were extracted from 21 tables in the legacy registration database and underwent data profiling to assess completeness, accuracy, and consistency. Cleansing techniques were then applied systematically, followed by post-cleansing evaluation. The results demonstrate that data completeness increased from 66.4% to 100%, while accuracy and validity improved significantly to meet required standards. Duplicates and inconsistencies were also reduced to a substantial extent. งานวิจัยนี้มีวัตถุประสงค์เพื่อแก้ไขปัญหาคุณภาพข้อมูลในระบบทะเบียนนักศึกษาของมหาวิทยาลัยเอกชน ซึ่งพบว่ามีข้อบกพร่องหลายประการ เช่น ข้อมูลซ้ำซ้อน ข้อมูลขาดหาย ข้อมูลไม่เป็นไปตามรูปแบบที่กำหนด และความไม่สอดคล้องกันระหว่างตารางข้อมูล ปัญหาเหล่านี้ส่งผลกระทบโดยตรงต่อกระบวนการบริหารจัดการและการตัดสินใจเชิงนโยบาย อีกทั้งยังสร้างความเสี่ยงต่อความถูกต้องของข้อมูลเชิงวิชาการและการรายงานต่อหน่วยงานกำกับดูแล งานวิจัยนี้จึงมุ่งเน้นการศึกษาและพัฒนากระบวนการทำความสะอาดข้อมูล (Data Cleansing) เพื่อยกระดับคุณภาพข้อมูลให้อยู่ในเกณฑ์มาตรฐานที่สามารถนำไปใช้ประโยชน์ได้จริง แนวทางการวิจัยประกอบด้วย 4 ระดับหลัก ได้แก่ (1) การทำความสะอาดข้อมูลด้วยกฎเชิงตรรกะ (Rule-Based Cleaning) เพื่อกำหนดเงื่อนไขที่ชัดเจนในการตรวจสอบและแก้ไขข้อมูล (2) การใช้เครื่องมือซอฟต์แวร์ เช่น OpenRefine ในการจัดกลุ่มข้อความ (Text Clustering) และการแก้ไขข้อมูลผิดรูปแบบ (3) การประยุกต์ใช้เทคนิคเชิงการเรียนรู้ของเครื่อง (Machine Learning-Based Cleaning) โดยเฉพาะการตรวจจับค่าผิดปกติด้วยอัลกอริทึม เช่น One-Class SVM เพื่อค้นหาข้อมูลที่แตกต่างจากรูปแบบปกติ และ (4) การตรวจสอบและปรับปรุงข้อมูลด้วยวิธีการเชิงมนุษย์ (Manual Cleaning) สำหรับกรณีข้อมูลจำนวนน้อยหรือมีความซับซ้อนสูงที่ระบบอัตโนมัติไม่สามารถจัดการได้ กระบวนการดำเนินงานวิจัยใช้แนวคิดเชิง ETL (Extract–Transform–Load) โดยเริ่มจากการดึงข้อมูลจากฐานข้อมูลทะเบียนเดิมจำนวน 21 ตาราง มาทำการวิเคราะห์คุณภาพเบื้องต้น (Data Profiling) เพื่อตรวจสอบความสมบูรณ์ ความถูกต้อง และความสอดคล้องของข้อมูล จากนั้นจึงนำเทคนิคการทำความสะอาดที่เลือกใช้มาใช้แก้ไขข้อมูล และทำการตรวจสอบซ้ำหลังการปรับปรุงเพื่อติดตามผลลัพธ์ ผลการวิจัยพบว่าหลังการดำเนินการ Data Cleansing ค่าความครบถ้วน (Completeness) ของข้อมูลเพิ่มขึ้นจากร้อยละ 66.4 เป็น 100 ความถูกต้อง (Accuracy) และความเป็นไปตามรูปแบบ (Validity) ได้รับการปรับปรุงให้สมบูรณ์ทั้งหมด อีกทั้งยังสามารถลดความซ้ำซ้อน (Uniqueness) และความไม่สอดคล้องกัน (Consistency) ได้อย่างมีนัยสำคัญ ผลลัพธ์ชี้ให้เห็นว่ากระบวนการ Data Cleansing ที่ออกแบบขึ้นสามารถปรับปรุงคุณภาพข้อมูลทะเบียนนักศึกษาได้อย่างเป็นระบบ และสามารถพัฒนาเป็นต้นแบบสำหรับการจัดการข้อมูลในมหาวิทยาลัยอื่น ๆ ได้ในอนาคต ทั้งนี้ ประโยชน์สำคัญที่ได้รับไม่เพียงแต่ช่วยลดความผิดพลาดในการจัดการข้อมูล แต่ยังช่วยเพิ่มประสิทธิภาพในการให้บริการนักศึกษาและบุคลากร อีกทั้งยังเสริมสร้างความเชื่อมั่นในการนำข้อมูลไปใช้ในการวางแผนและการตัดสินใจเชิงกลยุทธ์ของผู้บริหาร
URI:	http://ithesis-ir.su.ac.th/dspace/handle/123456789/5949
Appears in Collections:	Science

Files in This Item:

File	Description	Size	Format
660720067.pdf		6.53 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets