Build Process: Student Dropout Analysis

In late 2022, my team and I participated in the SSIP-22 State Level Hackathon, a challenge that brought together innovators to solve real-world problems. Our project, the "Student Dropout Analysis" platform, not only met the requirements but went on to win the competition for our problem statement and later evolved into a published research paper.

This post breaks down our entire build process, from initial concept to final deployment and beyond.

The Problem Statement

The challenge was to address the critical issue of student dropouts in the education system. The official problem statement was:

To design a streamlined architecture and a dashboard for student dropout analysis, providing analytics based on age group, caste, school, demographic location, and gender.

The existing data, while available through government sources like UDISE+, was presented in raw tabular formats, lacking the visualization and predictive power needed for effective intervention by administrators.

Our Approach: From Data to Prediction

We structured our solution into three core phases: sourcing and preparing the data, building an interactive analytics platform, and implementing a predictive model to forecast future trends.

Phase 1: Data Sourcing and Preparation

Our journey began with data. We utilized the official government data from the UDISE+ (Unified District Information System for Education Plus) portal. While comprehensive, this data required significant cleaning and preprocessing to be useful for machine learning.

This process led to the creation of our custom dataset, which we named EduDropX.

# A conceptual overview of our data pipeline
import pandas as pd

# 1. Load raw data from UDISE+ source
raw_data = pd.read_csv('udise_plus_data.csv')

# 2. Clean and preprocess the data
# - Handle missing values
# - Standardize demographic categories (caste, gender, location)
# - Feature engineering to create relevant metrics for analysis
cleaned_data = preprocess_data(raw_data)

# 3. Structure the data for modeling and visualization
# This became our EduDropX dataset
edudropx_dataset = create_features(cleaned_data)
edudropx_dataset.to_csv('EduDropX.csv', index=False)

Phase 2: Building the Analytics Dashboard

With a clean dataset, we developed a full-stack solution. This included a data pipeline API to ingest and process information and a web-based dashboard. The dashboard was designed for school administrators and policymakers, allowing them to:

Visualize dropout rates across different demographics.
Filter data by school, district, age group, and more.
Identify hotspots and trends that were previously hidden in raw data sheets.

Phase 3: Predictive Modeling for Proactive Intervention

We wanted to go beyond just retrospective analysis. To provide a forward-looking tool, we implemented machine learning models to predict future dropout rates. We experimented with several regression techniques and found that Multiple Linear Regression and Polynomial Regression yielded the best results on our EduDropX dataset.

These models could forecast potential dropout numbers based on the input demographic factors, enabling administrators to take proactive measures.

Results and Key Achievements

Our comprehensive approach and robust implementation led to incredible outcomes, both in the hackathon and in academic circles.

🏆

State-Level Hackathon Winners! Our end-to-end solution was recognized as the winning project for this problem statement (Team ID: TM000112). You can view the official result on the SSIP Gujarat Hackathon 2022 Winners page.

📊

Exceptional Model Performance: Our regression models achieved an R² value of 0.9976 and a Mean Absolute Error (MAE) of just 0.2311 on our custom EduDropX dataset, indicating extremely high accuracy in our predictions.

From Hackathon to Publication

The project's journey didn't end at the hackathon. Recognizing the value and novelty of our work, we extended our research, refined our methodology, and documented our findings in a formal research paper.

We are proud that this work was accepted and published in the IEEE International Conference for Convergence in Technology (I2CT) 2024.

📄

You can read the full research paper here: A Novel Approach to Predict the Student Dropout Rate Using Regression

This project was a testament to how a well-defined problem, combined with a structured data-driven approach, can lead to impactful solutions that are recognized by both industry and academia.