Posted: September 13th, 2023

BUA 6315: Business Analytics for Decision Making

1

Final Project Data Mining Handout: Dataset 1

If you are using the college admission data, follow the instructions below to complete the Data Mining

prompts in Sections 3 and 4 of the final project. Use the entire dataset which has 17339 datapoints.

Transform the College GPA to a Transfer dummy using the following function: =if(College_GPA=””,

0,1).

To complete the data mining prompts in Sections 3 and 4 of your final project, you must subset the

colleges and select the following two colleges: Business & Economics and Mathematics & Science to

apply the data mining techniques that you learned in Chapter 9, Chapter 10, and Chapter 11 of your

textbook.

Part I: Supervised Data Mining

This part will help you prepare your data for the prompts that involve supervised data mining in Sections 3

and 4 of your final project submission. For more information about what must be included in your final

report, see the Final Project document, available in Blackboard.

Step 1: Methodology

Choose either KNN algorithm or Decision Tree model based on the insights you want to gain from the data.

You will need to be able to explain the motivation for using the model you have chosen in Section 3 of your

final project submission.

Step 2: Analysis and Results

Choose ONE of the following sets of instructions to prepare your data, depending on the model you selected

in Step 1, in order to address the prompts related to data mining in Section 4 of your final project

submission.

Please scroll down the page!!!

BUA 6315: Business Analytics for Decision Making

2

KNN Algorithm

Note that you will perform KNN Algorithm for each

college.

Decision Tree

Note that you will construct a decision tree model

for each college.

1. For each college, perform KNN analysis on the

data set to predict whether an applicant will

eventually decide to enroll at the college using

predictor variables such as gender, race,

SAT/ACT, HSGPA, and parent’s education level.

Note: You need to transform all categorical

variables to numerical variables by creating

dummy variables if there are. For example:

gender needs to be transformed.

1. For each college, create a classification tree

model to predict which college is most likely to

accept a given university applicant based on the

applicant’s gender, race, high school GPA,

SAT/ACT score, and parent’s education level.

Note: You need to transform all categorical

variables to numerical variables by creating

dummy variables if there are. For example:

gender needs to be transformed.

2. For each college, partition the data with 50% for

the training set, 30% for Validation set, and 20%

for the test set.

3. For each college, report the accuracy,

specificity, sensitivity, and precision rates for the

test data set in a table. Interpret your results.

4. For each college, inspect performance charts

and report the area under the ROC curve

Please scroll down the page!!!

BUA 6315: Business Analytics for Decision Making

3

3. For each college, partition the data with 50% for

training set, 30% for validation set, and 20% for

test set.

4. For each college, report the accuracy, specificity,

sensitivity, and precision rates for the test data

set in a table. Interpret your results.

5. For each college, inspect performance charts

and report the area under the ROC curve

(AUC). Comment on the performance of the

KNN classification model based on AUC.

(AUC). Comment on the performance of the

classification tree based on AUC.

Please scroll down the page!!!

BUA 6315: Business Analytics for Decision Making

4

Part II: Unsupervised Data Mining

This part will help you prepare your data for the prompts that involve unsupervised data mining in Sections 3

and 4 of your final project submission. For more information about what must be included in your final

report, see the Final Project document, available in Blackboard.

Step 1: Methodology

Choose either the Hierarchical or K-Means clustering model, based on the insights you want to gain from

the data. You will need to be able to explain the motivation for using the model you have chosen in Section

3 of your final project submission.

Step 2: Analysis and Results

Choose ONE of the following sets of instructions to prepare your data, depending on the model you selected

in Step 1, in order to address the prompts related to data mining in Section 4 of your final project

submission.

Hierarchical Clustering

Note that you will perform clustering for each

college.

K-Means Clustering

Note that you will perform clustering for each

college.

1. Perform agglomerative hierarchical clustering to

group college applicants who are admitted and

enrolled in the business and economics

according to numerical

1. Perform k-means clustering to group college

applicants who are admitted and enrolled in

the business and economics according to

numerical

BUA 6315: Business Analytics for Decision Making

5

values such as high school GPA, SAT/ACT

score, college GPA, parents’ education. You

need to subset the data first to include only

college applicants in the business and

economics who are both admitted and enrolled.

2. Use the Euclidean distance and the average

linkage clustering to cluster the data into

three clusters.

3. Do you need to standardize data? Explain your

reasoning.

4. Describe each cluster and write a report based

on the clustering results. Here you can take

the average of numerical variables and

summarize your results in a table as explained

in the video “Using Analytic Solver to Perform

Agglomerative Clustering”.

5. Include a table that summarizes your results

for each cluster. You can find a sample

summary table below.

values such as high school GPA, SAT/ACT

score, college GPA, parents’ education using

k=3. You need to subset the data first to include

only college applicants in the business and

economics who are both admitted and enrolled.

2. Do you need to standardize data? Explain your

reasoning.

3. Describe each cluster and write a report based

on the clustering results. Here you can take

the average of numerical variables and

summarize your results in a table as explained

in the video “Using Analytic Solver to Perform

Agglomerative Clustering”.

4. Include a table that summarizes your results

for each cluster. You can find a sample

summary table below.

Clusters HSGPA SAT/ACT College GPA Mother’s

Education

Father’s

Education

Cluster 1

Average Average Average

Average

Average

Cluster 2

Average Average Average

Average

Average

Cluster 3

Average Average Average

Average

Average

Place an order in 3 easy steps. Takes less than 5 mins.

affordablepaperwritings.com