Posted: September 13th, 2023
BUA 6315: Business Analytics for Decision Making
1
Final Project Data Mining Handout: Dataset 1
If you are using the college admission data, follow the instructions below to complete the Data Mining
prompts in Sections 3 and 4 of the final project. Use the entire dataset which has 17339 datapoints.
Transform the College GPA to a Transfer dummy using the following function: =if(College_GPA=””,
0,1).
To complete the data mining prompts in Sections 3 and 4 of your final project, you must subset the
colleges and select the following two colleges: Business & Economics and Mathematics & Science to
apply the data mining techniques that you learned in Chapter 9, Chapter 10, and Chapter 11 of your
textbook.
Part I: Supervised Data Mining
This part will help you prepare your data for the prompts that involve supervised data mining in Sections 3
and 4 of your final project submission. For more information about what must be included in your final
report, see the Final Project document, available in Blackboard.
Step 1: Methodology
Choose either KNN algorithm or Decision Tree model based on the insights you want to gain from the data.
You will need to be able to explain the motivation for using the model you have chosen in Section 3 of your
final project submission.
Step 2: Analysis and Results
Choose ONE of the following sets of instructions to prepare your data, depending on the model you selected
in Step 1, in order to address the prompts related to data mining in Section 4 of your final project
submission.
Please scroll down the page!!!
BUA 6315: Business Analytics for Decision Making
2
KNN Algorithm
Note that you will perform KNN Algorithm for each
college.
Decision Tree
Note that you will construct a decision tree model
for each college.
1. For each college, perform KNN analysis on the
data set to predict whether an applicant will
eventually decide to enroll at the college using
predictor variables such as gender, race,
SAT/ACT, HSGPA, and parent’s education level.
Note: You need to transform all categorical
variables to numerical variables by creating
dummy variables if there are. For example:
gender needs to be transformed.
1. For each college, create a classification tree
model to predict which college is most likely to
accept a given university applicant based on the
applicant’s gender, race, high school GPA,
SAT/ACT score, and parent’s education level.
Note: You need to transform all categorical
variables to numerical variables by creating
dummy variables if there are. For example:
gender needs to be transformed.
2. For each college, partition the data with 50% for
the training set, 30% for Validation set, and 20%
for the test set.
3. For each college, report the accuracy,
specificity, sensitivity, and precision rates for the
test data set in a table. Interpret your results.
4. For each college, inspect performance charts
and report the area under the ROC curve
Please scroll down the page!!!
BUA 6315: Business Analytics for Decision Making
3
3. For each college, partition the data with 50% for
training set, 30% for validation set, and 20% for
test set.
4. For each college, report the accuracy, specificity,
sensitivity, and precision rates for the test data
set in a table. Interpret your results.
5. For each college, inspect performance charts
and report the area under the ROC curve
(AUC). Comment on the performance of the
KNN classification model based on AUC.
(AUC). Comment on the performance of the
classification tree based on AUC.
Please scroll down the page!!!
BUA 6315: Business Analytics for Decision Making
4
Part II: Unsupervised Data Mining
This part will help you prepare your data for the prompts that involve unsupervised data mining in Sections 3
and 4 of your final project submission. For more information about what must be included in your final
report, see the Final Project document, available in Blackboard.
Step 1: Methodology
Choose either the Hierarchical or K-Means clustering model, based on the insights you want to gain from
the data. You will need to be able to explain the motivation for using the model you have chosen in Section
3 of your final project submission.
Step 2: Analysis and Results
Choose ONE of the following sets of instructions to prepare your data, depending on the model you selected
in Step 1, in order to address the prompts related to data mining in Section 4 of your final project
submission.
Hierarchical Clustering
Note that you will perform clustering for each
college.
K-Means Clustering
Note that you will perform clustering for each
college.
1. Perform agglomerative hierarchical clustering to
group college applicants who are admitted and
enrolled in the business and economics
according to numerical
1. Perform k-means clustering to group college
applicants who are admitted and enrolled in
the business and economics according to
numerical
BUA 6315: Business Analytics for Decision Making
5
values such as high school GPA, SAT/ACT
score, college GPA, parents’ education. You
need to subset the data first to include only
college applicants in the business and
economics who are both admitted and enrolled.
2. Use the Euclidean distance and the average
linkage clustering to cluster the data into
three clusters.
3. Do you need to standardize data? Explain your
reasoning.
4. Describe each cluster and write a report based
on the clustering results. Here you can take
the average of numerical variables and
summarize your results in a table as explained
in the video “Using Analytic Solver to Perform
Agglomerative Clustering”.
5. Include a table that summarizes your results
for each cluster. You can find a sample
summary table below.
values such as high school GPA, SAT/ACT
score, college GPA, parents’ education using
k=3. You need to subset the data first to include
only college applicants in the business and
economics who are both admitted and enrolled.
2. Do you need to standardize data? Explain your
reasoning.
3. Describe each cluster and write a report based
on the clustering results. Here you can take
the average of numerical variables and
summarize your results in a table as explained
in the video “Using Analytic Solver to Perform
Agglomerative Clustering”.
4. Include a table that summarizes your results
for each cluster. You can find a sample
summary table below.
Clusters HSGPA SAT/ACT College GPA Mother’s
Education
Father’s
Education
Cluster 1
Average Average Average
Average
Average
Cluster 2
Average Average Average
Average
Average
Cluster 3
Average Average Average
Average
Average
Place an order in 3 easy steps. Takes less than 5 mins.