Module 23: Risk Scoring in AccuCampus
Topic: Risk Scoring
- Creating a New Model
- Analysis Results
- Processing
- Viewing Risk Scores
- Reports Available
Risk Scoring
What is machine learning? A statistical engine that processes the information in your account to try to predict whether a student is likely to return to the next term. To do so, it uses historical information and multiple algorithms to determine which ones work best in your institution’s particular case. The historical information includes information on current and former students in previous semesters that allows the prediction models to understand and recognize patterns in how the students behaved, taking into account their own particular context inside and outside the institution.
What data is required to run machine learning? There are no required attributes for the algorithm to run. We have a list of recommended attributes which we can provide as a separate document, and institutions may choose to utilize other attributes as well. We also recommend including student sign-ins to various locations as part of the algorithm. This does not require an extra import.
What can machine learning predict? We generate an algorithm that is unique to the institution and identifies potential risk factors that could lead to students not returning in the next term, and which recalculates every week as new information is added to the system. This algorithm produces a risk score that indicates the student's likelihood to return to the next term, which is viewable by staff with appropriate permissions, and which can be used for outreach efforts.
What algorithms are used? We run different models based on well-known algorithms (specifically Random Forest, SVM, Neural Networks and Logistic Regression). The system pre-processes the data to improve accuracy (BoxCox to avoid skewed data sets, create dummy variables, etc) and executes the aforementioned models several times with several different parameters to achieve better results. The best model is picked and used for future estimations.
What metrics are used to evaluate the results? The resulting models are evaluated using the AUC-ROC algorithm. AUC-ROC (sometimes called AUROC or just AUC) is a simple method that compares the number of correct predictions against the number of total predictions made. A result of 1 would mean that all the predictions were correct (not achievable in social models), while a result of 0.5 means that half of the predictions were correct, which would be completely random results (like flipping a coin) and considered a bad result. AccuCampus translates this 0.5 to 1 scale into a more friendly text-based scale, from “Bad” to “Excellent” and displays the result once the risk model is analyzed and processed.
How is weighting accomplished? The predictive model determines which attributes are most likely to be associated with student persistence based on historical data. As the algorithm calculates over multiple terms and more and more information is brought into AccuCampus, the risk score is more likely to be accurate.
Who would be the ideal team to manage the Machine Learning component? We recommend the Project Manager enlist a representative from Institutional Research to determine which attributes should be included, and whether one algorithm will work for all students, or if multiple algorithms should be used to account for a unique student population. The account’s IT contact should also be involved to set up the imports of the attributes from other data silos into AccuCampus. The Project Manager will need to set up the profile before the attributes are imported, or they will need to designate someone to do that.
How will I know if my data is good? Our algorithms deal with missing data by using the mean value when the number of missing values is relatively small. If there are a large number of missing values, the entire field/column is dropped and not used. It’s not possible for the ML algorithm to identify and deal with incorrect data. In general if the incorrect data is random and not biased, it won’t impact in the results. If, however, the data is incorrectly biased, the results will also be biased. This latter case should not be common though and, if present, must be fixed before importing the data. We recommend reviewing all data for errors or inconsistencies prior to importing it into AccuCampus.
After creating the risk model, but before processing it, view the analysis of the model. This screen will show each feature that was included in the model as a graph with a visual breakdown of the student population for that factor. These graphs can be used to 1) identify any issues with the data that might have been missed prior to importing and 2) help the user determine if the model should be adjusted before processing. There is also a graph showing what the system believes are the most important features based on the data. Again, this graph allows the model to be tweaked based on the user’s knowledge of the institution. Tweaking the model would include excluding features or creating new combinations of features to include. Once satisfied, the user can then run the model by clicking Process. This will start the machine learning system.
Creating a New Model
Your risk assessment model can be based on any demographic or tracking information available within the system. Before creating a model, check to make sure that your data is correct and not missing anything.
Create a User Profile
The first step in creating a risk assessment model is to import your demographic data. To do so, you will need to create User Profile(s). For more information on how to create a user profile, see Module 1. For more information on how to set up imports, see Module 2.
Create a Risk Assessment Model
The next step is to create your risk assessment model. You can create one model for your entire student population, or create several models to cover specific populations. The models are differentiated by what you choose to include and exclude when creating them.
From the Main Sidebar, hover over Institutional Research.
Click Risk Scoring.
Click Create New.
Enter a Name for the model. If you will have more than one model, we recommend using a unique and specific name so that other users will understand what the model is for.
Choose which demographic features to exclude when making predictions. These will be the unique keys used in your user profiles.
Choose to exclude user types from your model.
If desired, create new features to include when making predictions. These are typically created by combining features.
Add semester precedences. These tell the system how semesters proceed from one to the next. We recommend including at least Spring and Fall semesters, but you can include all semesters if desired. Keep in mind that not all students will proceed through the semesters the same.
If you are ready to analyze your model’s results, click Save and Analyze. Otherwise, click Save in order to return to the model for further modifications.
Analysis Results
Once you click Save and Analyze, the system will begin analyzing the data that will be included in your model to come up with the best algorithmic model to use. The Analysis screen will show you each feature that was included in the model as a graph with a visual breakdown of the student population for that factor as well as a graph that shows you the weighting that the system has applied for each feature. These graphs can be used to 1) identify any issues with the data that might have been missed prior to importing and 2) help you determine if the model should be adjusted before processing. Tweaking the model would include excluding features or creating new combinations of features to include. Once satisfied, you can then run the model by clicking Process. This will start the machine learning system.
From the Main Sidebar, hover over Institutional Research.
Click Risk Scoring.
Next to the model that you wish to use, click View Analysis Results.
A graphical representation of each feature that is included in the model will appear. Use these graphs to check your data for accuracy.
At the bottom of the page, a graph will show you the expected weighting of each feature in regards to predicting risk. Use this graph to check your data for accuracy or to determine if the model should be modified.
If you are satisfied with the graphs and with the model, click Process.
Processing
Clicking Process will begin the machine learning algorithm. You will start seeing risk scores for your users within a week of starting the model.
Viewing Risk Scores
User risk scores are found on the individual user’s profile page. The risk score will be updated weekly as new information is added to the system through imports and through the user’s behaviors within the system.
There are two ways to navigate to a user profile – either from the main sidebar, or by using the search function at the top of the page.
First Option
From the Main Sidebar, hover over General.
Click Users.
Scroll through the list of users to find the user you need or
Use the look-up glass to search for the user.
Second Option
Use the search box at the top of the page to search for the user. To do so, type in the name of the user you want to see followed by in:users. This option is only available in the browser.
Once you’ve found the user you want to edit, click on their name.
You will see the user’s profile page. The information that you are able to see on this page is based on your permissions within the system. The risk score is located in the middle of the page.
Risk scores are color coded as follows.
0-40: Low Risk
41-65: Intermediate Risk
66-100: High Risk
Reports Available
To get to the reports related to risk scoring, from the Main Sidebar navigate to Institutional Research > Reports. All reports can be filtered and most can be memorized, scheduled, and downloaded as CSV, Excel or PDF. From the individual report, you may also be able to create a User Group or assign/unassign tags to users.
User Risk List
This report lists the current risk of every user in the system. It can be filtered by group, role, specific user, tag, risk assessment model, minimum risk and/or maximum risk.
User Risk History
This report lists the historical risk of every user in the system. It can be filtered by date range, group, role, specific user, tag and/or risk assessment model.