Christopher Stare, PhD - Imbellus Portfolio

Pathogen Spread Tutorial Iterative Design

@ Imbellus

Project Objective

Refine the tutorialization of the game-based, problem-solving hiring assessment, Pathogen Spread, so that users are fully equipped to complete the assessment without confusion on the first try.

Methodology

Evaluative - Qualitative

Remote Moderated Usability Studies

User Interviews

A/B Testing

Study Overview

Pathogen Spread is a problem-solving assessment in which users must engage in inductive reasoning to determine the rules governing the spread of an epidemic in an animal population. Because this assessment was used in high-stakes hiring environments, proper tutorialization was critical to ensure scores were reflective of assessment performance under full user understanding of the task, tools, and objectives.

Using an iterative research and design approach, I collaborated with a team of learning scientists, data scientists, game designers, and project managers to refine the tutorial on a week-by-week basis.

Every week, user pain points were discussed amongst the team, and we developed specific research questions addressing theorized roadblocks. A series of 5-15 moderated usability tests were then conducted, after each of which a semi-structured user interview was administered. All usability studies and interviews were loosely structured around the week's research questions. When faced with a dichotomous set of proposed design solutions, A/B tests would occasionally be employed to test their efficacy against one another.

At the end of the week, we synthesized and discussed the findings of our research, and design changes were implemented based on these findings. Testing of these design changes was then included in the research questions of the following week.

Tutorial prompts from Pathogen Spread

Implemented Findings

Contextualizing instructions by providing the broadest user goals first, then explaining the more detailed rules of the assessment, was found to greatly increase user understanding of the scenario.

Information sticks better when users are prompted to perform actions rather than simply read about them. Instruction during a practice scenario provided the most clarity for users.

Forcing a highly-structured user journey by locking certain functionalities during tutorialization provided the best context for focused and effective learning.

Placing emphasis on where users could later refer to tutorial information provided more self-sufficiency during the assessment.

Various minor word choices and small details within example scenarios often created surprisingly outsized differences in understanding.

Modern Bold Swiss Elements Colorful Circles

Modern Bold Swiss Elements Colorful Grid

Pathogen Spread

Iterative Design

@ Imbellus

Project Objective

Refine the user interface of Pathogen Spread so that users can complete the assessment with minimal confusion, while using the tools necessary for scoring, and while engaging in the cognition being measured by the assessment.

Methodology

Evaluative - Qualitative

Remote Moderated Usability Studies

User Interviews

A/B Testing

Study Overview

This study utilized the same iterative research and design approach and methods described in the above project: several usability studies, A/B tests, and semi-structured interviews were conducted each week to address predefined research questions centered around user pain points and recent design changes.

Because this product aimed to assess users’ thought processes as they solved problems, it was imperative that – in addition to finding pain points – we were careful to gain very specific insights into the users' reasoning as they completed their assessments. Special care was taken to observe whether users were using heuristics, shortcuts, or alternate forms of reasoning to solve the problems.

Additionally, interactions with specific user interface elements were recording for scoring purposes, so it was essential that the assessment was designed in a way to incentivize their use.

Tool for users to filter data points using AND and OR logic

Tool for users to display geographical information

Implemented Findings

Some users were able to use a specific heuristic to quickly solve part of the problem based on the way the data in the assessment was categorized; this data was revisited and reorganized to make solutions less obvious and gameable.

One of the UI elements – a data filtering tool – was going unused. Because some of the assessment scoring was dependent on users’ interactions with this tool, many design iterations and A/B tests were conducted to optimize this tool to make it both more intuitive and crucial for problem-solving.

Users were finding correct solutions to certain assessment forms that differed from the designed "correct" solution. This discovery led to a large scale reconfiguration of how assessment data sets were generated.

Student Looking Over to Copy from a Classmate

Environmental Placement

Coachability RCT

@ Imbellus

Project Objective

Determine whether or not the Imbellus stealth assessment, Environmental Placement, is susceptible to cheating and/or coaching guides.

Methodology

Evaluative - Quantitative

Surveys

Randomized Control Trial

Study Overview

This assessment functions by scoring users on a variety of metrics that are not immediately obvious to users based on the objectives they are given. However, given the high-stakes nature of these assessments, the proliferation of public-facing assessment preparation guides revealing the "secrets" to scoring well was inevitable, and this project aimed to analyze the potential for such coaching and cheating.

In this study, I compared three groups in a randomized control trial: one that received the assessment without any advice, one that received the assessment with test-specific advice, and one that received generic test-taking advice (e.g. "keep an eye on your remaining time," "re-read your submission before submitting it," etc.).

However, in order to generate advice that would realistically be created and disseminated by users without insider scoring knowledge, the first step in the study involved the distribution of a post-assessment survey to an initial batch of test-takers, in which users were probed for their opinions on how to most effectively complete different parts of the assessment and score well on them. After I synthesized the responses and compiled the most common ones, two other learning scientists and I adjudicated which pieces of advice were the best and which specific scoring metrics we predicted they would impact.

Findings

While several pieces of user-generated advice were indicative of user insight into the goals of the stealth assessment, no advice provided any legitimate means of "gaming" the assessment with heuristics or shortcuts that had not already been accounted for in the design of the assessment.

The statistical analysis* demonstrated that there was no significant difference between the groups in the outcome measures we predicted, indicating that the assessment was resistant to user-generated advice.

After running exploratory analyses on all scoring metrics, it was found that the experimental group scored better on one metric, but because of multiple comparisons, a lack of a priori predictions, and no obvious connection between the metric and the advice, the finding was simply noted as an area for future investigation.

*Data cannot be displayed as all scoring metrics and methodologies are proprietary information.

Modern Bold Swiss Elements Orange Asterisk

Pathogen Spread

Coachability RCT

@ Imbellus

Project Objective

Determine whether or not the Imbellus stealth assessment, Pathogen Spread, is susceptible to cheating and/or coaching guides.

Methodology

Evaluative - Quantitative

Randomized Control Trial

Study Overview

This project nearly-identically mirrored the above coachability study in terms of aims and methods, but with regards to the assessment Pathogen Spread.

I again used a randomized control trial, but instead of outsourcing user-submitted coaching advice for the experimental group, such advice in this study was created by the test developers themselves (myself included). Because the prior study showed no effect of advice, I felt comfortable examining the effects of the best possible advice for this assessment (short of sharing proprietary scoring information), especially since in-house advice required fewer resources than collecting, analyzing, and selecting user-generated advice.

Findings

As with the previous study on Environmental Placement, statistical analysis* demonstrated that there was no significant difference between the groups in the scores we predicted, indicating that the assessment was resistant to coaching.

*Data cannot be displayed as all scoring metrics and methodologies are proprietary information.

Pathogen Spread

Equivalency RCT

@ Imbellus

Project Objective

Ensure that the variants of the Pathogen Spread assessment that use different animals and datasets do not elicit different scores.

Methodology

Evaluative - Quantitative

Randomized Control Trial

Study Overview

In order to reduce cheating and information sharing between users in this high-stakes cognitive assessment, it was critical that several forms of each test were developed to mask the similarities of the assessments completed by each user.

While superficial components of the assessment - such as the type of animals involved in the scenario - could easily be changed out without affecting the underlying cognition required to perform the task, extra care needed to be taken to ensure any new changes were not impacting user scores in unexpected ways. One version of the assessment being slightly more difficult than another would undermine any claims of fairness we could make about our product.

Two variables were simultaneously manipulated in a randomized control trial in order to provide new variants for Pathogen Spread. Because new sets of numbers were likely to provide the most resistance against answer sharing, three new data sets were created that followed the same logic as the data sets in originally tested assessment. Additionally, two new animals names - a factor that has no bearing on the assessment logic - were introduced. Therefore, I used a 2x3 design to create six groups that varied on these two factors at each level. The scores of each of these groups were analyzed against each other and against our wealth of scoring data from the existing versions of the assessment.

Findings

The statistical analysis revealed no differences in scores between the versions of the test with varying animals used and varying datasets.

The results of this study indicated that the six new versions of the assessment could be added to the catalogue of assessments for distribution to applicants.