Machine learning is changing every industry, from making cars autonomous to predicting stock market behaviors, and there’s no reason why the recruiting industry can’t leverage the benefits as well.
HackerRank’s new product, Tech Talent Matrix (TTM), is the first tool that uses machine learning to measure technical recruiting performance as well as provide actionable ways to deliver high-quality assessments and an excellent developer experience. Since our launch, a lot of people have expressed interest in learning more about how exactly we used machine learning in TTM. So, this post will delve deeper into the science behind the product.
We currently see developer candidates take 1 assessment every 8 seconds on HackerRank’s platform. During every assessment journey, there are hundreds of time-stamped events attached to it, from the time an invitation is sent to a candidate to the final submission.
When we looked at the hundreds of events per assessment, we knew that we had to narrow it down to the most relevant dimensions. For instance, email deliverability is an event that isn’t meaningful when measuring assessment performance.
My data science team and I worked with our customer success teams and conducted preliminary data analysis to carefully extract only events that are performance indicators and should be included in TTM. This resulted in us looking at, on average, just over 20 data points per assessment, giving us a massive database of over 150 million assessment and candidate data points.
Below we describe six of these key assessment data points we looked at to determine:
A company’s position in the matrix is the graphical representation of these two scores and determines the strength of their overall tech recruiting performance.
We have many metrics that go into the Assessment Quality Score. Below are three of the key metrics.
This data point provides insight into whether the assessments engineering managers have created are asking relevant questions (asking developers to build a weather app when the role they’re interviewing for requires them to build banking apps is an example of an irrelevant test question).
Test completion rates also reveal whether or not the assessment is the right level of difficulty. Junior developers are unable to complete tests that are for more experienced developers. On the other hand, senior developers feel as though their time is being wasted when they’re given easy tests.
The score distribution metric indicates whether or not engineering managers are administering the right tests to the right developers. If everyone does really well or really poorly on an assessment, that suggests the test is either too easy or too difficult. If everyone taking the test receives similar scores, then this would suggest that the assessment is not designed well enough to identify developers who are the best match for the role from the entire candidate pool.
Candidate feedback is an integral part of understanding tech recruiting performance. At the end of the day, the tech hiring market is a candidate-centric market, given that the demand for developers in every industry is continuing to grow exponentially. Thus, it’s crucial to ensure that candidates have the best interview experience possible.
Whenever developers take a HackerRank assessment, they’re prompted to rate their overall test-taking experience and provide granular feedback on specific questions as well as the environment in which they were assessed.
We have many metrics that go into the Candidate Response Score. Below are three of the key metrics.
This metric lets us know how well an email outreach is received. If candidates aren’t even opening the email, then this is a signal that recruiters aren’t sending invitations to relevant candidates and/or the company doesn’t have a strong tech talent brand recognition.
Click rate provides insight into how well recruiters are communicating with candidates. If the message in the email is not aligned with expectations, then candidates are less likely to click to get to the test.
The test attempt rate metric provides a deeper insight into how well recruiters align with candidates. If the assessment is not as how it was described by recruiters, then candidates are less likely to attempt it and give up before even attempting the assessment.
For the Assessment Quality Score and the Candidate Response Score, we looked at test completion rates, invite email open rates, test click rates, and test attempt rates across our entire customer base and across all tests they administered. We worked with our customer success team to establish what a healthy rate looks like. Afterwards, we applied clustering techniques to cluster similar distributions, followed by a variety of machine learning techniques such as XGBoost to classify distributions.
With the test score distribution for the Assessment Quality Score, we modeled score distribution by considering, for a given test, how well all candidates scored on that test. We looked for whether they were all clumped together, whether there was a good spread, and many other aspects of how a score distribution might behave. The team identified a number of good and bad tests, based on how well they performed at selecting the best candidates, and then used machine learning to classify whether a score distribution was more likely to be good or bad.
In order to compute the candidate feedback rating for the Assessment Quality Score, we looked at whether we needed to take into account how well a developer performed. For example, would candidates who performed poorly be more likely to give a low rating? This turned out not to be the case. We used clustering and regression models to understand performance bias, of which there was very little, and also to compute what a healthy score distribution should look like.
With the help of machine learning, we were able to sift through millions of data points and find the most important indicators of excellent tech recruiting performance as well as provide actionable insights on how to improve performance. Of course, machine learning requires human evaluation as well. We did a rigorous evaluation of our model, collaborated with customer success, and spoke with our customers to ensure that our results made sense. The product of this collaboration is what you see today.
Sofus Macskassy is the VP of Data Science at HackerRank, where he is on a mission to match every developer to the right job. He has spent the last 3 decades building expertise in machine learning, artificial intelligence, data science, and user modeling. Before HackerRank, he led teams at Branch Metrics and Facebook. He holds a Ph.D. in Machine Learning.