How Plagiarism Detection Works at HackerRank
Last year, as many as 20% of computer science students at Stanford were flagged for possible cheating. Likewise, of all the code violations that occurred at Brown last year, half of them were related to plagiarizing in computer science.
As take-home coding challenges are increasingly becoming the standard for how companies evaluate developer candidates, we often hear the question: How do you make sure developer candidates aren’t plagiarizing?
After all, it’s no secret that some websites even post the answer to common coding challenges.
At HackerRank we take a dual approach to plagiarism: Proactive and reactive.
Proactive Plagiarism Detection
To help mitigate plagiarism, we have a large team of content challenge curators who are continuously building out our library of 300+ coding challenges, 100+ role-based (or tech-specific) challenges, and 1,000+ multiple choice question types.
These questions range in difficulty and technical skill-set which enables the user to clearly screen candidates based on the specific skills they need. This includes questions that test problem-solving skills, role-specific challenges, and domain-specific challenges (like Databases, DevOps).
We regularly scrape the web using an internal tool to check for duplicate questions and test answers that have been leaked or shared. If a possible match is found, then it will automatically be flagged, reviewed by an internal team and the question will be archived. A warning flag will be placed and all test posters will be notified to ensure that question is not used as part of an interview again.
Reactive Plagiarism Detection
We use two algorithms to detect possible plagiarism – Moss (Measure of Software Similarity) and String comparison. Moss is an automatic system that determines the similarity of programs. All coding challenge answers undergo both String comparison and Moss to find similarities between code.
Since it was first developed in 1994, MOSS has been very effective in detecting plagiarism in programming classes and tests. However, it’s important to note that Moss–alone–is not a completely accurate tool to detect plagiarism.
According to the Merriam-Webster dictionary, the definition of plagiarize, plagiarized and plagiarizing is “to steal and pass off (the ideas or words of another) as one’s own : use (another’s production) without crediting the source.” So while the Moss system will determine code similarity, it will not know WHY the codes are similar. Therefore it is important for a person to go and look at the parts of the code that Moss highlights and make a decision on whether or not the code was plagiarized.
Cheating and plagiarism can’t be fully prevented but we are taking the proactive and reactive steps to stay ahead. By taking these measures, more often than not, test takers will be discouraged from cheating.
What our plagiarism flag is NOT meant to be
Our plagiarism flag is an indicator that someone has copied the code. While we can flag duplication, it’s hard for us to explain the exact reason for the similarity. Our plagiarism detection should be viewed as a way to save time and point out cases that are worth more detailed examination.
We recommend that hiring managers and technical recruiters should review the highlighted code to make a decision if this is an actual case of plagiarism or not. We do not recommend auto-rejecting a candidate based on the plagiarism flag!