// you’re reading...


Ways to spot plagiarized code

Copying code for Computer Science assignments continues to be a recurring problem, even though it’s very easy to get caught. To demonstrate this point in practise, a real world example has recently surfaced on the forums — “i need help!please fix my battleship game!!!i dont know what to do!!”[sic].

I previously mentioned that programming has a handwriting like signature that could be analyzed. This could be matched to the previously submitted programs, but even better — when a program is mashed together out of copied parts, the inconsistencies really show through. It is those inconsistencies in the coding style that easily give away the plagiarism offence.

different signatures in function namesFunction naming conventions. There are different styles to naming functions. Descriptive words, capitalization, use of underscores, use of parameters. Good use of helper methods in some places, but repetition of code in others also signals an inconsistency of function use.

different signatures in variable namesVariable naming conventions. Notice the degeneration of variable naming quality. From PicBackGround to picback — there’s a loss of CamelCase format and descriptiveness, while both variables are used for the same purpose. game/gam — inconsistency in spelling. The use of really obscure variables, such as d is quite a bit off from very descriptive variables, such as the first PicBackGround. Declaration of variables that are never used in the program should also prompt questions.

different signatures in comments of codeComments describe more than code. Same as before, there are obvious formatting styles to comments. Capitalization of words, indentation, use of comment characters. The distribution of comments is also telling — some sections of code could be lacking necessary comments, while others would have more than needed amount of description. The vocabulary, spelling, and grammar used in the comments is also very reflective of the author. “user is playing” vs. “the player lose” should raise flags.

different signatures in strings of codeString literals, similar to comments found in code, bring the plagiarism fight back into the realm of the English class. All of the words used in a program, form a sort of an essay. Vocabulary, spelling, grammar, punctuation — it all assesses the author, on a level independent of code, logic, and algorithms.

And more! Outside the scope of this example, there’s also Math, preferences toward some functions over others, structure of logic, algorithms, and other technical details. The point here is simple — don’t plagiarize code, it’s too easy to get caught.

Read more


  1. Posted by cbright | November 12, 2007, 3:41 pm

    And of course, code can be parsed — so for example if someone tries the old copy-paste trick and renames all the variables, although the code might superficially look different, it will actually generate the exact same parse tree. Even if the ordering a few statements were changed they will still be very similar. I suspect most universities have automated tools to parse submitted assignments and check for these similarities.

    Reply to comment

  2. Posted by agnivohneb | November 14, 2007, 12:45 pm

    Another way is US and Canadian spelling differences like color and colour. In some programing languages this is an option (like turing). Someone writing code from scratch will only use one way of spelling (for me color) and if a code has been copied that used the other way of spelling then you found it.

    Reply to comment

Post a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>