Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 RegisterRegister   
 Code "Handwriting" analysis
Index -> Off Topic
View previous topic Printable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message
mirhagk




PostPosted: Wed Jan 25, 2012 8:47 pm   Post subject: Code "Handwriting" analysis

This came up on a show I watched (Numbers) and I'm curious as to whether it'd work? Basically the concept is that a programmer generally names variables and functions similarly with all of their programs. For instance I name variables using lower camel case, and functions with upper camel case. Different programmers use different methods. Not only that but variable names themselves (are lists pluralized, do loops use i,j,k or x,y,z?)

There's possibly enough in the code itself as well (people generally use familiar methods for a particular problem, and there are many ways to approach a particular problem. Each person may do it a different way, and approach similar problems with similar code)

I wonder if there's enough to actually match a program to their programmer who made it? A program that did this could be invaluable in many circumstances (was a particular virus written by the same person as a different virus, did a student write a particular function, or steal the code?) Does anyone think this is perhaps possible? If so it might be an interesting idea for a thesis project, and you could study classic handwriting analysis as well, because most of the methods probably carry over.
Sponsor
Sponsor
Sponsor
sponsor
DemonWasp




PostPosted: Wed Jan 25, 2012 9:24 pm   Post subject: RE:Code "Handwriting" analysis

That might work better if most projects / languages didn't have well-defined coding standards. You might be able to identify people out of a very small set if they were permitted to use their own style, but it may be much harder if they have to follow a common style.

A counterpoint would be that I can identify some of my coworkers' code by their spelling mistakes (things like "porject" and "likned"). I'm not sure if that guy is dyslexic or if he just doesn't care, though based on his code quality, I'm guessing the latter.

For a practical software team, I recommend "svn blame" or the equivalent. Figure out who wrote that crappy code the reliable way.

In any case, not useful for identifying virus writers the way you describe, since viruses usually have symbols stripped. However, virus writers are often identified by signatures deliberately left in the virus binaries.

Determining whether a student plagiarized a function can usually be done with Google.
mirhagk




PostPosted: Wed Jan 25, 2012 10:44 pm   Post subject: RE:Code "Handwriting" analysis

Well even without symbols could can still be done with different styles. It'd be much harder, but I think theoretically possible.
Display posts from previous:   
   Index -> Off Topic
View previous topic Tell A FriendPrintable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 1  [ 3 Posts ]
Jump to:   


Style:  
Search: