Computer Science Canada

PHP Diff Algorithm

Author:  PaulButler [ Thu May 17, 2007 5:56 pm ]
Post subject:  PHP Diff Algorithm

I wrote a simple diff algorithm in PHP. The functions are described in the comments; basically the "diff" function works directly on an array and the htmlDiff function is a wrapper that takes two normal strings and returns HTML markup representing the difference.

I assume that the "largest common substring" algorithm is the one I used, but I did not have any code or description to base it on so I had to figure most of it out myself. Considering diff algorithms are fairly useful and fun but challenging to code, I am surprised that there is so little information on them on the web.

php:

<?php

/*
        Paul's Simple Diff Algorithm v 0.1
        (C) Paul Butler 2007 <http://www.paulbutler.org/>
        May be used and distributed under the zlib/libpng license.
       
        This code is intended for learning purposes; it was written with short
        code taking priority over performance. It could be used in a practical
        application, but there are a few ways it could be optimized.
       
        Given two arrays, the function diff will return an array of the changes.
        I won't describe the format of the array, but it will be obvious
        if you use print_r() on the result of a diff on some test data.
       
        htmlDiff is a wrapper for the diff command, it takes two strings and
        returns the differences in HTML. The tags used are <ins> and <del>,
        which can easily be styled with CSS. 
*/


function diff($old, $new){
        foreach($old as $oindex => $ovalue){
                $nkeys = array_keys($new, $ovalue);
                foreach($nkeys as $nindex){
                        $matrix[$oindex][$nindex] = isset($matrix[$oindex - 1][$nindex - 1]) ?
                                $matrix[$oindex - 1][$nindex - 1] + 1 : 1;
                        if($matrix[$oindex][$nindex] > $maxlen){
                                $maxlen = $matrix[$oindex][$nindex];
                                $omax = $oindex + 1 - $maxlen;
                                $nmax = $nindex + 1 - $maxlen;
                        }
                }       
        }
        if($maxlen == 0) return array(array('d'=>$old, 'i'=>$new));
        return array_merge(
                diff(array_slice($old, 0, $omax), array_slice($new, 0, $nmax)),
                array_slice($new, $nmax, $maxlen),
                diff(array_slice($old, $omax + $maxlen), array_slice($new, $nmax + $maxlen)));
}

function htmlDiff($old, $new){
        $diff = diff(explode(' ', $old), explode(' ', $new));
        foreach($diff as $k){
                if(is_array($k))
                        $ret .= (!empty($k['d'])?"<del>".implode(' ',$k['d'])."</del> ":'').
                                (!empty($k['i'])?"<ins>".implode(' ',$k['i'])."</ins> ":'');
                else $ret .= $k . ' ';
        }
        return $ret;
}

?>

Author:  Somsakjohn [ Wed Jun 08, 2016 6:33 am ]
Post subject:  RE:PHP Diff Algorithm

I was very impressed that I had a message on it. Size and promised to use it.


: