Computer Science Canada

Help to construct Suffix Arrays in O(nlogn) or better

Author:  xander7b [ Fri Aug 26, 2011 10:14 am ]
Post subject:  Help to construct Suffix Arrays in O(nlogn) or better

I understood what Suffix Arrays are, but the best implementation I was able to come up with (which took only few minutes of coding) was O(n^2 log n) [Attached].

Can someone please explain in simple words how to construct Suffix Arrays in O(n log^2 (n) ) or better ?

Please it would be really helpful if it was in simple words with some small examples, I tried googling but what I found were many PDFs\PPTs which were either too short, or were written in somewhat difficult technical language without any examples or simple pseudo-code, please atleast give me some simple explanation which would allow me to be able to understand them.

Thanks a lot Smile

Author:  DemonWasp [ Fri Aug 26, 2011 4:30 pm ]
Post subject:  RE:Help to construct Suffix Arrays in O(nlogn) or better

I'm not sure how you're getting O ( n log ^2 ( n ) ) from your code. I'm seeing O ( n + n log n ) = O ( n log n ), which is what I would have expected.

Author:  xander7b [ Fri Aug 26, 2011 4:50 pm ]
Post subject:  Re: RE:Help to construct Suffix Arrays in O(nlogn) or better

DemonWasp @ Fri Aug 26, 2011 11:30 pm wrote:
I'm not sure how you're getting O ( n log ^2 ( n ) ) from your code. I'm seeing O ( n + n log n ) = O ( n log n ), which is what I would have expected.


I am getting O( (n^2) log(n) ) [which is what I mentioned before] because I am using the STL sort which is n log n but at each stage the comparison takes n steps at most (length of the String) instead of O(1) as it would be with primitives, so its n^2 log n as I see it, which is pretty slow compariung to the O(n log n) or even O(n) algorithms which exist.

Author:  Brightguy [ Sat Aug 27, 2011 12:23 pm ]
Post subject:  Re: Help to construct Suffix Arrays in O(nlogn) or better

There's a really clever way to do this which is like a radix sort with only log(n) passes. Basically, you set up your n suffix strings, sort them by the 1st character, then by the first 2 characters, 4 characters, 8 characters, etc. The trick is that each pass can be done efficiently because you've actually already done most of the work.

For example, when you have the array sorted by the first k characters you know the proper ordering of all strings except for those with the first k characters equal. Now, on the next pass you want to order these strings by their first 2k characters. Since their first k characters are the same, this is equivalent to sorting the next k characters after the first k. But in fact you've already done this: if you remove characters from the beginning of a suffix, you get another suffix. So those next k characters that you want to sort were already sorted by the previous pass, and you can use this info to "bootstrap" the next pass.

Author:  DemonWasp [ Sat Aug 27, 2011 10:28 pm ]
Post subject:  Re: RE:Help to construct Suffix Arrays in O(nlogn) or better

xander7b @ Fri Aug 26, 2011 4:50 pm wrote:
...at each stage the comparison takes n steps at most (length of the String) instead of O(1) as it would be with primitives...


While that's true, it doesn't necessarily make your compare method into O(n), because you can quit the loop (or tail recursion) early. If your string includes a typical distribution of characters for English, you can figure that most strings will be differentiated by at most a constant number of characters. Your worst case is actually going to be O ( min ( sizeA, sizeB ) ), but you can probably just cross off that term entirely as being a "constant factor". Without knowing what your string will look like ahead of time, it's difficult to do a more accurate analysis.

Author:  mirhagk [ Sat Aug 27, 2011 11:18 pm ]
Post subject:  RE:Help to construct Suffix Arrays in O(nlogn) or better

@brightguy I'm pretty sure I implemented this in real life last week. At a camp we were playing some dumb game where you run an obstacle course, then look at 6 pylons that have 6 pieces of tape on them, then run back, and organize a copy of the pylons into the same order.

The awesome part was that we know which terms the pylons differed by (for instance with the top row of tape there was 3 white, 2 blue and 1 red). This let my team solve the event with only 2 people running, the first person memorizing 5 things, and the second person memorizing 3, instead of the normal 6 people memorizing 6 colours.

Author:  Brightguy [ Sat Aug 27, 2011 11:45 pm ]
Post subject:  Re: RE:Help to construct Suffix Arrays in O(nlogn) or better

DemonWasp @ Sat Aug 27, 2011 10:28 pm wrote:
but you can probably just cross off that term entirely as being a "constant factor".

Au contraire, a general runtime should hold in all cases. Though you could specify an average-case analysis over some distribution of inputs.

Author:  xander7b [ Sun Aug 28, 2011 1:14 am ]
Post subject:  RE:Help to construct Suffix Arrays in O(nlogn) or better

@DemonWasp The input I am forced to deal with here is made by an adversary, its made to test the algorithm to the extremes, its this spoj problem: https://www.spoj.pl/problems/SARRAY/
(which ain't itself the main challenge, but other problems which involve suffix arrays too, and they are all over the place).

@Brightguy I finally implemented the O(n log^2 n) method, which is the O(n log n) method with using O(n log n) sort instead of the O(n) radix sort, thats not a problem at all now, I scored a 60 in the spoj problem which was what I wanted (6 of the 10 cases ACed, the rest TLEd but thats okay, I'll optimize it soon, but need to make sure I got some stuff first).

I still feel the algorithm is sort of counter intuitive tho, specially the doubling part, but am happy it really worked finally.

Thanks a lot for all the support guys Smile

Edit: I attached my current code.
(the log2 call in the beginning I did it to make sure the array size is a power of two, without that I got RE (Runtime Error) in some of the test cases).


: