Computer Science Canada

Sorting Contest

Author:  md [ Wed Nov 07, 2007 8:37 pm ]
Post subject:  Sorting Contest

As inspired by rdrake's crappy assignment I am going to run a simple contest.

The Rules
* You can use any language you like, provided I can compile the source on linux. This unfortunately limits Turing unless someone wishes to make an effort to get OpenT running for me.
* You must provide source code
* My workstation will be used for judging.
* I shall post my solution at the end of the contest, as well as the winning entry.
* The contest shall start at 9:00 tonight and run until 12:00 tomorrow night. After that I shall test all programs and announce the winner (most likely friday night/saturday morning).
* The winner(s) shall be the person who can sort each data set fastest. There could be two winners.

The Goal
* You must provide one program capable of sorting single-byte values and one capable of sorting 2-byte integers (shorts, words, whatever).
* Your program must take a filename as the first argument. That file will contain bytes/shorts that are to be sorted. Your program should overwrite this file with sorted data.
* Your program must not use excessive amounts of memory. Reading the entire file into memory to sort it is cheating (and not practical).
* Your program must sort 256MB (give or take) worth of bytes/shorts, and then 1GB worth of bytes/shorts.


The Data
* Data will be generated for the final test with dd and /dev/urandom. Each program will be run with the same file; and the resulting file will be tested to ensure it contains the same numbers as the original. I will not provide such files for download as it's faster for people to generate their own test files.

The Prize
The winner will receive bits equal to the number of seconds faster then the next fastest program they are. This shall be limited to 360 bits. If anyone manages to beat my time they shall receive double bits.

Go Nuts!

Clarifications
* Data files are in binary, not text. One byte of the file is one byte to be sorted.
* There are four data sets to be sorted, a 256mb file containing single-byte values (unsigned char), a 256mb file containing 2-byte values(unsigned short), a 1gb file containing single-byte values (unsigned char) and a 1gb file containing 2-byte values (unsigned short).
* They are to be sorted from least to greatest.
* They are unsigned. So no negative numbers.

Author:  Nick [ Wed Nov 07, 2007 8:49 pm ]
Post subject:  RE:Sorting Contest

lmao i like the limitation on turing due to linux doesnt matter anyways because turing's max intake is 256 chars (even with multiple varibles asigned it would take longer to do making it harder to win)

Author:  CodeMonkey2000 [ Wed Nov 07, 2007 8:52 pm ]
Post subject:  RE:Sorting Contest

momop, do you even know what you are talking about?

Author:  Nick [ Wed Nov 07, 2007 8:55 pm ]
Post subject:  RE:Sorting Contest

Quote:
* Your program must sort 256MB (give or take) worth of bytes/shorts, and then 1GB worth of bytes/shorts.


that means that the intake must be longer than 256 chars

and iunno do u know what im talking about?

Author:  CodeMonkey2000 [ Wed Nov 07, 2007 9:00 pm ]
Post subject:  RE:Sorting Contest

You are sorting an array. Turing can definitely handle more than 256 elements.

Author:  Nick [ Wed Nov 07, 2007 9:31 pm ]
Post subject:  RE:Sorting Contest

OH!!!!!!!!!!!!!
i thought it gave u numbers and u sorted them :p geuss i didnt know what i was talking about

Author:  md [ Wed Nov 07, 2007 9:59 pm ]
Post subject:  RE:Sorting Contest

Technically you are sorting a file (or an array stored in a file if you prefer). Sorting a file is harder then sorting an array in memory; especially since loading the entire file into memory is both infeasible and against the rules.

I may also extend this to include Saturday if there are enough people who express an interest.

Author:  CodeMonkey2000 [ Wed Nov 07, 2007 10:12 pm ]
Post subject:  RE:Sorting Contest

So you would be continuously opening and closing the file for every iteration you go through?

Author:  Saad [ Wed Nov 07, 2007 10:16 pm ]
Post subject:  RE:Sorting Contest

Can you please extend the deadline to saturday? as it gives me more time to think

Author:  md [ Wed Nov 07, 2007 11:29 pm ]
Post subject:  Re: RE:Sorting Contest

CodeMonkey2000 @ 2007-11-07, 10:12 pm wrote:
So you would be continuously opening and closing the file for every iteration you go through?


Sure, how you do it doesn't really matter to me.

I give you a file containing 1073741824 bytes. You must sort that file so it's in order. You can do it in any way you want; so long as it ends up sorted from smallest to largest. Likewise for half as many shorts.

Author:  Clayton [ Wed Nov 07, 2007 11:45 pm ]
Post subject:  RE:Sorting Contest

Please, for the love of god, don't tell me you're going to use 5GB files too.

Author:  rdrake [ Wed Nov 07, 2007 11:50 pm ]
Post subject:  Re: RE:Sorting Contest

Clayton @ Wed Nov 07, 2007 11:45 pm wrote:
Please, for the love of god, don't tell me you're going to use 5GB files too.
He said 256 MB and 1 GB in the original post. I'd be surprised if he bothered running all the submissions on 5 GB files.

Author:  Clayton [ Thu Nov 08, 2007 12:00 am ]
Post subject:  RE:Sorting Contest

Yeah, well...

Clayton wrote:

(11:53:28 PM) Clayton{}: I'm going to use "I'm tired" as an excuse

Author:  md [ Thu Nov 08, 2007 8:38 am ]
Post subject:  RE:Sorting Contest

I'll extend this to Saturday at midnight; now you got no excuse clayton Wink

Author:  Saad [ Thu Nov 08, 2007 6:03 pm ]
Post subject:  Re: Sorting Contest

md @ Wed Nov 07, 2007 8:37 pm wrote:

The Goal
* You must provide one program capable of sorting bytes and one capable of sorting 2-byte integers (shorts, words, whatever).

The part underlined i don't understand, how many bytes in each value do you want?

Author:  md [ Thu Nov 08, 2007 6:13 pm ]
Post subject:  RE:Sorting Contest

Sorry, I have clarified the original post.

There are four data sets to be sorted, a 256mb file containing single-byte values (unsigned char), a 256mb file containing 2-byte values(unsigned short), a 1gb file containing single-byte values (unsigned char) and a 1gb file containing 2-byte values (unsigned short).

They are to be sorted from least to greatest.

Author:  zylum [ Thu Nov 08, 2007 10:54 pm ]
Post subject:  Re: Sorting Contest

http://compsci.ca/v3/viewtopic.php?t=10521 Laughing

Author:  md [ Fri Nov 09, 2007 8:32 am ]
Post subject:  RE:Sorting Contest

I should point out that the files to be sorted are binary, not text. And there has been one submission already.

Author:  bugzpodder [ Fri Nov 09, 2007 10:18 am ]
Post subject:  Re: Sorting Contest

[deletia]

md says: What part of "Contest" did you miss? Try not to give out solutions for others to copy; it makes the whole thing more fun for all involved.

Author:  klopyrev [ Fri Nov 09, 2007 1:28 pm ]
Post subject:  Re: Sorting Contest

I bet Knuth has a faster way to sort without reading the tapes more than once:P

Author:  md [ Fri Nov 09, 2007 4:14 pm ]
Post subject:  RE:Sorting Contest

Thanks bugz... where before it was left to people to figure that out on their own, now there is a solution they can just copy.

Helping people is good and all, but giving away most of one of the better solutions is just ass-hat-ery.

Author:  zylum [ Sat Nov 10, 2007 4:28 am ]
Post subject:  RE:Sorting Contest

well its the most obvious solution Razz

and i dont think my solution will work as it reads text not binary :S

Author:  md [ Sat Nov 10, 2007 9:14 am ]
Post subject:  RE:Sorting Contest

feel free to fix that part Wink

Also, it's only the obvious solution if you think of it. There are more then a few people on IRC that missed it.

Author:  md [ Sun Nov 11, 2007 1:04 pm ]
Post subject:  Re: Sorting Contest

Results

There were two entries, one from saad and one from zylum. I of course have written my own sorts to compare against.

results for sorting bytes:
code:

                        256MB                        1GB
md                      5.0s                         32.0s
saad                    10.1s                        48.1s
zylum                   SF                           SF



results for sorting shorts:
code:

                        256MB                        1GB
md                        -                           -
saad                      -                           -
zylum                    SF                           SF


I didn't run the sorts for shorts, since in prevous testing I've shown that my sort is faster then saads, and because zylums segfaulted again. Therefor... the winner is...

Saad

+360bits

Author:  wtd [ Sun Nov 11, 2007 1:11 pm ]
Post subject:  RE:Sorting Contest

The winning submission should be posted in source form.

Author:  Saad [ Sun Nov 11, 2007 1:20 pm ]
Post subject:  Re: Sorting Contest

Done in C++, compiled via g++ compiler

c++:
/* For this sort we will be calculating the number of each
byte occurs and then output each byte in the number of times it occurs
Data type changes this is for short int
*/


#include <fstream>
using std::ifstream;
using std::ofstream;
using std::ios;
int main(int argc, char **args){
    unsigned int count[65536];
    ifstream in;
    in.open(*(args + 1), ios::binary);
    // Initialise all the counters to 0
    for (int i = 0; i <= 65535; i++){
        count[i] = 0;
    }

    in.seekg(0, ios::end);
    register int max = in.tellg();    // Find the number of bytes in the file
    in.seekg(0,ios::beg);
   
    register unsigned long long int num;
    for (register unsigned int i = 0; i < max; i+= 8){
        // Read in bytes 8 at a time and then add one to that array location
        in.read((char *) &num, 8);
        (*(count + (num & 65535)))++;
        (*(count + ((num >> 16) & 65535)))++;
        (*(count + ((num >> 32) & 65535)))++;
        (*(count + ((num >> 48) & 65535)))++;
    }
    // Output the data
    ofstream out;
    out.open (*(args + 1), ios::binary);
    for (unsigned int i = 0; i <= 65535; i++){
        register unsigned int c = *(count + i);
        // Save constantly indexing the array
        if (c > 0){
            // Generate an array containing the number of bytes occured so we output that array at once instead of cosntantly calling write
            unsigned short int *a = new unsigned short int[c];
            for (unsigned int j = 0; j < c; j++)
                *(a + j) = i;
            out.write((char *) a, c << 1);
            delete [] a;
        }
    }
    // Close the streams
    in.close();
    out.close();
    return 0;
}


I dont know which source forum to post it to.

md asked me to post his so here it is

Author:  A.J [ Thu Jan 31, 2008 1:53 am ]
Post subject:  Re: Sorting Contest

erm.....................saad, is this similar to what i did in turing (at school)? cause i sorted it using your algorithm but i did not know what you mean by sorting bytes (and I never heard about compsci.ca until about a week ago)

plus I started turing recently so I needed to ask you if this is sort of what your algorithm was ?
code:

var stream, x : int
open : stream, "ENTER TEXT FILE NAME HERE.txt", get
var sort : array 1 .. (max value being tested) of int
% initialize the 'sort' array to 0 for all values
loop
    exit when eof (stream)
    get : stream, x
    sort (x) += 1
end loop
for i : 1 .. (max value being tested)
    for j : 1 .. sort (i)
        put i, " " ..
    end for
end for

Author:  Saad [ Thu Jan 31, 2008 5:38 pm ]
Post subject:  RE:Sorting Contest

Yes it is, although you should change the code so that it supports binary files. Its read/write in Turing.

Author:  A.J [ Thu Jan 31, 2008 7:15 pm ]
Post subject:  Re: Sorting Contest

could you give man example saad?

Author:  Saad [ Thu Jan 31, 2008 7:35 pm ]
Post subject:  RE:Sorting Contest

Turing:
var file : int
open : file, "foo.dat", write
for i : 1 .. 10
    write : file, i
end for
close : file
open : file, "foo.dat", read
for i : 1 .. 10
    var number : int
    read : file, number
    put number
end for
close : file


One advantage is to binary files is that binary files can be accessed at random location vs an ASCII (text) file which you need to access sequentially.


: