Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 RegisterRegister   
 interesting problem
Index -> General Programming
Goto page 1, 2  Next
View previous topic Printable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message
zylum




PostPosted: Fri Aug 12, 2005 11:16 pm   Post subject: interesting problem

You are part of a team that is working on a piece of software to handle text compression. Your team has designed the compression algorithm as follows:

To compress a given string of text, two strings, each 3 characters in length, will be chosen as compression keys. The strings may contain any combination of capital letters and/or spaces. Then, a compressed string will be generated from the original such that replacing "*1" in the compressed string with the first string and replacing "*2" with the second string will recreate the original text.

For example, if the two compression keys are "FOO" and "BAR", then the compressed string "*2X*1" would decompress to "BARXFOO", and "*1*1 *2" would decompress to "FOOFOO BAR".

You have been tasked with writing a proof of concept implementation to test the effectiveness of this compression scheme. You will be given a String original. Your goal is to optimally select the two compression keys, and generate the compressed text to be as short as possible. You are to return the length of the shortest possible text.


Examples:

0)

"BARXFOO"
Returns: 5
The first example from the problem statement.


1)

"FOOFOO BAR"
Returns: 7
The second example from the problem statement.


2)

"ABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOPQRSTUVWXY"
Returns: 46
It's a long string, but no 3-character substring appears more than twice.


3)

"QWERTYUIOPASDFGHJKLZXCVBNM"
Returns: 24
Here, no substring repeats itself at all. The best we can do is to pick any two substrings to replace.


4)

"BBAAAABBBBAAABABABBBABABAAABABABAAABBAABABBBABAAAB"
Returns: 40



hint: there are 26^7 possible combinations of two keys which is too many to test.


note: this problem if from topcoder.com
Sponsor
Sponsor
Sponsor
sponsor
bugzpodder




PostPosted: Sat Aug 13, 2005 4:53 pm   Post subject: (No subject)

hint to hint: it would be stupid to do all 26^7 combinations, (i dont even know where the 26^7 comes from, isnt it 26^6)... all you need to do is pick two keys that exists at least once in the string. so thats only about 200^2 (if the length of string is 200). and 40000 is reasonable. so all that is left to do is grunt work... what is the limit on the string length anyways?
wtd




PostPosted: Sat Aug 13, 2005 9:38 pm   Post subject: (No subject)

This seems to work reasonably well.

code:
module Compression where

import List
import Char

allPossibleKeys len str
  | length str < len = []
  | valid candidateKey = candidateKey : allPossibleKeys len (drop 3 str)
  | otherwise = allPossibleKeys len (tail str)
  where
    valid = all validChar
    validChar ch = isUpper ch || isSpace ch
    candidateKey = take len str
   
firstKey len = head . allPossibleKeys len

count _ [] = 0
count v (x:xs)
  | v == x = 1 + countRest
  | otherwise = countRest
  where
    countRest = count v xs
   
rankedKeys len str =
  sortBy (\k1 k2 -> snd k2 `compare` snd k1) $ map (\k -> (k, count k keys)) $ nub keys
  where
    keys = allPossibleKeys len str

strReplace _ _ "" = ""
strReplace "" _ str = str
strReplace s1 s2 str@(x:xs)
  | lenS1 > length str = str
  | s1 `isPrefixOf` str = s2 ++ strReplace s1 s2 (drop lenS1 str)
  | otherwise = x : strReplace s1 s2 xs
  where
    lenS1 = length s1
    lenS2 = length s2
 
compress len str =
  (key1, key2, secondTransform)
  where
    (key1, _) = head $ rankedKeys len str
    firstTransform = strReplace key1 "*1" str
    (key2, _) = head $ rankedKeys len firstTransform
    secondTransform = strReplace key2 "*2" firstTransform
   
decompress key1 key2 compressedStr =
  strReplace "*2" key2 $ strReplace "*1" key1 compressedStr
zylum




PostPosted: Sat Aug 13, 2005 11:14 pm   Post subject: (No subject)

bugzpodder wrote:
hint to hint: it would be stupid to do all 26^7 combinations, (i dont even know where the 26^7 comes from, isnt it 26^6)... all you need to do is pick two keys that exists at least once in the string. so thats only about 200^2 (if the length of string is 200). and 40000 is reasonable. so all that is left to do is grunt work... what is the limit on the string length anyways?


strings are 1-50 characters long and consist of upper case letters and space characters...

wtd >> if you give a solution in java / c++ / c# / vb.net then i could varify your answer...

btw it's not as easy as it seems...

hint2: even if you do have the correct keys, you still have to figure out what the shortest possible string the original string can be compressed into using these keys... the examples do not have any cases which will display this...
wtd




PostPosted: Sat Aug 13, 2005 11:50 pm   Post subject: (No subject)

code:
C:\> ghci Compression
   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.4, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base-1.0 ... linking ... done.
Compiling Compression      ( Compression.hs, interpreted )
Ok, modules loaded: Compression.
*Compression> let s = "BBAAAABBBBAAABABABBBABABAAABABABAAABBAABABBBABAAAB"
*Compression> length s
50
*Compression> let ((key1,_),(key2,_),str) = compress 3 s
Loading package haskell98-1.0 ... linking ... done.
*Compression> str
"BB*2ABBBB*2*1ABB*1AB*2*1AB*2BBAA*1B*1*2B"
*Compression> length str
40
*Compression> let dstr = decompress key1 key2 str
*Compression> dstr
"BBAAAABBBBAAABABABBBABABAAABABABAAABBAABABBBABAAAB"
*Compression> length dstr
50
*Compression> s == dstr
True
*Compression> key1
"BAB"
*Compression> key2
"AAA"
*Compression>
wtd




PostPosted: Sat Aug 13, 2005 11:56 pm   Post subject: (No subject)

zylum wrote:
wtd >> if you give a solution in java / c++ / c# / vb.net then i could varify your answer...


Yes, but it's so much easier to write it in Haskell. Smile
zylum




PostPosted: Sun Aug 14, 2005 12:35 am   Post subject: (No subject)

i suppose... but i don't even understand the code you wrote so even i couldnt tell you whether it might pass all test cases. maybe you could explain the algorithm you used?
wtd




PostPosted: Sun Aug 14, 2005 1:38 am   Post subject: (No subject)

Create the Compression module.

code:
module Compression where


Import functions from the Char and List modules.

code:
import List
import Char


The allPossibleKeys function finds, well... all possible keys.

code:
allPossibleKeys len str
  | length str < len = []
  | valid candidateKey = candidateKey : allPossibleKeys len (drop 3 str)
  | otherwise = allPossibleKeys len (tail str)
  where
    valid = all validChar
    validChar ch = isUpper ch || isSpace ch
    candidateKey = take len str


This function takes two arguments: the length of the keys, and the original string.

It uses guard conditions, local functions, and recursion.

Obviously, if the string isn't long enough to represent a key, there are no valid keys. Return an empty list.

Otherwise, if the candidate key is valid, return that candidate, tacked onto the result of evaluating the function on the rest of the string.

The candidate key is the first three characters of the input string.

The validChar function tests to see if a character is an uppercase letter or space.

The valid function tests to see if all characters in a string conform to the rules set in the validChar function. This uses partial function application and foregoes an explicit function parameter.

Otherwise, if there are enough characters, but they don't form a valid key, drop a single character from the beginning of the input string and search it for keys.

The firstKey function is an example of function composition. The string as the second parameter is inferred. It gets the first key from the list of all possible keys using the standard "head" function. The "." is the Haskell function composition operator.

code:
firstKey len = head . allPossibleKeys len


The count function is a straightforward recursive function.

code:
count _ [] = 0


Regardless of what you're counting, there aren't any in an empty list.

code:
count v (x:xs)
  | v == x = 1 + countRest
  | otherwise = countRest
  where
    countRest = count v xs


For a non-empty list, if the value you're looking for equals the first element in the list, add one to the result of counting the rest of the list. Otherwise, just count the rest of the list.

The "rankedKeys" function first finds all possible keys.

code:
rankedKeys len str =
  sortBy (\k1 k2 -> snd k2 `compare` snd k1) $ map (\k -> (k, count k keys)) $ nub keys
  where
    keys = allPossibleKeys len str


Then it uses the standard "nub" function to get unique keys, and for each of those maps its count in the original list of keys.

It then uses these counts to sort the list, such that keys which appear most frequently are first.

The "strReplace" function is a pretty straightforward way to replace one string with another in a larger string.

Now, since we can get a list of possible keys, sorted by the number of times they occur in a string, we can find the most common key just by taking the first one using the "head" function.

code:
compress len str =
  (key1, key2, secondTransform)
  where
    (key1, _) = head $ rankedKeys len str
    firstTransform = strReplace key1 "*1" str
    (key2, _) = head $ rankedKeys len firstTransform
    secondTransform = strReplace key2 "*2" firstTransform


That's exactly what I've done with:

code:
    (key1, _) = head $ rankedKeys len str
    firstTransform = strReplace key1 "*1" str


Now, since I have to assume that I've changed the string, I search the string again, and find the new best candidate, then do another string replacement to get the final compression. The function returns both keys and the compressed string in a tuple.
Sponsor
Sponsor
Sponsor
sponsor
zylum




PostPosted: Sun Aug 14, 2005 2:26 am   Post subject: (No subject)

wow, that was really confusing...

by explaining your algorithm i meant something more like:

"First we generate all possible 3 letter keys. Then try every combination of two keys that exist in the string and see which combination reduces the original string the most"

anyhoo, what does your solution return for 'AABBAABBBAABBA' ?
wtd




PostPosted: Sun Aug 14, 2005 4:48 am   Post subject: (No subject)

It actually breaks my solution, since my solution doesn't test to make sure there are two valid keys. The first key it goes for is "AAB". Once that's replaced with "*1", though, there are no further 3 character keys left.
wtd




PostPosted: Sun Aug 14, 2005 5:24 am   Post subject: (No subject)

And about my explanation... well, I thought that if the code itself were explained a bit, the algorithm would be fairly apparent.
bugzpodder




PostPosted: Sun Aug 14, 2005 9:04 am   Post subject: (No subject)

hey zylum, why not try a more interesting problem? like those three in the IOI practice set! They look fantastically good!
zylum




PostPosted: Sun Aug 14, 2005 9:35 pm   Post subject: (No subject)

bugzpodder wrote:
hey zylum, why not try a more interesting problem? like those three in the IOI practice set! They look fantastically good!


sounds good! can you post a link?

btw here's my solution to who ever is interested..

code:
import java.util.*;
public class CompressionText {
        public int shortestLength(String original) {
                int s = original.length();
                Vector keys = new Vector();
                for (int i = 0; i <= original.length()-3; i++)
                        if (!keys.contains(original.substring(i, i + 3)))
                                keys.add(original.substring(i, i + 3));
                for (int i = 0; i < keys.size(); i++)
                        for (int j = 0; j < keys.size(); j++)
                                s = Math.min(s, Search(original, (String) keys.get(i), (String) keys.get(j)));   
                return s;
        }
        int Search (String s, String k1, String k2) {
                return Math.min(s.indexOf(k1) < 0 ? s.length() : Search(s.replaceFirst(k1, "xx"), k1, k2),
                                s.indexOf(k2) < 0 ? s.length() : Search(s.replaceFirst(k2, "xx"), k1, k2));
        }
}


first i generate all keys by spliting the string into 3 letter substrings... i add each new key to a key vector making sure i dont add repeating keys. then i loop through all two key combinations. for each combination, i recursively replace the keys from the original string try all the ways i can replace the keys (some orders of replacing are better than others). the recursive function returns the length of the sortest possible string that can be generated by the two keys.
bugzpodder




PostPosted: Mon Aug 15, 2005 10:45 am   Post subject: (No subject)

zylum, see the link posted by prof cormack (gvc) in the contest forum
MysticVegeta




PostPosted: Tue Aug 16, 2005 1:24 pm   Post subject: (No subject)

I dont get the question... please use simpler language??
Display posts from previous:   
   Index -> General Programming
View previous topic Tell A FriendPrintable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 2  [ 21 Posts ]
Goto page 1, 2  Next
Jump to:   


Style:  
Search: