Posted: Fri Aug 12, 2005 11:16 pm Post subject: interesting problem
You are part of a team that is working on a piece of software to handle text compression. Your team has designed the compression algorithm as follows:
To compress a given string of text, two strings, each 3 characters in length, will be chosen as compression keys. The strings may contain any combination of capital letters and/or spaces. Then, a compressed string will be generated from the original such that replacing "*1" in the compressed string with the first string and replacing "*2" with the second string will recreate the original text.
For example, if the two compression keys are "FOO" and "BAR", then the compressed string "*2X*1" would decompress to "BARXFOO", and "*1*1 *2" would decompress to "FOOFOO BAR".
You have been tasked with writing a proof of concept implementation to test the effectiveness of this compression scheme. You will be given a String original. Your goal is to optimally select the two compression keys, and generate the compressed text to be as short as possible. You are to return the length of the shortest possible text.
Examples:
0)
"BARXFOO"
Returns: 5
The first example from the problem statement.
1)
"FOOFOO BAR"
Returns: 7
The second example from the problem statement.
2)
"ABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOPQRSTUVWXY"
Returns: 46
It's a long string, but no 3-character substring appears more than twice.
3)
"QWERTYUIOPASDFGHJKLZXCVBNM"
Returns: 24
Here, no substring repeats itself at all. The best we can do is to pick any two substrings to replace.
hint: there are 26^7 possible combinations of two keys which is too many to test.
note: this problem if from topcoder.com
Sponsor Sponsor
bugzpodder
Posted: Sat Aug 13, 2005 4:53 pm Post subject: (No subject)
hint to hint: it would be stupid to do all 26^7 combinations, (i dont even know where the 26^7 comes from, isnt it 26^6)... all you need to do is pick two keys that exists at least once in the string. so thats only about 200^2 (if the length of string is 200). and 40000 is reasonable. so all that is left to do is grunt work... what is the limit on the string length anyways?
wtd
Posted: Sat Aug 13, 2005 9:38 pm Post subject: (No subject)
This seems to work reasonably well.
code:
module Compression where
import List
import Char
allPossibleKeys len str
| length str < len = []
| valid candidateKey = candidateKey : allPossibleKeys len (drop 3 str)
| otherwise = allPossibleKeys len (tail str)
where
valid = all validChar
validChar ch = isUpper ch || isSpace ch
candidateKey = take len str
firstKey len = head . allPossibleKeys len
count _ [] = 0
count v (x:xs)
| v == x = 1 + countRest
| otherwise = countRest
where
countRest = count v xs
rankedKeys len str =
sortBy (\k1 k2 -> snd k2 `compare` snd k1) $ map (\k -> (k, count k keys)) $ nub keys
where
keys = allPossibleKeys len str
Posted: Sat Aug 13, 2005 11:14 pm Post subject: (No subject)
bugzpodder wrote:
hint to hint: it would be stupid to do all 26^7 combinations, (i dont even know where the 26^7 comes from, isnt it 26^6)... all you need to do is pick two keys that exists at least once in the string. so thats only about 200^2 (if the length of string is 200). and 40000 is reasonable. so all that is left to do is grunt work... what is the limit on the string length anyways?
strings are 1-50 characters long and consist of upper case letters and space characters...
wtd >> if you give a solution in java / c++ / c# / vb.net then i could varify your answer...
btw it's not as easy as it seems...
hint2: even if you do have the correct keys, you still have to figure out what the shortest possible string the original string can be compressed into using these keys... the examples do not have any cases which will display this...
wtd
Posted: Sat Aug 13, 2005 11:50 pm Post subject: (No subject)
code:
C:\> ghci Compression
___ ___ _
/ _ \ /\ /\/ __(_)
/ /_\// /_/ / / | | GHC Interactive, version 6.4, for Haskell 98.
/ /_\\/ __ / /___| | http://www.haskell.org/ghc/
\____/\/ /_/\____/|_| Type :? for help.
Posted: Sat Aug 13, 2005 11:56 pm Post subject: (No subject)
zylum wrote:
wtd >> if you give a solution in java / c++ / c# / vb.net then i could varify your answer...
Yes, but it's so much easier to write it in Haskell.
zylum
Posted: Sun Aug 14, 2005 12:35 am Post subject: (No subject)
i suppose... but i don't even understand the code you wrote so even i couldnt tell you whether it might pass all test cases. maybe you could explain the algorithm you used?
wtd
Posted: Sun Aug 14, 2005 1:38 am Post subject: (No subject)
Create the Compression module.
code:
module Compression where
Import functions from the Char and List modules.
code:
import List
import Char
The allPossibleKeys function finds, well... all possible keys.
code:
allPossibleKeys len str
| length str < len = []
| valid candidateKey = candidateKey : allPossibleKeys len (drop 3 str)
| otherwise = allPossibleKeys len (tail str)
where
valid = all validChar
validChar ch = isUpper ch || isSpace ch
candidateKey = take len str
This function takes two arguments: the length of the keys, and the original string.
It uses guard conditions, local functions, and recursion.
Obviously, if the string isn't long enough to represent a key, there are no valid keys. Return an empty list.
Otherwise, if the candidate key is valid, return that candidate, tacked onto the result of evaluating the function on the rest of the string.
The candidate key is the first three characters of the input string.
The validChar function tests to see if a character is an uppercase letter or space.
The valid function tests to see if all characters in a string conform to the rules set in the validChar function. This uses partial function application and foregoes an explicit function parameter.
Otherwise, if there are enough characters, but they don't form a valid key, drop a single character from the beginning of the input string and search it for keys.
The firstKey function is an example of function composition. The string as the second parameter is inferred. It gets the first key from the list of all possible keys using the standard "head" function. The "." is the Haskell function composition operator.
code:
firstKey len = head . allPossibleKeys len
The count function is a straightforward recursive function.
code:
count _ [] = 0
Regardless of what you're counting, there aren't any in an empty list.
code:
count v (x:xs)
| v == x = 1 + countRest
| otherwise = countRest
where
countRest = count v xs
For a non-empty list, if the value you're looking for equals the first element in the list, add one to the result of counting the rest of the list. Otherwise, just count the rest of the list.
The "rankedKeys" function first finds all possible keys.
code:
rankedKeys len str =
sortBy (\k1 k2 -> snd k2 `compare` snd k1) $ map (\k -> (k, count k keys)) $ nub keys
where
keys = allPossibleKeys len str
Then it uses the standard "nub" function to get unique keys, and for each of those maps its count in the original list of keys.
It then uses these counts to sort the list, such that keys which appear most frequently are first.
The "strReplace" function is a pretty straightforward way to replace one string with another in a larger string.
Now, since we can get a list of possible keys, sorted by the number of times they occur in a string, we can find the most common key just by taking the first one using the "head" function.
code:
compress len str =
(key1, key2, secondTransform)
where
(key1, _) = head $ rankedKeys len str
firstTransform = strReplace key1 "*1" str
(key2, _) = head $ rankedKeys len firstTransform
secondTransform = strReplace key2 "*2" firstTransform
That's exactly what I've done with:
code:
(key1, _) = head $ rankedKeys len str
firstTransform = strReplace key1 "*1" str
Now, since I have to assume that I've changed the string, I search the string again, and find the new best candidate, then do another string replacement to get the final compression. The function returns both keys and the compressed string in a tuple.
Sponsor Sponsor
zylum
Posted: Sun Aug 14, 2005 2:26 am Post subject: (No subject)
wow, that was really confusing...
by explaining your algorithm i meant something more like:
"First we generate all possible 3 letter keys. Then try every combination of two keys that exist in the string and see which combination reduces the original string the most"
anyhoo, what does your solution return for 'AABBAABBBAABBA' ?
wtd
Posted: Sun Aug 14, 2005 4:48 am Post subject: (No subject)
It actually breaks my solution, since my solution doesn't test to make sure there are two valid keys. The first key it goes for is "AAB". Once that's replaced with "*1", though, there are no further 3 character keys left.
wtd
Posted: Sun Aug 14, 2005 5:24 am Post subject: (No subject)
And about my explanation... well, I thought that if the code itself were explained a bit, the algorithm would be fairly apparent.
bugzpodder
Posted: Sun Aug 14, 2005 9:04 am Post subject: (No subject)
hey zylum, why not try a more interesting problem? like those three in the IOI practice set! They look fantastically good!
zylum
Posted: Sun Aug 14, 2005 9:35 pm Post subject: (No subject)
bugzpodder wrote:
hey zylum, why not try a more interesting problem? like those three in the IOI practice set! They look fantastically good!
sounds good! can you post a link?
btw here's my solution to who ever is interested..
code:
import java.util.*;
public class CompressionText {
public int shortestLength(String original) {
int s = original.length();
Vector keys = new Vector();
for (int i = 0; i <= original.length()-3; i++)
if (!keys.contains(original.substring(i, i + 3)))
keys.add(original.substring(i, i + 3));
for (int i = 0; i < keys.size(); i++)
for (int j = 0; j < keys.size(); j++)
s = Math.min(s, Search(original, (String) keys.get(i), (String) keys.get(j)));
return s;
}
int Search (String s, String k1, String k2) {
return Math.min(s.indexOf(k1) < 0 ? s.length() : Search(s.replaceFirst(k1, "xx"), k1, k2),
s.indexOf(k2) < 0 ? s.length() : Search(s.replaceFirst(k2, "xx"), k1, k2));
}
}
first i generate all keys by spliting the string into 3 letter substrings... i add each new key to a key vector making sure i dont add repeating keys. then i loop through all two key combinations. for each combination, i recursively replace the keys from the original string try all the ways i can replace the keys (some orders of replacing are better than others). the recursive function returns the length of the sortest possible string that can be generated by the two keys.
bugzpodder
Posted: Mon Aug 15, 2005 10:45 am Post subject: (No subject)
zylum, see the link posted by prof cormack (gvc) in the contest forum
MysticVegeta
Posted: Tue Aug 16, 2005 1:24 pm Post subject: (No subject)
I dont get the question... please use simpler language??