Splitting a string accordingly
Author |
Message |
SNIPERDUDE
|
Posted: Wed Nov 16, 2011 8:29 am Post subject: Splitting a string accordingly |
|
|
I'm trying to get something to work here, but my knowledge of python is limited. A simple question mind you, but I find Google's search results getting dumber than it once was.
Back on topic.
I am reading a line from a file as follows:
code: | 001 "blue key" "blue key" True 1.0 o o 0 0.0 2 False |
I've been using the split function to separate each variable it's reading, however I would like to keep the strings within the quotation marks together. Any simple way of doing this?
Note: The file is storing multiple lines similar to this, changing the format of the line is possible but quite tedious. |
|
|
|
|
|
Sponsor Sponsor
|
|
|
2goto1
|
Posted: Wed Nov 16, 2011 8:49 am Post subject: RE:Splitting a string accordingly |
|
|
Not a Python guru, but wouldn't it be possible to sequentially parse each item one at a time? I.e. parse 001 by looking for the first occurrence of whitespace. Next parse "blue key" by looking for the first occurrence of a double-quote, then the second. Etc.
Another approach that should work would be to use regular expressions. |
|
|
|
|
|
DemonWasp
|
Posted: Wed Nov 16, 2011 10:49 am Post subject: RE:Splitting a string accordingly |
|
|
The most straightforward way to do it is probably a regular expression, using "capturing groups", which will let you match things like this line relatively easily. I don't know how regular expressions are coded up in Python, but the syntax of the expression itself is relatively universal, so I've worked up the following based on Java (and your very limited input sample). I'm led to believe that Python's regular expression (regex) support should be pretty easy.
code: | ([0-9]+) "(.+?)" "(.+?)" (True|False) ([0-9]+\.[0-9]+) ([a-zA-Z]) ([a-zA-Z]) ([0-9]) ([0-9]+\.[0-9]+) ([0-9]) (True|False)
1 2 3 4 5 6 7 8 9 10 11 |
- Matches one or more digits.
- Matches one or more characters inside quotes (quotes are required)
- Same as #2
- True or False, nothing else.
- A decimal real number, with some digits before a dot (.) and some after. Will not accept .5 or 5. or 5, however.
- Any single character, either upper or lower-case.
- Same as #6
- Any single digit.
- Same as #5
- Same as #8
- Same as #4
With your input, this gives groups:
code: |
Group 1: '001'
Group 2: 'blue key'
Group 3: 'blue key'
Group 4: 'True'
Group 5: '1.0'
Group 6: 'o'
Group 7: 'o'
Group 8: '0'
Group 9: '0.0'
Group 10: '2'
Group 11: 'False'
|
Depending on what the actual specification of the format is, you can tweak the regular expression to accept different strings.
Alternately, you could scan the lines yourself. I've done that before too, and my experience is that it can be difficult to get right. |
|
|
|
|
|
Zren
|
Posted: Wed Nov 16, 2011 11:06 am Post subject: RE:Splitting a string accordingly |
|
|
http://docs.python.org/dev/library/argparse.html
You can use it outside sys.argv with:
args = parser.parse_args('/blah manga "rockets and bubblegum" "dude what"'.split(' '))
If you look at the source of that module (or it's legacy module optparse) you'll probably find either a good how they did it.
The advantage to using argparse is that you can specify parameter names (and retrieve with args.id, args.flag), and even specify what type they are. |
|
|
|
|
|
SNIPERDUDE
|
Posted: Wed Nov 16, 2011 12:26 pm Post subject: RE:Splitting a string accordingly |
|
|
Thanks guys, got it working. Getting this thing done is painfully slow, I think I need a break. |
|
|
|
|
|
|
|