Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 RegisterRegister   
 Python MemoryError when trying to split an element of a very large list
Index -> Programming, Python -> Python Help
View previous topic Printable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message
102jon




PostPosted: Wed Feb 23, 2011 11:33 am   Post subject: Python MemoryError when trying to split an element of a very large list

Hi, I'm writing a bit of code that requires me to read a very large text file (407 MB) and process the data. I read the file line by line to avoid a memory issue and stored it into a list. I am no iterating over that list and am using string.split() at for each iteration(line), however I am getting a MemoryError here. Anyone know what the problem is?
Sponsor
Sponsor
Sponsor
sponsor
DemonWasp




PostPosted: Wed Feb 23, 2011 12:23 pm   Post subject: RE:Python MemoryError when trying to split an element of a very large list

Without your code, any answer is guesswork. At a complete guess, you're trying to store something like 400-500MB in memory and Python is unhappy because it can't get that much memory. Check the listed memory consumption of your program, at the time when that error is thrown, through your operating system's task manager: you may find that it's closer to 1GB or so, depending on how Python stores strings internally (ASCII, UTF8, UTF16, etc).

Beyond that, post your source.
102jon




PostPosted: Wed Feb 23, 2011 12:31 pm   Post subject: Re: Python MemoryError when trying to split an element of a very large list

Here is the code:

code:
import string
import sys
import filter_interface

s = []

def sort(file_name, interface): 
    """
    This function takes a gnm file and sorts it in order of timestamp, from
    earliest to latest
    """
    global s
    lines = []
    with open(file_name) as file_in:
        for line in file_in:
            lines.append(line)
    x = len(lines)
    y = []                                                                 
    new_list = []                                               

    for i in range(len(lines)):
        interface.update()
        #Convert each $PGHP line into a list, and convert time values into integer values that can be sorted
        if lines[i].startswith("$PGHP"):
            plist = string.split(lines[i], ',')
            plist2 = plist[:2] + [int(item) for item in plist[2:9]] + plist[9:]
            y.append(plist2)
102jon




PostPosted: Wed Feb 23, 2011 12:33 pm   Post subject: Re: Python MemoryError when trying to split an element of a very large list

For further clarification, the error occurs at:

code:
plist = string.split(lines[i], ',')
DemonWasp




PostPosted: Wed Feb 23, 2011 12:58 pm   Post subject: RE:Python MemoryError when trying to split an element of a very large list

Does the error occur on the first time it gets to that line, or after some number of iterations? If you take a small portion of this file (say, a few lines, rather than megabytes) and run this program against it, does it work (either "run without errors" or "does what is expected")?
102jon




PostPosted: Wed Feb 23, 2011 1:22 pm   Post subject: Re: Python MemoryError when trying to split an element of a very large list

Not sure, but I tried this:

code:

plist = lines[i]
plist = string.split(plist, ',')



And that didn't work either.
DemonWasp




PostPosted: Wed Feb 23, 2011 2:26 pm   Post subject: RE:Python MemoryError when trying to split an element of a very large list

A possible solution to this problem, assuming you don't really need to store the whole file in memory, is to handle each line as you receive it. Meaning:

Python:

def sort(file_name, interface): 
    """
    This function takes a gnm file and sorts it in order of timestamp, from
    earliest to latest
    """

    y = []

    with open(file_name) as file_in:
        for line in file_in:
            if line.startswith ( "PGHP" ):
                plist = string.split ( line, ',' )
                plist2 = plist[:2] + [int(item) for item in plist[2:9]] + plist[9:]
                y.append(plist2)
102jon




PostPosted: Wed Feb 23, 2011 2:35 pm   Post subject: Re: Python MemoryError when trying to split an element of a very large list

That would be a nice solution, the only problem is that immediately after the code I posted, in a elif statement, I have to do comparisons between "line" and the previous "line". Not sure if there's a way to do that without storing the whole file into a list.
Sponsor
Sponsor
Sponsor
sponsor
DemonWasp




PostPosted: Wed Feb 23, 2011 4:03 pm   Post subject: RE:Python MemoryError when trying to split an element of a very large list

If you only need one previous line, then that's trivial (keep it in a variable, "previousLine" or similar). If you may need any prior line in the file, then you will have to store them all (or, read them from the file again, though I can't recommend that...).
102jon




PostPosted: Wed Feb 23, 2011 4:10 pm   Post subject: Re: Python MemoryError when trying to split an element of a very large list

Sorry, I should have made it more clear. I have to do comparisons with all of the lines up to that line. So as far as I know, I think I pretty much have to store the entire thing.
DemonWasp




PostPosted: Wed Feb 23, 2011 4:19 pm   Post subject: RE:Python MemoryError when trying to split an element of a very large list

Then yes, you're hosed. But, you're hosed in two different ways. First, you need to keep the entire file around in some capacity, rather than processing it as a stream (without storage). Second, your runtime is going to be abysmal: if you have a 400MB file and must compare each line to each previous line, you're looking at a runtime of O ( N ^ 2 ), which is going to be...a while.

Is this something where you could give a problem statement to work at (so that maybe we could determine a better algorithm) or is this proprietary?
Display posts from previous:   
   Index -> Programming, Python -> Python Help
View previous topic Tell A FriendPrintable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 1  [ 11 Posts ]
Jump to:   


Style:  
Search: