Python / IDLE CPU usage for no reason - python

Here is a strange problem I have with IDLE (version 2.6.5 with the same Python version) on windows.
I try to run the following three commands:
fid= open('file.txt', 'r')
lines=fid.readlines()
print lines
When the print lines command is executed, the pythonw.exe process is going CPU crazy, consuming 100% of CPU and the IDLE seems to not be responding. The file.txt is around 130 kb - I don't consider that file very large !
When the lines finally print (after some minutes), if I try to scroll up to see them, I once again experience the same very large CPU usage.
The memory usage of pythonw.exe is around 15-16 MB all the time.
Can anybody explain to me this behaviour - obviously this can't be a bug in IDLE since it would have been discovered ... Also, what can I do to supress that behavior ? I like using IDLE for script like tasks involving data transformations from files.

Try reading it line by line:
fid = open('file.txt', 'r')
for line in fid:
print line
From the documentation on Input Output, there seem to be two ways to read files:
print f.read() # This reads the *whole* file. Might be bad to do this for large files.
for l in f: # This reads it line by line
print l # and prints it. Might be better for big files.

Related

Python, running into memory error when parsing a 30MB file(already downloaded into my local computer)

Here is my download address, the file name is 'kosarak'
http://fimi.uantwerpen.be/data/
My parsing code is:
parsedDat = [line.split() for line in open('kosarak.dat').readlines()]
I need this data as a whole to run some method on it, so read one line by one line and do the operation on each line is not fit for me here.
The file is only 30 MB and my computer has at least 10G memory left and 30+G Hard drive place,So I guess there shouldn't be any resource problem
FYI: My python version is 2.7 and I am running my python inside Spyder. My OS is windows 10.
PS: You don't need to use my parsing code/method to do the job,as long as you could get the data from file to my python environment that would be perfect.
Perhaps this may help.
with open('kosarak.dat', 'r') as f: # Or 'rb' for binary data.
parsed_data = [line.split() for line in f]
The difference being that your approach reads all of the lines in the file at once and then processes each one (effectively requiring 2x memory, once for the file data and once again for the parsed data, all of which must be stored in memory at the same time), whereas this approach just reads the file line by line and only needs the memory for the resulting parsed_data.
In addition, your method did not explicitly close the file (although you may just not have shown that portion of your code). This method uses a context manager (with expression [as variable]:) which will close the object automatically once the with block terminates, even following an error. See PEP 343.

Python text file read/write optimization

I have been working on this file i/o and have made some progress reading through the site and i am wondering what other ways this can be optimized. I am parsing a test infile of 10GB/30MM lines and writing to an outfile the fields which results in aprog 1.4GB clean file. Initially, it took 40m to run this process and i have reduced it to around 30m. Anyone have any other ideas to reduce this in python. Long term i will be looking to write this in C++ - i just have to learn the language first. thanks in advance.
with open(fdir+"input.txt",'rb',(50*(1024*1024))) as r:
w=open(fdir+"output0.txt",'wb',50*(1024*1024)))
for i,l in enumerate(r):
if l[42:44]=='25':
# takes fixed width line into csv line of only a few cols
wbun.append(','.join([
l[7:15],
l[26:35],
l[44:52],
l[53:57],
format(int(l[76:89])/100.0,'.02f'),
l[89:90],
format(int(l[90:103])/100.0,'.02f'),
l[193:201],
l[271:278]+'\n'
]))
# write about every 5MM lines
if len(wbun)==wsize:
w.writelines(wbun)
wbun=[]
print "i_count:",i
# splits about every 4GB
if (i+1)%fsplit==0:
w.close()
w=open(fdir+"output%d.txt"%(i/fsplit+1),'wb',50*(1024*1024)))
w.writelines(wbun)
w.close()
Try running it in Pypy (https://pypy.org), it will run without changes to your code, and probably faster.
Also, C++ might be an overkill, especially if you don't know it yet. Consider learning Go or D instead.

Writing to a text file does not occur in real-time. How to fix this

I have a python script that takes a long time to run.
I placed print-outs throughout the script to observe its progress.
As this script different programs, some of whom print many messages, it is unfeasible to print directly to the screen.
Therefore, I am using a report file
f_report = open(os.path.join("//shared_directory/projects/work_area/", 'report.txt'), 'w')
To which I print my massages:
f_report.write(" "+current_image+"\n")
However, when I look at the file while the script is running, I do not see the messages. They appear only when the program finishes and closes the file, making my approach useless for monitoring on-going progress.
What should I do in order to make python output the messages to the report file in real time?
Many thanks.
You should use flush() function to write immediately to the file.
f_report.write(" "+current_image+"\n")
f_report.flush()
try this:
newbuffer = 0
f_report = open(os.path.join("//shared_directory/projects/work_area/", 'report.txt'), 'w', newbuffer)
it sets up a 0 buffer which will push OS to write content to file "immediately". well, different OS may behavior differently but in general content will be flushed out right away.

Extracting data from file performance wise (subprocess vs file read) Python

Wondering what is the most efficient method to read data from a locally hosted file using python.
Either using subprocesses and just cat the contents of the file:
ssh = subprocess.Popen(['cat', dir_to_file],
stdout=subprocess.PIPE)
for line in ssh.stdout:
print line
OR simply read contents of the file:
f = open(dir_to_file)
data = f.readlines()
f.close()
for line in data:
print line
I am creating a script that has to read the contents of many files and I was wondering which method is most efficient in terms of CPU usage and also which is the fastest in terms of runtime.
This is my first post here at stackoverflow, apologies on formatting.
Thanks
#chrisd1100 is correct that printing line by line is the bottleneck. After a quick experiment, here is what I found.
I ran and timed the two methods above repeatedly (A - subprocess, B - readline) on two different file sizes (~100KB and ~10MB).
Trial 1: ~100KB
subprocess: 0.05 - 0.1 seconds
readline: 0.02 - 0.026 seconds
Trial 2: ~10MB
subprocess: ~7 seconds
readlin: ~7 seconds
At the larger file size, printing line by line becomes by far the most expensive operation. On smaller file sizes, it seems that readline has about 2x speed performance. Tentatively, I'd say that readline is faster.
These were all run on Python 2.7.10, OSX 10.11.13, 2.8 Ghz i7.

Python: Read huge number of lines from stdin

I'm trying to read a huge amount of lines from standard input with python.
more hugefile.txt | python readstdin.py
The problem is that the program freezes as soon as i've read just a single line.
print sys.stdin.read(8)
exit(1)
This prints the first 8 bytes but then i expect it to terminate but it never does. I think it's not really just reading the first bytes but trying to read the whole file into memory.
Same problem with sys.stdin.readline()
What i really want to do is of course to read all the lines but with a buffer so i don't run out of memory.
I'm using python 2.6
This should work efficiently in a modern Python:
import sys
for line in sys.stdin:
# do something...
print line,
You can then run the script like this:
python readstdin.py < hugefile.txt
Back in the day, you had to use xreadlines to get efficient huge line-at-a-time IO -- and the docs now ask that you use for line in file.
Of course, this is of assistance only if you're actually working on the lines one at a time. If you're just reading big binary blobs to pass onto something else, then your other mechanism might be as efficient.

Categories