os.path.getsize() returns "0" - python

Getting "0" output, when I am trying to use os.path.getsize()
Not sure what's wrong, using PyCharm, I see that the file was created and the "comments" were added to the file. But PyCharm shows the output "0" :(
Here is the code:
import os
def create_python_script(filename):
comments = "# Start of a new Python program"
with open(filename, "w") as file:
file.write(comments)
filesize = os.path.getsize(filename)
return(filesize)
print(create_python_script("program.py"))
Please, point what is the error I don't see.

You're getting the size 0, due to the peculiar behaviour of the write function.
When you call the write function, it writes the content to the internal buffer. An internal buffer is kept for performance constraints (to limit too frequent I/O calls).
So in this case, you can't ensure that the data/content has been actually dumped to the file on disk or not when you call the getsize function.
with open(filename, "w") as file:
file.write(comments)
filesize = os.path.getsize(filename)
In order to ensure that the content is dumped to the file before calling the getsize function, you can call flush method.
flush method clears the internal buffer and dumps all the content to the file on the disk.
with open(filename, "w") as file:
file.write(comments)
file.flush()
filesize = os.path.getsize(filename)
Or, a better way would be to first close the file and then call the getsize method.
with open(filename, "w") as file:
file.write(comments)
filesize = os.path.getsize(filename)

Related

Python doesn't release file after it is closed

What I need to do is to write some messages on a .txt file, close it and send it to a server. This happens in a infinite loop, so the code should look more or less like this:
from requests_toolbelt.multipart.encoder import MultipartEncoder
num = 0
while True:
num += 1
filename = f"example{num}.txt"
with open(filename, "w") as f:
f.write("Hello")
f.close()
mp_encoder = MultipartEncoder(
fields={
'file': ("file", open(filename, 'rb'), 'text/plain')
}
)
r = requests.post("my_url/save_file", data=mp_encoder, headers=my_headers)
time.sleep(10)
The post works if the file is created manually inside my working directory, but if I try to create it and write on it through code, I receive this response message:
500 - Internal Server Error
System.IO.IOException: Unexpected end of Stream, the content may have already been read by another component.
I don't see the file appearing in the project window of PyCharm...I even used time.sleep(10) because at first, I thought it could be a time-related problem, but I didn't solve the problem. In fact, the file appears in my working directory only when I stop the code, so it seems the file is held by the program even after I explicitly called f.close(): I know the with function should take care of closing files, but it didn't look like that so I tried to add a close() to understand if that was the problem (spoiler: it was not)
I solved the problem by using another file
with open(filename, "r") as firstfile, open("new.txt", "a+") as secondfile:
secondfile.write(firstfile.read())
with open(filename, 'w'):
pass
r = requests.post("my_url/save_file", data=mp_encoder, headers=my_headers)
if r.status_code == requests.codes.ok:
os.remove("new.txt")
else:
print("File not saved")
I make a copy of the file, empty the original file to save space and send the copy to the server (and then delete the copy). Looks like the problem was that the original file was held open by the Python logging module
Firstly, can you change open(f, 'rb') to open("example.txt", 'rb'). In open, you should be passing file name not a closed file pointer.
Also, you can use os.path.abspath to show the location to know where file is written.
import os
os.path.abspath('.')
Third point, when you are using with context manager to open a file, you don't close the file. The context manger supposed to do it.
with open("example.txt", "w") as f:
f.write("Hello")

GZip and output file

I'm having difficulty with the following code (which is simplified from a larger application I'm working on in Python).
from io import StringIO
import gzip
jsonString = 'JSON encoded string here created by a previous process in the application'
out = StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
f.write(str.encode(jsonString))
# Write the file once finished rather than streaming it - uncomment the next line to see file locally.
with open("out_" + currenttimestamp + ".json.gz", "a", encoding="utf-8") as f:
f.write(out.getvalue())
When this runs I get the following error:
File "d:\Development\AWS\TwitterCompetitionsStreaming.py", line 61, in on_status
with gzip.GzipFile(fileobj=out, mode="w") as f:
File "C:\Python38\lib\gzip.py", line 204, in __init__
self._write_gzip_header(compresslevel)
File "C:\Python38\lib\gzip.py", line 232, in _write_gzip_header
self.fileobj.write(b'\037\213') # magic header
TypeError: string argument expected, got 'bytes'
PS ignore the rubbish indenting here...I know it doesn't look right.
What I'm wanting to do is to create a json file and gzip it in place in memory before saving the gzipped file to the filesystem (windows). I know I've gone about this the wrong way and could do with a pointer. Many thanks in advance.
You have to use bytes everywhere when working with gzip instead of strings and text. First, use BytesIO instead of StringIO. Second, mode should be 'wb' for bytes instead of 'w' (last is for text) (samely 'ab' instead of 'a' when appending), here 'b' character means "bytes". Full corrected code below:
Try it online!
from io import BytesIO
import gzip
jsonString = 'JSON encoded string here created by a previous process in the application'
out = BytesIO()
with gzip.GzipFile(fileobj = out, mode = 'wb') as f:
f.write(str.encode(jsonString))
currenttimestamp = '2021-01-29'
# Write the file once finished rather than streaming it - uncomment the next line to see file locally.
with open("out_" + currenttimestamp + ".json.gz", "wb") as f:
f.write(out.getvalue())

File content not reading without seek in python

In my case I am going to write some content to a file in bytearray format and tries to read the content that I have written . But here the problem is if I am not giving the seek function then the file content read is empty. What I understood is by default the reference point is at the beginning of the file which is similar to seek(0). Please help me to understand this problem. I will give you both scenarios as example here
Without seek command
filename = "my_file"
Arr = [0x1, 0x2]
file_handle = open(filename, "wb+")
binary_format = bytearray(Arr)
file_handle.write(binary_format)
#file_handle.seek(0) #Here commenting the seek(0) part
print("file_handle-",file_handle.read())
file_handle.close()
Output in the console
file_handle- b''
With seek command
filename = "my_file"
Arr = [0x1, 0x2]
file_handle = open(filename, "wb+")
binary_format = bytearray(Arr)
file_handle.write(binary_format)
file_handle.seek(0)
print("file_handle-",file_handle.read())
file_handle.close()
Output in the console is
file_handle- b'\x01\x02'
Is the seek(0) is mandatory here even if by default it points to the beginning of file ?

Python read file into memory for repeated FTP copy

I need to read a local file and copy to remote location with FTP, I copy same file file.txt to remote location repeatedly hundreds of times with different names like f1.txt, f2.txt... f1000.txt etc. Now, is it necessary to always open, read, close my local file.txt for every single FTP copy or is there a way to store into a variable and use that all time and avoid file open, close functions. file.txt is small file of 6KB. Below is the code I am using
for i in range(1,101):
fname = 'file'+ str(i) +'.txt'
fp = open('file.txt', 'rb')
ftp.storbinary('STOR ' + fname, fp)
fp.close()
I tried reading into a string variable and replace fp but ftp.storbinary requires second argument to have method read(), please suggest if there is better way to avoid file open close or let me know if it has no performance improvement at all. I am using python 2.7.10 on Windows 7.
Simply open it before the loop, and close it after:
fp = open('file.txt', 'rb')
for i in range(1,101):
fname = 'file'+ str(i) +'.txt'
fp.seek(0)
ftp.storbinary('STOR ' + fname, fp)
fp.close()
Update Make sure you add fp.seek(0) before the call to ftp.storbinary, otherwise the read call will exhaust the file in the first iteration as noted by #eryksun.
Update 2 depending on the size of the file it will probably be faster to use BytesIO. This way the file content is saved in memory but will still be a file-like object (ie it will have a read method).
from io import BytesIO
with open('file.txt', 'rb') as f:
output = BytesIO()
output.write(f.read())
for i in range(1, 101):
fname = 'file' + str(i) + '.txt'
output.seek(0)
ftp.storbinary('STOR ' + fname, fp)

Slow python file I:O; Ruby runs better than this; Got the wrong language?

Please advise - I'm going to use this asa learning point. I'm a beginner.
I'm splitting a 25mb file into several smaller file.
A Kindly guru here gave me a Ruby sript. It works beautifully fast. So, in order to learn I mimicked it with a python script. This runs like a three-legged cat (slow). I wonder if anyone can tell me why?
My python script
##split a file into smaller files
###########################################
def splitlines (file) :
fileNo=0001
outFile=open("C:\\Users\\dunner7\\Desktop\\Textomics\\Media\\LexisNexus\\ele\\newdocs\%s.txt" % fileNo, 'a') ## open file to append
fh = open(file, "r") ## open the file for reading
mylines = fh.readlines() ### read in lines
for line in mylines: ## for each line
if re.search("Copyright ", line): # if the line is equal to the regex
outFile.close() ## close the file
fileNo +=1 #and add one to the filename, starting to read lines in again
else: # otherwise
outFile=open("C:\\Users\\dunner7\\Desktop\\Textomics\\Media\\LexisNexus\\ele\\newdocs\%s.txt" % fileNo, 'a') ## open file to append
outFile.write(line) ## then append it to the open outFile
fh.close()
The guru's Ruby 1.9 script
g=0001
f=File.open(g.to_s + ".txt","w")
open("corpus1.txt").each do |line|
if line[/\d+ of \d+ DOCUMENTS/]
f.close
f=File.open(g.to_s + ".txt","w")
g+=1
end
f.print line
end
There are many reasons why your script is slow -- the main reason being that you reopen the outputfile for almost every line you write. Since the old file gets implicitly closed on opening a new one (due to Python garbage collection), the write buffer is flushed for every single line you write, which is quite expensive.
A cleaned up and corrected version of your script would be
def file_generator():
file_no = 1
while True:
f = open(r"C:\Users\dunner7\Desktop\Textomics\Media"
r"\LexisNexus\ele\newdocs\%s.txt" % file_no, 'a')
yield f
f.close()
file_no += 1
def splitlines(filename):
files = file_generator()
out_file = next(files)
with open(filename) as in_file:
for line in in_file:
if "Copyright " in line:
out_file = next(files)
out_file.write(line)
out_file.close()
I guess the reason your script is so slow is that you open a new file descriptor for each line. If you look at your guru's ruby script, it closes and opens the output file only if your separator matches.
In contrast to that, your python script opens a new file descriptor for every line you read (and btw, does not close them). Opening a file requires talking to the kernel, so this is relatively slow.
Another change I would suggest is to change
fh = open(file, "r") ## open the file for reading
mylines = fh.readlines() ### read in lines
for line in mylines: ## for each line
to
fh = open(file, "r")
for line in fh:
With this change, you do not read the whole file into memory, but only block after block. Although it should not matter with a 25MiB file, it will hurt you with big files and is good practice (and less code ;)).
The Python code might be slow due to regex and not IO. Try
def splitlines (file) :
fileNo=0001
outFile=open("newdocs/%s.txt" % fileNo, 'a') ## open file to append
reg = re.compile("Copyright ")
for line in open(file, "r"):
if reg.search("Copyright ", line): # if the line is equal to the regex
outFile.close() ## close the file
outFile=open("newdocs%s.txt" % fileNo, 'a') ## open file to append
fileNo +=1 #and add one to the filename, starting to read lines in again
outFile.write(line) ## then append it to the open outFile
Several notes
Always use / instead of \ for path name
If regex is used repeatedly, compile it
Do you need re.search? or re.match?
UPDATE:
#Ed. S: point taken
#Winston Ewert: code updated to be closer to the original Ruby code
rosser,
Don't use names of built-in objects as identifiers in a code (file, splitlines)
The following code respects the effect of your own code: an out_file is closed without the line containing 'Copyright ' that constitutes the signal of closing
The use of the function writelines() is intended to obtain a faster execution than with a repetition of out_file.write(line)
The if li: block is there to trigger the closing of out_file in case the last line of the read file doesn't contains 'Copyright '
def splitfile(filename, wordstop, destrep, file_no = 1, li = []):
with open(filename) as in_file:
for line in in_file:
if wordstop in line:
with open(destrep+str(file_no)+'.txt','w') as f:
f.writelines(li)
file_no += 1
li = []
else:
li.append(line)
if li:
with open(destrep+str(file_no)+'.txt','w') as f:
f.writelines(li)

Categories