Read file using urllib and write adding extra characters - python

I have a script that regularly reads a text file on a server and over writes a copy of the text to a local copy of the text file. I have an issue of the process adding extra carriage returns and an extra invisible character after the last character. How do I make an identical copy of the server file?
I use the following to read the file
for link in links:
try:
f = urllib.urlopen(link)
myfile = f.read()
except IOError:
pass
and to write it to the local file
f = open("C:\\localfile.txt", "w")
try:
f.write(myfile)
except NameError:
pass
finally:
f.close()
This is how the file looks on the server
!http://i.imgur.com/rAnUqmJ.jpg
and this is how the file looks locally. Besides, an additional invisible character after the last 75
!http://i.imgur.com/xfs3E8D.jpg
I have seen quite a few similar questions, but not sure how to handle the urllib to read in binary
Any solution please?

If you want to copy a remote file denoted by a URL to a local file i would use urllib.urlretrieve:
import urllib
urllib.urlretrieve("http://anysite.co/foo.gz", "foo.gz")

I think urllib is reading binary.
Try changing
f = open("C:\\localfile.txt", "w")
to
f = open("C:\\localfile.txt", "wb")

Related

Python doesn't release file after it is closed

What I need to do is to write some messages on a .txt file, close it and send it to a server. This happens in a infinite loop, so the code should look more or less like this:
from requests_toolbelt.multipart.encoder import MultipartEncoder
num = 0
while True:
num += 1
filename = f"example{num}.txt"
with open(filename, "w") as f:
f.write("Hello")
f.close()
mp_encoder = MultipartEncoder(
fields={
'file': ("file", open(filename, 'rb'), 'text/plain')
}
)
r = requests.post("my_url/save_file", data=mp_encoder, headers=my_headers)
time.sleep(10)
The post works if the file is created manually inside my working directory, but if I try to create it and write on it through code, I receive this response message:
500 - Internal Server Error
System.IO.IOException: Unexpected end of Stream, the content may have already been read by another component.
I don't see the file appearing in the project window of PyCharm...I even used time.sleep(10) because at first, I thought it could be a time-related problem, but I didn't solve the problem. In fact, the file appears in my working directory only when I stop the code, so it seems the file is held by the program even after I explicitly called f.close(): I know the with function should take care of closing files, but it didn't look like that so I tried to add a close() to understand if that was the problem (spoiler: it was not)
I solved the problem by using another file
with open(filename, "r") as firstfile, open("new.txt", "a+") as secondfile:
secondfile.write(firstfile.read())
with open(filename, 'w'):
pass
r = requests.post("my_url/save_file", data=mp_encoder, headers=my_headers)
if r.status_code == requests.codes.ok:
os.remove("new.txt")
else:
print("File not saved")
I make a copy of the file, empty the original file to save space and send the copy to the server (and then delete the copy). Looks like the problem was that the original file was held open by the Python logging module
Firstly, can you change open(f, 'rb') to open("example.txt", 'rb'). In open, you should be passing file name not a closed file pointer.
Also, you can use os.path.abspath to show the location to know where file is written.
import os
os.path.abspath('.')
Third point, when you are using with context manager to open a file, you don't close the file. The context manger supposed to do it.
with open("example.txt", "w") as f:
f.write("Hello")

random/empty characters while re-editing a json file

I apologize for the vague definition of my problem in the title, but I really can't figure out what sort of problem I'm dealing with. So, here it goes.
I have python file:
edit-json.py
import os, json
def add_rooms(data):
if(not os.path.exists('rooms.json')):
with open('rooms.json', 'w'): pass
with open('rooms.json', 'r+') as f:
d = f.read() # take existing data from file
f.truncate(0) # empty the json file
if(d == ''): rooms = [] # check if data is empty i.e the file was just created
else: rooms = json.loads(d)['rooms']
rooms.append({'name': data['roomname'], 'active': 1})
f.write(json.dumps({"rooms": rooms})) # write new data(rooms list) to the json file
add_rooms({'roomname': 'friends'})'
This python script basically creates a file rooms.json(if it doesn't exist), grabs the data(array) from the json file, empties the json file, then finally writes the new data into the file. All this is done in the function add_rooms(), which is then called at the end of the script, pretty simple stuff.
So, here's the problem, I run the file once, nothing weird happens, i.e the file is created and the data inside it is:
{"rooms": [{"name": "friends"}]}
But the weird stuff happens when the run the script again.
What I should see:
{"rooms": [{"name": "friends"}, {"name": "friends"}]}
What I see instead:
I apologize I had to post the image because for some reason I couldn't copy the text I got.
and I can't obviously run the script again(for the third time) because the json parser gives error due to those characters
I obtained this result in an online compiler. In my local windows system, I get extra whitespace instead of those extra symbols.
I can't figure out what causes it. Maybe I'm not doing file handling incorrectly? or is it due to the json module? or am I the only one getting this result?
When you truncate the file, the file pointer is still at the end of the file. Use f.seek(0) to move back to the start of the file:
import os, json
def add_rooms(data):
if(not os.path.exists('rooms.json')):
with open('rooms.json', 'w'): pass
with open('rooms.json', 'r+') as f:
d = f.read() # take existing data from file
f.truncate(0) # empty the json file
f.seek(0) # <<<<<<<<< add this line
if(d == ''): rooms = [] # check if data is empty i.e the file was just created
else: rooms = json.loads(d)['rooms']
rooms.append({'name': data['roomname'], 'active': 1})
f.write(json.dumps({"rooms": rooms})) # write new data(rooms list) to the json file
add_rooms({'roomname': 'friends'})

python - file was loaded in the wrong encoding utf-8

im quite new to programing and i don´t understand this error message i get, file was loaded in the wrong encoding utf-8 or it´s not really a error message in the code but i get it in my new .txt file where i write all found keywords to. The .txt file get upp to 4000+ rows with information that i sort to Excel in another program and later send it to Access. What dose the message mean and is thhere a way to fix it? Thanks
im using pycharm with anaconda36
import glob
def LogFile(filename, tester):
data = []
with open(filename) as filesearch: # open search file
filesearch = filesearch.readlines() # read file
file = filename[37:]
for line in filesearch:
if tester in line: # extract "Create Time"
short = line[30:]
data.append(short) # store all found wors in array
print (file)
with open('Msg.txt', 'a') as handler: # create .txt file
for i in range(len(data)):
handler.write(f"{file}|{data[i]}")
# open with 'w' to "reset" the file.
with open('LogFile.txt', 'w') as file_handler:
pass
# ---------------------------------------------------------------------------------
for filename in glob.glob(r'C:\Users\Documents\Access\\GTX797\*.log'):
LogFile(filename, 'Sending Request: Tester')
I just had the same error in pyCharm and fixed it by specifying UTF-8 when creating the file. You will need to import codecs to do this.
import codecs
with codecs.open(‘name.txt', 'a', 'utf-8-sig') as f:

Setting HTML source for QtWebKit to a string value vs. file.read(), encoding issue?

I have a script that reads a bunch of JavaScript files into a variable, and then places the contents of those files into placeholders in a Python template. This results in the value of the variable src (described below) being a valid HTML document including scripts.
# Open the source HTML file to get the paths to the JavaScript files
f = open(srcfile.html, 'rU')
src = f.read()
f.close()
js_scripts = re.findall('script\ssrc="(.*)"', src)
# Put all of the scripts in a variable
js = ''
for script in js_scripts:
f = open(script, 'rU')
js = js + f.read() + '\n'
f.close()
# Open/read the template
template = open('template.html)
templateSrc = Template(template.read())
# Substitute the scripts for the placeholder variable
src = str(templateSrc.safe_substitute(javascript_content=js))
# Write a Python file containing the string
with open('htmlSource.py', 'w') as f:
f.write('#-*- coding: utf-8 -*-\n\nhtmlSrc = """' + src + '"""')
If I try to open it up via PyQt5/QtWebKit in Python...
from htmlSource import htmlSrc
webWidget.setHtml(htmlSrc)
...it doesn't load the JS files in the web widget. I just end up with a blank page.
But if I get rid of everything else, and just write to file '"""src"""', when I open the file up in Chrome, it loads everything as expected. Likewise, it'll also load correctly in the web widget if I read from the file itself:
f = open('htmlSource.py', 'r')
htmlSrc = f.read()
webWidget.setHtml(htmlSrc)
In other words, when I run this script, it produces the Python output file with the variable; then I try to import that variable and pass it to webWidget.setHtml(); but the page doesn't render. But if I use open() and read it as a file, it does.
I suspect there's an encoding issue going on here. But I've tried several variations of encode and decode without any luck. The scripts are all UTF-8.
Any suggestions? Many thanks!

Serving binary file from web server to client

Usually, when I want to transfer a web server text file to client, here is what I did
import cgi
print "Content-Type: text/plain"
print "Content-Disposition: attachment; filename=TEST.txt"
print
filename = "C:\\TEST.TXT"
f = open(filename, 'r')
for line in f:
print line
Works very fine for ANSI file. However, say, I have a binary file a.exe (This file is in web server secret path, and user shall not have direct access to that directory path). I wish to use the similar method to transfer. How I can do so?
What content-type I should use?
Using print seems to have corrupted content received at client side. What is the correct method?
I use the following code.
#!c:/Python27/python.exe -u
import cgi
print "Content-Type: application/octet-stream"
print "Content-Disposition: attachment; filename=jstock.exe"
print
filename = "C:\\jstock.exe"
f = open(filename, 'rb')
for line in f:
print line
However, when I compare the downloaded file with original file, it seems there is an extra whitespace (or more) for after every single line.
Agree with the above posters about 'rb' and Content-Type headers.
Additionally:
for line in f:
print line
This might be a problem when encountering \n or \r\n bytes in the binary file. It might be better to do something like this:
import sys
while True:
data = f.read(4096)
sys.stdout.write(data)
if not data:
break
Assuming this is running on windows in a CGI environment, you will want to start the python process with the -u argument, this will ensure stdout isn't in text-mode
When opening a file, you can use open(filename, 'rb') - the 'b' flag marks it as binary. For a general handler, you could use some form of mime magic (I'm not familiar with using it from Python, I've only ever used it from PHP a couple of years ago). For the specific case, .exe is application/octet-stream.
Content-type of .exe is tipically application/octet-stream.
You might want to read your file using open(filename, 'rb') where b means binary.
To avoid the whitespace problem, you could try with:
sys.stdout.write(open(filename,"rb").read())
sys.stdout.flush()
or even better, depending on the size of your file, use the Knio approach:
fo = open(filename, "rb")
while True:
buffer = fo.read(4096)
if buffer:
sys.stdout.write(buffer)
else:
break
fo.close()
For anyone using Windows Server 2008 or 2012 and Python 3, here's an update...
After many hours of experimentation I have found the following to work reliably:
import io
with io.open(sys.stdout.fileno(),"wb") as fout:
with open(filename,"rb") as fin:
while True:
data = fin.read(4096)
fout.write(data)
if not data:
break

Categories