How to open a file with chinese name in python - python

I am trying to open a file in "w" mode with "open()" function in python.
The filename is : 仿宋人笔意.jpg.
The open function fails with this filename but succeeds with normal files.
How can I open a file with names which are not in English in python?
My code is as follows:
try:
filename = urllib.quote(filename.encode('utf-8'))
destination = open(filename, 'w')
yield("<br>Obtained the file reference")
except:
yield("<br>Error while opening the file")
I always get "Error while opening the file" for non-english filenames.
Thanks in advance.

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import codecs
f=codecs.open(u'仿宋人笔意.txt','r','utf-8')
print f.read()
f.close()
worked just fine here

If you're having a problem it seems more likely to do with your operating system or terminal configuration than Python itself; it worked ok for me even without using the codecs module.
Here's a shell log of a test that opened an image file and copied it into a new file with the Chinese name you provided:
$ ls
create_file_with_chinese_name.py some_image.png
$ cat create_file_with_chinese_name.py
#!/usr/bin/python
# -*- coding: UTF-8 -*-
chinese_name_file = open(u'仿宋人笔意.png','wb')
image_data = open('some_image.png', 'rb').read()
chinese_name_file.write(image_data)
chinese_name_file.close()
$ python create_file_with_chinese_name.py
$ ls
create_file_with_chinese_name.py some_image.png 仿宋人笔意.png
$ diff some_image.png 仿宋人笔意.png
$
Worked for me, the images are the same.

If you yield your filename does it look right? I'm not sure if the filname has been mangled before reaching this code segment.

I tried modifying my code and rewrote it as:
destination = open(filename.encode("utf-8"), 'wb+')
try:
for chunk in f.chunks():
destination.write(chunk)
destination.close()
except os.error:
yield( "Error in Writing the File ",f.name)
And it solved my error.
Thank you everyone for taking time to answer. I haven't tried the options mentioned above as I was able to fix it, but thanks everybody.

This:
filename.encode('utf-8')
tries to convert filename from ascii encoding to utf-8 and causes the error.

Related

python - file was loaded in the wrong encoding utf-8

im quite new to programing and i don´t understand this error message i get, file was loaded in the wrong encoding utf-8 or it´s not really a error message in the code but i get it in my new .txt file where i write all found keywords to. The .txt file get upp to 4000+ rows with information that i sort to Excel in another program and later send it to Access. What dose the message mean and is thhere a way to fix it? Thanks
im using pycharm with anaconda36
import glob
def LogFile(filename, tester):
data = []
with open(filename) as filesearch: # open search file
filesearch = filesearch.readlines() # read file
file = filename[37:]
for line in filesearch:
if tester in line: # extract "Create Time"
short = line[30:]
data.append(short) # store all found wors in array
print (file)
with open('Msg.txt', 'a') as handler: # create .txt file
for i in range(len(data)):
handler.write(f"{file}|{data[i]}")
# open with 'w' to "reset" the file.
with open('LogFile.txt', 'w') as file_handler:
pass
# ---------------------------------------------------------------------------------
for filename in glob.glob(r'C:\Users\Documents\Access\\GTX797\*.log'):
LogFile(filename, 'Sending Request: Tester')
I just had the same error in pyCharm and fixed it by specifying UTF-8 when creating the file. You will need to import codecs to do this.
import codecs
with codecs.open(‘name.txt', 'a', 'utf-8-sig') as f:

How to open .bashrc in python

I have a problem with opening the .bashrc file in python. I have written the following code:
mydata = open ('/home/lpp/.bashrc', 'r')
and I get this:
$ /usr/bin/python2.7 /home/lpp/lpp2016/Handin3.py
Process finished with exit code 0
Python does not open the file .bashrc. What did I do wrong?
This is an old question. I went through the same problem and this is my solution to read the content.
with open('.bashrc', 'r') as content:
fileContent = content.read()
print(fileContent)
fp = open(os.path.join((os.path.expanduser('~'), '.bashrc'))
fp.read()
it is opening and reading all the content

How to make this python(2.7) scripts works with unicode filename?

I have the following script to process filenames with non-latin characters:
import os
filelst = []
allfile = os.listdir(os.getcwd())
for file in allfile:
if os.path.isfile(file):
filelst.append(file)
w = open(os.getcwd()+'\\_filelist.txt','w+')
for file in allfile:
w.write(file)
w.write("\n")
w.close()
filelist in my folder:
new 1.py
ああっ女神さまっ 小っちゃいって事は便利だねっ.1998.Ep0108.x264.AC3CalChi.avi
ああっ女神さまっ 小っちゃいって事は便利だねっ.1998.Ep0108.x264.AC3CalChi.srt
output in _filelist.txt:
new 1.py
???????? ??????????????.1998.Ep01-08.x264.AC3-CalChi.avi
???????? ??????????????.1998.Ep01-08.x264.AC3-CalChi.srt
You should get the list of files as Unicode strings instead by passing a Unicode file path to listdir. As you're using getcwd, use: os.getcwdu()
Then open your output file with a text encoding wrapper. io module is the new way to do this (io handles Universal newlines correctly).
Putting it all together:
import os
import io
filelst = []
allfile = os.listdir(os.getcwdu())
for file in allfile:
if os.path.isfile(file):
filelst.append(file)
w = io.open(os.getcwd()+'\\_filelist.txt','w+', encoding="utf-8")
for file in allfile:
w.write(file)
w.write("\n")
w.close()
In Windows and OS X, this will just work as filename translation is enforced. In Linux, a filename can be any encoding (or non at all!). Therefore, ensure that whatever is creating your files (avi + srt), is using UTF-8, your terminal is set to UTF-8 and your locale is UTF-8.
You need to open your file with a proper encoding to write unicode in it.You can use codecs module for opening the file:
import codecs
with codecs.open(os.getcwd()+'\\_filelist.txt','w+',encoding='your-encoding') as w:
for file in allfile:
w.write(file + '\n')
You can use UTF-8 as your encoding which is a universal encoding or another proper encoding based on your unicode type.Also note that instead of opening the file and closing it manually you can use with statement to open the file which will close the file automatically at the end of the block.

How to deploy zip files (or other binaries) trough cgi in Python?

I'm coding a small website with Python and CGI where users can upload zip files and download files uploaded by other users.
Currently I'm able to upload correctly the zip's, but I'm having some trouble to correctly send files to the user. My first approach was:
file = open('../../data/code/' + filename + '.zip','rb')
print("Content-type: application/octet-stream")
print("Content-Disposition: filename=%s.zip" %(filename))
print(file.read())
file.close()
But soon I realized that I had to send the file as binary, so I tried:
print("Content-type: application/octet-stream")
print("Content-Disposition: filename=%s.zip" %(filename))
print('Content-transfer-encoding: base64\r')
print( base64.b64encode(file.read()).decode(encoding='UTF-8') )
And different variants of it. It just doesn't works; Apache raises "malformed header from script" error, so I guess I should encode the file in some other way.
You need to print an empty line after the headers, and you Content-disposition header is missing the type (attachment):
print("Content-type: application/octet-stream")
print("Content-Disposition: attachment; filename=%s.zip" %(filename))
print()
You may also want to use a more efficient method of uploading the resulting file; use shutil.copyfileobj() to copy the data to sys.stdout.buffer:
from shutil import copyfileobj
import sys
print("Content-type: application/octet-stream")
print("Content-Disposition: attachment; filename=%s.zip" %(filename))
print()
with open('../../data/code/' + filename + '.zip','rb') as zipfile:
copyfileobj(zipfile, sys.stdout.buffer)
You should not use print() for binary data in any case; all you get is b'...' byte literal syntax. The sys.stdout.buffer object is the underlying binary I/O buffer, copy binary data directly to that.
The header is malformed because, for some reason, Python sends it after sending the file.
What you need to do is flush stdout right after the header:
sys.stdout.flush()
Then put the file copy
This is what worked for me, I am running Apache2 and loading this script via cgi. Python 3 is my language.
You may have to replace first line with your python 3 bin path.
#!/usr/bin/python3
import cgitb
import cgi
from zipfile import ZipFile
import sys
# Files to include in archive
source_file = ["/tmp/file1.txt", "/tmp/file2.txt"]
# Name and Path to our zip file.
zip_name = "zipfiles.zip"
zip_path = "/tmp/{}".format(zip_name)
with ZipFile( zip_path,'w' ) as zipFile:
for f in source_file:
zipFile.write(f);
# Closing File.
zipFile.close()
# Setting Proper Header.
print ( 'Content-Type:application/octet-stream; name="{}"'.format(zip_name) );
print ( 'Content-Disposition:attachment; filename="{}"\r\n'.format(zip_name) );
# Flushing Out stdout.
sys.stdout.flush()
bstdout = open(sys.stdout.fileno(), 'wb', closefd=False)
file = open(zip_path,'rb')
bstdout.write(file.read())
bstdout.flush()

Python encoding conversion

I wrote a Python script that processes CSV files with non-ascii characters, encoded in UTF-8. However the encoding of the output is broken. So, from this in the input:
"d\xc4\x9bjin hornictv\xc3\xad"
I get this in the output:
"d\xe2\x99\xafjin hornictv\xc2\xa9\xc6\xaf"
Can you suggest where the encoding error might come from? Have you seen similar behaviour previously?
EDIT: I'm using csv standard library with the UnicodeWriter class featured in the docs. I use Python version 2.6.6.
EDIT 2: The code to reproduce the behaviour:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import csv
from pymarc import MARCReader # The pymarc package available PyPI: http://pypi.python.org/pypi/pymarc/2.71
from UnicodeWriter import UnicodeWriter # The UnicodeWriter from: http://docs.python.org/library/csv.html
def getRow(tag, record):
if record[tag].is_control_field():
row = [tag, record[tag].value()]
else:
row = [tag] + record[tag].subfields
return row
inputFile = open("input.mrc", "r")
outputFile = open("output.csv", "wb")
reader = MARCReader(inputFile, to_unicode = True)
writer = UnicodeWriter(outputFile, delimiter = ",", quoting = csv.QUOTE_MINIMAL)
for record in reader:
if bool(record["001"]):
tags = [field.tag for field in record.get_fields()]
tags.sort()
for tag in tags:
writer.writerow(getRow(tag, record))
inputFile.close()
outputFile.close()
The input data is available here (large file).
It seems adding force_utf8 = True argument to the MARCReader constructor solved the problem:
reader = MARCReader(inputFile, to_unicode = True, force_utf8 = True)
According to the inspection of the source code (via inspect) it does something like:
string.decode("utf-8", "strict")
You can try to open the file with UTF-8 encoding:
import codecs
codecs.open('myfile.txt', encoding='utf8')

Categories