This question already has an answer here:
Downloading Mp3 using Python in Windows mangles the song however in Linux it doesn't
(1 answer)
Closed 8 years ago.
I am trying to download a .mp3 file, although, when I download it, it is downloaded but sounds like sqeaks and other strange noises through my speakers (similar to when you have a .mp3 file that has errors in the coding).
Here is my current code:
# sets the song url to mp3link variable.
mp3link = dLink[0]
# opens the .mp3 file - using the same procedure as above.
openmp3 = open('testing.mp3', 'w')
dl = urllib2.urlopen(mp3link)
dl2 = dl.read()
# writes the .mp3 file to the file 'testing.mp3' which is in the variable openmp3.
openmp3.write(dl2)
openmp3.close()
print 'done'
I am aware that I could use this code as a faster method:
dlmp3 = urllib2.urlopen(url)
with open('testing2.mp3', 'wb') as filee:
filee.write(dlmp3.read())
Is there somebody that can tell me what I am doing wrong and how I would be able to fix it?
Thanks.
I found the fix at: Downloading Mp3 using Python in Windows mangles the song however in Linux it doesn't
"Try binary file mode. open(mp3Name, "wb") You're probably getting line ending translations.
The file is binary, yes. It's the mode that wasn't. When a file is opened, it can be set to read as a text file (this is default). When it does this, it will convert line endings to match the platform. On Windows, line ends are \r\n In most other places it's either \r or \n. This change messes up the data stream. "
Related
This question already has answers here:
Convert UTF-8 with BOM to UTF-8 with no BOM in Python
(7 answers)
Closed last year.
I'm opening a plain text file, parsing it, and adding different lines to existing, empty string variables. I add these variables into a new variable that is a multi-line fstring. Trying to write the data to a new text file is not behaving as expected.
Reading the original file works fine. Text is properly parsed, variables populated.
The multi-line fstring variable seems fine. Prints normally. Even tried formatting it different ways which I show below.
When writing to a new file, that's where the strangeness starts. I've tried 2 ways:
Straight coding the open function with w or w+
Adding the above to a function and using that inside main()
The file is saved to disk with the correct name. Trying to double-click open in Finder produces nothing. Right-click to open produces nothing. Trying to move to trash with command+delete gives an error:
It sounds like the file goes to trash, but as the file disappears from the folder a new one is created with the same name in its place.
If I try to open in TextMate via File > Open, it opens as a blank file with no errors.
Since I can't get rid of the file, I have to delete the directory and create the directory again with the same name, or force delete in Terminal using rm. Restarting the system does not help. Relaunching Finder does nothing. Saving text files from other apps works fine. Directory is chmod 755.
If I copy an existing text file into the output directory, rename it to what the file is expected to be named, and let python overwrite the contents, it doesn't work either. The file modification date changes (and I see the file "blink" in Finder) but the contents remain the same. However, the file is not corrupted and opens normally.
If I do the same but delete the text inside of the copied file first, then run the script, python writes no data to the file, I can't open it by double-clicking on it, and I get error -43 again with the odd non-trashing behavior.
The strangest thing is this: if I add another with open() at the end of the script, and open the file that was just created and supposedly written to, and print its contents, the contents print. It's like when the script ends the file contents are being removed or its being corrupted somehow. Tried to close the file inside the script even though it's not needed, but same behavior persists.
Code:
Here's the code for writing:
FORMAT='utf-8'
OUTPUT_DIR = '/Path/To/SaveFolder'
# as a function
def write_to_file(content, fpath, name):
the_file = os.path.join(fpath, name)
with open(the_file, 'w+', encoding=FORMAT) as t:
t.write(content)
def main():
print(f" Writing File...\n")
filename = f"{pcode}_{author}_{title}_text.txt"
write_to_file(multiline_var, OUTPUT_DIR, filename)
# or hard coded in main()
def main():
print(f" Writing File...\n")
filename = f"{pcode}_{author}_{title}_text.txt"
the_file = os.path.join(OUTPUT_DIR, filename)
with open(the_file, 'w+', encoding=FORMAT) as t:
t.write(multiline_var)
I have tried using w w+ wt and wt+ and with and without encoding='utf-8'
Here is an example of multi-line fstring variable:
# using triple quotes
multiline_var = f"""
[PROJ-{pcode}] {full_title} by {author}
{description}
{URL}
{DIVIDER_1}
{TEXT_BLURB}
Some text here and then {SOME_MORE_TEXT}"
{DIVIDER_1}
{SOME_LINK}
"""
# or inside parens
multiline_var = (
f"[PROJ-{pcode}] {full_title} by {author}\n"
f"{description}\n\n"
f"{URL}\n"
f"{DIVIDER_1}\n"
f"{TEXT_BLURB}\n\n"
f"Some text here and then {SOME_MORE_TEXT}\n"
f"{DIVIDER_1}\n\n"
f"{SOME_LINK}"
)
Using exiftool on the text file shows the following, so it looks the data is there but must be corrupted:
File Size : 1797 bytes
File Modification Date/Time : 2021:12:31 15:55:39-05:00
File Access Date/Time : 2021:12:31 15:58:13-05:00
File Inode Change Date/Time : 2021:12:31 15:55:39-05:00
File Permissions : -rw-r--r--
File Type : TXT
File Type Extension : txt
MIME Type : text/plain
MIME Encoding : utf-8
Byte Order Mark : No
Newlines : Unix LF
Line Count : 55
Word Count : 181
Not sure what I'm doing wrong. VScode shows no syntax errors in the script. There are no errors in Terminal when running the script. Have I made some simple mistake in the above code? Maybe the fstring variable is causing a problem?
Thanks to #bnaecker for leading me to the solution to this problem.
It appeared that when creating/writing to a text file with a long name, Python can corrupt it. Not sure why, as I save long names for images with Python image libraries all the time. Using a short name like "MyFile.txt" it worked just fine, but that was a red herring.
I have updated this post with my journey to the final solution for using the long names that are needed for my project, though I'm not sure why the problem exists.
First Attempts:
So far creating using a short name and then renaming to a long one.... attempts have failed. I did notice that python is locking the file it creates and never unlocks it. Not sure if this is the problem. Setting chflags with os.system('chflags nouchg') command does not work, not even with sudo, and not even in the Terminal doing it manually.
Using os.rename() in Python corrupts the file
Using os.system('mv oldFile.txt newFile.txt') corrupts the file
Manually using mv command in Terminal corrupts the file
Manually changing the filename in the Finder does not (wtf?)
I kept looking for workarounds but nothing did the job.
Round 2:
Progress!
After much tinkering, I discovered a hidden character inside the file. I ran cat /path/longfilename.txt in Terminal, selected and copied the output and pasted into VScode. Here is what I saw:
Somehow a hidden character is getting into the project code number.
Pasting it into a Unicode search engine it came up as a ZERO WIDTH NO-BREAK SPACE also known in Unicode as EF BB BF. However, when pasting this symbol into TextMate it shows up as <U+FEFF> which is?...
The Byte Order Mark!
Opening a normal utf-8 text file in a hex editor also shows the files starting with EFBBBF for the BOM.
Now, the text file being read and parsed at first has no blank lines to start the file, so I added a line break, and also tried adding some spaces. This time when writing the file I could open it, however, after sending it to the trash, the same behavior occurred and the file was broken again. It seems that because other corrupted versions were in the trash, it added the symbol back to the file name for some reason.
So what appears to be happening, for whatever reason, when Python opens the text file I'm parsing that has no line break at the top, it seems to be grabbing the BOM from the file and adding that to the first variable which is grabbing the first line of the text file. Since that text is a number code that starts the file name, the BOM symbol is being added to the file name as well as the code inside the text file.
Just... wow
The Current Solution:
I have to leave a blank line at the start of the text file that I'm opening and parsing and a simple line break won't do it. I have no idea why this is. I added some spaces for good measure because randomly the BOM would be added to the variable and filename again. So far (knock on wood) as long as the first line of that initial file has some spaces and then a line break, and previous corrupted files have been deleted from the trash, a long file name can be used for all the files I'm creating and writing to without any problems.
This corruption even persists if I remove the encoding flag from both of the open functions I'm using (one to read and parse, the other to create and write).
If anyone knows why this is happening, please share. I've never seen it mentioned before. I'm not sure if it's a python 3.8 bug, a mac OS bug, the way TextMate wrote the original file, or a combination of these.
Correct Solution:
Thanks to #tripleee for the proper way to handle this, as I don't remember seeing this before, though I haven't been using python for very long.
In order to ignore the BOM, reading in the text file to be parsed with an encoding='utf-8-sig' does the job. Seems to be why it exists. :)
Problem solved.
EDIT:
I found a fix, must use wb+ option for save binary files and dont use str()
Code in PHP that i cant change:
file_get_contents('data.pak');
$s_st = unserialize(zlib_decode($s_st));
I try this Python code:
import zlib
from phpserialize import serialize
file = zlib.compress(serialize(data), 9)
with open('data.pak', 'w+') as f_out:
f_out.write(str(file))
f_out.close()
But PHP cant open this file.
I also install PHP to check what file create zlib_encode (zlib_encode(serialize($data),15)) with same data, and these files are different.
Python file starts from:
b'\xa5\xbd\xcb\xb2,)\xaeh\xdb\x8e\xf3\x19\xd5O3\xe7\rY_s\x9b\xbb\xbd\x9b\xd7\xce\xbf\
Looks like file content in bytes
PHP file starts from:
њҐќ[Ом(і¦ЇіҐћDЯ—dО°{X=©‰ њ пЗRХ_»цЄ§Н1€гяыw]!‡ящ?ЖеЏsсъ„l?—фЧяю_ќuз!Фz¤СЙ`\ши
So my question is: How create file in Python, that I can then open in PHP, using code at the beginning.
I have this Python script here that opens a random video file in a directory when run:
import glob,random,os
files = glob.glob("*.mkv")
files.extend(glob.glob("*.mp4"))
files.extend(glob.glob("*.tp"))
files.extend(glob.glob("*.avi"))
files.extend(glob.glob("*.ts"))
files.extend(glob.glob("*.flv"))
files.extend(glob.glob("*.mov"))
file = random.choice(files)
print "Opening file %s..." % file
cmd = "rundll32 url.dll,FileProtocolHandler \"" + file + "\""
os.system(cmd)
Source: An answer in my Super User post, 'How do I open a random file in a folder, and set that only files with the specified filename extension(s) should be opened?'
This is called by a BAT file, with this as its script:
C:\Python27\python.exe "C:\Programs\Scripts\open-random-video.py" cd
I put this BAT file in the directory I want to open random videos of.
In most cases it works fine. However, I can't make it open files with Unicode characters (like Japanese or Korean characters in my case) in their filenames.
This is the error message when the BAT file and Python script is run on a directory and opens a file with Unicode characters in its filename:
C:\TestDir>openrandomvideo.BAT
C:\TestDir>C:\Python27\python.exe "C:\Programs\Scripts\open-random-video.py" cd
The filename, directory name, or volume label syntax is incorrect.
Note that the filename of the .FLV video file in that log is changed from its original filename (소시.flv) to '∩╗┐' in the command line log.
EDIT: I learned that the above command line error message is due to saving the BAT file as 'UTF-8 with BOM'. Saving it as 'ANSI or UTF-16' shows the following message instead, but still does not open the file:
C:\TestDir>openrandomvideo.BAT
C:\TestDir>C:\Python27\python.exe "C:\Programs\Scripts\open-random-video.py" cd
Opening file ??.flv...
Now, the filename of the .FLV video file in that log is changed from its original filename (소시.flv) to '??.flv.' in the command line log.
I'm using Python 2.7 on Windows 7, 64-bit.
How do I allow opening of files that have Unicode characters in their filenames?
Just use Unicode literals e.g., u".mp4" everywhere. IO functions in Python will return Unicode filenames back if you give them Unicode input (internally they might use Unicode-aware Windows API):
import os
import random
videodir = u"." # get videos from current directory
extensions = tuple(u".mkv .mp4 .tp .avi .ts .flv .mov".split())
files = [file for file in os.listdir(videodir) if file.endswith(extensions)]
if files: # at least one video file exists
random_file = random.choice(files)
os.startfile(os.path.join(videodir, random_file)) # start the video
else:
print('No %s files found in "%s"' % ("|".join(extensions), videodir,))
If you want to emulate how your web browser would open video files then you could use webbrowser.open() instead of os.startfile() though the former might use the latter internally on Windows anyway.
The error when running the BAT file is because the BAT file itself is saved as "UTF-8 with BOM". The "" bytes are not a corrupted filename, they are the literal first bytes stored in the BAT file. Re-save the BAT file as ANSI or UTF-16, which are the only encodings supported for BAT files.
Either use Unicode literals as described by J. F. Sebastian, or use Python 3, which always uses Unicode.
(For Python 3, your script will need a minor modification: print is a function now, so you have to put parentheses around the parameter list.)
please familiarize yourself to add # -*- coding: utf-8 -*- in your source code,
so python understanding about your unicode.
I am trying to return a zip file in django http response, the code goes something like...
archive = shutil.make_archive('testfolder', 'zip', MEDIA_ROOT, 'testfolder')
response = HttpResponse(FileWrapper(open(archive)),
content_type=mimetypes.guess_type(archive)[0])
response['Content-Length'] = getsize(archive)
response['Content-Disposition'] = "attachment; filename=test %s.zip" % datetime.now()
return response
Now when this code is executed on ubuntu the resulting downloaded file opens without any issue, but when its executed on windows the file created does not open in winzip (gives error 'Unsupported Zip Format').
Is there something very obvious I am missing here? Isn't python code supposed to be portable?
EDIT:
Thanks to J.F. Sebastian for his comment...
There was no problem in creating the archive, it was reading it back into the request. So, the solution is to change second line of my code from,
response = HttpResponse(FileWrapper(open(archive)),
content_type=mimetypes.guess_type(archive)[0])
to,
response = HttpResponse(FileWrapper(open(archive, 'rb')), # notice extra 'rb'
content_type=mimetypes.guess_type(archive)[0])
checkout, my answer to this question for more details...
The code you have written should work correctly. I've just run the following line from your snippet to generate a zip file and was able to extract on Linux and Windows.
archive = shutil.make_archive('testfolder', 'zip', MEDIA_ROOT, 'testfolder')
There is something funny and specific going on. I recommend you check the following:
Generate the zip file outside of Django with a script that just has that one liner. Then try and extract it on a Windows machine. This will help you rule out anything going on relating to Django, web server or browser
If that works then look at exactly what is in the folder you compressed. Do the files have any funny characters in their names, are there strange file types, or super long filenames.
Run a md5 checksum on the zip file in Windows and Linux just to make absolutely sure that the two files are byte by byte identical. To rule out any file corruption that might have occured.
Thanks to J.F. Sebastian for his comment...
I'll still write the solution here in detail...
There was no problem in creating the archive, it was reading it back into the request. So, the solution is to change second line of my code from,
response = HttpResponse(FileWrapper(open(archive)),
content_type=mimetypes.guess_type(archive)[0])
to,
response = HttpResponse(FileWrapper(open(archive, 'rb')), # notice extra 'rb'
content_type=mimetypes.guess_type(archive)[0])
because apparently, hidden somewhere in python 2.3 documentation on open:
The most commonly-used values of mode are 'r' for reading, 'w' for
writing (truncating the file if it already exists), and 'a' for
appending (which on some Unix systems means that all writes append to
the end of the file regardless of the current seek position). If mode
is omitted, it defaults to 'r'. The default is to use text mode, which
may convert '\n' characters to a platform-specific representation on
writing and back on reading. Thus, when opening a binary file, you
should append 'b' to the mode value to open the file in binary mode,
which will improve portability. (Appending 'b' is useful even on
systems that don’t treat binary and text files differently, where it
serves as documentation.) See below for more possible values of mode.
So, in simple terms while reading binary files, using open(file, 'rb') increases portability of your code (it certainly did in this case)
Now, it extracts without troubles, on windows...
I am trying to use the appdailysales.py module to download daily our iPhone apps. I am a .NET developer, so I tried running this using IronPython in a C# solution using the following code:
using IronPython.Hosting;
var ipy = Python.CreateRuntime();
dynamic appSales = ipy.UseFile("appdailysales.py");
appSales.main();
Because I didn't have gzip, I took out the references to that module. I was going to use the GZipStream C# class to decompress the file (Apple, provides their downloads as .gz files). So, I commented out lines 75 and 429-435.
I have tried executing appdailysales.py in my C# solution, directly from IronPython and using Python 2.7 (installed ActivePython last night); all with the same results: When I try to open the .gz file using 7zip, I get the following error:
CRC Failed ... file is broken
When I try using the GZipStream class I get:
The CRC in GZip footer does not match the CRC calculated from the decompressed data
If I download the .gz file manually, I can decompress the file just fine using 7Zip or GZipStream.
I am fluent in C#, but new to Python. Any help you can provide would be much appreciated.
Thanks for your time.
Looks like line 444 is the problem. Here are lines 444-446:
downloadFile = open(filename, 'w')
downloadFile.write(filebuffer)
downloadFile.close()
At this stage, IF you have deleted lines 429-435 OR selected not to unzip, then filebuffer refers to the raw gzipped stream that you got from the web. The output file is opened in TEXT mode, and you are on Windows, so every \n in the BINARY gzipped stream will be converted to \r\n -- CORRUPTION, like the error message said.
So: for the module to be used portably on both Windows and other platforms, the open mode must be "wb" (b for binary). If the gunzipped result file is also a binary file, "wb" can be hardcoded in the open call. However if the gunzipped file is a text file (meant to be capable of being opened in a text editor), then you need just "w" for that purpose, and you should set a variable mode to either "wb" or "w" as appropriate, and use mode in the open call.
Big question: I understand why you removed the gzip references for IronPython usage. Did you remove those lines for Python 2.7? Or did you run it under Python 2.7 with those lines still in, but set options.unzipFile to False?