I am trying to properly decode a Base64 string from Power Apps to an audio file. The point is: I do decode it and I can play it. But as soon as I try to convert it using ffmpeg or any website, all kind of errors are thrown. I have tried changing the formats too (aac, weba, m4a, wav, mp3, ogg, 3gp, caf), but none of them could be converted to another format.
PS: If I decode the string (which is too big to post here) directly using a website, then the audio file can finally be converted, indicating that the issue is in the code or even in the Python library.
===============
CODE ===============
import os
import base64
mainDir = os.path.dirname(__file__)
audioFileOGG = os.path.join(mainDir, "myAudio.ogg")
audioFile3GP = os.path.join(mainDir, "myAudio.3gp")
audioFileAAC = os.path.join(mainDir, "myAudio.aac")
binaryFileTXT = os.path.join(mainDir, 'binaryData.txt')
with open(binaryFileTXT, 'rb') as f:
audioData = f.readlines()
audioData = audioData[0]
with open(audioFileAAC, "wb") as f:
f.write(base64.b64decode(audioData))
Result: the audio file is playable, but it cannot be converted to any other format (I need *.wav).
What am I missing here?
I found the issue myself: in order to decode the Base64 string, one must remove the header first (eg.: "data:audio/webm;base64,"). Then it works!
Related
I am trying to encode docx file and decode/pass it on frontend/UI in streamlit. As of now i knew how to encode/decode strings using base64 but not with docx file.
If any of you guys have any code on how to achieve it. Please do share here.
import base64
import streamlit as st
data = open('/home/lungsang/Desktop/streamlit-practice/content/A0/A0.02-vocab.docx', 'rb').read()
encoded = base64.b64encode(data)
decoded = base64.b64decode(encoded)
st.download_button('Download Here', decoded)
I used the above code but not getting the desired result.
Instead I got collection of .xml file. As shown in the below screenshot
The supposed decoded document should look like this..
If you guys need the docx file that i am trying to encode/decode, here is the link https://docs.google.com/document/d/10zkg1HLDHhZNh83i2tbJqBVMfIsdqW-3/edit
You need to add filename argument to download_button function
import base64
import streamlit as st
data = open("test.docx", "rb").read()
encoded = base64.b64encode(data)
decoded = base64.b64decode(encoded)
st.download_button('Download Here', decoded, "decoded_file.docx")
This is just encoding You Have to Decode
with open('YOUR DATA FILE', 'rb') as binary_file:
binary_file_data = binary_file.read()
base64_encoded_data = base64.b64encode(binary_file_data)
base64_message = base64_encoded_data.decode('utf-8')
print(base64_message)
open the file using open('Your Data file', 'rb'). Note how we passed the 'rb' argument along with the file path - this tells Python that we are reading a binary file. Without using 'rb', Python would assume we are reading a text file.
use the read() method to get all the data in the file into the binary_file_data variable. Similar to how we treated strings, we Base64 encoded the bytes with base64.b64encode and then used the decode('utf-8') on base64_encoded_data to get the Base64 encoded data using human-readable characters.
Executing the code will produce similar output to:
python3 encoding_binary.py
I am accessing dataset that lives on ftp server. after I download the data, I used pandas to read it as csv but I got an encoding error. The file has csv file extension but after I opened the file with MS excell, data was in Unicode Text format. I want to make conversion of those dataset that stored in Unicode text format. How can I make this happen? Any idea to get this done?
my attempt:
from ftplib import FTP
import os
def mydef():
defaultIP=''
username='cat'
password='cat'
ftp = FTP(defaultIP,user=username, passwd=password)
ftp.dir()
filenames=ftp.nlst()
for filename in files:
local_filename = os.path.join('C:\\Users\\me', filename)
file = open(local_filename, 'wb')
ftp.retrbinary('RETR '+ filename, file.write)
file.close()
ftp.quit()
then I tried this to get correct encoding:
mydef.encode('utf-8').splitlines()
but this one is not working for me. I used this solution
the output of above code:
here is output snippet of above code:
b'\xff\xfeF\x00L\x00O\x00W\x00\t\x00C\x00T\x00Y\x00_\x00R\x00P\x00T\x00\t\x00R\x00E\x00P\x00O\x00R\x00T\x00E\x00R\x00\t\x00C\x00T\x00Y\x00_\x00P\x00T\x00N\x00\t\x00P\x00A\x00R\x00T\x00N\x00E\x00R\x00\t\x00C\x00O\x00M\x00M\x00O\x00D\x00I\x00T\x00Y\x00\t\x00D\x00E\x00S\x00C\x00R\x00I\x00P\x00T\x00I\x00O\x00N\x00\t'
expected output
the expected output of this dataset should be in normal csv data such as common trade data, but encoding doesn't work for me.
I used different encoding for getting the correct conversion of csv format data but none of them works for me. How can I make that work? any idea to get this done? thanks
EDIT: I have to change it - now I remove 2 bytes at the beginning (BOM) and one byte at the end because data is incomplete (every char needs 2 bytes)
It seems it is not utf-8 but utf-16 with BOM
If I remove first two bytes (BOM - Bytes Order Mark) and last byte at the end because it is incomplete (every char needs two bytes) and use decode('utf-16-le')
b'F\x00L\x00O\x00W\x00\t\x00C\x00T\x00Y\x00_\x00R\x00P\x00T\x00\t\x00R\x00E\x00P\x00O\x00R\x00T\x00E\x00R\x00\t\x00C\x00T\x00Y\x00_\x00P\x00T\x00N\x00\t\x00P\x00A\x00R\x00T\x00N\x00E\x00R\x00\t\x00C\x00O\x00M\x00M\x00O\x00D\x00I\x00T\x00Y\x00\t\x00D\x00E\x00S\x00C\x00R\x00I\x00P\x00T\x00I\x00O\x00N\x00'.decode('utf-16-le')
then I get
'FLOW\tCTY_RPT\tREPORTER\tCTY_PTN\tPARTNER\tCOMMODITY\tDESCRIPTION'
EDIT: meanwhile I found also Python - Decode UTF-16 file with BOM
I am currently experimenting with how Python 3 handles bytes when reading, and writing data and I have come across a particularly troubling problem that I can't seem to find the source of. I am basically reading bytes out of a JPEG file, converting them to an integer using ord(), then returning the bytes to their original character using the line chr(character).encode('utf-8') and writing it back into a JPEG file. No issue right? Well when I go to try opening the JPEG file, I get a Windows 8.1 notification saying it can not open the photo. When I check the two files against each other one is 5.04MB, and the other is 7.63MB which has me awfully confused.
def __main__():
operating_file = open('photo.jpg', 'rb')
while True:
data_chunk = operating_file.read(64*1024)
if len(data_chunk) == 0:
print('COMPLETE')
break
else:
new_operation = open('newFile.txt', 'ab')
for character in list(data_chunk):
new_operation.write(chr(character).encode('utf-8'))
if __name__ == '__main__':
__main__()
This is the exact code I am using, any ideas on what is happening and how I can fix it?
NOTE: I am assuming that the list of numbers that list(data_chunk) provides is the equivalent to ord().
Here is a simple example you might wish to play with:
import sys
f = open('gash.txt', 'rb')
stuff=f.read() # stuff refers to a bytes object
f.close()
print(stuff)
f2 = open('gash2.txt', 'wb')
for i in stuff:
f2.write(i.to_bytes(1, sys.byteorder))
f2.close()
As you can see, the bytes object is iterable, but in the for loop we get back an int in i. To convert that to a byte I use int.to_bytes() method.
When you have a code point and you encode it in UTF-8, it is possible for the result to contain more bytes than the original.
For a specific example, refer to the WikiPedia page and consider the hexadecimal value 0xA2.
This is a single binary value, less than 255, but when encoded to UTF8 it becomes 0xC2, 0xA2.
Given that you are pulling bytes out of your source file, my first recommendation would be to simply pass the bytes directly to the writer of your target file.
If you are trying to understand how file I/O works, be wary of encode() when using a binary file mode. Binary files don't need to be encoded and or decoded - they are raw data.
I'm trying to upload files from a javascript webpage, to a python-based server, with websockets.
In the JS, this is how I'm transmitting the package of data over the websocket:
var json = JSON.stringify({
'name': name,
'iData': image
});
in the python, I'm decoding it like this:
noJson = json.loads(message)
fName = noJson["name"]
fData = noJson["iData"]
I know fData is in unicode format, but when I try to save the file locally is when the problems begin. Say, I'm trying to upload/save a JPG file. Looking at that file after upload I see at the beginning:
ÿØÿà^#^PJFIF
the original code should be:
<FF><D8><FF><E0>^#^PJFIF
So how do I get it to save with the codes, instead of the interpreted unicode characters?
fd = codecs.open( fName, encoding='utf-8', mode='wb' ) ## On Unix, so the 'b' might be ignored
fd.write( fData)
fd.close()
(if I don't use the "encoding=" bit, it throws a UnicodeDecodeError exception)
Use 'latin-1' encoding to save the file.
The fData that you are getting already has the characters encoded, i.e. you get the string u'\xff\xd8\xff\xe0^#^PJFIF'. The latin-1 encoding will literally convert all codepoints between U+00 and U+FF to a single char, and fail to convert any codepoint above U+FF.
I took the dump of RAM data using a freeware called DumpIt(http://www.downloadcrew.com/article/23854-dumpit). The software saved the RAM data as a raw file which can be read using a Hex Editor(http://www.downloadcrew.com/article/10814-hxd).
How do I get the string data as visible in the Hex Editor(see image) in python?
For eg: I want to get the string "http://www.downloadcrew.com/article/23854-dumpit" in red box in image in python by reading the raw file generated by DumpIt.
EDIT
I tried using this code but it just gets stalled and nothing happens
#!/usr/bin/python
import binascii
filename = "LEMARC-20140401-181003.raw"
g = open("out","w")
str=""
with open(filename,"rb") as f:
for lines in f:
str+=lines
str = binascii.unhexlify(str)
f.close()
g.write(str)
g.close
In Python2
"437c2123".decode('hex')
'C|!#'
In Python3 (also works in Python2, for <2.6 you can't have the b prefixing the string)
import binascii
binascii.unhexlify(b"437c2123")
b'C|!#'
So in your case decode the entire hex string to get the ascii, and then you can extract the url with a regex or your own parsing function