Error to upload entire video file into memory - python

I am trying to upload an entire video file into RAM for quick access. i want the file to be as it is without any decoding, etc. I just want to point to a location in RAM instead of remote driver. The file is only 2 GB. I have 128 GB RAM. I need to do frame by frame analysis and readying from a server takes forever.
I thought i would do something like this
with open('my_file.txt', 'r') as f:
file_content = f.read() # Read whole file in the file_content string
print(file_content)
But I can an error. Is there another way to do it? Like using IO library?
In [11]: u = open("/net/server/raw/2020.04.02/IMG_9261.MOV",'r')
In [12]: data = u.read()
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-12-eecbc439fbf0> in <module>
----> 1 data = u.read()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codecs.py in decode(self, input, final)
320 # decode input (taking the buffer into account)
321 data = self.buffer + input
--> 322 (result, consumed) = self._buffer_decode(data, self.errors, final)
323 # keep undecoded input until the next call
324 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 31: invalid continuation byte
this example uses requests.get, but this works only for HTTP i have a local server I can mount with NFS
import requests
from pygame import mixer
import io
r = requests.get("http://example.com/somesmallmp3file.mp3")
inmemoryfile = io.BytesIO(r.content)
mixer.music.init()
mixer.music.load(inmemoryfile)
mixer.music.play()

Adding a 'b' for binary mode should make it work.
u = open("/net/server/raw/2020.04.02/IMG_9261.MOV",'rb')

Related

How to turn a string into a binary object in python

I'm using this library to download and decode MMS PDUs:
https://github.com/pmarti/python-messaging
The sample code almost works, except that this method:
mms = MMSMessage.from_data(response)
Is throwing an exception:
TypeError: unsupported operand type(s) for &: 'str' and 'int'
Which seems to obviously be some sort of binary formatting problem.
In the sample code, the HTTP response is passed directly into the from_data method, however in my case it comes through with HTTP headers on it so I'm splitting the response by double CRLF and then passing in just the PDU data:
data = buf.getvalue()
split = data.split("\r\n\r\n");
mms = MMSMessage.from_data(split[1].strip())
This throws an error BUT if I first write the exact same data to a file then use the from_file method it works:
data = buf.getvalue()
split = data.split("\r\n\r\n");
f = open('dump','w+')
f.write(split[1])
f.close()
path = 'dump'
mms = MMSMessage.from_file(path)
I looked in the from_file method, and all it does is load the contents and then pass it into the same method as the from_data method, so the first way should Just Work™.
What I did notice is that the file is opened in binary format, and the content is loaded like this:
data = array.array('B')
with open(filename, 'rb') as f:
data.fromfile(f, num_bytes)
return self.decode_data(data)
So it seems obvious that somehow what I'm passing into the first function is actually a "string representation of binary data" and what's being read from the file is "actual binary data".
I tried using bytearray like this to "binaryfy" the string:
mms = MMSMessage.from_data(bytearray(split[1].strip(), "utf8"))
but that throws the error:
Traceback (most recent call last):
File "decodepdu.py", line 41, in <module>
mms = MMSMessage.from_data(bytearray(split[1].strip(), "utf8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8c in position 0: ordinal not in range(128)
which seems weird because it's using an 'ascii' codec but I specified utf8 encoding.
Anyway at this point I'm in over my head because I'm not really all that familiar with python, so for now I'm just writing the content to a temporary file but I would really rather not.
Any help would be most appreciated!
Okay thanks to Paul M. in the comments, this works:
data = buf.getvalue()
split = data.split("\r\n\r\n");
pdu = array.array('B');
pdu.fromstring(split[1]);
mms = MMSMessage.from_data(pdu);

UTF-8 decoding an ANSI encoded file throws an error

Here's something I'm trying to understand. I was under the impression that UTF-8 was backwards compatible, so that I can always decode a text file with UTF-8, even if it's an ANSI file. But that doesn't seem to be the case:
In [1]: ansi_str = 'éµaØc'
In [2]: with open('test.txt', 'w', encoding='ansi') as f:
...: f.write(ansi_str)
...:
In [3]: with open('test.txt', 'r', encoding='utf-8') as f:
...: print(f.read())
...:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-3-b0711b7b947e> in <module>
1 with open('test.txt', 'r', encoding='utf-8') as f:
----> 2 print(f.read())
3
c:\program files\python37\lib\codecs.py in decode(self, input, final)
320 # decode input (taking the buffer into account)
321 data = self.buffer + input
--> 322 (result, consumed) = self._buffer_decode(data, self.errors, final)
323 # keep undecoded input until the next call
324 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
So it looks like if my code expects UTF-8, and is likely to encounter an ANSI-encoded file, I need to handle the UnicodeDecodeError. That's fine, but I would appreciate if anyone could throw some light on my initial misunderstanding.
Thanks!
UTF-8 is backwards compatible with ASCII. Not ANSI. "ANSI" doesn't even describe any one particular encoding. And those characters you're testing with are well outside the ASCII range, so unless you actually encode them with UTF-8, you can't read them as UTF-8.

How to convert cp1252 to UTF-8 when export csv file using python

I have Unicode error when I tried to export the CSV file (web-scraping, I'm using Beautifulsoup and imported both CSV and Beautifulsoup). The code is used by Mac Linux which quite supports the UTF-8 but I'm using Windows. The error shows as
> UnicodeEncodeError Traceback (most recent call last) in () 71
> 'ranking_title': ranking_title, ---> 72 'ranking_category':
> ranking_category}) 73
>
> ~\Anaconda3\lib\csv.py in writerow(self, rowdict) 154 def
> writerow(self, rowdict): --> 155 return
> self.writer.writerow(self._dict_to_list(rowdict)) 156
>
> ~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final) 18
> def encode(self, input, final=False): ---> 19 return
> codecs.charmap_encode(input,self.errors,encoding_table)[0] 20
>
> UnicodeEncodeError: 'charmap' codec can't encode characters in
> position 299-309: character maps to
The original code that works for Mac is:
def get_page(url):
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
mainpage = response.read().decode('utf8')
return mainpage
I tried decode the cp1252 and encode the UTF-8 at the beginning of the worksheet:
def get_page(url):
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
mainpage = response.read().decode('cp1252').encode('utf8')
return mainpage
But it doesn't work.Please help.
The UnicodeEncodeError you are facing occurs when you write the data to the CSV output file.
As the error message tells us, Python uses a "charmap" codec which doesn't support the characters contained in your data.
This usually happens when you open a file without specifying the encoding parameter on a Windows machine.
In the attached code document (comment link), snippet no. 10, we can see that this is the case.
You wrote:
with open('wongnai.csv', 'w', newline='') as record:
fieldnames = ...
In this case, Python uses a platform-dependent default encoding, which is usually some 8-bit encoding on Windows machines.
Specify a codec that supports all of Unicode, and writing the file should succeed:
with open('wongnai.csv', 'w', newline='', encoding='utf16') as record:
fieldnames = ...
You can also use "utf8" or "utf32" instead of "utf16", of course.
UTF-8 is very popular for saving files in Unix environments and on the Internet, but if you are planning to open the CSV file with Excel later on, you might face some trouble to get the application to display the data properly.
A more Windows-proof (but technically non-standard) solution is to use "utf-8-sig", which adds some semi-magic character to the beginning of the file for helping Windows programs understand that it's UTF-8.

Why does Python3 get a UnicodeDecodeError reading a text file where Python2 does not?

I'm reading in a text file. I've been doing it just fine with python2, but I decided to run my code with python3 instead.
My code for reading the text file is:
neg_words = []
with open('negative-words.txt', 'r') as f:
for word in f:
neg_words.append(word)
When I run this code on python 3 I get the following error:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-14-1e2ff142b4c1> in <module>()
3 pos_words = []
4 with open('negative-words.txt', 'r') as f:
----> 5 for word in f:
6 neg_words.append(word)
7 with open('positive-words.txt', 'r') as f:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py in
decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 3988: invalid continuation byte
It seems to me that there is a certain form of text that python2 decodes without any issue, which python3 can't.
Could someone please explain what the difference is between python2 and python3 with respect to this error. Why does it occur in one version but not the other? How can I stop it?
Your file is not UTF-8 encoded. Figure out what encoding is used and specificy that explicitly when opening the file:
with open('negative-words.txt', 'r', encoding="<correct codec>") as f:
In Python 2, str is a binary string, containing encoded data, not Unicode text. If you were to use import io then io.open(), you'd get the same issues, or if you were to try to decode the data you read with word.decode('utf8').
You probably want to read up on Unicode and Python. I strongly recommend Ned Batchelder's Pragmatic Unicode.
Or we can simply read file the under binary mode:
with open(filename, 'rb') as f:
pass
'r' open for reading (default)
'b' binary mode

Except Python codec errors?

File "/usr/lib/python3.1/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 805: invalid start byte
Hi, I get this exception. How do I catch it, and continue reading my files when I get this exception.
My program has a loop that reads a text file line-by-line and tries to do some processing. However, some files I encounter may not be text files, or have lines that are not properly formatted (foreign language etc). I want to ignore those lines.
The following is not working
for line in sys.stdin:
if line != "":
try:
matched = re.match(searchstuff, line, re.IGNORECASE)
print (matched)
except UnicodeDecodeError, UnicodeEncodeError:
continue
Look at http://docs.python.org/py3k/library/codecs.html. When you open the codecs stream, you probably want to use the additional argument errors='ignore'
In Python 3, sys.stdin is by default opened as a text stream (see http://docs.python.org/py3k/library/sys.html), and has strict error checking.
You need to reopen it as an error-tolerant utf-8 stream. Something like this will work:
sys.stdin = codecs.getreader('utf8')(sys.stdin.detach(), errors='ignore')

Categories