UnicodeDecodeError: Utf-8 Decode Error while parsing Midi File - python

I was trying to parse a few music files using music21, but the code is generating utf-8 error:
'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte
Here is my code:
notes = []
for file in glob.glob("midi_songs/*.mid"):
print("parsing %s"%file)
midi = converter.parse(file)
elements_to_parse = midi.flat.notes
for ele in elements_to_parse:
#Note: Store Pitch
if isinstance(ele, note.Note):
notes.append(str(ele.pitch))
#Chorde: Split note and join
elif isinstance(ele, chord.Chord):
notes.append("+".join(str(n) for n in ele.normalOrder))
Traceback
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-17-ca5986c7b6d6> in <module>
2 for file in glob.glob("midi_songs/*.mid"):
3 print("parsing %s"%file)
----> 4 midi = converter.parse(file)
5
6 elements_to_parse = midi.flat.notes

This is a regression in music21 6.1.0 that was fixed in 6.3.0
Each problematic file we found to have a copyright symbol in the track name message and was written by www.piano-midi.de.

Related

Q: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 0: unexpected end of data

I am using FastText.load_fasttext_format()to load fastText Offial Japanese trained model(300 dim) in Google Colab.
Here is my code.
model_path = "/content/drive/MyDrive/IDR/rakuten/wikipedia_fastText/cc.ja.300.bin"
model = FastText.load_fasttext_format(model_path)
And here is the encoding error.
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-7-61d7c85f09b2> in <module>()
2
3 model_path = "/content/drive/MyDrive/IDR/rakuten/wikipedia_fastText/cc.ja.300.bin"
----> 4 model = FastText.load_fasttext_format(model_path)
2 frames
/usr/local/lib/python3.7/dist-packages/gensim/models/fasttext.py in _load_dict(self, file_handle, encoding)
818 word_bytes += char_byte
819 char_byte = file_handle.read(1)
--> 820 word = word_bytes.decode(encoding)
821 count, _ = self.struct_unpack(file_handle, '#qb')
822
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 0: unexpected end of data
The specific error seems to be unexpected end of data.
Are you sure the cc.ja.300.bin file you've downloaded is the full untruncated length, and uncorrupted contents to match any declared checksum, from the source where it was downloaded?
Separately, the load_fasttext_format() class method is deprecated in current versions of Gensim, with load_facebook_model() now the preferred form (though this wouldn't account for your error).

how can I import this file I am getting the error

I am importing csv file for cleaning purpose but pycharm showing me this error
I have tried encoding format but it didn't work
import csv
txt1 = ""
txt2 = ""
i = 0
with open('data.csv',encoding='cp1252') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
i += 10
print(i)
txt1 = str(row['posts'])
print(txt1)
#print(row['type'], row['posts'])
My Traceback:
> Traceback (most recent call last):
> File "C:/Users/Administrator/PycharmProjects/mosh/clean.py", line 7, in <module>
> for row in reader:
> File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\csv.py",
> line 112, in __next__
> row = next(self.reader)
> File `enter code here`"C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py",
> line 23, in decode
> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2409: character maps to <undefined>
>
> Process finished with exit code 1

Trouble scanning list for duplicates

Hey so i want to scan this text file of emails and if two of the same emails pop up i want it to be printed if only 1 email is on the list i dont want it to be printed.
It worked for a different text file but now its saying traceback error???
#note make sure found.txt and list.txt are in the 'include' for pycharmfrom collect ions import Counter
print("Welcome DADDY")
with open('myheritage-1-million.txt') as f:
c=Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search
for line in c:
if c[line] > 1:
print(line)
ERROR:
rs/dcaputo/PycharmProjects/searchtoolforrhys/venv/include/search.py
Welcome DADDY
Traceback (most recent call last):
File "/Users/dcaputo/PycharmProjects/searchtoolforrhys/venv/include/search.py", line 5, in <module>
c = Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/collections/__init__.py", line 566, in __init__
self.update(*args, **kwds)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/collections/__init__.py", line 653, in update
_count_elements(self, iterable)
File "/Users/dcaputo/PycharmProjects/searchtoolforrhys/venv/include/search.py", line 5, in <genexpr>
c = Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 2668: invalid continuation byte
Process finished with exit code 1
a list of all emails that are shown up 2 times in that whole text file
The key is the error message at the end:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 2668: invalid continuation byte
This error can occur when trying to read a non-text file as text. Your file could be corrupted somehow and has some data (at position 2668) in it that can't be read as text.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x99 in position 0: invalid start byte

I'm trying to decode the string but getting an error, and here is part of the code:
#!/usr/bin/env python
import rsa
def constLenBin(s):
binary = "0"*(8-(len(bin(s))-2))+bin(s).replace('0b','')
return binary
data = 'apple'
(pubkey, privkey) = rsa.newkeys(1024)
crypto = rsa.encrypt(data.encode(), pubkey)
crypto = crypto.decode()
binary = ''.join(map(constLenBin,bytearray(crypto, 'utf-8')))
Traceback (most recent call last): File "stdin", line 1, in
module UnicodeDecodeError: 'utf-8' codec can't decode byte 0x99
in position 0: invalid start byte
As Remco notes, \x99 is not valid UTF8 byte. You need to specify encoding name, for example:
a = b'\x99'; a = a.decode('latin-1'); print(a)

how to write my terminal in a text file using python

Partial of my code is below. I want to export output of terminal in a text file but I get below error:
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-2-c7d647fa741c> in <module>()
34 text_file = open("Output.txt", "w")
35
---> 36 text_file.write(data)
37 #print (data)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 150-151: ordinal not in range(128)
# data is multi line text
data = ''.join(soup1.findAll('p', text=True))
text_file = open("Output.txt", "w")
text_file.write(data)
# print (data)
Encode your text before you write to the file:
text_file.write(data.encode("utf-8"))

Categories