Receiving a Windows message in Python - UnicodeEncodeError ...? - python

I have a little Python 3.3 script that successfully sends (SendMessage) a WM_COPYDATA message (inspired from here , works with XYplorer):
import win32api
import win32gui
import struct
import array
def sendScript(window, message):
CopyDataStruct = "IIP"
dwData = 0x00400001 #value required by XYplorer
buffer = array.array("u", message)
cds = struct.pack(CopyDataStruct, dwData, buffer.buffer_info()[1] * 2 + 1, buffer.buffer_info()[0])
win32api.SendMessage(window, 0x004A, 0, cds) #0x004A is the WM_COPYDATA id
message = "helloworld"
sendScript(window, message) #I write manually the hwnd during debug
Now I need to write a receiver script, still in Python. The script in this answer seems to work (after correcting all the print statements in a print() form). Seems because it prints out all properties of the received message (hwnd, wparam, lparam, etc) except the content of the message.
I get an error instead, UnicodeEncodeError. More specifically,
Python WNDPROC handler failed
Traceback (most recent call last):
File "C:\Python\xxx.py", line 45, in OnCopyData
print(ctypes.wstring_at(pCDS.contents.lpData))
File "C:\Python\python-3.3.2\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 10-13: character maps to <undefined>
I don't know how to fix it, also because I'm not using "fancy" characters in the message so I really can't see why I get this error. I also tried setting a different length of message in print(ctypes.wstring_at(pCDS.contents.lpData)) as well as using simply string_at, but without success (in the latter case I obtain a binary string).

ctypes.wstring (in the line print (ctypes.wstring_at(pCDS.contents.lpData))) may not be the string type that the sender sent. Try changing it to:
print (ctypes.string_at(pCDS.contents.lpData))

Related

Decoding Mail Subject Thunderbird in Python 3.x

For a workaround, see below
/Original Question:
Sorry, I am simply too dumb to solve this on my own. I am trying to read the "subjects" from several emails stored in a .mbox folder from Thunderbird. Now, I am trying to decode the header with decode_header(), but I am still getting UnicodeErrors.
I am using the following function (I am sure there is a smarter way to do this, but this is not the point of this post)
import mailbox
from email.header import decode_header
mflder = mailbox.mbox("mailfolder")
for message in mflder:
print(header_to_string(message["subject"]))
def header_to_string(header):
try:
header, encoding = decode_header(header)[0]
except:
return "something went wrong {}".format(header)
if encoding == None:
return header
else:
return header.decode(encoding)
The first 100 outputs or so are perfectly fine, but then this error message appears:
---------------------------------------------------------------------------
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-97-e252df04c215> in <module>
----> 1 for message in mflder:
2 try:
3 print(header_to_string(message["subject"]))
4 except:
5 print("0")
~\anaconda3\lib\mailbox.py in itervalues(self)
107 for key in self.iterkeys():
108 try:
--> 109 value = self[key]
110 except KeyError:
111 continue
~\anaconda3\lib\mailbox.py in __getitem__(self, key)
71 """Return the keyed message; raise KeyError if it doesn't exist."""
72 if not self._factory:
---> 73 return self.get_message(key)
74 else:
75 with contextlib.closing(self.get_file(key)) as file:
~\anaconda3\lib\mailbox.py in get_message(self, key)
779 string = self._file.read(stop - self._file.tell())
780 msg = self._message_factory(string.replace(linesep, b'\n'))
--> 781 msg.set_from(from_line[5:].decode('ascii'))
782 return msg
783
UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 4: ordinal not in range(128)
How can I force mailbox.py to decode a different encoding? Or is the header simply broken? And if I understood this correctly, headers are supposed to be "ASCII", right? I mean, this is the point of this entire MIME thing, no?
Thanks for your help!
/Workaround
I found a workaround by just avoiding to directly iterate over the .mbox mailfolder representation. Instead of using ...
for message in mflder:
# do something
... simply use:
for x in range(len(mflder)):
try:
message = mflder[x]
print(header_to_string(message["subject"]))
except:
print("Failed loading message!")
This skips the broken messages in the .mbox folder. Yet, I stumbled upon several other issues while working with the .mbox folder subjects. For instance, the headers are sometimes split into several tuples when using the decode_header() function. So, in order to receive the full subjects, one needs to add more stuff to the header_to_string() function as well. But this is not related to this question anymore. I am a noob and a hobby prgrammer, but I remember working with the Outlook API and Python, which was MUCH easier...
Solution
It looks like either you have corrupted "mailfolder" mbox file or there is a bug in Python's mailbox module triggered by something in your file. I can't tell what is going on without having the mbox input file or a minimal example input file that reproduces the issue.
You could do some debugging yourself. Each message in the file starts with a "From" line that should look like:
From - Mon Mar 30 18:18:04 2020
From the stack trace you posted, it looks like that line is malformed in one of the messages. Personally, I would use an IDE debugger (PyCharm) track down what the malformed line was, but it can be done with Python's built-in pdb. Wrap your loop like this:
import pdb
try:
for message in mflder:
print(header_to_string(message["subject"]))
except:
pdb.post_mortem()
When you run the code now, it will drop into the debugger when the exception occurs. At that prompt, you can enter l to list the code where the debugger stopped; this should match the last frame printed in your stack trace you originally posted. Once you are there, there are two commands that will tell you what is going on:
p from_line
will show you the malformed "From" line.
p start
will show you at what offset in the file the mailbox code thinks the message was supposed to be.
Previous answer that didn't solve the original problem but still applies
In the real world, there will be messages that don't comply with the standards. You can try to make the code more tolerant if you don't want to reject the bad messages. Decoding with "latin-1" is one way to handle these headers with bytes outside ASCII. This cannot fail because the all possible byte values map to valid Unicode characters (one-to-one mapping of the first 256 codes of Unicode vs. ISO/IEC 8859-1, a.k.a. "latin-1"). This may or may not give you the text the sender intended.
import mailbox
from email.header import decode_header
mflder = mailbox.mbox("mailfolder")
def get_subject(message):
header = message["subject"]
if not header:
return ''
header, encoding = decode_header(header)[0]
if encoding is not None:
try:
header = header.decode(encoding)
except:
header = header.decode('latin-1')
return header
for message in mflder:
print(get_subject(message))

PyCryptodome Error: MAC Check Failed

I am working on an encryption program with Pycryptodome in Python 3. I am trying to encrypt a (byte) string and then decrypt it and verify the MAC tag. When I get to verify it, an error is thrown.
This is the code:
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
aes_key = get_random_bytes(24)
aes_cipher = AES.new(aes_key, AES.MODE_GCM)
encrypted, MACtag = aes_cipher.encrypt_and_digest(b"A random thirty two byte string.")
# Imagine this is happening somewhere else
new_aes_cipher = AES.new(aes_key, AES.MODE_GCM, nonce=aes_cipher.nonce)
new_aes_cipher.verify(MACtag)
decrypted = new_aes_cipher.decrypt(encrypted)
And this is the error:
Traceback (most recent call last):
File "aespractice.py", line 10, in <module>
new_aes_cipher.verify(tag)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-
packages/Crypto/Cipher/_mode_gcm.py", line 441, in verify
raise ValueError("MAC check failed")
ValueError: MAC check failed
I've looked at the documentation, and I it looks to me like everything is all right. Why do you think the program is acting this way? Any help would be appreciated.
If you look at the state diagram for authenticated modes:
You see that verify() should be called at the very end, after any decrypt() has taken place.
So, either you invert the calls or you replace them with a combined decrypt_and_verify().

Strange UnicodeEncodeError/AttributeError in my script

Currently I am writing a script in Python 2.7 that works fine except for after running it for a few seconds it runs into an error:
Enter Shopify website URL (without HTTP): store.highsnobiety.com
Scraping! Check log file # z:\shopify_output.txt to see output.
!!! Also make sure to clear file every hour or so !!!
Copper Bracelet - 3mm - Polished ['3723603267']
Traceback (most recent call last):
File "shopify_sitemap_scraper.py", line 38, in <module>
print(prod, variants).encode('utf-8')
AttributeError: 'NoneType' object has no attribute 'encode'
The script is to get data from a Shopify website and then print it to console. Code here:
# -*- coding: utf-8 -*-
from __future__ import print_function
from lxml.html import fromstring
import requests
import time
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Log file location, change "z://shopify_output.txt" to your location.
logFileLocation = "z:\shopify_output.txt"
log = open(logFileLocation, "w")
# URL of Shopify website from user input (for testing, just use store.highsnobiety.com during input)
url = 'http://' + raw_input("Enter Shopify website URL (without HTTP): ") + '/sitemap_products_1.xml'
print ('Scraping! Check log file # ' + logFileLocation + ' to see output.')
print ("!!! Also make sure to clear file every hour or so !!!")
while True :
page = requests.get(url)
tree = fromstring(page.content)
# skip first url tag with no image:title
url_tags = tree.xpath("//url[position() > 1]")
data = [(e.xpath("./image/title//text()")[0],e.xpath("./loc/text()")[0]) for e in url_tags]
for prod, url in data:
# add xml extension to url
page = requests.get(url + ".xml")
tree = fromstring(page.content)
variants = tree.xpath("//variants[#type='array']//id[#type='integer']//text()")
print(prod, variants).encode('utf-8')
The most crazy part about it is that when I take out the .encode('utf-8') it gives me a UnicodeEncodeError seen here:
Enter Shopify website URL (without HTTP): store.highsnobiety.com
Scraping! Check log file # z:\shopify_output.txt to see output.
!!! Also make sure to clear file every hour or so !!!
Copper Bracelet - 3mm - Polished ['3723603267']
Copper Bracelet - 5mm - Brushed ['3726247811']
Copper Bracelet - 7mm - Polished ['3726253635']
Highsnobiety x EARLY - Leather Pouch ['14541472963', '14541473027', '14541473091']
Traceback (most recent call last):
File "shopify_sitemap_scraper.py", line 38, in <module>
print(prod, variants)
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xae' in position 13: character maps to <undefined>'
Any ideas? Have no idea what else to try after hours of googling.
snakecharmerb almost got it, but missed the cause of your first error. Your code
print(prod, variants).encode('utf-8')
means you print the values of the prod and variants variables, then try to run the encode() function on the output of print. Unfortunately, print() (as a function in Python 2 and always in Python 3) returns None. To fix it, use the following instead:
print(prod.encode("utf-8"), variants)
Your console has a default encoding of cp437, and cp437 is unable to represent the character u'\xae'.
>>> print (u'\xae')
®
>>> print (u'\xae'.encode('utf-8'))
b'\xc2\xae'
>>> print (u'\xae'.encode('cp437'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/encodings/cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xae' in position 0: character maps to <undefined>
You can see that it's trying to convert to cp437 in the traceback:
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
(I reproduced the problem in Python3.5, but it's the same issue in both versions of Python)

Python3 and hmac . How to handle string not being binary

I had a script in Python2 that was working great.
def _generate_signature(data):
return hmac.new('key', data, hashlib.sha256).hexdigest()
Where data was the output of json.dumps.
Now, if I try to run the same kind of code in Python 3, I get the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/hmac.py", line 144, in new
return HMAC(key, msg, digestmod)
File "/usr/lib/python3.4/hmac.py", line 42, in __init__
raise TypeError("key: expected bytes or bytearray, but got %r" %type(key).__name__)
TypeError: key: expected bytes or bytearray, but got 'str'
If I try something like transforming the key to bytes like so:
bytes('key')
I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
I'm still struggling to understand the encodings in Python 3.
You can use bytes literal: b'key'
def _generate_signature(data):
return hmac.new(b'key', data, hashlib.sha256).hexdigest()
In addition to that, make sure data is also bytes. For example, if it is read from file, you need to use binary mode (rb) when opening the file.
Not to resurrect an old question but I did want to add something I feel is missing from this answer, to which I had trouble finding an appropriate explanation/example of anywhere else:
Aquiles Carattino was pretty close with his attempt at converting the string to bytes, but was missing the second argument, the encoding of the string to be converted to bytes.
If someone would like to convert a string to bytes through some other means than static assignment (such as reading from a config file or a DB), the following should work:
(Python 3+ only, not compatible with Python 2)
import hmac, hashlib
def _generate_signature(data):
key = 'key' # Defined as a simple string.
key_bytes= bytes(key , 'latin-1') # Commonly 'latin-1' or 'ascii'
data_bytes = bytes(data, 'latin-1') # Assumes `data` is also an ascii string.
return hmac.new(key_bytes, data_bytes , hashlib.sha256).hexdigest()
print(
_generate_signature('this is my string of data')
)
try
codecs.encode()
which can be used both in python2.7.12 and 3.5.2
import hashlib
import codecs
import hmac
a = "aaaaaaa"
b = "bbbbbbb"
hmac.new(codecs.encode(a), msg=codecs.encode(b), digestmod=hashlib.sha256).hexdigest()
for python3 this is how i solved it.
import codecs
import hmac
def _generate_signature(data):
return hmac.new(codecs.encode(key), codecs.encode(data), codecs.encode(hashlib.sha256)).hexdigest()

python odfpy AttributeError: Text instance has no attribute encode

I'm trying to read from an ods (Opendocument spreadsheet) document with the odfpy modules. So far I've been able to extract some data but whenever a cell contains non-standard input the script errors out with:
Traceback (most recent call last):
File "python/test.py", line 26, in <module>
print x.firstChild
File "/usr/lib/python2.7/site-packages/odf/element.py", line 247, in __str__
return self.data.encode()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0105' in position 4: ordinal not in range(128)
I tried to force an encoding on the output but apparently it does not go well with print:
Traceback (most recent call last):
File "python/test.py", line 27, in <module>
print x.firstChild.encode('utf-8', 'ignore')
AttributeError: Text instance has no attribute 'encode'
What is the problem here and how could it be solved without editing the module code (which I'd like to avoid at all cost)? Is there an alternative to running encode on output that could work?
Here is my code:
from odf.opendocument import Spreadsheet
from odf.opendocument import load
from odf.table import Table,TableRow,TableCell
from odf.text import P
import sys,codecs
doc = load(sys.argv[1])
d = doc.spreadsheet
tables = d.getElementsByType(Table)
for table in tables:
tName = table.attributes[(u'urn:oasis:names:tc:opendocument:xmlns:table:1.0', u'name')]
print tName
rows = table.getElementsByType(TableRow)
for row in rows[:2]:
cells = row.getElementsByType(TableCell)
for cell in cells:
tps = cell.getElementsByType(P)
if len(tps)>0:
for x in tps:
#print x.firstChild
print x.firstChild.encode('utf-8', 'ignore')
Maybe you are not using the latest odfpy, in the latest verion, the __str__ method of Text is implemented as:
def __str__(self):
return self.data
Update odfpy to the latest version, and modify your code as:
print x.firstChild.__str__().encode('utf-8', 'ignore')
UPDATE
This is another method for getting the raw unicode data for Text: __unicode__. So if you don't want to update odfpy, modify your code as:
print x.firstChild.__unicode__().encode('utf-8', 'ignore')
Seems like the library itself is calling encode() -
return self.data.encode()
This uses the system default encoding , which in your case seems to be ascii. you can check that by using -
import sys
sys.getdefaultencoding()
From the traceback, seems like the actual data exists in a variable called data.
Try doing the below instead -
print x.firstChild.data

Categories