Strings operation issue while switching from python 2.x to python 3 - python

I am facing some problem with strings switching from python 2.x to python 3
Issue 1:
from ctypes import*
charBuffer=create_string_buffer(1000)
var = charBuffer.value # var contains like this "abc:def:ghi:1234"
a,b,c,d= var.split(':')
It works fine in python 2.x but not in 3.x it is throwing some errors like this
a,b,c,d= var.split(':')
TypeError: 'str' does not support the buffer interface
I got the links after doing some research in stackoverflow link link2
If I print, desired output would be
a= abc
b =def
c=ghi
d=1234
Issue2:
from ctypes import*
cdll = "Windll"
var = 0x1fffffffffffffffffffffff # I want to send this long variable to character pointer which is in cdll
charBuf =create_string_buffer(var.to_bytes(32,'little'))
cdll.createBuff (charBuf )
cdll function
int createBuff (char * charBuff){
print charBuff
return 0;
}
I want to send this long variable to character pointer which is in cdll, since its a character pointer its throwing errors.
Need your valuable inputs on how could I achieve this. Thanks in advance

In Python 3.x , '.value' on return of create_string_buffer() returns a byte string .
In your example you are trying to split the byte string using a Unicode string (which is the normal string in Python 3.x ) . This is what is causing your issue.
You would need to either split with byte string . Example -
a,b,c,d = var.split(b':')
Or you can decode the byte string to a Unicode string using '.decode()' method on it .
Example -
var = var.decode('<encoding>')

Split using b":" and you will be fine in both versions of python.
In py2 str is a bytestring, in py3 str is a unicode object. The object returned by the ctypes string buffer is a bytestring (str on py2 and bytes on py3). By writing the string literal as b"... you force it to be a bytestring in both version of python.

Related

Calling C functions on Python prints only the first character

I tried to call c function in python here is my code
string.c
#include <stdio.h>
int print(const char *str)
{
printf("%s", str):
return 0;
}
string.py
from ctypes import *
so_print = "/home/ubuntu/string.so"
my_functions = CDLL(so_print)
print(my_functions.print("hello"))
when i run the python script it prints only the fist character of the string
example "h"
How can i pass any string and my c code will read and display it.
Your function accepts a const char*, which corresponds to a Python bytes object (which coerces to c_char_p), not a str (which coerces to c_wchar_p). You didn't tell Python what the underlying C function's prototype was, so it just converted your str to a c_wchar_p, and UTF-16 or UTF-32 encoded string with solely ASCII characters looks like either an empty or single character (depending on platform endianness) C-style char * string.
Two things to improve:
Define the prototype for print so Python can warn you when you misuse it, adding:
my_functions.print.argtypes = [c_char_p]
before using the function.
Encode str arguments to bytes so they can be converted to valid C-style char* strings:
# For arbitrary string, just encode:
print(my_functions.print(mystr.encode()))
# For a literal, you can pass a bytes literal
print(my_functions.print(b"hello"))
# ^ b makes it a bytes, not str
You have to make the following change here.
Pass byteobject instead of string.
This because of as per the Fundamental data types c_char_p in c is python 's byte object equivalent type
from ctypes import *
so_print = "string.so"
my_functions = CDLL(so_print)
print(my_functions.print(b"hello"))

Python 3 Concat Single Byte with String Bytes

I need to concat a single byte with the bytes I get from a parameter string.
byte_command = 0x01
socket.send(byte_command + bytes(message, 'UTF-8'))
but I get this error:
socket.send(byte_command + bytes(message, 'UTF-8'))
TypeError: str() takes at most 1 argument (2 given)
I assume this happens because I am using the string concat operator - how do I resolve that?
From the error message, I get that you are running Python2 (works in Python3). Assuming that message is a string:
Python 3 ([Python 3.Docs]: class bytes([source[, encoding[, errors]]])):
byte_command = b"\x01"
sock.send(byte_command + bytes(message, 'UTF-8'))
Python 2 (where bytes and str are the same):
byte_command = "\x01"
sock.send(byte_command + message)
I also renamed the socket to sock so it doesn't clash with the socket module itself.
As everyone suggested, it's recommended / common to do the transformation using message.encode("utf8") (in Python 3 the argument is not even necessary, as utf8 is the default encoding).
More on the differences (although question is in different area): [SO]: Passing utf-16 string to a Windows function (#CristiFati's answer).
From that error message, it looks like you are using python2, not python3. In python2, bytes is just an alias for str, and str only takes one argument.
To make something that works in python2 and python3, use str.encode rather than bytes:
byte_command = b'0x01'
socket.send(byte_command + message.encode('UTF-8'))

Why does Python 2's raw_input output unicode strings?

I tried the following on Codecademy's Python lesson
hobbies = []
# Add your code below!
for i in range(3):
Hobby = str(raw_input("Enter a hobby:"))
hobbies.append(Hobby)
print hobbies
With this, it works fine but if instead I try
Hobby = raw_input("Enter a hobby:")
I get [u'Hobby1', u'Hobby2', u'Hobby3']. Where are the extra us coming from?
The question's subject line might be a bit misleading: Python 2's raw_input() normally returns a byte string, NOT a Unicode string.
However, it could return a Unicode string if it or sys.stdin has been altered or replaced (by an application, or as part of an alternative implementation of Python).
Therefore, I believe #ByteCommander is on the right track with his comment:
Maybe this has something to do with the console it's running in?
The Python used by Codecademy is ostensibly 2.7, but (a) it was implemented by compiling the Python interpreter to JavaScript using Emscripten and (b) it's running in the browser; so between those factors, there could very well be some string encoding and decoding injected by Codecademy that isn't present in plain-vanilla CPython.
Note: I have not used Codecademy myself nor do I have any inside knowledge of its inner workings.
'u' means its a unicode. You can also specify raw_input().encode('utf8') to convert to string.
Edited:
I checked in python 2.7 it returns byte string not unicode string. So problem is something else here.
Edited:
raw_input() returns unicode if sys.stdin.encoding is unicode.
In codeacademy python environment, sys.stdin.encoding and sys.stdout.decoding both are none and default endcoding scheme is ascii.
Python will use this default encoding only if it is unable to find proper encoding scheme from environment.
Where are the extra us coming from?
raw_input() returns Unicode strings in your environment
repr() is called for each item of a list if you print it (convert to string)
the text representation (repr()) of a Unicode string is the same as Unicode literal in Python: u'abc'.
that is why print [raw_input()] may produce: [u'abc'].
You don't see u'' in the first code example because str(unicode_string) calls the equivalent of unicode_string.encode(sys.getdefaultencoding()) i.e., it converts Unicode strings to bytestrings—don't do it unless you mean it.
Can raw_input() return unicode?
Yes:
#!/usr/bin/env python2
"""Demonstrate that raw_input() can return Unicode."""
import sys
class UnicodeFile:
def readline(self, n=-1):
return u'\N{SNOWMAN}'
sys.stdin = UnicodeFile()
s = raw_input()
print type(s)
print s
Output:
<type 'unicode'>
☃
The practical example is win-unicode-console package which can replace raw_input() to support entering Unicode characters outside of the range of a console codepage on Windows. Related: here's why sys.stdout should be replaced.
May raw_input() return unicode?
Yes.
raw_input() is documented to return a string:
The function then reads a line from input, converts it to a string
(stripping a trailing newline), and returns that.
String in Python 2 is either a bytestring or Unicode string :isinstance(s, basestring).
CPython implementation of raw_input() supports Unicode strings explicitly: builtin_raw_input() can call PyFile_GetLine() and PyFile_GetLine() considers bytestrings and Unicode strings to be strings—it raises TypeError("object.readline() returned non-string") otherwise.
You could encode the strings before appending them to your list:
hobbies = []
# Add your code below!
for i in range(3):
Hobby = raw_input("Enter a hobby:")
hobbies.append(Hobby.encode('utf-8')
print hobbies

How can ctypes be used to parse unicode strings?

Sending a string from Python to C++ using Python's ctypes module requires you to parse it as a c_char_p (a char *). I've found I need to use python pure string and not a python unicode string. If I use the unicode string, the variables just get overwritten instead of being sent properly. Here's an example:
C++
void test_function(char * a, char * b, char * c) {
printf("Result: %s %s %s", a, b, c);
}
Python
... load c++ using ctypes ...
lib.test_function.argtypes = [c_char_p, c_char_p, c_char_p]
lib.test_function(u'x', u'y', u'z')
lib.test_function('x', 'y', 'z')
Running the above Python code gives the following in stdout:
Result: z z z
Result: x y z
Why is this, is this a quirk of ctypes? What's an elegant way to avoid this quirk if I can am getting unicode strings?
Thanks!
C/C++ has no real support for Unicode, so there really isn't anything you can do about it. You must to encode your string as in order to pass them into the C/C++ world: you could use UTF-8, UTF-16, or UTF-32 depending on your use case.
For example, you can encode them as UTF-8 and pass in an array of bytes (bytes in Python and char * in C/C++):
lib.test_function(u'x'.encode('utf8'),
u'y'.encode('utf8'),
u'z'.encode('utf8'))
Exactly which encoding you pick is another story, but it will depend on what your C++ library is willing to accept.
Try c_wchar instead of c_char:
https://docs.python.org/2/library/ctypes.html

Why my python and objective-c code get different hmac-sha1 result?

I am writing a client/server project that need a signature. I use base64(hmac-sha1(key, data)) to generate a signature. But I got different signatures between python code and objective-c code:
get_signature('KEY', 'TEXT') //python get 'dAOnR2oXWP9xa4vUBdDvVXTpzQo='
[self hmacsha1:#"KEY" #"TEXT"] //obj-c get '7FH0NG0Ou4nb5luKUyjfrdWunos='
Not only the base64 values are different, the hmac-sha1 digest values are different too.
I'm trying to work it out with my friend for a few hours, still don't get it. Where is the problem of my code?
My python code:
import hmac
import hashlib
import base64
def get_signature(key, msg):
return base64.b64encode(hmac.new(key, msg, hashlib.sha1).digest())
My friend's objective-c code (copy from Objective-C sample code for HMAC-SHA1):
(NSString *)hmac_sha1:(NSString *)key text:(NSString *)text{
const char *cKey = [key cStringUsingEncoding:NSASCIIStringEncoding];
const char *cData = [text cStringUsingEncoding:NSASCIIStringEncoding];
unsigned char cHMAC[CC_SHA1_DIGEST_LENGTH];
CCHmac(kCCHmacAlgSHA1, cKey, strlen(cKey), cData, strlen(cData), cHMAC);
NSData *HMAC = [[NSData alloc] initWithBytes:cHMAC length:sizeof(cHMAC)];
NSString *hash = [GTMBase64 stringByEncodingData:HMAC];
return hash;
}
SOLVED: Thanks for everyone below. But I'm not gotta tell you that the real reason is I typed "TE S T" in my python IDE while typed "TE X T" in this post :P
For not wasting your time, I made some tests and got a nicer solution, base on your answers:
print get_signature('KEY', 'TEXT')
# 7FH0NG0Ou4nb5luKUyjfrdWunos=
print get_signature(bytearray('KEY'), bytearray('TEXT'))
# 7FH0NG0Ou4nb5luKUyjfrdWunos=
print get_signature('KEY', u'你好'.encode('utf-8')) # best solution, i think!
# PxEm7Oibj7ijZ55ko7V3isSkD1Q=
print get_signature('KEY', bytearray(u'你好'))
# TypeError: unicode argument without an encoding
print get_signature('KEY', u'你好')
# UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
print get_signature(u'KEY', 'TEXT')
# TypeError: character mapping must return integer, None or unicode
print get_signature(b'KEY', b'TEXT')
# 7FH0NG0Ou4nb5luKUyjfrdWunos=
Conclusion:
The message to be signature should be encoded to utf-8 string with both sides.
(Thanks to DJV)In python 3, strings are all unicode, so they should be used with 'b', or bytearray(thanks to Burhan Khalid), or encoded to utf-8 string.
Your friend is completely right, but so are you (sorta). Your function is completely right in both Python 2 and Python 3. However, your call is a little erroneous in Python 3. You see, in Python 3, strings are unicode, so in order to pass an ASCII string (as your Objective C friend does and as you would do in Python 2), you need to call your function with:
get_signature(b'KEY', b'TEXT')
in order to specify that those strings are bytes a.k.a. ASCII strings.
EDIT: As Burhan Khalid noted, the flexible way of doing this in Python 3 is to either call your function like this:
get_signature(key.encode('ascii'), test.encode('ascii'))
or define it as:
def get_signature(key, msg):
if(isinstance(key, str)):
key = key.encode('ascii')
if(isinstance(msg, str)):
msg = msg.encode('ascii')
return base64.b64encode(hmac.new(key, msg, hashlib.sha1).digest())

Categories