Unicode int to char, leading zero - python

I have an integer representing a unicode character which I want to transform to the actual character so I can print it out.
However the function unichr() gives me different behaviour depending on whether there a leading zero or not. (See screenshot below for a better explanation)
However, when the integer is stored in a variable I always get the first behavior whilst I want to achieve the second. How can I do this?

Related

Does the inchstr() curses-function work in python?

I want to retrieve multiple strings in one row of my terminal right now I'm using instr() but that only extracts the string in that exact position. The function that should actually do this is inchstr() but that doesn't seem to work in python or is it?
No. Python's curses binding does not extend the underlying curses library (much). There's more than one related curses function which python might use, depending on what you are looking at, but none read more than a single line of text:
int instr(char *str);
int inwstr(wchar_t *wstr);
int inchstr(chtype *chstr);
int in_wchstr(cchar_t *wchstr);
The first (instr) and third (inchstr) both read from the screen, but the latter returns attributes (color, underline, etc) along with the text.
Python's instr appears to use the former, since its documentation states
Return a bytes object of characters, extracted from the window starting at the current cursor position, or at y, x if specified. Attributes are stripped from the characters. If n is specified, instr() returns a string at most n characters long (exclusive of the trailing NUL).
The second (inwstr) and fourth (in_wchstr) differ from the other two by allowing for reading wide-characters directly. python actually should provide for using either set (narrow or wide character interfaces), since ncurses' wide-character interface is better suited to returning Unicode strings, but it is using the narrow interface in either case, returning a byte array (and requiring the application to puzzle out how to convert the data into a string).

Crafted hex string correct in string format, malforms once passed to unhexlify()

def craft_integration(xintegration_time):
integration_time = xintegration_time
integration_time_str = str(integration_time)
integration_time_str = integration_time_str.encode('utf-8')
integration_time_hex = integration_time_str.hex()
return integration_time_hex
def send_set_integration(xtime):
int_time_hex = decoder_crafter.craft_integration(xtime)
set_hex = "c1c000000000000010001100000000000000000000000004"+int_time_hex+"1400000000000000000000000000000000000000c5c4c3c2"
set_hex = str(set_hex)
print(set_hex)
set_hex = unhexlify(set_hex)
For example, input is '1000'.
That becomes 31303030 with craft_integration().
It is then inserted into the default hex string.
Output is:
c1c000000000000010001100000000000000000000000004313030301400000000000000000000000000000000000000c5c4c3c2
When unhexlify() is used, output is:
b'\xc1\xc0\x00\x00\x00\x00\x00\x00\x10\x00\x11\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x041000\x14\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc5\xc4\xc3\xc2'
\x041000 is an conjunction of \x04 and 1000 which was the original input value, not the converted value.
Why would this happen?
What you have in fact is simply your desired value being rendered into a form by the default implementation of bytes.__repr__ that you were not expecting to the point that it was unhelpful to what you want.
To start from a more basic level: in Python, any element (well, any "byte", i.e. a group of 8 bits) inside a bytes type are typically being stored as raw digital representation somewhere in a machine as binary. In order to "print" them out onto a console for human consumption it must be turned into a form that may be interpreted by the console such that the correct glyph may be used to represent the underlying value. For many values, such as 0 (or 00000000 in binary), Python would use \x00 to represent that. The \ is the escape character to start an escape sequence, the x that follows signifies that the escape sequence is to be followed by 2 hexadecimal characters, and combining those two characters with the whole sequence would form the representation of that single byte using four characters. Likewise for 255, in binary that would be 11111111, and this same value as part of a bytes type will be encoded as \xff.
Now there are exceptions - if a given value falls inside the ASCII range, and that it in the range of printable characters, the representation will instead be the corresponding ASCII character. So in the case of the hexadecimal 30 (decimal 48), rendering of that as part of a bytes type will show 0 instead of \x30, as 0 is the corresponding printable character.
So for your case, a bytes representation that was printed out in the console in the form of b'\x041000', is not in fact a big \x value, as the \x escape sequence is only applied to exactly two subsequent characters - all following characters (i.e. 1000) are in fact being represented using the printable characters that would otherwise be represented as \x31\x30\x30\x30.
There is another method available to those who don't mind working with the decimal representation of bytes - simply cast the bytes into a bytearray then into a list. We will take two nul bytes (b'\x00\x00') as an example:
>>> list(bytearray(b'\x00\x00'))
[0, 0]
Clearly those two nul bytes will correspond to two zero values. Now try using the confusing b'\x04\x31\x30\x30\x30' which got rendered into b'\x041000':
>>> list(bytearray(b'\x041000'))
[4, 49, 48, 48, 48]
We can note that it was in fact 5 bytes rendered with the corresponding decimal numbers in a list of 5 elements.
It is often easy to get confused with what the actual value is, vs. what is being shown and visualized on the computer console. Unfortunately the tools we use sometimes amplify that confusion, but as programmers we should understand this and seek ways to minimize this for users of our work, as this example shows that not everyone may have the intuition that certain representations of bytes may instead be represented as printable ASCII.

Python: different byte values for the same character?

The program I'm writing captures individual keypresses with the function mscvrt.getch(), which works very similarly to the C function of the same name, but instead of returning a char variable, it returns a byte, which I have to decode afterwards.
However, it has a problem decoding non-ascii characters, like accented letters (it triggers a UnicodeDecodeError), so I handle this exception with a function that compares the returned byte value with a list of byte values of special characters I want, and if it matches with one of them, the function returns its char equivalent.
The problem is that I noticed that the byte value is different on two machines I use (probably something to do with the system being in different languages, and/or I using keyboards with a different layout).
For example, if I input the character à, the byte value returned will be b'\x85' in one machine, and b'\xe0' in the other.
Why does this happen? How can I make a "universal solution" (elegant, preferably) that can work as I want in any machine?
Use msvcrt.getwch().
It will return a str (rather than a byte) that contains the character, and works with unicode rather than ascii.

Python's newly featured numeric literal (eg; 234_432) doesn't work with .isdigit()?

I always understood that if something can be converted to integer (ie; something is string representation of numeric), isdigit() return True. This is not the case with the new feature. Here is the sample below:
Code Sample
But why?
To answer your question, looking at the python 3.6 documentation for the isdigit method.
Return true if all characters in the string are digits and there is at least one character, false otherwise.
Since an underscore isn't a digit, the new format will not work well with the current implementation of isdigit. As I commented before, the immediate work around would be: str.replace("_", "").isdigit() where str is string containing the newly formatted number, while avoiding a try-except block with int.
You also need to take out the negative sign for negative integers. This way negative integers will work as well. str.replace("_", "").lstrip("-").isdigit().

Python - Convert negative decimals from string to float

I need to read in a large number of .txt files, each of which contains a decimal (some are positive, some are negative), and append these into 2 arrays (genotypes and phenotypes). Subsequently, I wish to perform some mathematical operations on these arrays in scipy, however the negative ('-') symbol is causing problems. Specifically, I cannot convert the arrays to float, because the '-' is being read as a string, causing the following error:
ValueError: could not convert string to float:
Here is my code as it's currently written:
import linecache
gene_array=[]
phen_array=[]
for i in genotype:
for j in phenotype:
genotype='/path/g.txt'
phenotype='/path/p.txt'
g=linecache.getline(genotype,1)
p=linecache.getline(phenotype,1)
p=p.strip()
g=g.strip()
gene_array.append(g)
phen_array.append(p)
gene_array=map(float,gene_array)
phen_array=map(float,phen_array)
I am fairly certain at this point that it is the negative sign that is causing the problem, but it is not clear to me why. Is my use of Linecache the problem here? Is there an alternative method that would be better?
The result of
print gene_array
is
['-0.0448022516321286', '-0.0236187263814157', '-0.150505384829925', '-0.00338459268479522', '0.0142429109897682', '0.0286253352284279', '-0.0462358095345649', '0.0286232317578776', '-0.00747425206137217', '0.0231790239373428', '-0.00266935581919541', '0.00825077426011094', '0.0272744527203547', '0.0394829854063242', '0.0233109171715023', '0.165841084392078', '0.00259693465334536', '-0.0342590874424289', '0.0124600520095644', '0.0713627590092807', '-0.0189374898081401', '-0.00112750710611284', '-0.0161387333242288', '0.0227226505624106', '0.0382173405035751', '0.0455518646388402', '-0.0453048799717046', '0.0168570746329513']
The issue seems to be with empty string or space as evident from your error message
ValueError: could not convert string to float:
To make it work, convert the map to a list comprehension
gene_array=[float(e) for e in gene_array if e]
phen_array=[float(e) for e in phen_array if e]
By empty string means
float(" ") or float("") would give value errors, so if any of the items within gene_array or phen_array has space, this will throw an error while converting to float
There could be many reasons for empty string like
empty or blank line
blank line either at the beginning or end
The issue is definitely not in the negative sign. Python converts strings with negative sign without a problem. I suggest you run each of your entries against a float RegEx and see if they all pass.
There is nothing in the error message to suggest that - is the problem. The most likely reason is that gene_array and/or phen_array contain an empty string ('').
As stated in the documentation, linecache.getline()
will return '' on errors (the terminating newline character will be included for lines that are found).

Categories