Numpy: Creating a Vector through Array Comparison is NOT working - python

As shown in the IPython (Python 3) snapshot below I expect to see an array of Boolean values printed in the end. However, I see ONLY 1 Boolean value returned.
Unable to identify why?
What does the character 'b' before every
value in the first print statement denote? Am I using the wrong
dtype=numpy.string_ in my numpy.getfromtxt() command?

Python has the distinction between unicode strings and ASCII bytes. In Python3, the default is that "strings" are unicode.
The b prefixing the "strings", indicate that the interpreter considers these to be bytes.
For the comparison, you need to compare it to bytes as well, i.e.,
... == b"1984"
and then numpy will understand that it should perform broadcasting on same-type elements.

Related

how to check python bytes for dataframe string field?

I have a string that I would like to check for comparison.
df['ACCOUNTMANAGER'][0] this value has some special character that I cannot compare using string comparison. I tried to compare using bytes but it failed. I would like to check how the data is stored there for comparison. Is there a way to do this?
I figured it out. It was being stored as utf-8 encoded format and now comparison works. I used for comparison byte b'\x8a' in if statement.

Python - How can I convert a special character to the unicode representation?

In a dictionary, I have the following value with equals signal:
{"appVersion":"o0u5jeWA6TwlJacNFnjiTA=="}
To be explicit, I need to replace the = for the unicode representation '\u003d' (basically the reverse process of [json.loads()][1]). How can I set the unicode value to a variable without store the value with two scapes (\\u003d)?.
I've tryed of different ways, including the enconde/decode, repr(), unichr(61), etc, and even searching a lot, cound't find anything that does this, all the ways give me the following final result (or the original result):
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
Since now, thanks for your attention.
EDIT
When I debug the code, it gives me the value of the variable with 2 escapes. The program will get this value and use it to do the following actions, including the extra escape. I'm using this code to construct a json by the json.dumps() and the result returned is a unicode with 2 escapes.
Follow a print of the final result after the JSON construction. I need to find a way to store the value in the var with just one escape.
I don't know if make difference, but I'm doing this to a custom BURP Plugin, manipulating some selected requests.
Here is an image of my POC, getting the value of the var.
The extra backslash is not actually added, The Python interpreter uses the repr() to indicate that it's a backslash not something like \t or \n when the string containing \ gets printed:
I hope this helps:
>>> t['appVersion'] = t["appVersion"].replace('=', '\u003d')
>>> t['appVersion']
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
>>> print(t['appVersion'])
o0u5jeWA6TwlJacNFnjiTA\u003d\u003d
>>> t['appVersion'] == 'o0u5jeWA6TwlJacNFnjiTA\u003d\u003d'
True

Unicode int to char, leading zero

I have an integer representing a unicode character which I want to transform to the actual character so I can print it out.
However the function unichr() gives me different behaviour depending on whether there a leading zero or not. (See screenshot below for a better explanation)
However, when the integer is stored in a variable I always get the first behavior whilst I want to achieve the second. How can I do this?

How to always end a slice at a certain value

I am dealing with a number of different length byte objects. However, they will all contain a certain byte that I want to end on (i.e. I will always want to get a certain number of values up to that value but not past it). The problem is it is not always the last byte (so I can't just use [:-1]. I am never interested in what comes after that so it is okay for me to ignore what comes later but I do need to capture what comes before it.
Is there a way in Python to slice up to a certain value as opposed to a certain index?
i.e.
[2:'\xf0']
to slice from the third byte to the \xf0 byte?
In Python 2.x, you can use the index function and slicing, like this
a = bytearray(b"abcd\xf0asda")
print a[2:a.index('\xf0')]
# cd
In Python 3.x, you just need to search with the bytes object, like this
a = b"abcd\xf0asda"
print(a[2:a.index(b'\xf0')])
# b'cd'
The index function will return the index of the item you are looking for, in the object. Beware, it will raise an exception if the item being searched for is not found in the object.

Converting 2.5 byte comparisons to 3

I'm trying to convert a 2.5 program to 3.
Is there a way in python 3 to change a byte string, such as b'\x01\x02' to a python 2.5 style string, such as '\x01\x02', so that string and byte-by-byte comparisons work similarly to 2.5? I'm reading the string from a binary file.
I have a 2.5 program that reads bytes from a file, then compares or processes each byte or combination of bytes with specified constants. To run the program under 3, I'd like to avoid changing all my constants to bytes and byte strings ('\x01' to b'\x01'), then dealing with issues in 3 such as:
a = b'\x01'
b = b'\x02'
results in
(a+b)[0] != a
even though similar operation work in 2.5. I have to do (a+b)[0] == ord(a), while a+b == b'\x01\x02' works fine. (By the way, what do I do to (a+b)[0] so it equals a?)
Unpacking structures is also an issue.
Am I missing something simple?
Bytes is an immutable sequence of integers (in the range 0<= to <256), therefore when you're accessing (a+b)[0] you're getting back an integer, exactly the same one you'd get by accessing a[0]. so when you're comparing sequence a to an integer (a+b)[0], they're naturally different.
using the slice notation you could however get a sequence back:
>>> (a+b)[:1] == a # 1 == len(a) ;)
True
because slicing returns bytes object.
I would also advised to run 2to3 utility (it needs to be run with py2k) to convert some code automatically. It won't solve all your problems, but it'll help a lot.

Categories