Pandas bitwise comparisons throws exception when using multiple conditions

Pandas bitwise comparisons throws exception when using multiple conditions - python

I am working with a large data, and I want to extract a subset.
In SQL representation this is what I want to achieve. I would like to do this using pandas/numpy.
select * from Data where cpty_type = 'INTERBRANCH' and (settlementDate >= '2017-04-18 00:00:00.000' or settlementDate = '1899-12-30 00:00:00.000'))
These two statements on their own work:
#1. unionX1 = data[data.cpty_type == 'INTERBRANCH']
#2. unionX1 = data[data.settlementDate >= '2017-04-18 00:00:00.000']
My versions (combining both does not work):
unionX1 = data[data.cpty_type == 'INTERBRANCH' & (data.settlementDate >= '2017-04-18' | data.settlementDate == '2017-04-18')]
I get the following exception when I run it:
I think it is cause by the bit-wise comparison
Any suggestion on what I am doing wrong here?
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 877, in na_op
result = op(x, y)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 127, in <lambda>
ror_=bool_method(lambda x, y: operator.or_(y, x),
TypeError: ufunc 'bitwise_or' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 895, in na_op
result = lib.scalar_binop(x, y, op)
File "pandas\lib.pyx", line 912, in pandas.lib.scalar_binop (pandas\lib.c:16177)
ValueError: cannot include dtype 'M' in a buffer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Karunyan/PycharmProjects/RECON/criteria/distinct_matched_trades.py", line 18, in <module>
unionX1 = data[data.cpty_type == 'INTERBRANCH' & (data.settlementDate >= '2017-04-18' | data.settlementDate == '1899-12-30')]
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 929, in wrapper
na_op(self.values, other),
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 899, in na_op
x.dtype, type(y).__name__))
TypeError: cannot compare a dtyped [datetime64[ns]] array with a scalar of type [bool]

In Python, bitwise operations like |, &, and ^ have higher precedence than comparison operations like <, >, ==, etc. You need to use parentheses in your expressions to force the correct evaluation order.
For example, if you write A < B & C < D, it will be evaluated as A < (B & C) < D which will produce an error in the case of Pandas series. You need to explicitly write (A < B) & (C < D) to make it work as you expect.
In your case, you can do this:
unionX1 = data[(data.cpty_type == 'INTERBRANCH') & ((data.settlementDate >= '2017-04-18') | (data.settlementDate == '2017-04-18'))]

You need to enclose multiple conditions in braces due to operator precedence and use the bitwise and (&) and or (|) operators:
unionX1 = data[(data.cpty_type == 'INTERBRANCH') &
((data.settlementDate >='2017-04-18') | (data.settlementDate =='2017-04-18'))]

If you want to sidestep the operator precedence oddity, you can use numpy's logical_or and logical_and functions.
unionX1 = data[logical_and(
data.cpty_type == 'INTERBRANCH',
np.logical_or(
data.settlementDate >= '2017-04-18',
data.settlementDate == '2017-04-18'
)
)]
The grouping is explicit, so you get the behavior you intend without having to remember the precedence of binary operators.

Related

Add binary operator to z3

I'm trying to parse and translate a string to its equivalent z3 form.
import z3
expr = 'x + y = 10'
p = my_parse_expr_to_z3(expr) # results in: ([x, '+', y], '==', [10])
p = my_flatten(p) # after flatten: [x, '+', y, '==', 10]
Type-checking of parsed string:
for e in p:
print(type(e), e)
# -->
<class 'z3.z3.ArithRef'> x
<class 'str'> +
<class 'z3.z3.ArithRef'> y
<class 'str'> ==
<class 'int'> 10
When I now try:
s = z3.Solver()
s.add(*p)
I get:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "...\venv\lib\site-packages\z3\z3.py", line 6938, in add
self.assert_exprs(*args)
File "..\venv\lib\site-packages\z3\z3.py", line 6926, in assert_exprs
arg = s.cast(arg)
File "..\venv\lib\site-packages\z3\z3.py", line 1505, in cast
_z3_assert(self.eq(val.sort()), "Value cannot be converted into a Z3 Boolean value")
File "..\venv\lib\site-packages\z3\z3.py", line 112, in _z3_assert
raise Z3Exception(msg)
z3.z3types.Z3Exception: Value cannot be converted into a Z3 Boolean value
The equal and plus signs occurs to be of the false type/usage? How can I translate that correctly?

Where's the definition of parse_expr_to_z3 coming from? It's definitely not something that comes with z3 itself, so you must be getting it from some other third-party, or perhaps you wrote it yourself. Without knowing how it's defined, it's impossible for anyone on stack-overflow to give you any guidance.
In any case, as you suspected its results are not something you can feed back to z3. It fails precisely because what you can add to the solver must be constraints, i.e., expressions of type Bool in z3. Clearly, none of those constituents have that type.
So, long story short, this parse_expr_to_z3 doesn't seem to be designed to do what you intended. Contact its developer for further details on what the intended use case is.
If you're trying to load assertions from a string to z3, then you can do that using the so called SMTLib format. Something like:
from z3 import *
expr = """
(declare-const x Int)
(declare-const y Int)
(assert (= (+ x y) 10))
"""
p = parse_smt2_string(expr)
s = Solver()
s.add(p)
print(s.check())
print(s.model())
This prints:
sat
[y = 0, x = 10]
You can find more about SMTLib syntax in https://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2021-05-12.pdf
Note that trying to do this using any other syntax (like your proposed 'x + y = 10') is going to require knowledge of the variables in the string (x and y in this case, but can of course be arbitrary), and what sort of symbols (+ and = in your case, but again can be any number of different symbols) and their precise meanings. Without knowing your exact needs, it's hard to opine, but using anything other than the existing support for SMTLib syntax itself will require a non-insignificant amount of work.

Why do divisions not work in jsonpath-ng?

Jsonpath-ng offers basic arithmetic as in:
from jsonpath_ng import jsonpath
from jsonpath_ng.ext import parse
jsonpath_expr = parse('$.foo * 2')
target = {'foo': 2}
result = jsonpath_expr.find(target)
result = [match.value for match in result]
print(result)
result: [4]
However, if I change the expression to $.foo / 2, then I get a Parse Error:
Traceback (most recent call last):
File "test.py", line 4, in <module>
jsonpath_expr = parse('$.foo / 2')
File "C:\Users\micha\AppData\Roaming\Python\Python38\site-packages\jsonpath_ng\ext\parser.py", line 172, in parse
return ExtentedJsonPathParser(debug=debug).parse(path)
File "C:\Users\micha\AppData\Roaming\Python\Python38\site-packages\jsonpath_ng\parser.py", line 32, in parse
return self.parse_token_stream(lexer.tokenize(string))
File "C:\Users\micha\AppData\Roaming\Python\Python38\site-packages\jsonpath_ng\parser.py", line 55, in parse_token_stream
return new_parser.parse(lexer = IteratorToTokenStream(token_iterator))
File "C:\Users\micha\AppData\Roaming\Python\Python38\site-packages\ply\yacc.py", line 333, in parse
return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "C:\Users\micha\AppData\Roaming\Python\Python38\site-packages\ply\yacc.py", line 1201, in parseopt_notrack
tok = call_errorfunc(self.errorfunc, errtoken, self)
File "C:\Users\micha\AppData\Roaming\Python\Python38\site-packages\ply\yacc.py", line 192, in call_errorfunc
r = errorfunc(token)
File "C:\Users\micha\AppData\Roaming\Python\Python38\site-packages\jsonpath_ng\parser.py", line 69, in p_error
raise Exception('Parse error at %s:%s near token %s (%s)' % (t.lineno, t.col, t.value, t.type))
Exception: Parse error at 1:6 near token / (SORT_DIRECTION)
I can sometimes work around this issue by dividing by the inverse value, so I would do $.foo * 0.5 to get the result [1.0]. But this doesn't work if both sides of the equation are numeric values of different types (int or float). So 2 * 0.5 and 0.5 * 2 will result in a Parse error, but 2.0 * 0.5 will not.
How do I get divisions to work? And why can I not multiply a float by an integer?

Those are both grammar bugs.
Extended JSON paths are allowed to be suffixed with a bracketed expression containing a list of "sorts"; each sort starts with / or \. To make that work, the extended lexer recognises those two symbols as the token SORT_DIRECTION, which takes precedence over the recognition of / as an arithmetic operator. Consequently the use of / as an arithmetic operator is not allowed by the parser. (In fact, the problem goes deeper than that, but that's the essence.)
For some reason, the grammar author chose to separate NUMBER (actually, integer) and FLOAT in arithmetic expressions, which means that they had to enumerate the possible combinations. What they chose was:
jsonpath : NUMBER operator NUMBER
| FLOAT operator FLOAT
| ID operator ID
| NUMBER operator jsonpath
| FLOAT operator jsonpath
| jsonpath operator NUMBER
| jsonpath operator FLOAT
| jsonpath operator jsonpath
There are other problems with this grammar, but the essence here is that it permits NUMBER operator NUMBER and FLOAT operator FLOAT but does not permit NUMBER operator FLOAT or FLOAT operator NUMBER, which is what you observe. However, path expressions can work with either NUMBER or FLOAT.

Z3python XOR sum?

I'm currently trying to solve some equation with z3python, and I am coming across a situation I could not deal with.
I need to xor certain BitVecs with specific non ascii char values, and sum them up to check a checksum.
Here is an example :
pbInput = [BitVec("{}".format(i), 8) for i in range(KEY_LEN)]
password = "\xff\xff\xde\x8e\xae"
solver.add(Xor(pbInput[0], password[0]) + Xor(pbInput[3], password[3]) == 300)
It results in a z3 type exception :
z3.z3types.Z3Exception: Value cannot be converted into a Z3 Boolean value.
I found this post and tried to apply a function to my password string, adding this line to my script :
password = Function(password, StringSort(), IntSort(), BitVecSort(8))
But of course it fails as the string isn't an ASCII string.
I don't care about it being a string, I tried to just do Xor(pbInput[x] ^ 0xff), but this doesn't work either. I could not find any documentation on this particular situation.
EDIT :
Here is the full traceback.
Traceback (most recent call last):
File "solve.py", line 18, in <module>
(Xor(pbInput[0], password[0])
File "/usr/local/lib/python2.7/dist-packages/z3/z3.py", line 1555, in Xor
a = s.cast(a)
File "/usr/local/lib/python2.7/dist-packages/z3/z3.py", line 1310, in cast
_z3_assert(self.eq(val.sort()), "Value cannot be converted into a Z3 Boolean value")
File "/usr/local/lib/python2.7/dist-packages/z3/z3.py", line 91, in _z3_assert
raise Z3Exception(msg)
z3.z3types.Z3Exception: Value cannot be converted into a Z3 Boolean value
Thanks in advance if you have any idea about how I could do this operation!

There are two problems in your code.
Xor is for Bool values only; for bit-vectors simply use ^
Use the function ord to convert characters to integers before passing to xor
You didn't give your full program (which is always helpful!), but here's how you'd write that section in z3py as a full program:
from z3 import *
solver = Solver()
KEY_LEN = 10
pbInput = [BitVec("c_{}".format(i), 8) for i in range(KEY_LEN)]
password = "\xff\xff\xde\x8e\xae"
solver.add((pbInput[0] ^ ord(password[0])) + (pbInput[3] ^ ord(password[3])) == 300)
print solver.check()
print solver.model()
This prints:
sat
[c_3 = 0, c_0 = 97]
(I gave the variables better names to distinguish more properly.) So, it's telling us the solution is:
>>> (0xff ^ 97) + (0x8e ^ 0)
300
Which is indeed what you asked for.

Is there any way to make Python .format() thow an exception if the data won't fit the field?

I want to normalize floating-point numbers to nn.nn strings, and to do some special handling if the number is out of range.
try:
norm = '{:5.2f}'.format(f)
except ValueError:
norm = 'BadData' # actually a bit more complex than this
except it doesn't work: .format silently overflows the 5-character width. Obviously I could length-check norm and raise my own ValueError, but have I missed any way to force format (or the older % formatting) to raise an exception on field-width overflow?

You can not achieve this with format(). You have to create your custom formatter which raises the exception. For example:
def format_float(num, max_int=5, decimal=2):
if len(str(num).split('.')[0])>max_int:
raise ValueError('Integer part of float can have maximum {} digits'.format(max_int))
return "{:.2f}".format(num)
Sample run:
>>> format_float(123.456)
'123.46'
>>> format_float(123.4)
'123.40'
>>> format_float(123789.456) # Error since integer part is having length more than 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in format_float
ValueError: Integer part of float can have maximum 5 digits

Python3 Infinity/NaN: Decimal vs. float

Given (Python3):
>>> float('inf') == Decimal('inf')
True
>>> float('-inf') <= float('nan') <= float('inf')
False
>>> float('-inf') <= Decimal(1) <= float('inf')
True
Why are the following invalid? I have read Special values.
Invalid
>>> Decimal('-inf') <= Decimal('nan') <= Decimal('inf')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]
>>> Decimal('-inf') <= float('nan') <= Decimal('inf')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]
>>> float('-inf') <= Decimal('nan') <= float('inf')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]

From the decimal.py source code:
# Note: The Decimal standard doesn't cover rich comparisons for
# Decimals. In particular, the specification is silent on the
# subject of what should happen for a comparison involving a NaN.
# We take the following approach:
#
# == comparisons involving a quiet NaN always return False
# != comparisons involving a quiet NaN always return True
# == or != comparisons involving a signaling NaN signal
# InvalidOperation, and return False or True as above if the
# InvalidOperation is not trapped.
# <, >, <= and >= comparisons involving a (quiet or signaling)
# NaN signal InvalidOperation, and return False if the
# InvalidOperation is not trapped.
#
# This behavior is designed to conform as closely as possible to
# that specified by IEEE 754.
And from the Special values section you say you read:
An attempt to compare two Decimals using any of the <, <=, > or >= operators will raise the InvalidOperation signal if either operand is a NaN, and return False if this signal is not trapped.
Note that IEEE 754 uses NaN as a floating point exception value; e.g. you did something that cannot be computed and you got an exception instead. It is a signal value and should be seen as an error, not something to compare other floats against, which is why in the IEEE 754 standard it is unequal to anything else.
Moreover, the Special values section mentions:
Note that the General Decimal Arithmetic specification does not specify the behavior of direct comparisons; these rules for comparisons involving a NaN were taken from the IEEE 854 standard (see Table 3 in section 5.7).
and looking at IEEE 854 section 5.7 we find:
In addition to the true/false response, an invalid operation exception (see 7.1) shall be signaled
when, as indicated in the last column of Table 3, “unordered” operands are compared using one of the predicates
involving “<” or “>” but not “?.” (Here the symbol “?” signifies “unordered.” )
with comparisons with NaN classified as unordered.
By default InvalidOperation is trapped, so a Python exception is raised when using <= and >= against Decimal('NaN'). This is a logical extension; Python has actual exceptions so if you compare against the NaN exception value, you can expect an exception being raised.
You could disable trapping by using a Decimal.localcontext():
>>> from decimal import localcontext, Decimal, InvalidOperation
>>> with localcontext() as ctx:
... ctx.traps[InvalidOperation] = 0
... Decimal('-inf') <= Decimal('nan') <= Decimal('inf')
...
False

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas bitwise comparisons throws exception when using multiple conditions - python

You need to enclose multiple conditions in braces due to operator precedence and use the bitwise and (&) and or (|) operators: unionX1 = data[(data.cpty_type == 'INTERBRANCH') & ((data.settlementDate >='2017-04-18') | (data.settlementDate =='2017-04-18'))]

Related

Add binary operator to z3

Why do divisions not work in jsonpath-ng?

Z3python XOR sum?

Is there any way to make Python .format() thow an exception if the data won't fit the field?

Python3 Infinity/NaN: Decimal vs. float

Categories

Resources