Related
We're trying to figure out a way to easily pull values from what I guess I would describe as a grid of conditional statements. We've got two variables, x and y, and depending on those values, we want to pull one of (something1, ..., another1, ... again1...). We could definitely do this using if statements, but we were wondering if there was a better way. Some caveats: we would like to be able to easily change the bounds on the x and y conditionals. The problem with a bunch of if statements is that it's not very easy to compare the values of those bounds with the values in the example table below.
Example:
So if x = 4% and y = 30%, we would get back another1. Whereas if x = 50% and y = 10%, we would get something3.
Overall two questions:
Is there a general name for this kind of problem?
Is there an easy framework or library that could do this for us without if statements?
Even though Pandas is not really made for this kind of usage, with function aggregation and boolean indexing it allows for an elegant-ish solution for your problem. Alternatively, constraint-based programing might be an option (see python-constraint on pypi).
Define the constraints as functions.
x_constraints = [lambda x: 0 <= x < 5,
lambda x: 5 <= x < 10,
lambda x: 10<= x < 15,
lambda x: x >= 15
]
y_constraints = [lambda y: 0 <= y < 20,
lambda y: 20 <= y < 50,
lambda y: y >= 50]
x = 15
y = 30
Now we want to make two dataframes: One that only holds the x-values,
and another that only holds the y-values where number of columns = number of x-constraints and number of rows = number of y-constraints.
import pandas as pd
def make_dataframe(value):
return pd.DataFrame(data=value,
index=range(len(y_constraints)),
columns=range(len(x_constraints)))
x_df = make_dataframe(x)
y_df = make_dataframe(y)
The dataframes look like this:
>>> x_df
0 1 2 3
0 15 15 15 15
1 15 15 15 15
2 15 15 15 15
>>> y_df
0 1 2 3
0 30 30 30 30
1 30 30 30 30
2 30 30 30 30
Next, we need the dataframe label_df that holds the possible outcomes. The shape must match the dimension of x_df and y_df above. (What's cool about this is that you can store the data in a
CSV-file and directly read it into a dataframe with pd.read_csv if you wish.)
label_df = pd.DataFrame([[f"{w}{i+1}" for i in range(len(x_constraints))] for w in "something another again".split()])
>>> label_df
0 1 2 3
0 something1 something2 something3 something4
1 another1 another2 another3 another4
2 again1 again2 again3 again4
Next, we want to apply the x_constraints to the columns of x_df, and the y_constraints to the rows of y_df. .aggregate takes
a dictionary that maps column or row names to functions {colname: func},
which we construct inline using dict(zip(...)). axis=1 means "apply the functions row-wise".
x_mask = x_df.aggregate(dict(zip(x_df.columns, x_constraints)))
y_mask = y_df.aggregate(dict(zip(y_df.columns, y_constraints)), axis=1)
The result are two dataframes holding boolean values, and ideally,
there should be exactly one column in x_mask and one row in y_mask that's all True, e.g.
>>> x_mask
0 1 2 3
0 False False False True
1 False False False True
2 False False False True
>>> y_mask
0 1 2 3
0 False False False False
1 True True True True
2 False False False False
If we combine them with bit-wise and &, we get a boolean mask with exactly
one True value.
>>> m = x_mask & y_mask
>>> m
0 1 2 3
0 False False False False
1 False False False True
2 False False False False
Use m to select the target value from label_df. The result df is all NaN except one value, which we extract with df.stack().iloc[0]:
>>> df = label_df[m]
0 1 2 3
0 NaN NaN NaN NaN
1 NaN NaN NaN another4
2 NaN NaN NaN NaN
>>> df.stack().iloc[0]
'another4'
And that's it! It should be very easy to maintain, by just changing the list of constraints and adapting the possible outcomes in label_df.
I didn't hear about any name.
If (ha-ha) it should be more conceptually close to you, I might suggest that you create two mapper functions that would map x and y values to the categories of your contingency table.
map_x = lambda x: 0 if x < 0.05 else 1 if x < 0.1 else 2
map_y = lambda y: 0 if y < 0.2 else 1 if y < 0.5 else 2
df.iloc[map_x(x), map_y(y)]
If you have just a handful of conditionals then you may define two lists with the upper bounds, and use a simple linear search:
x_bounds = [0.05, 0.1, 1.0]
y_bounds = [0.2, 0.5, 1.0]
def linear(x_bounds, y_bounds, x, y):
for i,xb in enumerate(x_bounds):
if x <= xb:
break
for j,yb in enumerate(y_bounds):
if y <= yb:
break
return i,j
linear(x_bounds, y_bounds, 0.4, 3.0) #(0,1)
If there are many conditionals a binary search will be better:
def binary(x_bounds, y_bounds, x, y):
lower = 0
upper = len(x_bounds)-1
while upper > lower+1:
mid = (lower+upper)//2
if x_bounds[mid] < x:
lower = mid
elif x_bounds[mid] >= x:
if mid > 0 and x_bounds[mid-1] < x:
xmid = mid
break
else:
xmid = mid-1
break
else:
upper = mid
lower = 0
upper = len(y_bounds)-1
while upper > lower+1:
mid = (lower+upper)//2
if y_bounds[mid] < y:
lower = mid
elif y_bounds[mid] >= y:
if mid > 0 and y_bounds[mid-1] < y:
ymid = mid
break
else:
ymid = mid-1
break
else:
upper = mid
return xmid,ymid
binary(x_bounds, y_bounds, 0.4, 3.0) #(0,1)
I have a pandas dataframe(100,000 obs) with 11 columns.
I'm trying to assign df['trade_sign'] values based on the df['diff'] (which is a pd.series object of integer values)
If diff is positive, then trade_sign = 1
if diff is negative, then trade_sign = -1
if diff is 0, then trade_sign = 0
What I've tried so far:
pos['trade_sign'] = (pos['trade_sign']>0) <br>
pos['trade_sign'].replace({False: -1, True: 1}, inplace=True)
But this obviously doesn't take into account 0 values.
I also tried for loops with if conditions but that didn't work.
Essentially, how do I fix my .replace function to take account of diff values of 0.
Ideally, I'd prefer a solution that uses numpy over for loops with if conditions.
There's a sign function in numpy:
df["trade_sign"] = np.sign(df["diff"])
If you want integers,
df["trade_sign"] = np.sign(df["diff"]).astype(int)
a = [-1 if df['diff'].values[i] < 0 else 1 for i in range(len(df['diff'].values))]
df['trade_sign'] = a
You could do it this way:
pos['trade_sign'] = (pos['diff'] > 0) * 1 + (pos['diff'] < 0) * -1
The boolean results of the element-wise > and < comparisons automatically get converted to int in order to allow multiplication with 1 and -1, respectively.
This sample input and test code:
import pandas as pd
pos = pd.DataFrame({'diff':[-9,0,9,-8,0,8,-7-6-5,4,3,2,0]})
pos['trade_sign'] = (pos['diff'] > 0) * 1 + (pos['diff'] < 0) * -1
print(pos)
... gives this output:
diff trade_sign
0 -9 -1
1 0 0
2 9 1
3 -8 -1
4 0 0
5 8 1
6 -18 -1
7 4 1
8 3 1
9 2 1
10 0 0
UPDATE: In addition to the solution above, as well as some of the other excellent ideas in other answers, you can use numpy where:
pos['trade_sign'] = np.where(pos['diff'] > 0, 1, np.where(pos['diff'] < 0, -1, 0))
I am trying to validate if any numbers are duplicates in a 9x9 array however need to exclude all 0 as they are the once I will solve later. I have a 9x9 array and would like to validate if there are any duplicates in the rows and columns however excluding all 0 from the check only numbers from 1 to 9 only. The input array as example would be:
[[1 0 0 7 0 0 0 0 0]
[0 3 2 0 0 0 0 0 0]
[0 0 0 6 0 0 0 0 0]
[0 8 0 0 0 2 0 7 0]
[5 0 7 0 0 1 0 0 0]
[0 0 0 0 0 3 6 1 0]
[7 0 0 0 0 0 2 0 9]
[0 0 0 0 5 0 0 0 0]
[3 0 0 0 0 4 0 0 5]]
Here is where I am currently with my code for this:
#Checking Columns
for c in range(9):
line = (test[:,c])
print(np.unique(line).shape == line.shape)
#Checking Rows
for r in range(9):
line = (test[r,:])
print(np.unique(line).shape == line.shape)
Then I would like to do the exact same for the 3x3 sub arrays in the 9x9 array. Again I need to somehow exclude the 0 from the check. Here is the code I currently have:
for r0 in range(3,9,3):
for c0 in range(3,9,3):
test1 = test[:r0,:c0]
for r in range(3):
line = (test1[r,:])
print(np.unique(line).shape == line.shape)
for c in range(3):
line = (test1[:,c])
print(np.unique(line).shape == line.shape)
``
I would truly appreciate assistance in this regard.
It sure sounds like you're trying to verify the input of a Sudoku board.
You can extract a box as:
for r0 in range(0, 9, 3):
for c0 in range(0, 9, 3):
box = test1[r0:r0+3, c0:c0+3]
... test that np.unique(box) has 9 elements...
Note that this is only about how to extract the elements of the box. You still haven't done anything about removing the zeros, here or on the rows and columns.
Given a box/row/column, you then want something like:
nonzeros = [x for x in box.flatten() if x != 0]
assert len(nonzeros) == len(set(nonzeros))
There may be a more numpy-friendly way to do this, but this should be fast enough.
Excluding zeros is fairly straight forward by masking the array
test = np.array(test)
non_zero_mask = (test != 0)
At this point you can either check the whole matrix for uniqueness
np.unique(test[non_zero_mask])
or you can do it for individual rows/columns
non_zero_row_0 = test[0, non_zero_mask[0]]
unique_0 = np.unique(non_zero_row_0)
You can add the logic above into a loop to get the behavior you want
As for the 3x3 subarrays, you can loop through them as you did in your example.
When you have a small collection of things (small being <=64 or 128, depending on architecture), you can turn it into a set using bits. So for example:
bits = ((2**board) >> 1).astype(np.uint16)
Notice that you have to use right shift after the fact rather than pre-subtracting 1 from board to cleanly handle zeros.
You can now compute three types of sets. Each set is the bitwise OR of bits in a particular arrangement. For this example, you can use sum just the same:
rows = bits.sum(axis=1)
cols = bits.sum(axis=0)
blocks = bits.reshape(3, 3, 3, 3).sum(axis=(1, 3))
Now all you have to do is compare the bit counts of each number to the number of non-zero elements. They will be equal if and only if there are no duplicates. Duplicates will cause the bit count to be smaller.
There are pretty efficient algorithms for counting bits, especially for something as small as a uint16. Here is an example: How to count the number of set bits in a 32-bit integer?. I've adapted it for the smaller size and numpy here:
def count_bits16(arr):
count = arr - ((arr >> 1) & 0x5555)
count = (count & 0x3333) + ((count >> 2) & 0x3333)
return (count * 0x0101) >> 8
This is the count of unique elements for each of the configurations. You need to compare it to the number of non-zero elements. The following boolean will tell you if the board is valid:
count_bits16(rows) == np.count_nonzero(board, axis=1) and \
count_bits16(cols) == np.count_nonzero(board, axis=0) and \
count_bits16(blocks) == np.count_nonzero(board.reshape(3, 3, 3, 3), axis=(1, 3))
I'm in a situation where I have to rewrite some Julia code into Python code and I cannot reproduce this line.
if 1 in array1[array2] || 1 in array1[array3]
In my understanding, this line comparing arrays array1 to array2 and array1 to array3, to see if the index array2 of array1 is 1 or the index array3 of array1 is 1.
So, I reproduced this code into python code with my understand,
for i, j in zip(array2, array3):
if array1[i] == 1 or array1[j] == 1:
But this code didn't work like an above code and I got a ValueError like below:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I'm not sure if this is my misunderstanding of the Julia's line or my Python code is wrong.
Could someone tell me what is wrong?
[edit]:
Here is Julia code of this problem. here I am using karate club network as an input matrix
dir = "to/your/path"
ln = "soc-karate.mtx"
mtx = MatrixMarketRead(string(dir,strip(ln)));
A = mtx - spdiagm(diag(mtx))
n = size(A,1);
A = speye(n) - A * spdiagm(1./vec(sum(A,1)));
println(A)
function findDegrees(Ac::SparseMatrixCSC)
degrees = zeros(Int,length(Ac.colptr)-1)
for i = 1:length(degrees)
degrees[i] = Ac.colptr[i+1]-Ac.colptr[i]-1
end
return degrees
end
function lowDegreeNodes(A::SparseMatrixCSC,At::SparseMatrixCSC,d::Int64,dout::Vector,din::Vector)
# 1: find low degree nodes
n = size(A,1)
U = collect(dout.==1)
println(U)
V = collect(din.==1)
Z = min((dout+din) .>= 1 , (dout+din) .<= 8 )
# 2: visited = 0 ==> NotVisited
# = 1 ==> FNode
# = 2 ==> NotEliminated
visited = zeros(length(U))
for u = 1:n
if Z[u]
if visited[u] == 0
Au = A.rowval[ A.colptr[u]:A.colptr[u+1]-1 ]
Av = At.rowval[ At.colptr[u]:At.colptr[u+1]-1 ]
if 1 in visited[Au] || 1 in visited[Av]
visited[u] = 2
else
visited[Au] = 2
visited[u] = 1
end
end
end
if V[u]
if visited[u] == 0
Au = A.rowval[ A.colptr[u]:A.colptr[u+1]-1 ]
Av = At.rowval[ At.colptr[u]:At.colptr[u+1]-1 ]
if 1 in visited[Au] || 1 in visited[Av]
visited[u] = 2
else
visited[Av] = 2
visited[u] = 1
end
end
end
if U[u]
if visited[u] == 0
Au = A.rowval[ A.colptr[u]:A.colptr[u+1]-1 ]
Av = At.rowval[ At.colptr[u]:At.colptr[u+1]-1 ]
if 1 in visited[Au] || 1 in visited[Av]
visited[u] = 2
else
visited[Av] = 2
visited[u] = 1
end
end
end
end
return visited .== 1
end
dout = findDegrees(A)
din = findDegrees(A')
z = lowDegreeNodes(A, A', 3, dout, din)
if 1 in array1[array2] || 1 in array1[array3]
In my understanding, this line comparing arrays array1 to array2 and array1 to array3, to see if the index array2 of array1 is 1 or the index array3 of array1 is 1.
I don't think that's correct. I think this line tests if 1 is in the values of array1 at indices array2 or indices array3. Let me make a MWE:
julia> array2 = [2, 3]
2-element Array{Int64,1}:
2
3
julia> array3 = [4, 5]
2-element Array{Int64,1}:
4
5
julia> array1 = [1, 9, 1, 2, 3]
5-element Array{Int64,1}:
1
9
1
2
3
julia> 1 in array1[array2] || 1 in array1[array3]
true
julia> array1 = [1, 9, 4, 2, 3] # now only at the 1st position is there a 1
5-element Array{Int64,1}:
1
9
4
2
3
julia> 1 in array1[array2] || 1 in array1[array3]
false
Your understanding of this line is correct:
Edit: Ok, after reading your description again, I am not sure the understanding is correct, but hopefully this explanation will clarify it.
if 1 in array1[array2] || 1 in array1[array3]
You choose the elements in array1 at indexes given by array2 and array3 and check if any of these elements is a 1
So if your array1 is [0, 1, 2, 3, 4, 5, 6] and array2 is [1, 3, 4], array1[array2] would be [0, 2, 3] (remember, arrays are indexed by 1 in Julia!) and thus 1 in array1[array2] would evlauate to false
You can achieve something similar with numpy but remember that since python indexes by 0, you have to subtract 1 from the indexes if you want the code stay equivalent for same input data:
array1 = np.array([...]) # Fill your arrays with the data
array2 = np.array([...])
array3 = np.array([...])
if 1 in array1[array2 - 1] or 1 in array1[array3 - 1]:
# Rest of code
The syntax 1 in <array> is same as in Julia, it evaluates to True if the value 1 is contained in <array>
If these are regular python lists (and built assuming 0 indexed arrays) you could do
array1 = [1,1,2,2,3,3]
array2 = [0,2]
array3 = [3,5]
print(any(array1[i]==1 for i in array2), any(array1[i]==1 for i in array3))
if any(array1[i]==1 for i in array2) or any(array1[i]==1 for i in array3):
print("Yes")
This would fail if indicies in the second two arrays overflow the first. In that case you could scrub them first.
array2 = [i for i in array2 if i < len(array1)]
array3 = [i for i in array3 if i < len(array1)]
I have written a program that eliminates the items in a list and outputs the other list.
Program :
r = [5,7,2]
for i in range(10):
if i != r:
print i
It outputs
0
1
2
3
4
5
6
7
8
9
But I want the desired output to be
0
1
3
4
6
8
9
What is the method to do so?
When you do - i !=r its always true, because an int and a list are never equal. You want to use the not in operator -
r = [5,7,2]
for i in range(10):
if i not in r:
print i
From python documentation -
The operators in and not in test for collection membership. x in s evaluates to true if x is a member of the collection s, and false otherwise. x not in s returns the negation of x in s .
You are checking if a integer is not equal to to list .which is right so it prints all the value
What you really wanted to do is check if the value is not available in the list .So you need to use not in operator
r = [5,7,2]
for i in range(10):
if i not in r:
print i
You can try like this,
>>> r = [5, 7, 2]
>>> for ix in [item for item in range(10) if item not in r]:
... print ix
...
0
1
3
4
6
8
9
Using set
>>> r = [5,7,2]
>>> for i in set(range(10))-set(r):
... print(i)
...
0
1
3
4
6
8
9
>>>