How to delete DF rows based on multiple column conditions? - python

Here's an example of DF:
EC1 EC2 CDC L1 L2 L3 L4 L5 L6 VNF
0 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [1, 0]
1 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1]
2 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [-1, 0]
3 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, -1]
4 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [1, 0]
5 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [0, 1]
6 [1, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [-1, 0]
How to delete those rows where df['VNF'] = [-1, 0] or [0, -1] and df['EC1'], df['EC2'] and df['CDC'] has a value of 0 in the same index position as the -1 in df['VNF'])?
The expected result would be:
EC1 EC2 CDC L1 L2 L3 L4 L5 L6 VNF
0 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [1, 0]
1 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1]
2 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [1, 0]
3 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [0, 1]
4 [1, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [-1, 0]
Here's the constructor for the DataFrame:
data = {'EC1': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [1, 0]],
'EC2': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'CDC': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
'L1': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'L2': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'L3': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'L4': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'L5': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
'L6': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
'VNF': [[1, 0], [0, 1], [-1, 0], [0, -1], [1, 0], [0, 1], [-1, 0]]}

You can explode every column of df, then identify the elements satisfying the first (sum of "VNF" values must be -1) and second condition and filter out the elements that satisfy both conditions to create temp. Then since each cell must have two elements, you can count whether each index contains 2 elements by transforming count, then filter the rows with two indices and groupby the index and aggregate to list:
exploded = df.explode(df.columns.tolist())
first_cond = exploded.groupby(level=0)['VNF'].transform('sum').eq(-1)
second_cond = exploded['VNF'].eq(-1) & exploded['EC1'].eq(0) & exploded['EC2'].eq(0) & exploded['CDC'].eq(0)
temp = exploded[~(first_cond & second_cond)]
out = temp[temp.groupby(level=0)['VNF'].transform('count').gt(1)].groupby(level=0).agg(list).reset_index(drop=True)
Output:
EC1 EC2 CDC L1 L2 L3 L4 L5 L6 \
0 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0]
1 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0]
2 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1]
3 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1]
4 [1, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1]
VNF
0 [1, 0]
1 [0, 1]
2 [1, 0]
3 [0, 1]
4 [-1, 0]

List comprehension to find which indexes to drop might help see the conditions more directly:
columns = df.EC1, df.EC2, df.CDC, df.VNF
inds_to_drop = [iloc
for iloc, (ec1, ec2, cdc, vnf) in enumerate(zip(*columns))
if vnf == [-1, 0] or vnf == [0, -1]
if all(val[idx] == 0
for idx in (vnf.index(-1),) for val in (ec1, ec2, cdc))]
new_df = df.drop(df.index[inds_to_drop])
to get
>>> new_df
EC1 EC2 CDC L1 L2 L3 L4 L5 L6 VNF
0 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [1, 0]
1 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1]
4 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [1, 0]
5 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [0, 1]
6 [1, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [-1, 0]
The list comprehension starts with the outer for loop over the column values and their integer indexes:
for iloc, (ec1, ec2, cdc, vnf) in enumerate(zip(*columns))
Then the first condition to drop kicks in:
df['VNF'] = [-1, 0] or [0, -1]
if vnf == [-1, 0] or vnf == [0, -1]
And the second condition:
df['EC1'], df['EC2'] and df['CDC'] has a value of 0 in the same index position as the -1 in df['VNF'])
if all(val[idx] == 0 for idx in (vnf.index(-1),) for val in (ec1, ec2, cdc))
Here, we check if all of the values of 3 columns satisfy the criterion. A trick here is 1-turn loop for idx in (vnf.index(-1),) so as to evaluate the index of -1 only once (compare with val[vnf.index(-1)] for val in (ec1, ec2, cdc); less efficient).
Then the list is comprehended with the integer index locations of rows to drop:
>>> inds_to_drop
[2, 3]
If you have a RangeIndex, i.e., 0..N-1 kind of index, then you can directly say new_df = df.drop(inds_to_drop). But if custom index (e.g., ["a", "d", "e", "f"]), we lookup the real index labels with df.index[inds_to_drop] and then drop (would be "e", "f"); this covers all cases.

Related

how to create or expand an identity-like matrix in python

I'm trying to creating a matrix like this:
[[A 0 0],
[0 B 0],
[0 0 C]]
in which A,B,C could be either a submatrix or a constant
suppose I got one of the submatrix first:
[[1 2],
[3 4]]
then got the next:
[[5 0 0],
[0 6 0],
[0 0 7]]
How can I concat them into the format like below?
[[1 2 0 0 0],
[3 4 0 0 0],
[0 0 5 0 0],
[0 0 0 6 0],
[0 0 0 0 7]]
You can simply use scipy.linalg.block_diag
As follow:
from scipy.linalg import block_diag
A = [[1, 2], [3, 4]]
B = [[5, 0, 0],
[0, 6, 0],
[0, 0, 7]]
block_diag(A, B)
Output:
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0],
[0, 0, 5, 0, 0],
[0, 0, 0, 6, 0],
[0, 0, 0, 0, 7]])

How to change a specific element in a matrix to match another matrix?

I am writing a program that involves me to create a matrix 'B' edited from another matrix 'A'. Both the matrix have the same size and I simply want that for every position where matrix 'A' contains a 1, matrix 'B' also contains a 1 in that position. For example:
if __name__ == '__main__':
mat_A = [[0, 0, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 0, 0]]
R = len(mat_A)
C = len(mat_A[1])
mat_B = [[0]*C]*R #Initialise matrix B to be the same size as A
for i in range (R):
for j in range (C):
if mat_A[i][j] == 1:
mat_B[i][j] = 1
print(mat_B)
However, in this case, it prints me an output like such:
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]]
Process finished with exit code 0
This tells me that the code is finding an instance where mat_A[i][j] = 1 and then changing the entire mat_B together. Shouldn't it only affect the specific position in 'B' rather than all?
Thank you for your help.
p.s. The above is only a very simple example I wrote to try and debug. There are multiple complex steps in the if loop.
The line
mat_B = [[0]*C]*R
creates a list of length R where each element is the same list consisting of zeros. If you change one of the sublists of mat_B, you change them all, since they are all the same list. You can fix this, for example, as follows:
mat_B = [[0]*C for i in range(R)]
After that, your code should work fine.
As a side note, it is easier to accomplish such operations using numpy arrays:
import numpy as np
mat_A = np.array([[1, 2, 3], [0, 1, 7], [3, 1, 0], [0, 1, 0], [0, 2, 4]])
mat_B = np.zeros_like(mat_A)
mat_B[mat_A == 1] = 1

Python Numpy stack 2d arrays in vector

So, I would like to stack couple 2d arrays to vector so it would look like this:
[[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]]
I can make smth like this:
import numpy as np
a = np.zeros((5, 5), dtype=int)
b = np.zeros((5, 5), dtype=int)
c = np.stack((a, b), 0)
print(c)
To get this:
[[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]]
But I cant figure out how to add third 2d array to such vector or how to create such vector of 2d arrays iteratively in a loop. Append, stack, concat just dont keep the needed shape
So, any suggestions?
Thank you!
Conclusion:
Thanks to Tom and Mozway we've got two answers
Tom's:
data_x_train = x_train[np.where((y_train==0) | (y_train==1))
Mozway's:
out = np.empty((0,5,5))
while condition:
# get new array
a = XXX
out = np.r_[out, a[None]]
out
Assuming the following arrays:
a = np.ones((5, 5), dtype=int)
b = np.ones((5, 5), dtype=int)*2
c = np.ones((5, 5), dtype=int)*3
You can stack all at once using:
np.stack((a, b, c), 0)
If you really need to add the arrays iteratively, you can use np.r_:
out = a[None]
for i in (b,c):
out = np.r_[out, i[None]]
output:
array([[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2]],
[[3, 3, 3, 3, 3],
[3, 3, 3, 3, 3],
[3, 3, 3, 3, 3],
[3, 3, 3, 3, 3],
[3, 3, 3, 3, 3]]])
edit: if you do not know the arrays in advance
out = np.empty((0,5,5))
while condition:
# get new array
a = XXX
out = np.r_[out, a[None]]
out
Do you mean something like:
np.tile(a, (3, 1, 1))
array([[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]])
Edit:
Do you mean something like:
test = np.tile(a, (3000, 1, 1))
filtered_subset = test[[1, 10, 100], :, :]
array([[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]])

Unexpected behaviour in list value change

I defined this function:
def newMap(dim, n):
tc = [0 for i in range(n)]
return [[tc for _ in range(dim)] for _ in range(dim)]
Which creates a list of lists of zeroes. For example
m = newMap(2,2)
print(m)
returns
[[[0, 0], [0, 0]], [[0, 0], [0, 0]]]
I want to change one os the zeroes to obtain [[[0, 0], [0, 0]], [[0, 0], [0, 0]]] and tried doing so by
m[0][0][0] = 1
which, unexpectedly returns [[[1, 0], [1, 0]], [[1, 0], [1, 0]]] instead of [[[1, 0], [0, 0]], [[0, 0], [0, 0]]].
However, if I defined a = [[[0, 0], [0, 0]], [[0, 0], [0, 0]]], and then do
a[0][0][0] = 1
print(a)
it returns [[[1, 0], [0, 0]], [[0, 0], [0, 0]]], which is what I want.
Why does this happen? Shouldn't the two definitions be equivalent? How can I prevent it from happening in the first case?
Use tc.copy() this should fix it, i tried it and it works:
def newMap(dim, n):
tc = [0 for i in range(n)]
return [[tc.copy() for _ in range(dim)] for _ in range(dim)]
a = newMap(2,2)
a
#[[[0, 0], [0, 0]], [[0, 0], [0, 0]]]
a[0][0][0] = 1
#[[[1, 0], [0, 0]], [[0, 0], [0, 0]]]

Enumeration of balls in basket with a specific order

I'd like to enumerate the solution with a specific order. Currently, with the below code:
def balls_in_baskets(balls=1, baskets=1):
if baskets == 1:
yield [balls]
elif balls == 0:
yield [0]*baskets
else:
for i in range(balls+1):
for j in balls_in_baskets(balls-i, 1):
for k in balls_in_baskets(i, baskets-1):
yield j+k
x=[t for t in balls_in_baskets(3,3)][::-1]
for i in x:
print(i)
I get this:
[0, 0, 3]
[0, 1, 2]
[0, 2, 1]
[0, 3, 0]
[1, 0, 2]
[1, 1, 1]
[1, 2, 0]
[2, 0, 1]
[2, 1, 0]
[3, 0, 0]
However, I would like this order:
[0, 0, 3]
[0, 1, 2]
[1, 0, 2]
[0, 2, 1]
[1, 1, 1]
[2, 0, 1]
[0, 3, 0]
[1, 2, 0]
[2, 1, 0]
[3, 0, 0]
How can I achieve this correct order?
You already lose the memory-efficiency of your generator by consuming it in a list comprehension so you could also sort the result:
x = sorted(balls_in_baskets(3,3), key=lambda x: x[::-1], reverse=True)
which then prints the expected output:
[0, 0, 3]
[0, 1, 2]
[1, 0, 2]
[0, 2, 1]
[1, 1, 1]
[2, 0, 1]
[0, 3, 0]
[1, 2, 0]
[2, 1, 0]
[3, 0, 0]

Categories