Set column based on another column np - python

I want to make sure column 2 is smaller than column 1 and where it is just set to 0
x = np.array([[0,1],[1,0]])
x = np.where((x[1] > (x[0])), 0, x)
print(x)=>[[0,0],[1,0]]

Maybe this help you:
arr = np.array([[0,1],[1,0]])
arr[arr[:,1] > arr[:,0], 1] = 0
print(arr)
Output:
array([[0, 0],
[1, 0]])

You started with a list (of lists), so I'll give you a list answer.
First define a simple helper function:
def foo(row):
if row[1]<row[0]:
row[1] = 0
return row
And apply it to x row by row:
In [37]: x = [[0,1],[1,0]]
In [38]: [foo(row) for row in x]
Out[38]: [[0, 1], [1, 0]]

Related

Pandas dataframe to 3D array

I have a dataframe like this
group b c d e label
A 0.577535 0.299304 0.617103 0.378887 1
0.167907 0.244972 0.615077 0.311497 0
B 0.640575 0.768187 0.652760 0.822311 0
0.424744 0.958405 0.659617 0.998765 1
0.077048 0.407182 0.758903 0.273737 0
I want to reshape it into a 3D array which an LSTM could use as input, using padding. So group A should feed in a sequence of length 3 (after padding) and group B of length 3. Desired output something like
array1 = [[[0.577535, 0.299304, 0.617103, 0.378887],
[0.167907, 0.244972, 0.615077, 0.311497],
[0, 0, 0, 0]],
[[0.640575, 0.768187, 0.652760, 0.822311],
[0.424744, 0.958405, 0.659617, 0.998765],
[0.077048, 0.407182, 0.758903, 0.273737]]]
and then the labels have to be reshaped accordingly too
array2 = [[1,
0,
0],
[0,
1,
0]]
How can I put in the padding and reshape my data?
You can first use cumcount to create a count for each group, reindex by MultiIndex.from_product and fill with 0, and finally export to list:
df["count"] = df.groupby("group")["label"].cumcount()
mux = pd.MultiIndex.from_product([df["group"].unique(), range(max(df["count"]+1))], names=["group","count"])
df = df.set_index(["group","count"]).reindex(mux, fill_value=0)
print (df.iloc[:,:4].groupby(level=0).apply(pd.Series.tolist).values.tolist())
[[[0.577535, 0.299304, 0.617103, 0.378887],
[0.167907, 0.24497199999999997, 0.6150770000000001, 0.31149699999999997],
[0.0, 0.0, 0.0, 0.0]],
[[0.640575, 0.768187, 0.65276, 0.822311],
[0.42474399999999995, 0.958405, 0.659617, 0.998765],
[0.077048, 0.40718200000000004, 0.758903, 0.273737]]]
print (df.groupby(level=0)["label"].apply(list).tolist())
[[1, 0, 0], [0, 1, 0]]
I'm assuming your group column consists of many values and not just 1 'A' and 1 'B'. This code worked for me, you can give it a try as well:
import pandas as pd
df = pd.read_csv('file2.csv')
vals = df['group'].unique()
array1 = []
array2 = []
for val in vals:
val_df = df[df.group == val]
val_label = val_df.label
smaller_array = []
label_small_array = []
for label in val_label:
label_small_array.append(label)
array2.append(label_small_array)
for i in range(val_df.shape[0]):
smallest_array = []
for j in val_df.columns:
smallest_array.append(j)
smaller_array.append(smallest_array)
array1.append(smaller_array)

Dynamic way to compute linear constraints with multiple operators

Imagine a matrix A having one column with a lot of inequality/equality operators (≥, = ≤) and a vector b, where the number of rows in A is equal the number of elements in b. Then one row, in my setting would be computed by, e.g
dot(A[0, 1:], x) ≥ b[0]
where x is some vector, column A[,0] represents all operators and we'd know that for row 0 we were suppose to calculate using ≥ operator (e.i. A[0,0] == "≥" is true). Now, is there a way for dynamically calculate all rows in following so far imaginary way
dot(A[, 1:], x) A[, 0] b
My hope was for a dynamic evaluation of each row where we evaluate which operator is used for each row.
Example, let
A = [
[">=", -2, 1, 1],
[">=", 0, 1, 0],
["==", 0, 1, 1]
]
b = [0, 1, 1]
and x be some given vector, e.g. x = [1,1,0] we wish to compute as following
A[,1:] x A[,0] b
dot([-2, 1, 1], [1, 1, 0]) >= 0
dot([0, 1, 0], [1, 1, 0]) >= 1
dot([0, 1, 1], [1, 1, 0]) == 1
The output would be [False, True, True]
If I understand correctly, this is a way to do that operation:
import numpy as np
# Input data
a = [
[">=", -2, 1, 1],
[">=", 0, 1, 0],
["==", 0, 1, 1]
]
b = np.array([0, 1, 1])
x = np.array([1, 1, 0])
# Split in comparison and data
a0 = np.array([lst[0] for lst in a])
a1 = np.array([lst[1:] for lst in a])
# Compute dot product
c = a1 # x
# Compute comparisons
leq = c <= b
eq = c == b
geq = c >= b
# Find comparison index for each row
cmps = np.array(["<=", "==", ">="]) # This array is lex sorted
cmp_idx = np.searchsorted(cmps, a0)
# Select the right result for each row
result = np.choose(cmp_idx, [leq, eq, geq])
# Convert to numeric type if preferred
result = result.astype(np.int32)
print(result)
# [0 1 1]

I want to take the XOR of all the elements of 1 list with another. How do I do it? [duplicate]

This question already has answers here:
How do you get the logical xor of two variables in Python?
(28 answers)
Closed 3 years ago.
I have a bunch of lists in the form of say [0,0,1,0,1...], and I want to take the XOR of 2 lists and give the output as a list.
Like:
[ 0, 0, 1 ] XOR [ 0, 1, 0 ] -> [ 0, 1, 1 ]
res = []
tmp = []
for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
for j in Course_Specific_Vocabulary_Dict['Binary Vector']:
tmp = [i[index] ^ j[index] for index in range(len(i))]
res.append(temp)
The size of each of my lists / vectors is around 3500 elements, so I need something to save time, since this piece of code is taking more than 20 mins to run.
I have 3085 lists, each of which need an XOR operation with 4089 other lists.
How do I do this without iterating through each list explicitly?
Use map:
answer = list(map(operator.xor, lst1, lst2)).
or zip:
answer = [x ^ y for x,y in zip(lst1, lst2)]
If you need something faster, consider using NumPy instead of Python lists to hold your data.
Assuming a and b are the same size you can use the xor operation (i.e. ^) with simple list indexing:
a = [0, 0, 1]
b = [0, 1, 1]
c = [a[index] ^ b[index] for index in range(len(a))]
print(c) # [0, 1, 0]
or you can use zip with the xor:
a = [0, 0, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip(a, b)]
print(c) # [0, 1, 0]
zip will only go to the shortest list (if they are not the same size). If they are not the same size and you want to go to the longer list you can use zip_longest:
from itertools import zip_longest
a = [0, 0, 1, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip_longest(a, b, fillvalue=0)]
print(c) # [0, 1, 0, 1]
Using numpy you should have some performance gains, the function you need is bitwise_xor, like so:
import numpy as np
results = []
for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
for j in Course_Specific_Vocabulary_Dict['Binary Vector']:
results.append(np.bitwise_xor(i, j))
A proof of concept:
a = [1,0,0,1,1]
b = [1,1,0,0,1]
x = np.bitwise_xor(a,b)
print("a\tb\tres")
for i in range(len(a)):
print("{}\t{}\t{}".format(a[i], b[i], x[i]))
output:
a b x
1 1 0
0 1 1
0 0 0
1 0 1
1 1 0
Edit
Note that if your arrays have the same size, you can simply do one operation and the bitwise_xor will still work, so:
a = [[1,1,0], [0,0,1]]
b = [[0,1,0], [1,0,1]]
res = np.bitwise_xor(a, b)
will still work, and you'll have:
res: [[1, 0, 0], [1, 0, 0]]
In your case, a workaround would possibily be:
results = []
n = len(Course_Specific_Vocabulary_Dict['Binary Vector'])
for a in Employee_Specific_Vocabulary_Dict['Binary Vector']:
# Get same size array w.r.t Course_Specific_Vocabulary_Dict["Binary Vector]
repeated_a = np.repeat([a], n, axis=0)
results.append(np.bitwise_xor(repeated_a, Course_Specific_Vocabulary_Dict['Binary Vector']))
However I don't know if that would actually improve performance, it is to be checked; for sure it will require some more memory.

creating a function that makes a new universum with all elements [0] PYTHON

I want to create a function named create_empty_universum in matlab. This function will make a new universum with all elements on zero. This universum must have the n x m of a given matrix ( n length of a row, m length of a column)
for example.
I have a matrix m given.
I = len(m) #I is the amount of rows
J = len(m[0]) #J is the amount of columns
New_matrix =[]
row= I*[0]
index = 0
def create_empty_universum():
while index < J :
New_matrix.append(row)
index +=1
return New_matrix
but my new matrix remains [] how does this come?
You want to use the multiply operator on a list:
>>> cols = 4
>>> rows = 3
>>> [0] * cols
[0, 0, 0, 0]
>>> [[0] * cols] * rows
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
If you really want to use a helper function:
def create_empty_universum(cols, rows, cell=0):
return [[cell] * cols] * rows
Update:
See #tobias_k's comment: You should use [[0]*cols for i in range(rows)] to have unrelated rows.

Python - best way to set a column in a 2d array to a specific value

I have a 2d array, I would like to set a column to a particular value, my code is below. Is this the best way in python?
rows = 5
cols = 10
data = (rows * cols) *[0]
val = 10
set_col = 5
for row in range(rows):
data[row * cols + set_col - 1] = val
If I want to set a number of columns to a particular value , how could I extend this
I would like to use the python standard library only
Thanks
NumPy package provides powerful N-dimensional array object. If data is a numpy array then to set set_col column to val value:
data[:, set_col] = val
Complete Example:
>>> import numpy as np
>>> a = np.arange(10)
>>> a.shape = (5,2)
>>> a
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> a[:,1] = -1
>>> a
array([[ 0, -1],
[ 2, -1],
[ 4, -1],
[ 6, -1],
[ 8, -1]])
A better solution would be:
data = [[0] * cols for i in range(rows)]
For the values of cols = 2, rows = 3 we'd get:
data = [[0, 0],
[0, 0],
[0, 0]]
Then you can access it as:
v = data[row][col]
Which leads to:
val = 10
set_col = 5
for row in range(rows):
data[row][set_col] = val
Or the more Pythonic (thanks J.F. Sebastian):
for row in data:
row[set_col] = val
There's nothing inherently wrong with the way you're using, except that it would be clearer to name the variableset_col than set_row since you're setting a column.
So set a number of columns, just wrap it with another loop:
for set_col in [...columns that have to be set...]
One concern, though: your 2D array is unusual in that it's packed in a 1D array (Python can support 2D arrays via lists of lists as well), so I would wrap it all with methods or functions.
In your case rows and columns are probably interchangeable, i.e. it's matter of semantics which is which. If this is the case, then you could make columns to occupy sequence of cells in data list, and then zero them using just:
data[column_start:column_start+rows] = rows * [0]
An earlier answer left out a range, so you could try the following:
cols = 7
rows = 8
data = [[0] * cols for i in range(rows)]
val = 10
set_col = 5
for row in data:
row[set_col] = val
to extend this to a number of columns you could store the column number and it's value in a dict. So to set colum 5 to 10 and column 2 to 7:
cols = 7
rows = 8
data = [[0] * cols for i in range(rows)]
valdict = {5:10, 2:7}
for col, val in valdict.items():
for row in data:
row[col] = val
Swapping the rows and columns, as suggested in another answer, makes this slightly simpler:
cols = 7
rows = 8
data = [[0] * rows for i in range(cols)]
valdict = {5:10, 2:7}
for col, val in valdict.items():
data[col] = [val] * rows

Categories