Related
I am trying to find a way to replace all of the duplicate 1 with 0. As an example:
[[0,1,0,1,0],
[1,0,0,1,0],
[1,1,1,0,1]]
Should become:
[[0,1,0,0,0],
[1,0,0,0,0],
[1,0,0,0,0]]
I found a similar problem, however the solution does not seem to work numpy: setting duplicate values in a row to 0
Assume array contains only zeros and ones, you can find the max value per row using numpy.argmax and then use advanced indexing to reassign the values on the index to a zeros array.
arr = np.array([[0,1,0,1,0],
[1,0,0,1,0],
[1,1,1,0,1]])
res = np.zeros_like(arr)
idx = (np.arange(len(res)), np.argmax(arr, axis=1))
res[idx] = arr[idx]
res
array([[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])
Try looping through each row of the grid
In each row, find all the 1s. In particular you want their indices (positions within the row). You can do this with a list comprehension and enumerate, which automatically gives an index for each element.
Then, still within that row, go through every 1 except for the first, and set it to zero.
grid = [[0, 1, 0, 1, 0], [1, 0, 0, 1, 0], [1, 1, 1, 0, 1]]
for row in grid:
ones = [i for i, element in enumerate(row) if element==1]
for i in ones[1:]:
row[i] = 0
print(grid)
Gives: [[0, 1, 0, 0, 0], [1, 0, 0, 0, 0], [1, 0, 0, 0, 0]]
You can use cumsum:
(arr.cumsum(axis=1).cumsum(axis=1) == 1) * 1
this will create a cummulative sum, by then checking if a value is 1 you can find the first 1s
Hey so i basically have a problem like this:
i have a numpy array which contains a matrix of values, for example:
Data = np.array([
[3, 0, 1, 5],
[0, 0, 0, 7],
[0, 3, 0, 0],
[0, 0, 0, 6],
[5, 1, 0, 0]])
Using another array i want to extract the specific values and sum them together, this is a bit hard to explain so ill just show an example:
values = np.array([3,1,3,4,2])
so this means we want the first 3 values of the first row, first value of the second row, first 3 values of the 3rd row, first 4 values of the 4th row and first 2 values of the the last row, so we only want this data:
final_data = np.array([
[3, 0, 1],
[0],
[0, 3, 0],
[0, 0, 0, 6],
[5, 1]])
then we want to get the sum amount of those values, in this case the sum value will be 19.
Is there any easy way to do this? also, the data isn't always the same size so i cant have any fixed variables.
An even better answer:
Data[np.arange(Data.shape[1])<values[:,None]].sum()
You can try:
sum([Data[i, :j].sum() for i, j in enumerate(values)])
You can accomplish this with advanced indexing. The advanced coordinates can be calculated separately before pulling them from the array.
Explicitly:
Data = np.array([
[3, 0, 1, 5],
[0, 0, 0, 7],
[0, 3, 0, 0],
[0, 0, 0, 6],
[5, 1, 0, 0]])
values = np.array([3,1,3,4,2])
X = [0,0,0,1,2,2,2,3,3,3,3,4,4]
Y = [0,1,2,0,0,1,2,0,1,2,3,0,1]
Data[X,Y]
Notice X is the number of times to access each row and Y is the column to access with each X. These can be calculated from values directly:
X = np.concatenate([[n]*i for n,i in enumerate(values)])
Y = np.concatenate([np.arange(i) for i in values])
Create a two-dimensional array named A with ROWS rows and COLS columns. ROWS and COLSS are specified by the user at run time. Fill A with randomly-chosen integers from the range [ -10,99 ], then repeatedly perform the following steps until end-of-file(1) input an integer x(2) search for x in A(3) when x is found in A, output the coordinate (row,col) where x is found, otherwise output the message "x not found!"
I need help I am wondering how can we define two-dimensional array named A with ROWS rows and COLS columns. ROWS and COLSS are specified by the user at runtime in python latest version
#--------------------------------------
#Hw 7
#E80
#---------------------------------------
A = [[Rows],[ColSS]] #I really dont know how to defend this part
for i in range (-10,99): #dont worry about this its just the logic not the actual code
x = int(input("Enter a number : "))
if x is found in A
coordinate row and clumn
otherwise output "x is not found"
The idiomatic way to create a 2D array in Python is:
rows,cols = 5,10
A = [[0]*cols for _ in range(rows)]
Explanation:
>>> A = [0] * 5 # Multiplication on a list creates a new list with duplicated entries.
>>> A
[0, 0, 0, 0, 0]
>>> A = [[0] * 5 for _ in range(2)] # Create multiple lists, in a list, using a comprehension.
>>> A
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
>>> A[0][0] = 1
>>> A
[[1, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
Note you do not want to create duplicate lists of lists. It duplicates the list references so you have multiple references to the same list:
>>> A = [[0] * 5] * 2
>>> A
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
>>> A[0][0] = 1
>>> A
[[1, 0, 0, 0, 0], [1, 0, 0, 0, 0]] # both rows changed!
My actual problem need to encode strings in a data frame, as I do in the following step:
import pandas as pd
df = pd.DataFrame({"cool": list("ABC"), "not_cool": list("CBA")})
encoding = {"A": [0, 0, 1], "B": [0, 1, 0], "C": [1, 0, 0]}
Which is encoded:
df.applymap(encoding.get)
Now, what I have is a data frame where the elements are lists:
cool not_cool
[0, 0, 1] [1, 0, 0]
[0, 1, 0] [0, 1, 0]
[1, 0, 0] [0, 0, 1]
I need to expand this as matrix. How to do that? My first thought was iterate through the rows and apply numpy.hstack for joining, store it and numpy.vstack the stored rows, but it doesn't work as intended.
Other way is to this data frame to create a new one, where every column will be the n-th element of the lists. If I had this data frame, the pandas.DataFrame.values would get what I need:
1, 2, 3, 4, 5, 6 # Column names
0, 0, 1, 1, 0, 0
0, 1, 0, 0, 1, 0
1, 0, 0, 0, 0, 1
quick answer:
x = df.applymap(encoding.get)
(x.cool+x.not_cool).values # gives you matrix without the headers
# should be elementary to get labels you need in there
This adds the two columns together (adding lists actually concatenates them). The values just get the array of lists.
Updating for #mithrado comment
pd.DataFrame(np.vstack((x.cool+x.not_cool).values), columns=range(6))]
# will give you a dataframe with the required values
You seem to ask fro the columns as a another row in the DataFrame? Why would you want it that way?
I wonder what is the best way to replaces rows that do not satisfy a certain condition with zeros for sparse matrices. For example (I use plain arrays for illustration):
I want to replace every row whose sum is greater than 10 with a row of zeros
a = np.array([[0,0,0,1,1],
[1,2,0,0,0],
[6,7,4,1,0], # sum > 10
[0,1,1,0,1],
[7,3,2,2,8], # sum > 10
[0,1,0,1,2]])
I want to replace a[2] and a[4] with zeros, so my output should look like this:
array([[0, 0, 0, 1, 1],
[1, 2, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 1, 0, 1],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 2]])
This is fairly straight forward for dense matrices:
row_sum = a.sum(axis=1)
to_keep = row_sum >= 10
a[to_keep] = np.zeros(a.shape[1])
However, when I try:
s = sparse.csr_matrix(a)
s[to_keep, :] = np.zeros(a.shape[1])
I get this error:
raise NotImplementedError("Fancy indexing in assignment not "
NotImplementedError: Fancy indexing in assignment not supported for csr matrices.
Hence, I need a different solution for sparse matrices. I came up with this:
def zero_out_unfit_rows(s_mat, limit_row_sum):
row_sum = s_mat.sum(axis=1).T.A[0]
to_keep = row_sum <= limit_row_sum
to_keep = to_keep.astype('int8')
temp_diag = get_sparse_diag_mat(to_keep)
return temp_diag * s_mat
def get_sparse_diag_mat(my_diag):
N = len(my_diag)
my_diags = my_diag[np.newaxis, :]
return sparse.dia_matrix((my_diags, [0]), shape=(N,N))
This relies on the fact that if we set 2nd and 4th elements of the diagonal in the identity matrix to zero, then rows of the pre-multiplied matrix are set to zero.
However, I feel that there is a better, more scipynic, solution. Is there a better solution?
Not sure if it is very scithonic, but a lot of the operations on sparse matrices are better done by accessing the guts directly. For your case, I personally would do:
a = np.array([[0,0,0,1,1],
[1,2,0,0,0],
[6,7,4,1,0], # sum > 10
[0,1,1,0,1],
[7,3,2,2,8], # sum > 10
[0,1,0,1,2]])
sps_a = sps.csr_matrix(a)
# get sum of each row:
row_sum = np.add.reduceat(sps_a.data, sps_a.indptr[:-1])
# set values to zero
row_mask = row_sum > 10
nnz_per_row = np.diff(sps_a.indptr)
sps_a.data[np.repeat(row_mask, nnz_per_row)] = 0
# ask scipy.sparse to remove the zeroed entries
sps_a.eliminate_zeros()
>>> sps_a.toarray()
array([[0, 0, 0, 1, 1],
[1, 2, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 1, 0, 1],
[0, 0, 0, 0, 0],
[0, 1, 0, 1, 2]])
>>> sps_a.nnz # it does remove the entries, not simply set them to zero
10