Related
As in the title, if I have a matrix a
a = np.diag(np.arange(5))
array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4]])
How can I assign a new 4x4 matrix or even 3x4 matrix to a without i-th row and i-th column? Let's say
b = array([[1,1,1,1],
[1,1,1,1],
[1,1,1,1])
I want to slice a and remove the first and second row and the second column of the matrix, which is something in R like
a[c(-1,-2), -2] = b
a =
array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1]])
But in python, I tried something like
a[[2,3,4],:][:,[0,1,3,4]]
output:
array([0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
This operation won't allow me to assign a new matrix to slices of a.
How can I do that? I really appreciate any help you can provide.
p.s.
I found in this special case, I can assign values by blocks. But what I actually want to ask is when we do slice like a[2:5, [0,2,3,4]], we can get a 3x4 matrix, and assign a new matrix to that position of the matrix. But I want to do is to slice 'a[[0,2,3,4],[0,2,3,4]]` to get a 4x4 matrix or other shapes(the index for row and column may even be random), and assign a new matrix to that position. But numpy gives me a 1d array.
newmatrix = a[[0, 1, 3, 4], :][:, [0, 1, 3, 4]]
Regarding setting the values of a matric part of a larger matrix, I think there is no direct option. But you can create the original matrix around the one to be added:
before = np.array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4]])
insert_array = np.array([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
first two rows without second column
first_step = np.delete(before[:2, :], 1, 1)
or
first_step = before[:2, [0, 2, 3, 4]]
appended to insert matrix
second_step = np.insert(insert_array, 0, first_step, axis=0)
second column appended
third_step = np.insert(second_step, 1, before[:, 1], axis=1)
final matrix
third_step = np.array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1]])
I can't find a one-step solution to do that. But I think we can assign matrix by block.
a[2:5, 0] = 1
a[2:5, 2:5] = 1
Then I can get what I want.
matrix = np.array([[0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]])
vector = np.array([0,0,0,0])
For vectors, you can edit every other element like so
vector[1::2] = 1
This gives
np.array([0,1,0,1])
However;
matrix[1::2] = 1
yields
np.array([[0,0,0,0],[1,1,1,1],[0,0,0,0],[1,1,1,1]])
I would like the output
np.array([[0,1,0,1],[0,1,0,1],[0,1,0,1],[0,1,0,1]])
There is a brute force approach to take the shape of the array, flatten it, use [1::2], and reshape, but i'm sure there is a more elegant solution i am missing.
Any help would be appreciated.
You can do something similar with multidimensional indexing
>>> matrix
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
>>> matrix[:,1::2] = 1
>>> matrix
array([[0, 1, 0, 1],
[0, 1, 0, 1],
[0, 1, 0, 1],
[0, 1, 0, 1]])
In what I am working on, I have two numpy matrices, both the same size, filled with 0's and 1's for simplicity (but let's say it could be filled with any numbers). What I would like to know is a way to extract, from these two matrices, the position of the 1's that exist in the same position in both matrices.
For example, if I have the following two matrices and value
a = np.array([[0, 0, 0, 1, 0, 1],
[1, 1, 0, 1, 1, 1],
[1, 0, 1, 1, 0, 1],
[1, 0 ,1, 1, 1, 0],
[0, 0, 1, 0, 0, 0]])
b = np.array([[0, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 0]])
value = 1
then I would like a way to somehow get the information of all the locations where the value "1" exists in both matrices, i.e.:
result = [(0,5),(1,1),(2,3),(4,2)]
I guess the result could be thought of as an intersection, but in my case the position is important which is the reason I don't think np.intersect1d() would be much help. In the actual matrices I am working with, they are on the order of about 10,000 by 10,000, so this list would probably be a lot longer.
Thanks in advance for any help!
You could use numpy.argwhere:
import numpy as np
a = np.array([[0, 0, 0, 1, 0, 1],
[1, 1, 0, 1, 1, 1],
[1, 0, 1, 1, 0, 1],
[1, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 0, 0]])
b = np.array([[0, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 0]])
result = np.argwhere(a & b)
print(result)
Output
[[0 5]
[1 1]
[2 3]
[2 5]
[4 2]]
I am trying to make a special diagonal matrix that looks like this:
[[1,1,0,0,0,0],
[0,0,1,1,0,0],
[0,0,0,0,1,1]]
It is slightly different from the question here: Make special diagonal matrix in Numpy
I tried tweaking the solution but couldn't quite get it.
Appreciate any advice on how to achieve this efficiently.
Not as elegant as in comments, but :
a=4 # number of rows
b=a*2 #number of columns
np.array((([1]*2+[0]*b)*a)[:-b]).reshape(a,b)
array([[1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1]])
works for any a.
I have a long nested list in which each embedded list can have a different length or elements. I would like to flatten it in order to use each variable as a predictor in my model. The nested list looks like this:
[[u'Burgers',u'Bars'],[u'Local Services', u'Dry Cleaning & Laundry'],[u'Shopping', u'Eyewear & Opticians'],[u'Restaurants'],...]
What I would like to achieve is something that I can use as a predictor in a model, especially in sklearn Machine Learning. The elements in the list should be used to predict the interested variable, which is the score. The desired result of the conversion would be like
[[1, 1, 0, 0, 0, 0, 0], [0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 0, 1, 1, 0], [0, 0, 0, 0, 0, 0, 1], ...]
Could someone give me a hand how to make this transformation? I am pretty stuck here. Thank you.
A pandas approach would be:
import pandas as pd
L = [['Burgers', 'Bars'], ['Local Services', 'Dry Cleaning & Laundry'], ['Shopping', 'Eyewear & Opticians'], ['Restaurants']]
ser = pd.Series([';'.join(i) for i in L]).str.get_dummies(';')
You can get the array with .values:
ser.values
array([[1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0]], dtype=int64)
This assumes you don't have ; in those strings - you can change it with another separator. For scikit-learn you'd normally use OneHotEncoder but that also requires some preprocessing (label encoding first) so it seems easier with pandas.
You can first flatten the list then build the scores for the various classes from the flattened list, with a nested list comprehension by assigning 1 to values found in a given sublist (called category) and 0 if the class is not found
Y is the original list of classes to be predicted:
from itertools import chain
Y = [[u'Burgers',u'Bars'],[u'Local Services', u'Dry Cleaning & Laundry'],[u'Shopping', u'Eyewear & Opticians'],[u'Restaurants']]
classes = list(chain.from_iterable(Y))
scores = [[1 if c in category else 0 for c in classes] for category in Y]
print(scores)
# [[1, 1, 0, 0, 0, 0, 0], [0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 0, 1, 1, 0], [0, 0, 0, 0, 0, 0, 1]]