Editing Large Matrix Python

Editing Large Matrix Python - python

I want to make a 34x34 Matrix consisting of entirely zeroes and ones. I have an array that lists the coordinates where all of the ones should go but don't know how to use it. The array looks like this:
0 1 1
0 2 1
0 3 1
1 1 1
where the first number in each row is the x coordinate, the second number in each row is the y coordinate, and the third number is the desired value (always 1).
I tried to create a blank matrix using Matrix=numpy.zeros(34,34) but I don't know how to change the desired values all at once.
Any idea how to take a matrix and change multiple values at once?

That's work:
a = np.array([[0,1,1],[0,2,1],[0,3,1],[1,1,1]])
m = np.zeros([5,5])
for i in range(len(a)):
m[a[i][0],a[i][1]] = a[i][2] # Or = 1 if that's always the case
And the m matrix is:
array([[ 0., 1., 1., 1., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])

Related

Creating many state vectors and saving them in a file

I want to create m number of matrices, each of which is an n x 1 numpy arrays. Moreover those matrices should have only two nonzero entries in the two rows, all other rows should have 0 as their entries, meaning that matrix number m=1 should have entries m[0,:]=m[1,:]=1, rest elements are 0. And similarly the last matrix m=m should have entries like m[n-1,:]=m[n,:]=1, where rest of the elements in other rows are 0. So for consecutive two matrices, the nonzero elements shift by two rows. And finally, I would like them to be stored into a dictionary or in a file.
What would be a neat way to do this?

Is this what you're looking for?
In [2]: num_rows = 10 # should be divisible by 2
In [3]: np.repeat(np.eye(num_rows // 2), 2, axis=0)
Out[3]:
array([[1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1.]])
In terms of storage in a file, you can use np.save and np.load.
Note that the default data type for np.eye will be float64. If you expect your values to be small when you begin integrating or whatever you're planning on doing with your state vectors, I'd recommend setting the data type appropriately (like np.uint8 for positive integers < 256 for example).

minimize runtime for numpy array manipulation

I have an 2 dimensional array with np.shape(input)=(a,b) and that looks like
input=array[array_1[0,0,0,1,0,1,2,0,3,3,2,...,entry_b],...array_a[1,0,0,1,2,2,0,3,1,3,3,...,entry_b]]
Now I want to create an array np.shape(output)=(a,b,b) in which every entry that had the same value in the input get the value 1 and 0 otherwise
for example:
input=[[1,0,0,0,1,2]]
output=[array([[1., 0., 0., 0., 1., 0.],
[0., 1., 1., 1., 0., 0.],
[0., 1., 1., 1., 0., 0.],
[0., 1., 1., 1., 0., 0.],
[1., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 1.]])]
My code so far is looking like:
def get_matrix(svdata,padding_size):
List=[]
for k in svdata:
matrix=np.zeros((padding_size,padding_size))
for l in range(padding_size):
for m in range(padding_size):
if k[l]==k[m]:
matrix[l][m]=1
List.append(matrix)
return List
But it takes 2:30 min for an input array of shape (2000,256). How can I become more effiecient by using built in numpy solutions?

res = input[:,:,None]==input[:,None,:]
Should give boolean (a,b,b) array
res = res.astype(int)
to get a 0/1 array

You're trying to create the array y where y[i,j,k] is 1 if input[i,j] == input[i, k]. At least that's what I think you're trying to do.
So y = input[:,:,None] == input[:,None,:] will give you a boolean array. You can then convert that to np.dtype('float64') using astype(...) if you want.

Explanation of fill_diagonal syntax

I'm having trouble understanding how np.fill_diagonal is implemented here.
I found a post here explaining a way to fill the sub and super diagonals with certain values but I don't really understand the arguments of the function. Here is the code:
a = np.zeros((4, 4))
b = np.ones(3)
np.fill_diagonal(a[1:], b)
np.fill_diagonal(a[:,1:], -b)
I don't understand how fill_diagonal is used here. I thought that the second argument had to be a scalar. Also, I don't understand what is happening with the slices of 'a'.

"For an array a with a.ndim >= 2, the diagonal is the list of locations with indices a[i, ..., i] all identical. This function modifies the input array in-place, it does not return a value." (Source) The documentation for this method says b should be a scalar, however if b is an array of length equal to the length of the diagonal of the input array, then it will fill the values of b in for the diagonal.
The key is that the number of elements in b is equal to the number of elements along the diagonals of each sub-array of a. The nth diagonal value of the sub-array is filled in with the nth value of b.
The first sub-array of a that is modified is all but the first row of a (this means 3 rows, 4 columns), so the number of diagonal elements is 3.
The second sub-array of a is the last three columns (4 x 3 matrix) of a which also has only 3 diagonal elements.
==========================================================================
Thanks G. Anderson for the comment. I'm editing this into the post to draw attention to it:
"It's worth noting that b doesn't have to have the same length as the diagonal it's filling. if b is longer, then n elements of the diagonal will be filled with the first n elements of b. If n is shorter than the diagonal, then b will be repeated to fill the diagonal"

Your examples involve filling slices, views, of the original array.
In [79]: a = np.zeros((4, 4))
...: b = np.arange(1,5)
In [80]:
The simple case - filling the whole array:
In [80]: np.fill_diagonal(a,b)
In [81]: a
Out[81]:
array([[1., 0., 0., 0.],
[0., 2., 0., 0.],
[0., 0., 3., 0.],
[0., 0., 0., 4.]])
fill_diagonal takes an array to be filled, and values to put in the diagonal. The docs does say scalar, but that's overly restrictive. As I show, it can be a 1d array of the right size.
In [82]: a = np.zeros((4, 4))
...: b = np.arange(1,4)
filling the last 3 rows:
In [83]: a[1:]
Out[83]:
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
In [84]: np.fill_diagonal(a[1:],b)
In [85]: a
Out[85]:
array([[0., 0., 0., 0.],
[1., 0., 0., 0.],
[0., 2., 0., 0.],
[0., 0., 3., 0.]])
In [86]: a = np.zeros((4, 4))
...: b = np.arange(1,4)
filling the last 3 columns:
In [87]: a[:,1:]
Out[87]:
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
In [88]: np.fill_diagonal(a[:,1:],b)
In [89]: a
Out[89]:
array([[0., 1., 0., 0.],
[0., 0., 2., 0.],
[0., 0., 0., 3.],
[0., 0., 0., 0.]])
The key is that fill_diagonal works in-place, and the a[1:] and a[:,1:] produce views of a.
Look at the slice of a after filling:
In [90]: a[:,1:]
Out[90]:
array([[1., 0., 0.],
[0., 2., 0.],
[0., 0., 3.],
[0., 0., 0.]])
The docs demonstrate the use with np.fliplr(a). That too, creates a view which can be modified in place.
The actual write is done with:
a.flat[:end:step] = val
where end and step have been calculated from the dimensions. For example to fill a 3x3 array, we can write to every 4th element.
In [96]: a[:,1:].ravel()
Out[96]: array([1., 0., 0., 0., 2., 0., 0., 0., 3., 0., 0., 0.])

updating specific numpy matrix columns

I have the following list of indices [2 4 3 4] which correspond to my target indices. I'm creating a matrix of zeroes with the following line of code targets = np.zeros((features.shape[0], 5)). Im wondering if its possible to slice in such a way that I could update the specific indices all at once and set those values to 1 without a for loop, ideally the matrix would look like
([0,0,1,0,0], [0,0,0,0,1], [0,0,0,1,0], [0,0,0,0,1])

I believe you can do something like this:
targets = np.zeros((4, 5))
ind = [2, 4, 3, 4]
targets[np.arange(0, 4), ind] = 1
Here is the result:
array([[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.]])

Vectorizing / Contrasting a Dataframe with Categorical Variables

Say I have a dataframe like the following:
A B
0 bar one
1 bar three
2 flux six
3 bar three
4 foo five
5 flux one
6 foo two
I would like to apply dummy-coding contrasting on it so that I get:
A B
0 0 0
1 0 2
2 1 1
3 0 2
4 2 3
5 1 0
6 2 4
(i.e. mapping every unique value to a different integer, per column).
I have tried using scikit-learn's DictVectorizer, but I get:
> from sklearn.feature_extraction import DictVectorizer as DV
> vectorizer = DV( sparse = False )
> dict_to_vectorize = df.T.to_dict().values()
> df_vec = vectorizer.fit_transform(dict_to_vectorize )
> df_vec
array([[ 1., 0., 0., 0., 1., 0., 0., 0.],
[ 1., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 1., 1., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 1.]])
This is because scikit-learn's DictVectorizer is designed to output one-of-K encoding. What I want is a simple-encoding instead (one column per variable).
How can I do this with scikit-learn and/or pandas? Aside from that, are there any other Python packages that help with general contrasting methods?

You could use pd.factorize:
In [124]: df.apply(lambda x: pd.factorize(x)[0])
Out[124]:
A B
0 0 0
1 0 1
2 1 2
3 0 1
4 2 3
5 1 0
6 2 4

The patsy package provides all the contrasts you'd need (and the ability to make more). [1] AFAIK, statsmodels is the only stats package that currently uses patsy's formula framework. [2, 3].
[1] https://patsy.readthedocs.org/en/latest/API-reference.html#handling-categorical-data
[2] http://statsmodels.sourceforge.net/devel/contrasts.html
[3] http://statsmodels.sourceforge.net/devel/example_formulas.html

Dummy encoding is what you get when you call DictVectorizer. The kind of integer encoding you get is actually different:
sklearn.preprocessing.LabelBinarizer or DictVectorizer gives dummy encoding (as pandas.get_dummies)
sklearn.preprocessing.LabelEncoder gives integer categorical encoding (as pandas.factorize)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Editing Large Matrix Python - python

Related

Creating many state vectors and saving them in a file

minimize runtime for numpy array manipulation

Explanation of fill_diagonal syntax

updating specific numpy matrix columns

Vectorizing / Contrasting a Dataframe with Categorical Variables

Categories

Resources