How to convert values to their index - python

I have a numpy array containing 1's and 0's:
a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 0]])
I'd like to convert each 1 to the index in the subarray that it's occuring at, to get this:
e = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
So far what I've done is multiply the array by a range:
a * np.arange(a.shape[0])
which is good, but I'm wondering if there's a better, simpler way to do it, like a single function call?

This modifies a in place:
In [4]: i, j = np.nonzero(a)
In [5]: a[i, j] = j
In [6]: a
Out[6]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
Make a copy if you don't want modify a in place.
Or, this creates a new array (in one line):
In [8]: np.arange(a.shape[1])[a]
Out[8]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])

Your approach is a fast as it gets but it uses the wrong dimension for the multiplication (it would fait if the matrix wasn't square).
Multiply the matrix by a range of column indexes:
import numpy as np
a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0]])
e = a * np.arange(a.shape[1])
print(e)
[[ 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 5 0 7 0 0 0]
[ 0 0 2 0 0 0 0 0 0 0 0]
[ 0 1 2 0 0 0 6 0 0 0 0]
[ 0 0 2 0 0 5 0 0 0 9 0]
[ 0 0 0 0 0 5 0 0 0 9 0]
[ 0 0 2 0 0 0 6 0 0 0 10]
[ 0 0 0 3 0 0 6 0 8 0 0]
[ 0 0 0 0 0 0 0 0 0 9 0]
[ 0 1 2 0 0 5 6 0 0 0 0]]

I benchmarked the obligatory np.einsum approach, which was ~1.29x slower for larger arrays (100_000, 1000) than the corrected original solution. The inplace solution was ~8x slower than np.einsum.
np.einsum('ij,j->ij', a, np.arange(a.shape[1]))

Related

Store python array in each entry of a column

I have got an array 'mutlilabel' which looks like this:
[[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0],
...
[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]]
and want to store each of those arrays in my target variable as I am facing a multi-label classification task. How can I achieve that? My code:
pd.DataFrame(multilabel)
Outputs multiple columns:
0 1 2 3 4 5 6 7
0 0 0 0 0 1 0 0 0
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
Thanks in advance!
df = pd.DataFrame(list(multilabel))
list_column = df.apply(lambda row: row.values, axis=1)
pd.DataFrame(list_column, columns=['list_column'])
Result df:
Have you consider using the following trick?
import pandas as pd
arr = [[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]]
pd.DataFrame([arr]).T
Output
0
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
EDIT
In case you are using numpy arrays you can use the following
import numpy as np
pd.DataFrame(np.array(arr))\
.apply(lambda x: np.array(x), axis=1)
So, the real question is why... it doesn't seem like the most useful data structure.
That said, the one-dimensional data type in pandas is the Series:
>>> pd.Series(multilabel)
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
dtype: object
You can then convert it further into a DataFrame:
>>> pd.DataFrame(pd.Series(multilabel))
0
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
Edit: Per further discussion, this works if multilabel is a nested Python list, but not if it's a NumPy array.

Looping and counting python 2d arrays

I have array like this:
[
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0]
]
I want to count value every 3 array, so the result i expected is:
[
[3, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 2, 1, 0, 0, 0, 0, 0]
]
I have no idea to loop it.
UPDATE..
this problem was solved. Thank you. I try Shijith's code this
if len(arr)%3==0:
print([[sum(y) for y in zip(arr[x],arr[x+1],arr[x+2])] for x in range(0, len(arr),3)])
Try:
result = []
for i in range(int(len(a)/3)):
result.append(np.sum(a[i*3:i*3+3], axis=0))
[array([3, 0, 0, 0, 0, 0, 0, 0, 0]),
array([0, 3, 0, 0, 0, 0, 0, 0, 0]),
array([0, 0, 2, 1, 0, 0, 0, 0, 0])]
you can use numpy.sum() along axis=0 , for every three rows in your array.
import numpy as np
if len(arr)%3==0:
print(np.array([np.sum(arr[x:x+3], axis = 0) for x in range(0, len(arr),3) ]))
[[3 0 0 0 0 0 0 0 0]
[0 3 0 0 0 0 0 0 0]
[0 0 2 1 0 0 0 0 0]]
or use simple list comprehension,
if len(arr)%3==0:
print([[sum(y) for y in zip(arr[x],arr[x+1],arr[x+2])] for x in range(0, len(arr),3)])
[[3, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 2, 1, 0, 0, 0, 0, 0]]
You can do this in pandas like this ( It splits the rows into 3, then takes the sum of each set of rows) :
import pandas as pd
import numpy as np
df=pd.DataFrame([
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0]
])
df.groupby(np.arange(len(df))//3).sum().to_numpy().tolist()
output:
[[3, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 2, 1, 0, 0, 0, 0, 0]]
For a pure non import way:
combine=[]
for x in range(3):
combine.append(list(sum((a[x*3:x*3+3]))))
list(combine)
output:
[[3, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 2, 1, 0, 0, 0, 0, 0]]
result = [[sum(three) for three in
zip(arr[first], arr[first + 1], arr[first + 2])]
for first in range(0, len(array)-len(array)%3, 3)]
print(result)
Output
[[3, 0, 0, 0, 0, 0, 0, 0, 0], [0, 3, 0, 0, 0, 0, 0, 0, 0], [0, 0, 2, 1, 0, 0, 0, 0, 0]]

Flatten arrays into a list of lists

I have this data structure:
[array([[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1]]), array([[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]]), array([[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], etc....
I want to flatten this into a list of lists something like:
[[0 1 0 1 1 1 0 5 1 0 2 1]
[1 6 1 0 0 1 1 1 2 0 2 0]
[2 0 5 0 5 2 2 0 6 3 2 2]
[1 0 1 1 1 1 0 2 0 0 0 1]]
How do we do this in python?
I believe you're looking for vstack:
>>> np.vstack(l)
array([[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Note that this is equivalent to:
>>> np.concatenate(x, axis=0)
array([[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
To convert to list, use tolist:
>>> np.vstack(l).tolist()
[[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1], [0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0], [1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
# or
>>> np.concatenate(x, axis=0).tolist()
[[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1], [0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0], [1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
Use flatten:
print([i.flatten() for i in l])
Or:
print(list(map(lambda x: x.flatten(),l)))
Both output:
[array([0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1]), array([0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1,
0]), array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0])]

Tensorflow Initialize list with 0s and 1s similar to tf.one_hot

I currently have a list of values over a time sequence say the values are [1, 3, 5, 7, 3]. Currently I am using tf.one_hot to get a one hot vector/tensor representative for each value within the list.
1 = [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
3 = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
5 = [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
7 = [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
3 = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
In Tensorflow is there a way/function that would allow me to do something similar but initialize all values from 0 to the value with 1s?
Desired Result:
1 = [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
3 = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
5 = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
7 = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]
3 = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
You are looking for tf.sequence_mask.
TensorFlow API

Github-API: How do I plot the number of commits to a repo?

I am a Python newbie, and I am trying to plot the number of commits, per day of the week.
I have the output as a pandas dataframe. However, I am unable to figure out how to plot this data using matplotlib.
Any help is greatly appreciated!
import requests
import json
import pandas as pd
r = requests.get('https://api.github.com/repos/thisismetis/dsp/stats/commit_activity')
raw = r.text
results = json.loads(raw)
print pd.DataFrame(results)
Results:
days total week
0 [0, 0, 0, 0, 0, 0, 0] 0 1431216000
1 [0, 0, 0, 0, 0, 8, 0] 8 1431820800
2 [0, 19, 1, 4, 18, 8, 0] 50 1432425600
3 [0, 3, 23, 1, 0, 0, 0] 27 1433030400
4 [1, 0, 0, 0, 1, 0, 0] 2 1433635200
5 [0, 0, 0, 0, 0, 0, 0] 0 1434240000
6 [0, 2, 0, 0, 0, 0, 0] 2 1434844800
7 [0, 0, 0, 0, 0, 0, 0] 0 1435449600
8 [0, 0, 0, 0, 0, 0, 0] 0 1436054400
9 [0, 0, 0, 0, 0, 0, 0] 0 1436659200
10 [0, 0, 8, 0, 3, 0, 0] 11 1437264000
11 [0, 3, 36, 0, 1, 9, 0] 49 1437868800
12 [0, 2, 2, 2, 5, 1, 0] 12 1438473600
13 [0, 0, 0, 0, 0, 0, 0] 0 1439078400
14 [0, 2, 0, 0, 0, 0, 0] 2 1439683200
15 [0, 0, 0, 0, 0, 0, 0] 0 1440288000
16 [0, 0, 0, 0, 0, 0, 0] 0 1440892800
17 [0, 0, 0, 0, 0, 0, 0] 0 1441497600
18 [0, 0, 0, 0, 0, 3, 0] 3 1442102400
19 [0, 0, 0, 0, 0, 0, 0] 0 1442707200
20 [0, 0, 0, 0, 0, 0, 0] 0 1443312000
21 [0, 0, 0, 0, 0, 0, 0] 0 1443916800
22 [0, 0, 0, 0, 0, 0, 0] 0 1444521600
23 [0, 0, 10, 0, 0, 0, 0] 10 1445126400
24 [0, 0, 0, 0, 0, 0, 0] 0 1445731200
25 [1, 0, 0, 0, 0, 0, 0] 1 1446336000
26 [0, 0, 0, 0, 4, 3, 0] 7 1446940800
27 [0, 0, 0, 0, 0, 0, 0] 0 1447545600
28 [0, 0, 0, 0, 0, 0, 0] 0 1448150400
29 [0, 0, 0, 0, 0, 0, 0] 0 1448755200
30 [0, 0, 0, 0, 0, 0, 0] 0 1449360000
31 [0, 0, 0, 0, 0, 0, 0] 0 1449964800
32 [0, 0, 0, 0, 0, 0, 0] 0 1450569600
33 [0, 0, 0, 0, 0, 0, 1] 1 1451174400
34 [0, 0, 0, 0, 0, 0, 0] 0 1451779200
35 [0, 0, 0, 0, 0, 0, 0] 0 1452384000
36 [0, 0, 0, 0, 0, 0, 0] 0 1452988800
37 [0, 0, 0, 0, 0, 0, 0] 0 1453593600
38 [0, 0, 0, 0, 0, 0, 0] 0 1454198400
39 [0, 0, 5, 2, 0, 0, 0] 7 1454803200
40 [0, 0, 25, 2, 0, 0, 0] 27 1455408000
41 [1, 10, 0, 0, 3, 0, 0] 14 1456012800
42 [0, 0, 0, 0, 0, 0, 0] 0 1456617600
43 [0, 0, 0, 0, 0, 0, 0] 0 1457222400
44 [0, 0, 0, 2, 1, 0, 0] 3 1457827200
45 [0, 0, 0, 0, 0, 0, 0] 0 1458432000
46 [0, 0, 0, 0, 0, 0, 0] 0 1459036800
47 [0, 0, 0, 0, 0, 0, 0] 0 1459641600
48 [0, 0, 0, 0, 0, 0, 0] 0 1460246400
49 [0, 0, 0, 0, 0, 0, 0] 0 1460851200
50 [0, 0, 0, 0, 0, 0, 0] 0 1461456000
51 [0, 0, 0, 0, 0, 0, 0] 0 1462060800
you can do it this way:
df['date'] = pd.to_datetime(df.week, unit='s')
df['week_no'] = df.apply(lambda x: '{:d}-{:02d}'.format(x['date'].year, x['date'].weekofyear), axis=1)
df.set_index('week_no')['total'].plot.bar()

Categories