Converting an array of numpy arrays to DataFrame

Converting an array of numpy arrays to DataFrame - python

I have a numpy object that contains the following:
17506 [0, 0, 0, 0, 0, 0]
17507 [0, 0, 0, 0, 0, 0]
17508 [0, 0, 0, 0, 0, 0]
17509 [0, 0, 0, 0, 0, 0]
17510 [0, 0, 0, 0, 0, 0]
17511 [0, 0, 0, 0, 0, 0]
17512 [0, 0, 0, 0, 0, 0]
17513 [0, 0, 0, 0, 0, 0]
17514 [0, 0, 0, 0, 0, 0]
17515 [0, 0, 0, 0, 0, 0]
17516 [0, 0, 0, 0, 0, 0]
17517 [0, 0, 0, 0, 0, 0]
17518 [0, 0, 0, 0, 0, 0]
17519 [0, 0, 0, 0, 0, 0]
(An array that contains arrays of dtype('int32'))
How can I efficiently convert this to data frame in pandas and concantenate it (vertically) to an existing dataframe?

What seems to be the problem? You may need to further describe your data.
a = np.array([np.zeros(6) for _ in range(3)])
>>> pd.DataFrame(a)
0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0

Related

count the number of occurance of each one hot code

I have a list of numpy arrays (one-hot represantation) like the example bellow, I want to count the number of occurances of each one-hot code.
[0 0 1 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 1 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
Edit :
Expected output :
[1 0 0 0 0 0 0 0 0 0] ==> 1 occurrence
[0 0 1 0 0 0 0 0 0 0] ==> 2 occurrences
[0 1 0 0 0 0 0 0 0 0] ==> 3 occurrences
[0 0 0 0 0 1 0 0 0 0] ==> 1 occurrence
[0 0 0 0 1 0 0 0 0 0] ==> 2 occurrences
[0 0 0 0 0 0 0 0 0 1] ==> 2 occurrences

I think you can get the result you seek:
[1 3 2 1 2 1 0 0 0 2]
indicating the count of occurrences of one hot in that position via a simple column-wise sum using ndarray.sum():
import numpy
data = numpy.array([
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
])
print(numpy.ndarray.sum(data, axis=0))
or more compactly as just:
print(data.sum(axis=0))
both should give you:
[1 3 2 1 2 1 0 0 0 2]

Using the face that each row is 1 hot, you can do the following:
temp = np.array([[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0 ,0 ,0 ,1 ,0 ,0 ,0 ,0 ,0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
converting the one-hot to indices can be done as follows:
temp2 = np.argmax(temp, axis=1) # array([2, 2, 1, 5, 1, 4, 9, 4, 0, 3, 1, 9])
and then the counting of the occurances can be done using np.histogram. We know that you have 10 possible values, so we use 10 bins as follows:
temp3 = np.histogram(temp2, bins=10, range=(-0.5,9.5))
np.histogram returns a touple where index [0] holds the histogram values and index [1] holds the bins. In your case:
(array([1, 3, 2, 1, 2, 1, 0, 0, 0, 2]),
array([-0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5]))

How to convert values to their index

I have a numpy array containing 1's and 0's:
a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 0]])
I'd like to convert each 1 to the index in the subarray that it's occuring at, to get this:
e = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
So far what I've done is multiply the array by a range:
a * np.arange(a.shape[0])
which is good, but I'm wondering if there's a better, simpler way to do it, like a single function call?

This modifies a in place:
In [4]: i, j = np.nonzero(a)
In [5]: a[i, j] = j
In [6]: a
Out[6]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
Make a copy if you don't want modify a in place.
Or, this creates a new array (in one line):
In [8]: np.arange(a.shape[1])[a]
Out[8]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])

Your approach is a fast as it gets but it uses the wrong dimension for the multiplication (it would fait if the matrix wasn't square).
Multiply the matrix by a range of column indexes:
import numpy as np
a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0]])
e = a * np.arange(a.shape[1])
print(e)
[[ 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 5 0 7 0 0 0]
[ 0 0 2 0 0 0 0 0 0 0 0]
[ 0 1 2 0 0 0 6 0 0 0 0]
[ 0 0 2 0 0 5 0 0 0 9 0]
[ 0 0 0 0 0 5 0 0 0 9 0]
[ 0 0 2 0 0 0 6 0 0 0 10]
[ 0 0 0 3 0 0 6 0 8 0 0]
[ 0 0 0 0 0 0 0 0 0 9 0]
[ 0 1 2 0 0 5 6 0 0 0 0]]

I benchmarked the obligatory np.einsum approach, which was ~1.29x slower for larger arrays (100_000, 1000) than the corrected original solution. The inplace solution was ~8x slower than np.einsum.
np.einsum('ij,j->ij', a, np.arange(a.shape[1]))

Store python array in each entry of a column

I have got an array 'mutlilabel' which looks like this:
[[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0],
...
[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]]
and want to store each of those arrays in my target variable as I am facing a multi-label classification task. How can I achieve that? My code:
pd.DataFrame(multilabel)
Outputs multiple columns:
0 1 2 3 4 5 6 7
0 0 0 0 0 1 0 0 0
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
Thanks in advance!

df = pd.DataFrame(list(multilabel))
list_column = df.apply(lambda row: row.values, axis=1)
pd.DataFrame(list_column, columns=['list_column'])
Result df:

Have you consider using the following trick?
import pandas as pd
arr = [[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]]
pd.DataFrame([arr]).T
Output
0
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
EDIT
In case you are using numpy arrays you can use the following
import numpy as np
pd.DataFrame(np.array(arr))\
.apply(lambda x: np.array(x), axis=1)

So, the real question is why... it doesn't seem like the most useful data structure.
That said, the one-dimensional data type in pandas is the Series:
>>> pd.Series(multilabel)
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
dtype: object
You can then convert it further into a DataFrame:
>>> pd.DataFrame(pd.Series(multilabel))
0
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
Edit: Per further discussion, this works if multilabel is a nested Python list, but not if it's a NumPy array.

How to one-hot-encode sentences at the character level?

I would like to convert a sentence to an array of one-hot vector.
These vector would be the one-hot representation of the alphabet.
It would look like the following:
"hello" # h=7, e=4 l=11 o=14
would become
[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
Unfortunately OneHotEncoder from sklearn does not take as input string.

Just compare the letters in your passed string to a given alphabet:
def string_vectorizer(strng, alphabet=string.ascii_lowercase):
vector = [[0 if char != letter else 1 for char in alphabet]
for letter in strng]
return vector
Note that, with a custom alphabet (e.g. "defbcazk", the columns will be ordered as each element appears in the original list).
The output of string_vectorizer('hello'):
[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

This is a common task in Recurrent Neural Networks and there's a specific function just for this purpose in tensorflow, if you'd like to use it.
alphabets = {'a' : 0, 'b': 1, 'c':2, 'd':3, 'e':4, 'f':5, 'g':6, 'h':7, 'i':8, 'j':9, 'k':10, 'l':11, 'm':12, 'n':13, 'o':14}
idxs = [alphabets[ch] for ch in 'hello']
print(idxs)
# [7, 4, 11, 11, 14]
# #divakar's approach
idxs = np.fromstring("hello",dtype=np.uint8)-97
# or for more clear understanding, use:
idxs = np.fromstring('hello', dtype=np.uint8) - ord('a')
one_hot = tf.one_hot(idxs, 26, dtype=tf.uint8)
sess = tf.InteractiveSession()
In [15]: one_hot.eval()
Out[15]:
array([[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)

With pandas, you can use pd.get_dummies by passing a categorical Series:
import pandas as pd
import string
low = string.ascii_lowercase
pd.get_dummies(pd.Series(list(s)).astype('category', categories=list(low)))
Out:
a b c d e f g h i j ... q r s t u v w x y z
0 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
[5 rows x 26 columns]

Here's a vectorized approach using NumPy broadcasting to give us a (N,26) shaped array -
ints = np.fromstring("hello",dtype=np.uint8)-97
out = (ints[:,None] == np.arange(26)).astype(int)
If you are looking for performance, I would suggest using an initialized array and then assign -
out = np.zeros((len(ints),26),dtype=int)
out[np.arange(len(ints)), ints] = 1
Sample run -
In [153]: ints = np.fromstring("hello",dtype=np.uint8)-97
In [154]: ints
Out[154]: array([ 7, 4, 11, 11, 14], dtype=uint8)
In [155]: out = (ints[:,None] == np.arange(26)).astype(int)
In [156]: print out
[[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]]

You asked about "sentences" but your example provided only a single word, so I'm not sure what you wanted to do about spaces. But as far as single words are concerned, your example could be implemented with:
def onehot(ltr):
return [1 if i==ord(ltr) else 0 for i in range(97,123)]
def onehotvec(s):
return [onehot(c) for c in list(s.lower())]
onehotvec("hello")
[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

Github-API: How do I plot the number of commits to a repo?

I am a Python newbie, and I am trying to plot the number of commits, per day of the week.
I have the output as a pandas dataframe. However, I am unable to figure out how to plot this data using matplotlib.
Any help is greatly appreciated!
import requests
import json
import pandas as pd
r = requests.get('https://api.github.com/repos/thisismetis/dsp/stats/commit_activity')
raw = r.text
results = json.loads(raw)
print pd.DataFrame(results)
Results:
days total week
0 [0, 0, 0, 0, 0, 0, 0] 0 1431216000
1 [0, 0, 0, 0, 0, 8, 0] 8 1431820800
2 [0, 19, 1, 4, 18, 8, 0] 50 1432425600
3 [0, 3, 23, 1, 0, 0, 0] 27 1433030400
4 [1, 0, 0, 0, 1, 0, 0] 2 1433635200
5 [0, 0, 0, 0, 0, 0, 0] 0 1434240000
6 [0, 2, 0, 0, 0, 0, 0] 2 1434844800
7 [0, 0, 0, 0, 0, 0, 0] 0 1435449600
8 [0, 0, 0, 0, 0, 0, 0] 0 1436054400
9 [0, 0, 0, 0, 0, 0, 0] 0 1436659200
10 [0, 0, 8, 0, 3, 0, 0] 11 1437264000
11 [0, 3, 36, 0, 1, 9, 0] 49 1437868800
12 [0, 2, 2, 2, 5, 1, 0] 12 1438473600
13 [0, 0, 0, 0, 0, 0, 0] 0 1439078400
14 [0, 2, 0, 0, 0, 0, 0] 2 1439683200
15 [0, 0, 0, 0, 0, 0, 0] 0 1440288000
16 [0, 0, 0, 0, 0, 0, 0] 0 1440892800
17 [0, 0, 0, 0, 0, 0, 0] 0 1441497600
18 [0, 0, 0, 0, 0, 3, 0] 3 1442102400
19 [0, 0, 0, 0, 0, 0, 0] 0 1442707200
20 [0, 0, 0, 0, 0, 0, 0] 0 1443312000
21 [0, 0, 0, 0, 0, 0, 0] 0 1443916800
22 [0, 0, 0, 0, 0, 0, 0] 0 1444521600
23 [0, 0, 10, 0, 0, 0, 0] 10 1445126400
24 [0, 0, 0, 0, 0, 0, 0] 0 1445731200
25 [1, 0, 0, 0, 0, 0, 0] 1 1446336000
26 [0, 0, 0, 0, 4, 3, 0] 7 1446940800
27 [0, 0, 0, 0, 0, 0, 0] 0 1447545600
28 [0, 0, 0, 0, 0, 0, 0] 0 1448150400
29 [0, 0, 0, 0, 0, 0, 0] 0 1448755200
30 [0, 0, 0, 0, 0, 0, 0] 0 1449360000
31 [0, 0, 0, 0, 0, 0, 0] 0 1449964800
32 [0, 0, 0, 0, 0, 0, 0] 0 1450569600
33 [0, 0, 0, 0, 0, 0, 1] 1 1451174400
34 [0, 0, 0, 0, 0, 0, 0] 0 1451779200
35 [0, 0, 0, 0, 0, 0, 0] 0 1452384000
36 [0, 0, 0, 0, 0, 0, 0] 0 1452988800
37 [0, 0, 0, 0, 0, 0, 0] 0 1453593600
38 [0, 0, 0, 0, 0, 0, 0] 0 1454198400
39 [0, 0, 5, 2, 0, 0, 0] 7 1454803200
40 [0, 0, 25, 2, 0, 0, 0] 27 1455408000
41 [1, 10, 0, 0, 3, 0, 0] 14 1456012800
42 [0, 0, 0, 0, 0, 0, 0] 0 1456617600
43 [0, 0, 0, 0, 0, 0, 0] 0 1457222400
44 [0, 0, 0, 2, 1, 0, 0] 3 1457827200
45 [0, 0, 0, 0, 0, 0, 0] 0 1458432000
46 [0, 0, 0, 0, 0, 0, 0] 0 1459036800
47 [0, 0, 0, 0, 0, 0, 0] 0 1459641600
48 [0, 0, 0, 0, 0, 0, 0] 0 1460246400
49 [0, 0, 0, 0, 0, 0, 0] 0 1460851200
50 [0, 0, 0, 0, 0, 0, 0] 0 1461456000
51 [0, 0, 0, 0, 0, 0, 0] 0 1462060800

you can do it this way:
df['date'] = pd.to_datetime(df.week, unit='s')
df['week_no'] = df.apply(lambda x: '{:d}-{:02d}'.format(x['date'].year, x['date'].weekofyear), axis=1)
df.set_index('week_no')['total'].plot.bar()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting an array of numpy arrays to DataFrame - python

What seems to be the problem? You may need to further describe your data. a = np.array([np.zeros(6) for _ in range(3)]) >>> pd.DataFrame(a) 0 1 2 3 4 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0

Related

count the number of occurance of each one hot code

How to convert values to their index

Store python array in each entry of a column

How to one-hot-encode sentences at the character level?

Github-API: How do I plot the number of commits to a repo?

Categories

Resources