Related
I have a list of numpy arrays (one-hot represantation) like the example bellow, I want to count the number of occurances of each one-hot code.
[0 0 1 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 1 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
Edit :
Expected output :
[1 0 0 0 0 0 0 0 0 0] ==> 1 occurrence
[0 0 1 0 0 0 0 0 0 0] ==> 2 occurrences
[0 1 0 0 0 0 0 0 0 0] ==> 3 occurrences
[0 0 0 0 0 1 0 0 0 0] ==> 1 occurrence
[0 0 0 0 1 0 0 0 0 0] ==> 2 occurrences
[0 0 0 0 0 0 0 0 0 1] ==> 2 occurrences
I think you can get the result you seek:
[1 3 2 1 2 1 0 0 0 2]
indicating the count of occurrences of one hot in that position via a simple column-wise sum using ndarray.sum():
import numpy
data = numpy.array([
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
])
print(numpy.ndarray.sum(data, axis=0))
or more compactly as just:
print(data.sum(axis=0))
both should give you:
[1 3 2 1 2 1 0 0 0 2]
Using the face that each row is 1 hot, you can do the following:
temp = np.array([[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0 ,0 ,0 ,1 ,0 ,0 ,0 ,0 ,0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
converting the one-hot to indices can be done as follows:
temp2 = np.argmax(temp, axis=1) # array([2, 2, 1, 5, 1, 4, 9, 4, 0, 3, 1, 9])
and then the counting of the occurances can be done using np.histogram. We know that you have 10 possible values, so we use 10 bins as follows:
temp3 = np.histogram(temp2, bins=10, range=(-0.5,9.5))
np.histogram returns a touple where index [0] holds the histogram values and index [1] holds the bins. In your case:
(array([1, 3, 2, 1, 2, 1, 0, 0, 0, 2]),
array([-0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5]))
I have a numpy array containing 1's and 0's:
a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 0]])
I'd like to convert each 1 to the index in the subarray that it's occuring at, to get this:
e = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
So far what I've done is multiply the array by a range:
a * np.arange(a.shape[0])
which is good, but I'm wondering if there's a better, simpler way to do it, like a single function call?
This modifies a in place:
In [4]: i, j = np.nonzero(a)
In [5]: a[i, j] = j
In [6]: a
Out[6]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
Make a copy if you don't want modify a in place.
Or, this creates a new array (in one line):
In [8]: np.arange(a.shape[1])[a]
Out[8]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
Your approach is a fast as it gets but it uses the wrong dimension for the multiplication (it would fait if the matrix wasn't square).
Multiply the matrix by a range of column indexes:
import numpy as np
a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0]])
e = a * np.arange(a.shape[1])
print(e)
[[ 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 5 0 7 0 0 0]
[ 0 0 2 0 0 0 0 0 0 0 0]
[ 0 1 2 0 0 0 6 0 0 0 0]
[ 0 0 2 0 0 5 0 0 0 9 0]
[ 0 0 0 0 0 5 0 0 0 9 0]
[ 0 0 2 0 0 0 6 0 0 0 10]
[ 0 0 0 3 0 0 6 0 8 0 0]
[ 0 0 0 0 0 0 0 0 0 9 0]
[ 0 1 2 0 0 5 6 0 0 0 0]]
I benchmarked the obligatory np.einsum approach, which was ~1.29x slower for larger arrays (100_000, 1000) than the corrected original solution. The inplace solution was ~8x slower than np.einsum.
np.einsum('ij,j->ij', a, np.arange(a.shape[1]))
I have got an array 'mutlilabel' which looks like this:
[[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0],
...
[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]]
and want to store each of those arrays in my target variable as I am facing a multi-label classification task. How can I achieve that? My code:
pd.DataFrame(multilabel)
Outputs multiple columns:
0 1 2 3 4 5 6 7
0 0 0 0 0 1 0 0 0
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
Thanks in advance!
df = pd.DataFrame(list(multilabel))
list_column = df.apply(lambda row: row.values, axis=1)
pd.DataFrame(list_column, columns=['list_column'])
Result df:
Have you consider using the following trick?
import pandas as pd
arr = [[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]]
pd.DataFrame([arr]).T
Output
0
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
EDIT
In case you are using numpy arrays you can use the following
import numpy as np
pd.DataFrame(np.array(arr))\
.apply(lambda x: np.array(x), axis=1)
So, the real question is why... it doesn't seem like the most useful data structure.
That said, the one-dimensional data type in pandas is the Series:
>>> pd.Series(multilabel)
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
dtype: object
You can then convert it further into a DataFrame:
>>> pd.DataFrame(pd.Series(multilabel))
0
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
Edit: Per further discussion, this works if multilabel is a nested Python list, but not if it's a NumPy array.
I would like to convert a sentence to an array of one-hot vector.
These vector would be the one-hot representation of the alphabet.
It would look like the following:
"hello" # h=7, e=4 l=11 o=14
would become
[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
Unfortunately OneHotEncoder from sklearn does not take as input string.
Just compare the letters in your passed string to a given alphabet:
def string_vectorizer(strng, alphabet=string.ascii_lowercase):
vector = [[0 if char != letter else 1 for char in alphabet]
for letter in strng]
return vector
Note that, with a custom alphabet (e.g. "defbcazk", the columns will be ordered as each element appears in the original list).
The output of string_vectorizer('hello'):
[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
This is a common task in Recurrent Neural Networks and there's a specific function just for this purpose in tensorflow, if you'd like to use it.
alphabets = {'a' : 0, 'b': 1, 'c':2, 'd':3, 'e':4, 'f':5, 'g':6, 'h':7, 'i':8, 'j':9, 'k':10, 'l':11, 'm':12, 'n':13, 'o':14}
idxs = [alphabets[ch] for ch in 'hello']
print(idxs)
# [7, 4, 11, 11, 14]
# #divakar's approach
idxs = np.fromstring("hello",dtype=np.uint8)-97
# or for more clear understanding, use:
idxs = np.fromstring('hello', dtype=np.uint8) - ord('a')
one_hot = tf.one_hot(idxs, 26, dtype=tf.uint8)
sess = tf.InteractiveSession()
In [15]: one_hot.eval()
Out[15]:
array([[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
With pandas, you can use pd.get_dummies by passing a categorical Series:
import pandas as pd
import string
low = string.ascii_lowercase
pd.get_dummies(pd.Series(list(s)).astype('category', categories=list(low)))
Out:
a b c d e f g h i j ... q r s t u v w x y z
0 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
[5 rows x 26 columns]
Here's a vectorized approach using NumPy broadcasting to give us a (N,26) shaped array -
ints = np.fromstring("hello",dtype=np.uint8)-97
out = (ints[:,None] == np.arange(26)).astype(int)
If you are looking for performance, I would suggest using an initialized array and then assign -
out = np.zeros((len(ints),26),dtype=int)
out[np.arange(len(ints)), ints] = 1
Sample run -
In [153]: ints = np.fromstring("hello",dtype=np.uint8)-97
In [154]: ints
Out[154]: array([ 7, 4, 11, 11, 14], dtype=uint8)
In [155]: out = (ints[:,None] == np.arange(26)).astype(int)
In [156]: print out
[[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]]
You asked about "sentences" but your example provided only a single word, so I'm not sure what you wanted to do about spaces. But as far as single words are concerned, your example could be implemented with:
def onehot(ltr):
return [1 if i==ord(ltr) else 0 for i in range(97,123)]
def onehotvec(s):
return [onehot(c) for c in list(s.lower())]
onehotvec("hello")
[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
I am a Python newbie, and I am trying to plot the number of commits, per day of the week.
I have the output as a pandas dataframe. However, I am unable to figure out how to plot this data using matplotlib.
Any help is greatly appreciated!
import requests
import json
import pandas as pd
r = requests.get('https://api.github.com/repos/thisismetis/dsp/stats/commit_activity')
raw = r.text
results = json.loads(raw)
print pd.DataFrame(results)
Results:
days total week
0 [0, 0, 0, 0, 0, 0, 0] 0 1431216000
1 [0, 0, 0, 0, 0, 8, 0] 8 1431820800
2 [0, 19, 1, 4, 18, 8, 0] 50 1432425600
3 [0, 3, 23, 1, 0, 0, 0] 27 1433030400
4 [1, 0, 0, 0, 1, 0, 0] 2 1433635200
5 [0, 0, 0, 0, 0, 0, 0] 0 1434240000
6 [0, 2, 0, 0, 0, 0, 0] 2 1434844800
7 [0, 0, 0, 0, 0, 0, 0] 0 1435449600
8 [0, 0, 0, 0, 0, 0, 0] 0 1436054400
9 [0, 0, 0, 0, 0, 0, 0] 0 1436659200
10 [0, 0, 8, 0, 3, 0, 0] 11 1437264000
11 [0, 3, 36, 0, 1, 9, 0] 49 1437868800
12 [0, 2, 2, 2, 5, 1, 0] 12 1438473600
13 [0, 0, 0, 0, 0, 0, 0] 0 1439078400
14 [0, 2, 0, 0, 0, 0, 0] 2 1439683200
15 [0, 0, 0, 0, 0, 0, 0] 0 1440288000
16 [0, 0, 0, 0, 0, 0, 0] 0 1440892800
17 [0, 0, 0, 0, 0, 0, 0] 0 1441497600
18 [0, 0, 0, 0, 0, 3, 0] 3 1442102400
19 [0, 0, 0, 0, 0, 0, 0] 0 1442707200
20 [0, 0, 0, 0, 0, 0, 0] 0 1443312000
21 [0, 0, 0, 0, 0, 0, 0] 0 1443916800
22 [0, 0, 0, 0, 0, 0, 0] 0 1444521600
23 [0, 0, 10, 0, 0, 0, 0] 10 1445126400
24 [0, 0, 0, 0, 0, 0, 0] 0 1445731200
25 [1, 0, 0, 0, 0, 0, 0] 1 1446336000
26 [0, 0, 0, 0, 4, 3, 0] 7 1446940800
27 [0, 0, 0, 0, 0, 0, 0] 0 1447545600
28 [0, 0, 0, 0, 0, 0, 0] 0 1448150400
29 [0, 0, 0, 0, 0, 0, 0] 0 1448755200
30 [0, 0, 0, 0, 0, 0, 0] 0 1449360000
31 [0, 0, 0, 0, 0, 0, 0] 0 1449964800
32 [0, 0, 0, 0, 0, 0, 0] 0 1450569600
33 [0, 0, 0, 0, 0, 0, 1] 1 1451174400
34 [0, 0, 0, 0, 0, 0, 0] 0 1451779200
35 [0, 0, 0, 0, 0, 0, 0] 0 1452384000
36 [0, 0, 0, 0, 0, 0, 0] 0 1452988800
37 [0, 0, 0, 0, 0, 0, 0] 0 1453593600
38 [0, 0, 0, 0, 0, 0, 0] 0 1454198400
39 [0, 0, 5, 2, 0, 0, 0] 7 1454803200
40 [0, 0, 25, 2, 0, 0, 0] 27 1455408000
41 [1, 10, 0, 0, 3, 0, 0] 14 1456012800
42 [0, 0, 0, 0, 0, 0, 0] 0 1456617600
43 [0, 0, 0, 0, 0, 0, 0] 0 1457222400
44 [0, 0, 0, 2, 1, 0, 0] 3 1457827200
45 [0, 0, 0, 0, 0, 0, 0] 0 1458432000
46 [0, 0, 0, 0, 0, 0, 0] 0 1459036800
47 [0, 0, 0, 0, 0, 0, 0] 0 1459641600
48 [0, 0, 0, 0, 0, 0, 0] 0 1460246400
49 [0, 0, 0, 0, 0, 0, 0] 0 1460851200
50 [0, 0, 0, 0, 0, 0, 0] 0 1461456000
51 [0, 0, 0, 0, 0, 0, 0] 0 1462060800
you can do it this way:
df['date'] = pd.to_datetime(df.week, unit='s')
df['week_no'] = df.apply(lambda x: '{:d}-{:02d}'.format(x['date'].year, x['date'].weekofyear), axis=1)
df.set_index('week_no')['total'].plot.bar()