pandas column shift with day 0 value as 0 - python

I've got a pandas dataframe(pivoted) like customer_name, current_date, current_day_count
+----------+--------------+-------------------+
| customer | current_date | current_day_count |
+----------+--------------+-------------------+
| Mark | 2018_02_06 | 15 |
| | 2018_02_09 | 42 |
| | 2018_02_12 | 33 |
| | 2018_02_21 | 82 |
| | 2018_02_27 | 72 |
| Bob | 2018_02_02 | 76 |
| | 2018_02_23 | 11 |
| | 2018_03_04 | 59 |
| | 2018_03_13 | 68 |
| Shawn | 2018_02_11 | 71 |
| | 2018_02_15 | 39 |
| | 2018_02_18 | 65 |
| | 2018_02_24 | 38 |
+----------+--------------+-------------------+
Now, I want another new column with previous_day_counts for each customer but the first day of the customer's previous day value should be 0 something like this customer, current_date, current_day_count, previous_day_count (with first day value as 0)
+----------+--------------+-------------------+--------------------+
| customer | current_date | current_day_count | previous_day_count |
+----------+--------------+-------------------+--------------------+
| Mark | 2018_02_06 | 15 | 0 |
| | 2018_02_09 | 42 | 33 |
| | 2018_02_12 | 33 | 82 |
| | 2018_02_21 | 82 | 72 |
| | 2018_02_27 | 72 | 0 |
| Bob | 2018_02_02 | 76 | 0 |
| | 2018_02_23 | 11 | 59 |
| | 2018_03_04 | 59 | 68 |
| | 2018_03_13 | 68 | 0 |
| Shawn | 2018_02_11 | 71 | 0 |
| | 2018_02_15 | 39 | 65 |
| | 2018_02_18 | 65 | 38 |
| | 2018_02_24 | 38 | 0 |
+----------+--------------+-------------------+--------------------+

Try this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'name': ['Mark','Mark','Mark','Mark','Bob','Bob','Bob','Bob'], 'current_day_count': [18,28,29,10,19,92,7,43]})
df['previous_day_count'] = df.groupby('name')['current_day_count'].shift(-1)
df.loc[df.groupby('name',as_index=False).head(1).index,'previous_day_count'] = np.nan
df['previous_day_count'].fillna(0, inplace=True)

Related

Binning Pandas value_counts

I have a Pandas Series produced by df.column.value_counts().sort_index().
| N Months | Count |
|------|------|
| 0 | 15 |
| 1 | 9 |
| 2 | 78 |
| 3 | 151 |
| 4 | 412 |
| 5 | 181 |
| 6 | 543 |
| 7 | 175 |
| 8 | 409 |
| 9 | 594 |
| 10 | 137 |
| 11 | 202 |
| 12 | 170 |
| 13 | 446 |
| 14 | 29 |
| 15 | 39 |
| 16 | 44 |
| 17 | 253 |
| 18 | 17 |
| 19 | 34 |
| 20 | 18 |
| 21 | 37 |
| 22 | 147 |
| 23 | 12 |
| 24 | 31 |
| 25 | 15 |
| 26 | 117 |
| 27 | 8 |
| 28 | 38 |
| 29 | 23 |
| 30 | 198 |
| 31 | 29 |
| 32 | 122 |
| 33 | 50 |
| 34 | 60 |
| 35 | 357 |
| 36 | 329 |
| 37 | 457 |
| 38 | 609 |
| 39 | 4744 |
| 40 | 1120 |
| 41 | 591 |
| 42 | 328 |
| 43 | 148 |
| 44 | 46 |
| 45 | 10 |
| 46 | 1 |
| 47 | 1 |
| 48 | 7 |
| 50 | 2 |
my desired output is
| bin | Total |
|-------|--------|
| 0-13 | 3522 |
| 14-26 | 793 |
| 27-50 | 9278 |
I tried df.column.value_counts(bins=3).sort_index() but got
| bin | Total |
|---------------------------------|-------|
| (-0.051000000000000004, 16.667] | 3634 |
| (16.667, 33.333] | 1149 |
| (33.333, 50.0] | 8810 |
I can get the correct result with
a = df.column.value_counts().sort_index()[:14].sum()
b = df.column.value_counts().sort_index()[14:27].sum()
c = df.column.value_counts().sort_index()[28:].sum()
print(a, b, c)
Output: 3522 793 9270
But I am wondering if there is a pandas method that can do what I want. Any advice is very welcome. :-)
You can use pd.cut:
pd.cut(df['N Months'], [0,13, 26, 50], include_lowest=True).value_counts()
Update you should be able to pass custom bin to value_counts:
df['N Months'].value_counts(bins = [0,13, 26, 50])
Output:
N Months
(-0.001, 13.0] 3522
(13.0, 26.0] 793
(26.0, 50.0] 9278
Name: Count, dtype: int64

Filter all rows from groupby object

I have a dataframe like below
+-----------+------------+---------------+------+-----+-------+
| InvoiceNo | CategoryNo | Invoice Value | Item | Qty | Price |
+-----------+------------+---------------+------+-----+-------+
| 1 | 1 | 77 | 128 | 1 | 10 |
| 1 | 1 | 77 | 101 | 1 | 11 |
| 1 | 2 | 77 | 105 | 3 | 12 |
| 1 | 3 | 77 | 129 | 2 | 10 |
| 2 | 1 | 21 | 145 | 1 | 9 |
| 2 | 2 | 21 | 130 | 1 | 12 |
+-----------+------------+---------------+------+-----+-------+
I want to filter the entire group, if any of the items in the list item_list = [128,129,130] is present in that group, after grouping by 'InvoiceNo' &'CategoryNo'.
My desired out put is as below
+-----------+------------+---------------+------+-----+-------+
| InvoiceNo | CategoryNo | Invoice Value | Item | Qty | Price |
+-----------+------------+---------------+------+-----+-------+
| 1 | 1 | 77 | 128 | 1 | 10 |
| 1 | 1 | 77 | 101 | 1 | 11 |
| 1 | 3 | 77 | 129 | 2 | 10 |
| 2 | 2 | 21 | 130 | 1 | 12 |
+-----------+------------+---------------+------+-----+-------+
I know how to filter a dataframe using isin(). But, not sure how to do it with groupby()
so far i have tried below
import pandas as pd
df = pd.read_csv('data.csv')
item_list = [128,129,130]
df.groupby(['InvoiceNo','CategoryNo'])['Item'].isin(item_list)
but nothing happens. please guide me how to solve this issue.
You can do something like this:
s = (df['Item'].isin(item_list)
.groupby([df['InvoiceNo'], df['CategoryNo']])
.transform('any')
)
df[s]

Numpy version of finding the highest and lowest value locations within an interval of another column?

Given the following numpy array. How can I find the highest and lowest value locations of column 0 within the interval on column 1 using numpy?
import numpy as np
data = np.array([
[1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
[1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
[1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
[1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
[1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
[1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
[1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
[1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
[1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
[1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
[1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
[1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
[1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
[1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
[1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
[1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
[1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
])
In the third column below you can see where the max and min is for each interval.
+-------+----------+-----------+---------+
| index | Value | Intervals | Min/Max |
+-------+----------+-----------+---------+
| 0 | 1879.289 | np.nan | |
| 1 | 1879.281 | np.nan | |
| 2 | 1879.292 | 1 | |
| 3 | 1879.295 | 1 | |
| 4 | 1879.481 | 1 | |
| 5 | 1879.294 | 1 | |
| 6 | 1879.268 | 1 | -1 | min
| 7 | 1879.293 | 1 | |
| 8 | 1879.277 | 1 | |
| 9 | 1879.285 | 1 | |
| 10 | 1879.464 | 1 | |
| 11 | 1879.475 | 1 | |
| 12 | 1879.971 | 1 | |
| 13 | 1879.779 | 1 | |
| 17 | 1879.986 | 1 | |
| 18 | 1880.791 | 1 | 1 | max
| 19 | 1880.29 | 1 | |
| 55 | 1879.253 | np.nan | |
| 56 | 1878.268 | np.nan | |
| 57 | 1875.73 | 1 | -1 |min
| 58 | 1876.792 | 1 | |
| 59 | 1875.977 | 1 | |
| 60 | 1876.408 | 1 | |
| 61 | 1877.159 | 1 | |
| 62 | 1877.187 | 1 | |
| 63 | 1883.164 | 1 | |
| 64 | 1883.171 | 1 | |
| 65 | 1883.495 | 1 | |
| 66 | 1883.962 | 1 | |
| 67 | 1885.158 | 1 | |
| 68 | 1885.974 | 1 | 1 | max
| 69 | 1886.479 | np.nan | |
| 70 | 1885.969 | np.nan | |
| 71 | 1884.693 | 1 | |
| 72 | 1884.977 | 1 | |
| 73 | 1884.967 | 1 | |
| 74 | 1884.691 | 1 | -1 | min
| 75 | 1886.171 | 1 | 1 | max
| 76 | 1886.166 | np.nan | |
| 77 | 1884.476 | np.nan | |
| 78 | 1884.66 | 1 | 1 | max
| 79 | 1882.962 | 1 | |
| 80 | 1881.496 | 1 | |
| 81 | 1871.163 | 1 | -1 | min
| 82 | 1874.985 | 1 | |
| 83 | 1874.979 | 1 | |
| 84 | 1871.173 | np.nan | |
| 85 | 1871.973 | np.nan | |
| 86 | 1871.682 | np.nan | |
| 87 | 1872.476 | np.nan | |
| 88 | 1882.361 | 1 | 1 | max
| 89 | 1880.869 | 1 | |
| 90 | 1882.165 | 1 | |
| 91 | 1881.857 | 1 | |
| 92 | 1880.375 | 1 | |
| 93 | 1880.66 | 1 | |
| 94 | 1880.891 | 1 | |
| 95 | 1880.377 | 1 | |
| 96 | 1881.663 | 1 | |
| 97 | 1881.66 | 1 | |
| 98 | 1877.888 | 1 | |
| 99 | 1875.69 | 1 | |
| 100 | 1875.161 | 1 | -1 | min
| 101 | 1876.697 | np.nan | |
| 102 | 1876.671 | np.nan | |
| 103 | 1879.666 | np.nan | |
| 111 | 1877.182 | np.nan | |
| 112 | 1878.898 | 1 | |
| 113 | 1878.668 | 1 | |
| 114 | 1878.871 | 1 | |
| 115 | 1878.882 | 1 | |
| 116 | 1879.173 | 1 | 1 | max
| 117 | 1878.887 | 1 | |
| 118 | 1878.68 | 1 | |
| 119 | 1878.872 | 1 | |
| 120 | 1878.677 | 1 | |
| 121 | 1877.877 | 1 | |
| 122 | 1877.669 | 1 | |
| 123 | 1877.69 | 1 | |
| 124 | 1877.684 | 1 | |
| 125 | 1877.68 | 1 | |
| 126 | 1877.885 | 1 | |
| 127 | 1877.863 | 1 | |
| 128 | 1877.674 | 1 | |
| 129 | 1877.676 | 1 | |
| 130 | 1877.687 | 1 | |
| 131 | 1878.367 | 1 | |
| 132 | 1878.179 | 1 | |
| 133 | 1877.696 | 1 | |
| 134 | 1877.665 | 1 | -1 | min
| 135 | 1877.667 | np.nan | |
| 136 | 1878.678 | np.nan | |
| 137 | 1878.661 | 1 | 1 | max
| 138 | 1878.171 | 1 | |
| 139 | 1877.371 | 1 | |
| 140 | 1877.359 | 1 | |
| 141 | 1878.381 | 1 | |
| 142 | 1875.185 | 1 | -1 | min
| 143 | 1875.367 | np.nan | |
| 144 | 1865.492 | np.nan | |
| 145 | 1865.495 | 1 | -1 | min
| 146 | 1866.995 | 1 | |
| 147 | 1866.672 | 1 | |
| 148 | 1867.465 | 1 | |
| 149 | 1867.663 | 1 | |
| 150 | 1867.186 | 1 | |
| 151 | 1867.687 | 1 | |
| 152 | 1867.459 | 1 | |
| 153 | 1867.168 | 1 | |
| 154 | 1869.689 | 1 | |
| 155 | 1869.693 | 1 | |
| 156 | 1871.676 | 1 | |
| 157 | 1873.174 | 1 | 1 | max
| 158 | 1873.691 | np.nan | |
| 159 | 1873.685 | np.nan | |
+-------+----------+-----------+---------+
I must specify upfront that this question has already been answered here with a pandas solution. The solution performs reasonable at about 300 seconds for a table of around 1 million rows. But after some more testing, I see that if the table is over 3 million rows, the execution time increases dramatically to over 2500 seconds and even more. This is obviously too long for such a simple task. How would the same problem be solved with numpy?
Here's one NumPy approach -
mask = ~np.isnan(data[:,1])
s0 = np.flatnonzero(mask[1:] > mask[:-1])+1
s1 = np.flatnonzero(mask[1:] < mask[:-1])+1
lens = s1 - s0
tags = np.repeat(np.arange(len(lens)), lens)
idx = np.lexsort((data[mask,0], tags))
starts = np.r_[0,lens.cumsum()]
offsets = np.r_[s0[0], s0[1:] - s1[:-1]]
offsets_cumsum = offsets.cumsum()
min_ids = idx[starts[:-1]] + offsets_cumsum
max_ids = idx[starts[1:]-1] + offsets_cumsum
out = np.full(data.shape[0], np.nan)
out[min_ids] = -1
out[max_ids] = 1
So this is a bit of a cheat since it uses scipy:
import numpy as np
from scipy import ndimage
markers = np.isnan(data[:, 1])
groups = np.cumsum(markers)
mins, max, min_idx, max_idx = ndimage.measurements.extrema(
data[:, 0], labels=groups, index=range(2, groups.max(), 2))

How to add the first values in cascading columns?

Can anyone help me with a better solution than looping? Let's say that we have the following pandas data-frame made out of 4 columns. I am looking for a way to get the same values as in "Result" column through other methods than looping.
Here is the logic:
If priority1=1 then result=1
If priority1=1 and priority2=1 then result=2 (ignore all other if priority1 !=1)
If priority1=1 and priority2=1 and priority3=1 then result=3 (ignore all other if priority1 and priority2 != 1)
If priority1=1 and priority2=1 and priority3=1 and priority4=1 then result=4 (ignore all other if priority1 and priority2 and priority3 != 1)
The opposite is happening if priority1 is negative.
Here is my final result after a very ugly and inefficient looping:
+-----+-----------+-----------+-----------+-----------+--------+
| | priority1 | priority2 | priority3 | priority4 | Result |
+-----+-----------+-----------+-----------+-----------+--------+
| 0 | | | | | |
| 1 | | 1 | -1 | -1 | |
| 2 | | | | | |
| 3 | | | | 1 | |
| 4 | | | | 1 | |
| 5 | | | | | |
| 6 | | | | -1 | |
| 7 | | | | | |
| 8 | | | | | |
| 9 | 1 | 1 | 1 | 1 | 1 |
| 10 | | | | | |
| 11 | | | | 1 | |
| 12 | | | 1 | | |
| 13 | | | | | |
| 14 | | | -1 | -1 | |
| 15 | | | | | |
| 16 | | | | | |
| 17 | | | | -1 | |
| 18 | | | | | |
| 19 | | | | | |
| 20 | | 1 | 1 | 1 | 2 |
| 21 | | | | | |
| 22 | | | -1 | -1 | |
| 23 | | | | | |
| 24 | | | | | |
| 25 | | | | -1 | |
| 26 | | | | | |
| 27 | | | 1 | 1 | 3 |
| 28 | | | | | |
| 29 | | | | | |
| 30 | | | | -1 | |
| 31 | | | | | |
| 32 | | | | | |
| 33 | | | -1 | -1 | |
| 34 | | | | | |
| 35 | | | 1 | 1 | 4 |
| 36 | | | | | |
| 37 | | | | | |
| 38 | | | | | |
| 39 | | | -1 | -1 | |
| 40 | | | | | |
| 41 | | | | | |
| 42 | | 1 | 1 | 1 | 2 |
| 43 | | | | | |
| 44 | | | | | |
| 45 | | | | -1 | |
| 46 | | | | | |
| 47 | | | | | |
| 48 | | | | | |
| 49 | | | | | |
| 50 | | -1 | -1 | -1 | |
| 51 | | | | | |
| 52 | | | | | |
| 53 | | 1 | 1 | 1 | 2 |
| 54 | | | | | |
| 55 | | | | | |
| 56 | | | | -1 | |
| 57 | | | | | |
| 58 | | | | | |
| 59 | | | -1 | -1 | |
| 60 | | | | | |
| 61 | | | | | |
| 62 | | | 1 | 1 | 3 |
| 63 | | | | | |
| 64 | -1 | -1 | -1 | -1 | -1 |
| 65 | | | | | |
| 66 | | | 1 | 1 | |
| 67 | | | | | |
| 68 | | | | | |
| 69 | | | | | |
| 70 | | | | -1 | |
| 71 | | | | | |
| 72 | | | | | |
| 73 | | | 1 | 1 | |
| 74 | | | | -1 | |
| 75 | | | | | |
| 76 | | | 1 | 1 | |
| 77 | | | | | |
| 78 | | -1 | -1 | -1 | -2 |
| 79 | | | 1 | | |
| 80 | | | 1 | 1 | |
| 81 | | | | | |
| 82 | | | -1 | -1 | -3 |
| 83 | | | 1 | 1 | |
| 84 | | | | | |
| 85 | | | | | |
| 86 | | | | | |
| 87 | | | -1 | -1 | -4 |
| 88 | | | | | |
| 89 | | -1 | -1 | -1 | -2 |
| 90 | | | | | |
| 91 | | | | | |
| 92 | | | | -1 | |
| 93 | | | | | |
| 94 | | | | | |
| 95 | | | 1 | 1 | |
| 96 | | | | | |
| 97 | | | | | |
| 98 | | | | -1 | |
| 99 | | | | 1 | |
| 100 | | | | | |
| 101 | | | -1 | -1 | -3 |
| 102 | | | | | |
| 103 | | | | | |
| 104 | | 1 | 1 | 1 | |
| 105 | | | | | |
| 106 | | | | 1 | |
| 107 | | | | | |
| 108 | | | -1 | -1 | |
| 109 | | | | | |
| 110 | | | | | |
| 111 | | | 1 | 1 | |
| 112 | | | | | |
| 113 | | | | | |
| 114 | | | -1 | -1 | |
| 115 | | | | | |
| 116 | | | 1 | 1 | |
| 117 | | | | | |
| 118 | | | | | |
| 119 | | -1 | -1 | -1 | -2 |
| 120 | | | | | |
| 121 | | | | | |
| 122 | | | | 1 | |
| 123 | | | | | |
| 124 | | | | 1 | |
| 125 | | | | | |
| 126 | | | | | |
| 127 | | | 1 | 1 | |
| 128 | | | | | |
| 129 | | | | | |
| 130 | | | -1 | -1 | -3 |
| 131 | | | | | |
| 132 | | | | | |
| 133 | | | | | |
| 134 | 1 | 1 | 1 | 1 | 1 |
| 135 | | | | -1 | |
| 136 | | | | | |
| 137 | | -1 | -1 | -1 | |
| 138 | | | 1 | | |
| 139 | | | | 1 | |
| 140 | | 1 | 1 | 1 | 2 |
| 141 | | | 1 | 1 | 3 |
| 142 | | | | | |
| 143 | | | | -1 | |
| 144 | | | | | |
| 145 | | | | 1 | 4 |
+-----+-----------+-----------+-----------+-----------+--------+
setup
df = pd.DataFrame([
[ 1, 0, 0, 0],
[ 1, 1, 0, 0],
[ 1, 1, 1, 0],
[ 1, 1, 1, 1],
[ 0, 1, 1, 1],
[ 0, 0, 1, 1],
[ 0, 0, 0, 1],
[ 1, 0, 1, 1], # this should end up 1
[ 0, 0, 0, 0],
[-1, 0, 0, 0],
[-1, -1, 0, 0],
[-1, -1, -1, 0],
[-1, -1, -1, -1],
[ 0, -1, -1, -1],
[ 0, 0, -1, -1],
[ 0, 0, 0, -1],
], columns=['priority{}'.format(i) for i in range(1, 5)])
solution
v = df.values
df.assign(Results=(v * v.cumprod(1).astype(np.bool8)).sum(1))
priority1 priority2 priority3 priority4 Results
0 1 0 0 0 1
1 1 1 0 0 2
2 1 1 1 0 3
3 1 1 1 1 4
4 0 1 1 1 0
5 0 0 1 1 0
6 0 0 0 1 0
7 1 0 1 1 1
8 0 0 0 0 0
9 -1 0 0 0 -1
10 -1 -1 0 0 -2
11 -1 -1 -1 0 -3
12 -1 -1 -1 -1 -4
13 0 -1 -1 -1 0
14 0 0 -1 -1 0
15 0 0 0 -1 0
how it works
Grab the numpy array with
v = df.values
non-zeros as True with
v.astype(np.bool8)
each successive column continuing to be non-zero with
v.astype(np.bool8).cumprod(1)
multiply by v to filter out which ones to add then sum
(v * v.astype(np.bool8).cumprod(1)).sum()
naive time test
small data
big data
Using piRSquared's example frame (hat tip!), I might do something like
match = (df.abs() == 1) & (df.eq(df.iloc[:, 0], axis=0))
out = match.cumprod(axis=1).sum(axis=1) * df.iloc[:, 0]
which gives me
In [107]: df["Result"] = out
In [108]: df
Out[108]:
priority1 priority2 priority3 priority4 Result
0 1 0 0 0 1
1 1 1 0 0 2
2 1 1 1 0 3
3 1 1 1 1 4
4 0 1 1 1 0
5 0 0 1 1 0
6 0 0 0 1 0
7 0 0 0 0 0
8 -1 0 0 0 -1
9 -1 -1 0 0 -2
10 -1 -1 -1 0 -3
11 -1 -1 -1 -1 -4
12 0 -1 -1 -1 0
13 0 0 -1 -1 0
14 0 0 0 -1 0

Parsing out indeces and values from pandas multi index dataframe

I have a dataframe in a similar format to this:
+--------+--------+----------+------+------+------+------+
| | | | | day1 | day2 | day3 |
+--------+--------+----------+------+------+------+------+
| id_one | id_two | id_three | date | | | |
| 18273 | 50 | 1 | 3 | 9 | 11 | 3 |
| | | | 4 | 26 | 27 | 68 |
| | | | 5 | 92 | 25 | 4 |
| | | | 6 | 60 | 72 | 83 |
| | 60 | 2 | 5 | 69 | 93 | 84 |
| | | | 6 | 69 | 30 | 12 |
| | | | 7 | 65 | 65 | 59 |
| | | | 8 | 57 | 88 | 59 |
| | 70 | 3 | 5 | 22 | 95 | 7 |
| | | | 6 | 40 | 24 | 20 |
| | | | 7 | 73 | 81 | 57 |
| | | | 8 | 43 | 8 | 66 |
+--------+--------+----------+------+------+------+------+
I am trying to create tuple that contains id_one, id_two and the values that each grouping contains.
To test this, I am simply trying to print the ids and values like this:
for id_two, data in df.head(100).groupby(level='id_two'):
print id_two, data.values.ravel()
Which gives me the id_two and the data exactly as it should.
I am running into problems when I try and incorporate id_one. I tried this, but was met with an error ValueError: need more than 2 values to unpack
for id_one, id_two, data in df.head(100).groupby(level='id_two'):
print id_one, id_two, data.values.ravel()
How can I print id_one, id_two and the data?
You can pass a list of columns into the level parameter:
df.head.groupby(level=['id_one', 'id_two'])

Categories