There is a Dataframe with this SAMPLE (not original data) of records:
import pandas as pd
df = pd.DataFrame(dikt, columns=['id', 'price', 'day'])
df:
+-------+-----+-------+-----+
| index | id | price | day |
+-------+-----+-------+-----+
| 0 | 34 | 12 | 3 |
+-------+-----+-------+-----+
| 1 | 34 | 6 | 5 |
+-------+-----+-------+-----+
| 2 | 56 | 23 | 8 |
+-------+-----+-------+-----+
| 3 | 56 | 21 | 9 |
+-------+-----+-------+-----+
| 4 | 56 | 67 | 22 |
+-------+-----+-------+-----+
| ... | ... | ... | |
+-------+-----+-------+-----+
I want to group the price in a week like this:
+-------+-----+---------------------+
| index | id | price |
+-------+-----+---------------------+
| 0 | 34 | [12, 6] |
+-------+-----+---------------------+
| 1 | 56 | [23, 21], [67] |
+-------+-----+---------------------+
| ... | ... | ... |
+-------+-----+---------------------+
In the above table, the prices were grouped by their day. For example 12 and 6 are in 3 and 5 day that can be in the first week. So they are together, and so on.
Divide the day by 7 and add a column for the week number and group it into that unit. Which grouped data frames will be combined in a grouping without the week number.
df['weeknum'] = df['day'] // 7
df2 = df.groupby(['id','weeknum'])['price'].agg(list).to_frame()
df2['price'] = df2['price'].astype(str)
df2.groupby('id')['price'].agg(','.join).to_frame()
price
id
34 [12, 6]
56 [23, 21],[67]
Related
I'm trying to figure out a way to match a given value in a dataframe column to another dataframe column, and then storing an AGE from df1 in df2.
e.g. Matching VAL in df1 to VAL in df2. If the two are equal, store AGE from df1 in AGE df2.
| df1 | VAL | AGE |
|:--- |:---:|----:|
| 0 | 20 | 25 |
| 1 | 10 | 29 |
| 2 | 50 | 21 |
| 4 | 20 | 32 |
| 5 | 00 | 19 |
| df2 | VAL | AGE |
|:--- |:---:|----:|
| 0 | 00 | [] |
| 1 | 10 | [] |
| 2 | 20 | [] |
| 4 | 30 | [] |
| 5 | 40 | [] |
| 6 | 50 | [] |
edit: AGE in df2 stores an array of values rather than a single value
Try:
x = df1.groupby("VAL").agg(list)
df2["AGE"] = df2["VAL"].map(x["AGE"]).fillna({i: [] for i in df2.index})
print(df2)
Prints:
VAL AGE
0 0 [19]
1 10 [29]
2 20 [25, 32]
4 30 []
5 40 []
6 50 [21]
Is is possible to fetch column containing values corresponding to an id column?
Example:-
df1
| ID | Value | Salary |
|:---------:--------:|:------:|
| 1 | amr | 34 |
| 1 | ith | 67 |
| 2 | oaa | 45 |
| 1 | eea | 78 |
| 3 | anik | 56 |
| 4 | mmkk | 99 |
| 5 | sh_s | 98 |
| 5 | ahhi | 77 |
df2
| ID | Dept |
|:---------:--------:|
| 1 | hrs |
| 1 | cse |
| 2 | me |
| 1 | ece |
| 3 | eee |
Expected Output
| ID | Dept | Value |
|:---------:--------:|----------:|
| 1 | hrs | amr |
| 1 | cse | ith |
| 2 | me | oaa |
| 1 | ece | eea |
| 3 | eee | anik |
I want to fetch each values in the 'Value' column corresponding to values in df2's ID column. And create column containing 'Values' in df2. The number of rows in the two dfs are not the same. I have tried
this
Not worked
IIUC , you can try df.merge after assigning a helper column by doing groupby+cumcount on ID:
out = (df1.assign(k=df1.groupby("ID").cumcount())
.merge(df2.assign(k=df2.groupby("ID").cumcount()),on=['ID','k'])
.drop("k",1))
print(out)
ID Value Dept
0 1 Amr hrs
1 1 ith cse
2 2 oaa me
3 1 eea ece
4 3 anik eee
is this what you want to do?
df1.merge(df2, how='inner',on ='ID')
Since you have duplicated IDs in both dfs, but these are ordered, try:
df1 = df1.drop(columns="ID")
df3 = df2.merge(df1, left_index=True, right_index=True)
The df has the following columns,
col1 | col2 | col3 | Jan-19 | Feb-19 | Mar-19 | Apr-19 | May-19 | Jun-19 | Jul-19 | Aug-19 | Sep-19 | Oct-19 | Nov-19 | Dec-19 | Jan-20 | Feb-20 | Mar-20 | Apr-20 | May-20 | Jun-20 | Jul-20 | Aug-20 | Sep-20 | Oct-20 | Nov-20 | Dec-20
ab | cd | | 10 | 12 | 14 | 15 | 16 | 12 | 13 | 7 | 82 | 76 | 100 | 98 | 10 | 12 | 14 | 15 | 16 | 12 | 13 | 7 | 82 | 76 | 100 | 98
The month columns have numbers. I want to sum the month columns on the following condition,
Condition,
If the datetime.now().strftime('%b-%Y') is anything from Jun-19(for example) to Oct-19, then I want to sum the month columns from Oct-19 to Feb-20. If it was anything from Jun-20 to Oct-20, then sum of columns from Oct-20 to Feb-21 and so on.
If the datetime.now().strftime('%b-%Y') is anything from Nov-19 to May-19, then I want to sum the month columns from Mar-20 to Sep-20. If it was anything from Nov-20 to May-20, then sum of columns Mar-21 to Sep-21 and so on.
There should be a Total column at the end.
col1 | col2 | col3 | Jan-19 | Feb-19 | Mar-19 | Apr-19 | May-19 | Jun-19 | Jul-19 | Aug-19 | Sep-19 | Oct-19 | Nov-19 | Dec-19 | Jan-20 | Feb-20 | Mar-20 | Apr-20 | May-20 | Jun-20 | Jul-20 | Aug-20 | Sep-20 | Oct-20 | Nov-20 | Dec-20 | Total
ab | cd | | 10 | 12 | 14 | 15 | 16 | 12 | 13 | 7 | 82 | 76 | 100 | 98 | 10 | 12 | 14 | 15 | 16 | 12 | 13 | 7 | 82 | 76 | 100 | 98 | 296
Is there a way to create a generic condition for this so that it may work for x month and y year?
It is still confusing about what you are actually want to do.
But to your case, my suggestion is you can select the columns by their names and transpose the table.
Then you can sum the values along the row axis.
It is not very time costing on DataFrame.
In my opinion, the operation across the col axis in DataFrame is always harder than across the row axis.
Since in the row operation, one can use .query() function to easily filter the entries they want.
But not in the col direction.
My dataFrame looks like this :
+---------------+------+--------+
| Date | Type | Number |
+---------------+------+--------+
| 14-March-2020 | A | 10 |
| 14-March-2020 | B | 20 |
| 14-March-2020 | C | 30 |
| 15-March-2020 | A | 40 |
| 15-March-2020 | B | 50 |
| 15-March-2020 | C | 60 |
+---------------+------+--------+
I want to transform it to :
+---------------+----+----+----+
| Date | A | B | C |
+---------------+----+----+----+
| 14-March-2020 | 10 | 20 | 30 |
| 15-March-2020 | 40 | 50 | 60 |
+---------------+----+----+----+
I have tried using df.groupby('Date') - for an initial condensation - however that doesn't seem to work. Any help would be great.
A solution that removes also the index 'Type' that remains after pivoting the dataframe involves rename_axis after resetting the index.
import pandas as pd
df.pivot('Date', 'Type', 'Number').reset_index().rename_axis(columns={'Type': ''})
# Date A B C
# 0 14-March-2020 10 20 30
# 1 15-March-2020 40 50 60
If we omit rename_axis, we in fact obtain
df.pivot('Date', 'Type', 'Number').reset_index()
# Type Date A B C
# 0 14-March-2020 10 20 30
# 1 15-March-2020 40 50 60
I have a dataframe like below
+-----------+------------+---------------+------+-----+-------+
| InvoiceNo | CategoryNo | Invoice Value | Item | Qty | Price |
+-----------+------------+---------------+------+-----+-------+
| 1 | 1 | 77 | 128 | 1 | 10 |
| 1 | 1 | 77 | 101 | 1 | 11 |
| 1 | 2 | 77 | 105 | 3 | 12 |
| 1 | 3 | 77 | 129 | 2 | 10 |
| 2 | 1 | 21 | 145 | 1 | 9 |
| 2 | 2 | 21 | 130 | 1 | 12 |
+-----------+------------+---------------+------+-----+-------+
I want to filter the entire group, if any of the items in the list item_list = [128,129,130] is present in that group, after grouping by 'InvoiceNo' &'CategoryNo'.
My desired out put is as below
+-----------+------------+---------------+------+-----+-------+
| InvoiceNo | CategoryNo | Invoice Value | Item | Qty | Price |
+-----------+------------+---------------+------+-----+-------+
| 1 | 1 | 77 | 128 | 1 | 10 |
| 1 | 1 | 77 | 101 | 1 | 11 |
| 1 | 3 | 77 | 129 | 2 | 10 |
| 2 | 2 | 21 | 130 | 1 | 12 |
+-----------+------------+---------------+------+-----+-------+
I know how to filter a dataframe using isin(). But, not sure how to do it with groupby()
so far i have tried below
import pandas as pd
df = pd.read_csv('data.csv')
item_list = [128,129,130]
df.groupby(['InvoiceNo','CategoryNo'])['Item'].isin(item_list)
but nothing happens. please guide me how to solve this issue.
You can do something like this:
s = (df['Item'].isin(item_list)
.groupby([df['InvoiceNo'], df['CategoryNo']])
.transform('any')
)
df[s]