I have a dataframe like the following:
+-------+-------+
| Group | Price |
+-------+-------+
| A | 2 |
| B | 3 |
| A | 1 |
| C | 4 |
| B | 2 |
+-------+-------+
I would like to create a column, that would give me the in which range (if I divided each group into 4 intervals) my price value is within each group.
+-------+-------+--------------------------+
| Group | Price | Range |
+-------+-------+--------------------------+
| A | 2 | [1-2] |
| B | 3 | [2-3] |
| A | 1 | [0-1] |
| C | 4 | [0-4] |
| B | 2 | [0-2] |
+-------+-------+--------------------------+
Anyone has any idea by using pandas pd.cut and groupby operations?
Thanks
You can pass pd.cut to groupby():
df['Range'] = df.groupby('Group')['Price'].transform(pd.cut, bins=4)
I have two dataframes.
df1 = pd.DataFrame({
'id':[1,1,1,1,1,1,2,2,2,2,2,2],
'pp':[3,'',2,'',1,0,4, 3, 2, 1, '', 0],
'pc':[6,5,4,3,2,1,6,5,4,3,2,1]
})
| | id | pp | pc |
|---:|-----:|:-----|-----:|
| 0 | 1 | 3 | 6 |
| 1 | 1 | | 5 |
| 2 | 1 | 2 | 4 |
| 3 | 1 | | 3 |
| 4 | 1 | 1 | 2 |
| 5 | 1 | 0 | 1 |
| 6 | 2 | 4 | 6 |
| 7 | 2 | 3 | 5 |
| 8 | 2 | 2 | 4 |
| 9 | 2 | 1 | 3 |
| 10 | 2 | | 2 |
| 11 | 2 | 0 | 1 |
df2 = pd.DataFrame({
'id':[1,1,1,2,2,2],
'pp':['', 3, 4, 1, 2, ''],
'yu':[1,2,3,4,5,6]
})
| | id | pp | yu |
|---:|-----:|:-----|-----:|
| 0 | 1 | | 1 |
| 1 | 1 | 3 | 2 |
| 2 | 1 | 4 | 3 |
| 3 | 2 | 1 | 4 |
| 4 | 2 | 2 | 5 |
| 5 | 2 | | 6 |
I'd like to merge the two so that final results look like this.
| | id | pp | pc | yu |
|---:|-----:|:-----|:-----|-----:|
| 0 | 1 | | | 1 |
| 1 | 1 | 0 | 1 | 2 |
| 2 | 1 | 3 | 6 | 3 |
| 3 | 2 | 1 | 3 | 4 |
| 4 | 2 | 2 | 4 | 5 |
| 5 | 2 | | | 6 |
Basically, the df1 has the value that I need to lookup from.
df2 is the has id and pp column that are used to lookup.
However when I do
pd.merge(df2, df1, on=['id', 'pp'], how='left') results in
| | id | pp | pc | yu |
|---:|-----:|:-----|-----:|-----:|
| 0 | 1 | | 5 | 1 |
| 1 | 1 | | 3 | 1 |
| 2 | 1 | 3 | 6 | 2 |
| 3 | 1 | 4 | nan | 3 |
| 4 | 2 | 1 | 3 | 4 |
| 5 | 2 | 2 | 4 | 5 |
| 6 | 2 | | 2 | 6 |
This is not correct because it looks at empty rows as well.
If the value in df2 is empty, there should be no mapping.
I do want to keep the empty rows in df2 as it showed so can't use inner join
We can dropna for empty row in df1
out = pd.merge(df2, df1.replace({'':np.nan}).dropna(), on=['id', 'pp'], how='left')
Out[121]:
id pp yu pc
0 1 1 NaN
1 1 3 2 6.0
2 1 4 3 NaN
3 2 1 4 3.0
4 2 2 5 4.0
5 2 6 NaN
Is is possible to fetch column containing values corresponding to an id column?
Example:-
df1
| ID | Value | Salary |
|:---------:--------:|:------:|
| 1 | amr | 34 |
| 1 | ith | 67 |
| 2 | oaa | 45 |
| 1 | eea | 78 |
| 3 | anik | 56 |
| 4 | mmkk | 99 |
| 5 | sh_s | 98 |
| 5 | ahhi | 77 |
df2
| ID | Dept |
|:---------:--------:|
| 1 | hrs |
| 1 | cse |
| 2 | me |
| 1 | ece |
| 3 | eee |
Expected Output
| ID | Dept | Value |
|:---------:--------:|----------:|
| 1 | hrs | amr |
| 1 | cse | ith |
| 2 | me | oaa |
| 1 | ece | eea |
| 3 | eee | anik |
I want to fetch each values in the 'Value' column corresponding to values in df2's ID column. And create column containing 'Values' in df2. The number of rows in the two dfs are not the same. I have tried
this
Not worked
IIUC , you can try df.merge after assigning a helper column by doing groupby+cumcount on ID:
out = (df1.assign(k=df1.groupby("ID").cumcount())
.merge(df2.assign(k=df2.groupby("ID").cumcount()),on=['ID','k'])
.drop("k",1))
print(out)
ID Value Dept
0 1 Amr hrs
1 1 ith cse
2 2 oaa me
3 1 eea ece
4 3 anik eee
is this what you want to do?
df1.merge(df2, how='inner',on ='ID')
Since you have duplicated IDs in both dfs, but these are ordered, try:
df1 = df1.drop(columns="ID")
df3 = df2.merge(df1, left_index=True, right_index=True)
Assuming I have the following table:
+----+---+---+
| A | B | C |
+----+---+---+
| 1 | 1 | 3 |
| 2 | 2 | 7 |
| 6 | 3 | 2 |
| -1 | 9 | 0 |
| 2 | 1 | 3 |
| -8 | 8 | 2 |
| 2 | 1 | 9 |
+----+---+---+
if column A's value is Negative, update column B's value by the value of column C. if not do nothing
This is the desired output:
+----+---+---+
| A | B | C |
+----+---+---+
| 1 | 1 | 3 |
| 2 | 2 | 7 |
| 6 | 3 | 2 |
| -1 | 0 | 0 |
| 2 | 1 | 3 |
| -8 | 2 | 2 |
| 2 | 1 | 9 |
+----+---+---+
I've been trying the following code but it's not working
#not working
result.loc(result["A"] < 0,result['B'] = result['C'].iloc[0])
result.B[result.A < 0] = result.C
Try this:
df.loc[df['A'] < 0, 'B'] = df['C']
So I have a dataframe similar to this:
timestamp | name
------------+------------
1 | a
1 | b
2 | c
2 | d
2 | e
3 | f
4 | g
Essentially I want to get min and max value of each timestamp session(defined by unique timestamp value, there are 4 sessions in this example), the expected result would something like this:
timestamp | name | start | end
------------+----------+--------+------
1 | a | 1 | 2
1 | b | 1 | 2
2 | c | 2 | 3
2 | d | 2 | 3
2 | e | 2 | 3
3 | f | 3 | 4
4 | g | 4 | 4
I am thinking index on timestamp column, then "move up" the index by 1, yet this approach didn't work on the forth bucket in the example above.
Any help is greatly appreciated!
try numpy.clip(), such as df['end']=numpy.clip(df['timestamp']+1, 0, 4)