I have a dataframe like this:
name time session1 session2 session3
Alex 135 10 3 5
Lee 136 2 6 4
I want to make multiple dataframes based on each session. for example, i want to make dataframe one that has name, time, and session1. and dataframe 2 has name, time, and session2. and dataframe 3 has name, time, and session3. I want to use for loop or any other way is better but don't know how to choose column 1,2,3 at one time but column 1,2, 4 and etc. Any one has idea about that. The data is saved in pandas dataframe. I just don't know how to type it in Stackoverflow here. Thank you
I don't think you need to create a new dictionary for that.
Just directly slice your data frame whenever needed.
df[['name', 'time', 'session 1']]
If you think the following design can help you, you can set the name and time to be indexes (df.set_index(['name', 'time'])) and just simply
df['session 1']
Organize it into a dictionary of dataframes:
dict_of_dfs = {f'df {i}':df[['name','time', i]] for i in df.columns[2:]}
Then you can access each dataframe as you would any other dictionary values:
>>> dict_of_dfs['df session1']
name time session1
0 Alex 135 10
1 Lee 136 2
>>> dict_of_dfs['df session2']
name time session2
0 Alex 135 3
1 Lee 136 6
Related
I have a python pandas dataframe of stock data, and I'm trying to filter some of those tickers.
There are companies that have 2 or more tickers (different types of shares when a share is preferred and the other not).
I want to drop the lines of those additional share values, and let just the share with the higher volume. In the dataframe I also have the company name, so maybe there is a way of using it to make some condition and then drop it when comparing the volume of the same company? How can I do this?
Use groupby and idxmax:
Suppose this dataframe:
>>> df
ticker volume
0 CEBR3 123
1 CEBR5 456
2 CEBR6 789 # <- keep for group CEBR
3 GOAU3 23 # <- keep for group GOAU
4 GOAU4 12
5 CMIN3 135 # <- keep for group CMIN3
>>> df.loc[df.groupby(df['ticker'].str.extract(r'^(.*)\d', expand=False),
sort=False)['volume'].idxmax().tolist()]
ticker volume
2 CEBR6 789
3 GOAU3 23
5 CMIN3 135
I have two pandas dataframes with some columns in common. These columns are of type category but unfortunately the category codes don't match for the two dataframes. For example I have:
>>> df1
artist song
0 The Killers Mr Brightside
1 David Guetta Memories
2 Estelle Come Over
3 The Killers Human
>>> df2
artist date
0 The Killers 2010
1 David Guetta 2012
2 Estelle 2005
3 The Killers 2006
But:
>>> df1['artist'].cat.codes
0 55
1 78
2 93
3 55
Whereas:
>>> df2['artist'].cat.codes
0 99
1 12
2 23
3 99
What I would like is for my second dataframe df2 to take the same category codes as the first one df1 without changing the category values. Is there any way to do this?
(Edit)
Here is a screenshot of my two dataframes. Essentially I want the song_tags to have the same cat codes for artist_name and track_name as the songs dataframe. Also song_tags is created from a merge between songs and another tag dataframe (which contains song data and their tags, without the user information) and then saved and loaded through pickle. Also it might be relevant to add that I had to cast artist_name and track_name in song_tags to type category from type object.
I think essentially my question is: how to modify category codes of an existing dataframe column?
I have problem where I need to update a value if people were at the same table.
import pandas as pd
data = {"p1":['Jen','Mark','Carrie'],
"p2":['John','Jason','Rob'],
"value":[10,20,40]}
df = pd.DataFrame(data,columns=["p1",'p2','value'])
meeting = {'person':['Jen','Mark','Carrie','John','Jason','Rob'],
'table':[1,2,3,1,2,3]}
meeting = pd.DataFrame(meeting,columns=['person','table'])
df is a relationship table and value is the field i need to update. So if two people were at the same table in the meeting dataframe then update the df row accordingly.
for example: Jen and John were both at table 1, so I need to update the row in df that has Jen and John and set their value to value + 100 so 110.
I thought about maybe doing a self join on meeting to get the format to match that of df but not sure if this is the easiest or fastest approach
IIUC you could set the person as index in the meeting dataframe, and use its table values to replace the names in df. Then if both mappings have the same value (table), replace with df.value+100:
m = df[['p1','p2']].replace(meeting.set_index('person').table).eval('p1==p2')
df['value'] = df.value.mask(m, df.value+100)
print(df)
p1 p2 value
0 Jen John 110
1 Mark Jason 120
2 Carrie Rob 140
This could be an approach, using df.to_records():
groups=meeting.groupby('table').agg(set)['person'].to_list()
df['value']=[row[-1]+100 if set(list(row)[1:3]) in groups else row[-1] for row in df.to_records()]
Output:
df
p1 p2 value
0 Jen John 110
1 Mark Jason 120
2 Carrie Rob 140
I'm trying to get the total number of books that an author wrote and put it in a column called book number with my dataframe that has 15 other columns.
I checked online and people use groupby with count(), however it doesn't create the column that I want, it only gives a column of numbers without a name and I can't put it together with the original dataframe.
author_count_df = (df_author["Name"]).groupby(df_author["Name"]).count()
print(author_count_df)
Result:
Name
A D 3
A Gill 4
A GOO 3
ALL SHOT 10
AMIT PATEL 5
..
vishal raina 7
walt walter 6
waqas alhafidh 3
yogesh koshal 8
zainab m.jawad 9
Name: Name, Length: 696, dtype: int64
Expected: A dataframe with
Name other 14 columns from author_df Book Number
A D ... 3
A Gill ... 4
A GOO ... 3
ALL SHOT ... 10
AMIT PATEL ... 5
... ..
vishal raina ... 7
walt walter ... 6
waqas alhafidh ... 3
yogesh koshal ... 8
zainab m.jawad ... 9
Use transform with the groupby and assign it back:
df_author['Book Number']=df_author.groupby("Name")['Name'].transform('count')
For a new df, use:
author_count_df = df_author.assign(BookNum=df_author.groupby("Name")['Name']
.transform('count'))
Use reset_index()
author_count_df = (df_author["Name"]).groupby(df_author["Name"]).count().reset_index()
This basically tells the pandas groupby to reset back to the original index
You have done the good Job except you need to check how to populate or assign the values back into a new column which you have got, Which you can achieve with DataFrame.assign method which does the Job quite elegantly.
Straight from the Docs:
Assign new columns to a DataFrame.
Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.
I have a pandas data-frame that looks like this:
ID Hobbby Name
1 Travel Kevin
2 Photo Andrew
3 Travel Kevin
4 Cars NaN
5 Photo Andrew
6 Football NaN
.............. 1303 rows.
The number of Names filled in might be large then 2 as well. I would like to end up the entire Names column filled n equally into the names ( or+1 in the case of even number of rows). I already store into a variable number of names the total number of names. In the above case it's 2. I tried filtering and counting by each name but I don't know how to make this when the number of name is dynamic.
Expected Dataframe:
ID Hobbby Name
1 Travel Kevin
2 Photo Andrew
3 Travel Kevin
4 Cars Kevin
5 Photo Andrew
6 Football Andrew
I tried: replace NaN with 0 in Column Name using fillna. Filter the column and end up with a dataframe that has only the na fields and afterwards len(df) to get the number of nan and from here created 2 databases each containing half of the df. Bu I think this approach is completely wrong as I do not always have 2 Names. There could be2,3,4 etc. ( this is given by a dictionary)
Any help highly appreciated
Thanks.
It's difficult to tell but I think you need ffill
df['Name'] = df['Name'].ffill()