I run into this problem frequently and it isn't clear to me why the below python code will run
groups = session['time'].dt.total_seconds().groupby(session['user'])
but this python code will not run
groups = session['time'].dt.total_seconds().groupby(session[['user','date']])
or
groups = session['time'].dt.total_seconds().groupby(session['user','date'])
Why can't I tack on another column to groupby in this way? How can I write this statement better?
Thank you for guidance, I'm a newbie with Python
You are creating a series with session['time'], and thus a SeriesGroupBy object with this code, but you seem to want to access other columns in the dataframe.
The more common syntax is grouped = df.groupby(columns_to_group_by)[columns_to_keep]. I wouldn't name the variable groups because that is also the name of a property of the GroupBy object.
Related
I am trying to create multiple dataframes inside a for loop using the below code:
for i in range(len(columns)):
f'df_v{i+1}' = df.pivot(index="no", columns=list1[i], values=list2[i])
But I get the error "Cannot assign to literal". Not sure whether there is a way to create the dataframes dynamically in pandas?
This syntax
f'df_v{i+1}' = df.pivot(index="no", columns=list1[i], values=list2[i])
means that you are trying to assign DataFrames to a string, which is not possible. You might try using a dictionary, instead:
my_dfs = {}
for i in range(len(columns)):
my_dfs[f'df_v{i+1}'] = df.pivot(index="no", columns=list1[i], values=list2[i])
Since it allows the use of named keys, which seems like what you want. This way you can access your dataframes using my_dfs['df_v1'], for example.
I'm trying to organise data using a pandas dataframe.
Given the structure of the data it seems logical to use a composite index; 'league_id' and 'fixture_id'. I believe I have implemented this according to the examples in the docs, however I am unable to access the data using the index.
My code can be found here;
https://repl.it/repls/OldCorruptRadius
** I am very new to Pandas and programming in general, so any advice would be much appreciated! Thanks! **
For multi-indexing, you would need to use the pandas MutliIndex API, which brings its own learning curve; thus, I would not recommend it for beginners. Link: https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html
The way that I use multi-indexing is only to display a final product to others (i.e. making it easy/pretty to view). Before the multi-indexing, you filter the fixture_id and league_id as columns first:
df = pd.DataFrame(fixture, columns=features)
df[(df['fixture_id'] == 592) & (df['league_id'] == 524)]
This way, you are still technically targeting the indexes if you would have gone through with multi-indexing the two columns.
If you have to use multi-indexing, try the transform feature of a pandas DataFrame. This turns the indexes into columns and vise-versa. For example, you can do something like this:
df = pd.DataFrame(fixture, columns=features).set_index(['league_id', 'fixture_id'])
df.T[524][592].loc['event_date'] # gets you the row of `event_dates`
df.T[524][592].loc['event_date'].iloc[0] # gets you the first instance of event_dates
I have a dataframe in pandas that I need to use to create other dataframes from.
the dataframe contains naics codes along with related data. I am trying to create a new dataframe per code in essence and getting stuck on an error.
fdf is a dataframe with 2 digit numbers ie: 10,11,12,13.
I want to loop through this dataframe to query and build many others. Here is what I have so far:
for x in fdf:
'Sdf' + str(x) = df[df['naics'].astype(str).str[2:4]==str(x)]
if I run this by itself:
df[df['naics'].astype(str).str[2:4]==str(57)]
it returns the dataframe I want, but I am not sure how to build this into a function.
'SyntaxError: can't assign to function call' is the error I get. I think the issue is how I am trying to dynamically build the dataframe name?
any help is greatly appreciated.
Do it with use of dictionary.
df_list = {}
for x in fdf:
df_list[str(x)] = df[df['naics'].astype(str).str[2:4]==str(x)]
I'm new with Pandas so this is basic question. I created a Dataframe by concatenating two previous Dataframes. I used
todo_pd = pd.concat([rabia_pd, capitan_pd], keys=['Rabia','Capitan'])
thinking that in the future I could separate them easily and saving each one to a different location. Right now I'm being unable to do this separation using the keys I defined with the concat function.
I've tried simple things like
half_dataframe = todo_pd['Rabia']
but it throws me an error saying that there is a problem with the key.
I've also tried with other options I've found in SO, like using the
_get_values('Rabia'),or the.index._get_level_values('Rabia')features, but they all throw me different errors regarding that it does not recognize a string as a way to access the information, or that it requires positional argument: 'level'
The whole Dataframe contains about 22 columns, and I just want to retrieve from the "big dataframe" the part indexed as 'Rabia' and the part index as 'Capitan'.
I'm sure it has a simple solution that I'm not getting for my lack of practice with Pandas.
Thanks a lot,
Use DataFrame.xs:
df1 = todo_pd.xs('Rabia')
df2 = todo_pd.xs('Capitan')
Using ipython for interactive manipulation, the autocomplete feature helps expanding columns names quickly.
But given the column object, I'd like to get it's name but I haven't found a simple way to do it. Is there one?
I'm trying to avoid typing the full "ALongVariableName"
x = "ALongVariableName"
relevantColumn = df[x]
instead I type "df.AL<\Tab>" to get my series. So I have:
relevantColumn = df.ALongVariableName #now how can I get x?
But that series object doesn't carry its name or index in the dataframe. Did I miss it?
Thanks!