I am trying to create multiple dataframes inside a for loop using the below code:
for i in range(len(columns)):
f'df_v{i+1}' = df.pivot(index="no", columns=list1[i], values=list2[i])
But I get the error "Cannot assign to literal". Not sure whether there is a way to create the dataframes dynamically in pandas?
This syntax
f'df_v{i+1}' = df.pivot(index="no", columns=list1[i], values=list2[i])
means that you are trying to assign DataFrames to a string, which is not possible. You might try using a dictionary, instead:
my_dfs = {}
for i in range(len(columns)):
my_dfs[f'df_v{i+1}'] = df.pivot(index="no", columns=list1[i], values=list2[i])
Since it allows the use of named keys, which seems like what you want. This way you can access your dataframes using my_dfs['df_v1'], for example.
Related
If I have a dataframe df and want to access the unique values of ID, I can do something like this.
UniqueFactor = df.ID.unique()
But how can I convert this into a function in order to access different variables? I tried this, but it doesn't work
def separate_by_factor(df, factor):
# Separating the readings by given factor
UniqueFactor = df.factor.unique()
separate_by_factor('ID')
And it shouldn't, because I'm passing a string as a variable name. How can I get around this?
I don't know how I can better word the question, sorry for being too vague.
When you create a DataFrame, every column that is a valid identifier it's treated as an attribute. To access a column based on its name (like in your example), you need to use df[factor].unique().
I run into this problem frequently and it isn't clear to me why the below python code will run
groups = session['time'].dt.total_seconds().groupby(session['user'])
but this python code will not run
groups = session['time'].dt.total_seconds().groupby(session[['user','date']])
or
groups = session['time'].dt.total_seconds().groupby(session['user','date'])
Why can't I tack on another column to groupby in this way? How can I write this statement better?
Thank you for guidance, I'm a newbie with Python
You are creating a series with session['time'], and thus a SeriesGroupBy object with this code, but you seem to want to access other columns in the dataframe.
The more common syntax is grouped = df.groupby(columns_to_group_by)[columns_to_keep]. I wouldn't name the variable groups because that is also the name of a property of the GroupBy object.
I have a dataframe in pandas that I need to use to create other dataframes from.
the dataframe contains naics codes along with related data. I am trying to create a new dataframe per code in essence and getting stuck on an error.
fdf is a dataframe with 2 digit numbers ie: 10,11,12,13.
I want to loop through this dataframe to query and build many others. Here is what I have so far:
for x in fdf:
'Sdf' + str(x) = df[df['naics'].astype(str).str[2:4]==str(x)]
if I run this by itself:
df[df['naics'].astype(str).str[2:4]==str(57)]
it returns the dataframe I want, but I am not sure how to build this into a function.
'SyntaxError: can't assign to function call' is the error I get. I think the issue is how I am trying to dynamically build the dataframe name?
any help is greatly appreciated.
Do it with use of dictionary.
df_list = {}
for x in fdf:
df_list[str(x)] = df[df['naics'].astype(str).str[2:4]==str(x)]
I'm new with Pandas so this is basic question. I created a Dataframe by concatenating two previous Dataframes. I used
todo_pd = pd.concat([rabia_pd, capitan_pd], keys=['Rabia','Capitan'])
thinking that in the future I could separate them easily and saving each one to a different location. Right now I'm being unable to do this separation using the keys I defined with the concat function.
I've tried simple things like
half_dataframe = todo_pd['Rabia']
but it throws me an error saying that there is a problem with the key.
I've also tried with other options I've found in SO, like using the
_get_values('Rabia'),or the.index._get_level_values('Rabia')features, but they all throw me different errors regarding that it does not recognize a string as a way to access the information, or that it requires positional argument: 'level'
The whole Dataframe contains about 22 columns, and I just want to retrieve from the "big dataframe" the part indexed as 'Rabia' and the part index as 'Capitan'.
I'm sure it has a simple solution that I'm not getting for my lack of practice with Pandas.
Thanks a lot,
Use DataFrame.xs:
df1 = todo_pd.xs('Rabia')
df2 = todo_pd.xs('Capitan')
Using ipython for interactive manipulation, the autocomplete feature helps expanding columns names quickly.
But given the column object, I'd like to get it's name but I haven't found a simple way to do it. Is there one?
I'm trying to avoid typing the full "ALongVariableName"
x = "ALongVariableName"
relevantColumn = df[x]
instead I type "df.AL<\Tab>" to get my series. So I have:
relevantColumn = df.ALongVariableName #now how can I get x?
But that series object doesn't carry its name or index in the dataframe. Did I miss it?
Thanks!