find column name in dataframe - python

Using ipython for interactive manipulation, the autocomplete feature helps expanding columns names quickly.
But given the column object, I'd like to get it's name but I haven't found a simple way to do it. Is there one?
I'm trying to avoid typing the full "ALongVariableName"
x = "ALongVariableName"
relevantColumn = df[x]
instead I type "df.AL<\Tab>" to get my series. So I have:
relevantColumn = df.ALongVariableName #now how can I get x?
But that series object doesn't carry its name or index in the dataframe. Did I miss it?
Thanks!

Related

how to pass user input as a method?

I'm trying to prompt a user to input a column name in a pandas dataframe and then use that input to display information about the column.
the code I've tried:
df = #initializing dataframe
user_input = input('enter column name')
print(df.user_input.describe())
but I got the error:
df has no attribute user_input
assuming that user input is actually valid column name, how can I use the input in such a way?
You can also access a column with df[]. Try:
df[user_input].describe()
Another way is to use getattr():
getattr(df, user_input).describe()
which I think is quite "unnatural".
pandas lets you lookup a column as an attribute reference if it meets python's syntax rules and doesn't interfere with an existing attribute on the object. In your case, pandas would look for a column literally named "user_input".
The more general way to lookup a column is with indexing. It does not have the same constraints. So,
f = #initializing dataframe
user_input = input('enter column name')
print(df[user_input].describe())
Now pandas will use the string entered by the user to look up the column.
One rule[1] of programming is that there should only be one "right way" of doing things. That's obviously not the case for pandas or python in general. But organizations may define what they consider "right". Since attribute lookup of columns only works sometimes, should it be used at all? Debatable!
[1]: The code is more what you'd call 'guidelines' than actual rules. -Hector Barbossa, Captain of the Black Perl.

Convert string to already existing variable (pandas)

If I have a dataframe df and want to access the unique values of ID, I can do something like this.
UniqueFactor = df.ID.unique()
But how can I convert this into a function in order to access different variables? I tried this, but it doesn't work
def separate_by_factor(df, factor):
# Separating the readings by given factor
UniqueFactor = df.factor.unique()
separate_by_factor('ID')
And it shouldn't, because I'm passing a string as a variable name. How can I get around this?
I don't know how I can better word the question, sorry for being too vague.
When you create a DataFrame, every column that is a valid identifier it's treated as an attribute. To access a column based on its name (like in your example), you need to use df[factor].unique().

Python pandas clarity on syntax for groupby

I run into this problem frequently and it isn't clear to me why the below python code will run
groups = session['time'].dt.total_seconds().groupby(session['user'])
but this python code will not run
groups = session['time'].dt.total_seconds().groupby(session[['user','date']])
or
groups = session['time'].dt.total_seconds().groupby(session['user','date'])
Why can't I tack on another column to groupby in this way? How can I write this statement better?
Thank you for guidance, I'm a newbie with Python
You are creating a series with session['time'], and thus a SeriesGroupBy object with this code, but you seem to want to access other columns in the dataframe.
The more common syntax is grouped = df.groupby(columns_to_group_by)[columns_to_keep]. I wouldn't name the variable groups because that is also the name of a property of the GroupBy object.

How to build a Python function to build Pandas DataFrames dynamically?

I have a dataframe in pandas that I need to use to create other dataframes from.
the dataframe contains naics codes along with related data. I am trying to create a new dataframe per code in essence and getting stuck on an error.
fdf is a dataframe with 2 digit numbers ie: 10,11,12,13.
I want to loop through this dataframe to query and build many others. Here is what I have so far:
for x in fdf:
'Sdf' + str(x) = df[df['naics'].astype(str).str[2:4]==str(x)]
if I run this by itself:
df[df['naics'].astype(str).str[2:4]==str(57)]
it returns the dataframe I want, but I am not sure how to build this into a function.
'SyntaxError: can't assign to function call' is the error I get. I think the issue is how I am trying to dynamically build the dataframe name?
any help is greatly appreciated.
Do it with use of dictionary.
df_list = {}
for x in fdf:
df_list[str(x)] = df[df['naics'].astype(str).str[2:4]==str(x)]

Python Pandas Dataframe Pulling cell value of Column B based on Column A

struggling here. Probably missing something incredibly easy, but beating my head on my desk while trying to learn Python and realizing that's probably not going to solve this for me.
I have a dataframe df and need to pull the value of column B based on the value of column A.
Here's what I can tell you of my dataset that should make it easier. Column A is unique (FiscalYear) but despite being a year was converted to_numeric. Column B is not specifically unique (Sales) and like Column A was converted to to_numeric. This is what I have been trying as I was able to do this when finding the value of sales using idx max. However at a specific value, this is returning an error:
v = df.at[df.FiscalYear == 2007.0, 'Sales']
I am getting ValueError: At based indexing on an integer index can only have integer indexers I am certain that I am doing something wrong, but I can't quite put my finger on it.
And here's the code that is working for me.
v = df.at[df.FiscalYear.idxmax(), 'Sales']
No issues there, returning the proper value, etc.
Any help is appreciated. I saw a bunch of similar threads, but for some reason searching and blindly writing lines of code is failing me tonight.
you can use .loc method
df.Sales.loc[df.FiscalYear==2007.0]
this will be pandas series type object.
if you want it in a list you can do:
df.Sales.loc[df.FiscalYear==2007.0].tolist()
Can you try this:
v = df.at[df.FiscalYear.eq(2007.0).index[0], 'Sales']

Categories