I'm trying to prompt a user to input a column name in a pandas dataframe and then use that input to display information about the column.
the code I've tried:
df = #initializing dataframe
user_input = input('enter column name')
print(df.user_input.describe())
but I got the error:
df has no attribute user_input
assuming that user input is actually valid column name, how can I use the input in such a way?
You can also access a column with df[]. Try:
df[user_input].describe()
Another way is to use getattr():
getattr(df, user_input).describe()
which I think is quite "unnatural".
pandas lets you lookup a column as an attribute reference if it meets python's syntax rules and doesn't interfere with an existing attribute on the object. In your case, pandas would look for a column literally named "user_input".
The more general way to lookup a column is with indexing. It does not have the same constraints. So,
f = #initializing dataframe
user_input = input('enter column name')
print(df[user_input].describe())
Now pandas will use the string entered by the user to look up the column.
One rule[1] of programming is that there should only be one "right way" of doing things. That's obviously not the case for pandas or python in general. But organizations may define what they consider "right". Since attribute lookup of columns only works sometimes, should it be used at all? Debatable!
[1]: The code is more what you'd call 'guidelines' than actual rules. -Hector Barbossa, Captain of the Black Perl.
Related
I've a column which has ratings like "4.1/5" I want to remove the slash (/) and it is a object type. I want to convert it to float so I'm trying to create a function to do that.
Please correct me what I'm doing wrong. I'm trying something like
def remove_slash_from_rating(ratings):
for i in ratings:
df[rate] = df[rate].str.replace(r'/','')
But when I'm imputing it (df["rate"] = df["rate"].apply(remove_slash_from_rating)) then I'm getting an error
NameError: name 'rate' is not defined
Please check the above post
There is no loop, no apply necessary, use Series.str.replace for column rate:
df["rate"] = df["rate"].str.replace(r'/','')
I have a specific question: I need to create a column name called "Plane type" for a column that contains the first 4 characters of the "TAIL_NUM" column.
How can I do this? I already imported the data and I can see it.
Creating new columns with Pandas (assuming that's what you're talking about) is very simple. Pandas also provides common string methods. Pandas Docs, Similar SO Question
You will use 'string slicing' which is worth reading about.
df['new_col'] = 'X'
or in your case:
df['Plane type'] = df['tail_num'].str[:4]
After viewing your code and assuming that the the column "TAIL_NUM" have string values, you can do like that:
df['Plane type'] = df["TAIL_NUM"].str[0:4]
If I have a dataframe df and want to access the unique values of ID, I can do something like this.
UniqueFactor = df.ID.unique()
But how can I convert this into a function in order to access different variables? I tried this, but it doesn't work
def separate_by_factor(df, factor):
# Separating the readings by given factor
UniqueFactor = df.factor.unique()
separate_by_factor('ID')
And it shouldn't, because I'm passing a string as a variable name. How can I get around this?
I don't know how I can better word the question, sorry for being too vague.
When you create a DataFrame, every column that is a valid identifier it's treated as an attribute. To access a column based on its name (like in your example), you need to use df[factor].unique().
Using ipython for interactive manipulation, the autocomplete feature helps expanding columns names quickly.
But given the column object, I'd like to get it's name but I haven't found a simple way to do it. Is there one?
I'm trying to avoid typing the full "ALongVariableName"
x = "ALongVariableName"
relevantColumn = df[x]
instead I type "df.AL<\Tab>" to get my series. So I have:
relevantColumn = df.ALongVariableName #now how can I get x?
But that series object doesn't carry its name or index in the dataframe. Did I miss it?
Thanks!
My PANDAS data has columns that were read as objects. I want to change these into floats. Following the post linked below (1), I tried:
pdos[cols] = pdos[cols].astype(float)
But PANDAS gives me an error saying that an object can't be recast as float.
ValueError: invalid literal for float(): 17_d
But when I search for 17_d in my data set, it tells me it's not there.
>>> '17_d' in pdos
False
I can look at the raw data to see what's happening outside of python, but feel if I'm going to take python seriously, I should know how to deal with this sort of issue. Why doesn't this search work? How could I do a search over objects for strings in PANDAS? Any advice?
Pandas: change data type of columns
of course it does, because you're only looking in the column list!
'17_d' in pdos
checks to see if '17_d' is in pdos.columns
so what you want to do is pdos[cols] == '17_d', which will give you a truth table. if you want to find which row it is, you can do (pdos[cols] == '17_d').any(1)