I've a column which has ratings like "4.1/5" I want to remove the slash (/) and it is a object type. I want to convert it to float so I'm trying to create a function to do that.
Please correct me what I'm doing wrong. I'm trying something like
def remove_slash_from_rating(ratings):
for i in ratings:
df[rate] = df[rate].str.replace(r'/','')
But when I'm imputing it (df["rate"] = df["rate"].apply(remove_slash_from_rating)) then I'm getting an error
NameError: name 'rate' is not defined
Please check the above post
There is no loop, no apply necessary, use Series.str.replace for column rate:
df["rate"] = df["rate"].str.replace(r'/','')
Related
I'm trying to prompt a user to input a column name in a pandas dataframe and then use that input to display information about the column.
the code I've tried:
df = #initializing dataframe
user_input = input('enter column name')
print(df.user_input.describe())
but I got the error:
df has no attribute user_input
assuming that user input is actually valid column name, how can I use the input in such a way?
You can also access a column with df[]. Try:
df[user_input].describe()
Another way is to use getattr():
getattr(df, user_input).describe()
which I think is quite "unnatural".
pandas lets you lookup a column as an attribute reference if it meets python's syntax rules and doesn't interfere with an existing attribute on the object. In your case, pandas would look for a column literally named "user_input".
The more general way to lookup a column is with indexing. It does not have the same constraints. So,
f = #initializing dataframe
user_input = input('enter column name')
print(df[user_input].describe())
Now pandas will use the string entered by the user to look up the column.
One rule[1] of programming is that there should only be one "right way" of doing things. That's obviously not the case for pandas or python in general. But organizations may define what they consider "right". Since attribute lookup of columns only works sometimes, should it be used at all? Debatable!
[1]: The code is more what you'd call 'guidelines' than actual rules. -Hector Barbossa, Captain of the Black Perl.
If I have a dataframe df and want to access the unique values of ID, I can do something like this.
UniqueFactor = df.ID.unique()
But how can I convert this into a function in order to access different variables? I tried this, but it doesn't work
def separate_by_factor(df, factor):
# Separating the readings by given factor
UniqueFactor = df.factor.unique()
separate_by_factor('ID')
And it shouldn't, because I'm passing a string as a variable name. How can I get around this?
I don't know how I can better word the question, sorry for being too vague.
When you create a DataFrame, every column that is a valid identifier it's treated as an attribute. To access a column based on its name (like in your example), you need to use df[factor].unique().
I have a dataframe called "modified_df".I have a variable that I am trying to aggregate, 'age' (trying to calculate things like mean). Currently, the datatype is showing as "object," which is why I am not able to aggregate it. I have cleaned through it, and everything seems to be an integer, but there is a chance I missed something.
I tried running this code
modified_df['Age'] = modified_df['Age'].astype('int')
I have attached the error message along with what "Age" looks like
You can try two different things.
Option 1: (converts to a float instead. This might not work, but will rule out if you have any ages that have any values that can't be an int, but can be a float.)
modified_df['Age'] = modified_df['Age'].astype('float')
Option 2: (ignores whatever is causing the error and returns original value)
modified_df['Age'] = modified_df['Age'].astype('int',errors = 'ignore')
Obviously there are some values in the "Age" column which are not converting to int like mentioned. Try using value_counts() to and explore the column, or drop non-int columns. Try doing:
modified_df['Age'] = modified_df['Age'].astype('int',errors='ignore')
See the astype() documentation here.
If I need to search if a value exists in a pandas data frame column , which has got a name without any spaces, then I simply do something like this
if value in df.Timestamp.values
This will work if the column name is Timestamp. However, I have got plenty of data with column names as 'Date Time'. How do I use the if in statement in that case?
If there is no easy way to check for this using the if in statement, can I search for the existence of the value in some other way? Note that I just need to search for the existence of the value. Also, this is not an index column.
Thank you for any inputs
It's better practice to use the square bracket notation:
df["Date Time"].values
Which does exactly the same thing
There are 2 ways of indexing columns in pandas. One is using the dot notation which you are using and the other is using square brackets. Both work the same way.
if value in df["Date Time"].values
in the case where you want to work with a column that has a header name with spaces
but you don't want it changed because you may have to forward the file
...one way is to just rename it, do whatever you want with the new no-spaced-name, them rename it back...# e.g. to drop the rows with the value "DUMMY" in the column 'Recipient Fullname'
df.rename(columns={'Recipient Fullname':'Recipient_Fullname'}, inplace=True)
df = df[(df.Recipient_Fullname != "DUMMY")]
df.rename(columns={'Recipient_Fullname':'Recipient Fullname'}, inplace=True)
Using ipython for interactive manipulation, the autocomplete feature helps expanding columns names quickly.
But given the column object, I'd like to get it's name but I haven't found a simple way to do it. Is there one?
I'm trying to avoid typing the full "ALongVariableName"
x = "ALongVariableName"
relevantColumn = df[x]
instead I type "df.AL<\Tab>" to get my series. So I have:
relevantColumn = df.ALongVariableName #now how can I get x?
But that series object doesn't carry its name or index in the dataframe. Did I miss it?
Thanks!