Convert objects to numeric values [duplicate] - python

This question already has answers here:
Change Pandas String Column with commas into Float
(2 answers)
Closed 6 months ago.
I have a CSV file and it has a column full of numbers. These numbers can be formatted as 45.11 , 1,234.33, 122.33, 10,222.22 etc.
Right now they are showing up as objects in my data frame, and i need to convert them to numeric. I have tried:
df['Value'].astype(str).astype(float)
But am getting errors like this:
ValueError: could not convert string to float: '1,054.43'
Does anyone know how to solve this for the weirdly formatted numbers?

this should make the job
vals={'Value': ["45.11" , "1,234.33", "122.33", "10,222.22"]}
df = pd.DataFrame(vals)
df.Value = df.Value.apply(lambda x: x.replace(",", "")).astype(float)
print(df.Value)
output
0 45.11
1 1234.33
2 122.33
3 10222.22
Name: Value, dtype: float64

Related

How to select rows from pandas dataframe by looking a feature' data types when a feature contains more than one type of value [duplicate]

This question already has answers here:
Select row from a DataFrame based on the type of the object(i.e. str)
(3 answers)
Closed 3 months ago.
I have a dataframe with 3 features: id, name and point. I need to select rows that type of 'point' value is string.
id
name
point
0
x
5
1
y
6
2
z
ten
3
t
nine
4
q
two
How can I split the dataframe just looking by type of one feature' value?
I tried to modify select_dtypes method but I lost. Also I tried to divide dataset with using
df[df[point].dtype == str] or df[df[point].dtype is str]
but didn't work.
Technically, the answer would be:
out = df[df['point'].apply(lambda x: isinstance(x, str))]
But this would also select rows containing a string representation of a number ('5').
If you want to select "strings" as opposed to "numbers" whether those are real numbers or string representations, you could use:
m = pd.to_numeric(df['point'], errors='coerce')
out = df[df['point'].notna() & m]
The question is now, what if you have '1A' or 'AB123' as value?

Need to plot Pairplot for a dataframe that has duplicate indices [duplicate]

This question already has answers here:
dataframe to long format
(2 answers)
Reshape wide to long in pandas
(2 answers)
Closed 9 months ago.
I have a dataframe 'df' (310, 7) and need to plot a pairplot for it. But I'm getting an error <ValueError: cannot reindex from a duplicate axis> when I do it in a regular way.
sns.pairplot(df,hue='Class')
ValueError: cannot reindex from a duplicate axis
The data is of this form:
[data]
P_incidence P_tilt L_angle S_slope P_radius S_Degree Class
0 38.505273 16.964297 35.112814 21.540976 127.632875 7.986683 Normal
1 54.920858 18.968430 51.601455 35.952428 125.846646 2.001642 Normal
2 44.362490 8.945435 46.902096 35.417055 129.220682 4.994195 Normal
3 48.318931 17.452121 48.000000 30.866809 128.980308 -0.910941 Normal
4 45.701789 10.659859 42.577846 35.041929 130.178314 -3.388910 Normal
I tried removing the duplicates using:
df.loc[df['L_angle'].duplicated(), 'L_angle'] = ''
But, this method converts the column to an object and I'm not able to negate it.
The expected output plot is as follows:
[expected]

TypeError: must be str, not float when combining multiple columns [duplicate]

This question already has answers here:
How to concatenate multiple column values into a single column in Pandas dataframe
(15 answers)
Closed 2 years ago.
I have a dataset like this:
customer_id offer_id time
0 78afa995795e4d85b5d9ceeca43f5fef 9b98b8c7a33c4b65b9aebfe6a799e6d9 0.0
1 a03223e636434f42ac4c3df47e8bac43 0b1e1539f2cc45b7b9fa7c272da2e1d7 0.0
I wanted to combine those three column together, the datatype of them are:
customer_id object
offer_id object
time float64
When I use the code below it works fine:
check_1 = transcript['customer_id'] + '--' +transcript['offer_id']
check_1.value_counts()
This returns:
6d2db3aad94648259e539920fc2cf2a6--f19421c1d4aa40978ebb69ca19b0e20d 10
2ea50de315514ccaa5079db4c1ecbc0b--fafdcd668e3743c1bb461111dcafc2a4 10
23d67a23296a485781e69c109a10a1cf--5a8bc65990b245e5a138643cd4eb9837 10
........
But when I tried to combine the time column (because I want to check if any customer received multiple offers with the same timestamp), it gave me error TypeError: must be str, not float
check_2 = transcript['customer_id'] + '--' +transcript['offer_id'] + '--' + transcript['time']
check_2.value_counts()
I tried to convert float to str:
check_2 = transcript['customer_id'] + '--' +transcript['offer_id'] + '--' + str(transcript['time'])
check_2.value_counts()
This returns some odd results:
eece6a9a7bdd4ea1b0f812f34fc619d6--5a8bc65990b245e5a138643cd4eb9837--0 0.00\n1 0.00\n2 0.00\n3 0.00\n4 0.00\n ... \n306529 29.75\n306530 29.75\n306531 29.75\n306532 29.75\n306533 29.75\nName: time, Length: 306534, dtype: float64 10
6d2db3aad94648259e539920fc2cf2a6--f19421c1d4aa40978ebb69ca19b0e20d--0 0.00\n1 0.00\n2 0.00\n3 0.00\n4 0.00\n ... \n306529 29.75\n306530 29.75\n306531 29.75\n306532 29.75\n306533 29.75\nName: time, Length: 306534, dtype: float64 10
.....
Just wondering what I've done wrong and is there other better ways to do this? Thanks.
Try using the following syntax instead?
transcript['time'].astype(str)

Multiply by a float in pandas -> numbers with comma disappearing [duplicate]

This question already has answers here:
TypeError: can't multiply sequence by non-int of type 'float' (python 2.7)
(1 answer)
Finding non-numeric rows in dataframe in pandas?
(7 answers)
Change column type in pandas
(16 answers)
Closed 4 years ago.
Im having issue applying a currency rate in pandas.
Some numbers are being converted as 'nan' whenever they contains a comma, eg: 1,789 will be considered as nan.
I started with that code :
import pandas as pd
usd_rate = 0.77
salary = pd.read_csv("salary.csv")
#create revenue clean (convert usd to gbp)
salary['revenue_gbp'] = salary.usd_revenue * usd_rate
So I was getting that error :
TypeError: can't multiply sequence by non-int of type 'float'
I've read you can't multiply the column by a float. So I converted my column to numeric :
salary.net_revenue = pd.to_numeric(salary.usd_revenue, errors='coerce')
salary['revenue_gbp'] = salary.usd_revenue * usd_rate
Now I don't have any errors, yet when I looked at my file , all of the number above 999.99 - so the ones containing a comma - are put under 'nan'...
I thought it could be translate issue .. but I'm getting confused here..
any ideas ?
Thanks a lot
usd_revenue is probably not already a numeric type. Try this:
salary['usd_revenue'] = salary['usd_revenue'].map(float)
before your actual line:
salary['revenue_gbp'] = salary.usd_revenue * usd_rate

Pandas - ValueError: Error parsing datetime string "17-Jan-23" at position 3 [duplicate]

This question already has an answer here:
Error parsing datetime string "09-11-2017 00:02:00" at position 8
(1 answer)
Closed 4 years ago.
I have the following code where I am reading date column:
data = pd.DataFrame(array, columns=names)
data[['D_DATE']] = data[['D_DATE']].astype('datetime64')
But this is giving me error:
ValueError: Error parsing datetime string "17-Jan-23" at position 3
Can someone help how can I resolve this.
Try this:
data['D_DATE'] = pd.to_datetime(data['D_DATE'])
Indexing a single column with double brackets (df[['D_DATE']]) returns a DataFrame with one column named 'D_DATE'. Indexing with a single set of brackets (df['D_DATE']) returns a Series named 'D_DATE'. To create a new column in a DataFrame using the form df[new_col], use single brackets.

Categories