I have the following data frame.
What I am trying to do is
Convert this object to a string and then to a numeric
I have looked at using the astype function (string) and then again to int. What I would like to get is the data type to be
df['a'] = df['a'].astype(string).astype(int).
I have tried other variations. What I have been trying to do is get the column values to become a number(obviously without the columns). It is just that the data type is an object initially.
Thanks so much!
You need to remove all the ,:
df['a'] = df['a'].str.replace(',', '').astype(int)
With both columns, you can do:
df[['a','b']] = df[['a','b']].replace(',', '', regex=True).astype('int')
Related
I have a dataframe:
df1 = pd.DataFrame({'GL': [2311000200.0, 2312000600.0, 2330800100.0]})
df1.dtypes is float so first I convert it to int64 to removes .0 digitals
df1.GL = df1.GL.astype('int64')
Then I try to convert it to str but instead I receive object dtype.
Does anyone know what can be the reason?
You can force it to use the string dtype by using:
>>> df1.GL.astype("string")
df1.GL
0 2311000200.0
1 2312000600.0
2 2330800100.0
Name: GL, dtype: string
However, object dtypes are fine for most string operations. As per the docs:
For backwards-compatibility, object dtype remains the default type we infer a list of strings to
The type object is actually string in pandas dataframe.
If you would like to retain the data as string, use df.to_excel() instead of df.to_csv. This is because when opening the CSV file, Excel will automatically convert the number data to numbers.
df1 = pd.DataFrame({'GL': [2311000200.0, 2312000600.0, 2330800100.0]})
df1.GL = df1.GL.astype('int64').astype('string')
df1.to_excel('test.xlsx', index=False)
How can I check if a column is a string, or another type (e.g. int or float), even though the dtype is object?
(Ideally I want this operation vectorised, and not applymap checking every row...)
import io
# American post code
df1_str = """id,postal
1,12345
2,90210
3,"""
df1 = pd.read_csv(io.StringIO(df1_str))
df1["postal"] = df1["postal"].astype("O") # is an object (of type float due to the null row 3)
# British post codes
df2_str = """id,postal
1,EC1
2,SE1
3,W2"""
df2 = pd.read_csv(io.StringIO(df2_str))
df2["postal"] = df2["postal"].astype("O") # is an object (of type string)
Both df1 and df2 return object when doing df["postal"].dtype
However, df2 has .str methods, e.g. df2["postal"].str.lower(), but df1 doesn't.
Similarly, df1 can have mathematical operations done to it, e.g. df1 * 2
This is different to other SO questions. who ask if there are strings inside a column (and not the WHOLE column). e.g:
Python: Check if dataframe column contain string type
Check if string is in a pandas dataframe
Check pandas dataframe column for string type
You can use pandas.api.types.infer_dtype:
>>> pd.api.types.infer_dtype(df2["postal"])
'string'
>>> pd.api.types.infer_dtype(df1["postal"])
'floating'
From the docs:
Efficiently infer the type of a passed val, or list-like array of values. Return a string describing the type.
I am using pandas data frames. The data contains 3032 columns. All the columns are 'object' datatype. How do I convert all the columns to 'float' datatype?
If need convert integers and floats columns use to_numeric with DataFrame.apply for apply for all columns:
df = df.apply(pd.to_numeric)
working same like:
df = df.apply(lambda x: pd.to_numeric(x))
If some columns contains strings (so converting failed) is possible add errors='coerce' for repalce it to missing values NaN:
df = df.apply(pd.to_numeric, errors='coerce')
If you are reading the df from a file you can do the same when reading using converters in case you want to apply a customized function, or using dtype to specify the data type you want.
Data is below
no,store_id,revenue,profit,state,country
'0','101','779183','281257','WD','India'
'1','101','144829','838451','WD','India'
'2','101','766465','757565','AL','Japan'
'3','102','766465','757565','AL','Japan'
Code is below
import pandas as pd
data = pd.read_csv("1.csv")
dummies = pd.get_dummies(data)
dummies.head(10)
data.info() is Object for all the column.
How to convert to automatically to new object column to dummies, For example here TEAM is object need to convert to get_dummies. If some one add tomorrow names column it is also need to convert to dummies
data.info() is object for all the column
How to convert automatically assign int to numeric column and object to non-numeric column
Tomorrow some one might add new column may be numeric or non-numeric
How to apply get_dummies after that
While reading CSV file using pd.read_csv set quotechar parameter to '(default is ")
From docs pd.read_csv under quotechar:
quotecharstr (length 1), optional:
The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
from io import StringIO
text = """no,store_id,revenue,profit,state,country
'0','101','779183','281257','WD','India'
'1','101','144829','838451','WD','India'
'2','101','766465','757565','AL','Japan'
'3','102','766465','757565','AL','Japan'"""
df = pd.read_csv(StringIO(text),quotechar='\'') # or quotechar = "'"
print(df.dtypes)
no int64
store_id int64
revenue int64
profit int64
state object
country object
dtype: object
#Ch3steR's solution is perfect.
Just to extend it, you can use converters in conjunction to efficiently handle conversions in case you would want to.
df = pd.read_csv(io.StringIO(text), converters={'no': D.Decimal, 'store_id': D.Decimal})
I'm trying to convert object to string in my dataframe using pandas.
Having following data:
particulars
NWCLG 545627 ASDASD KJKJKJ ASDASD
TGS/ASDWWR42045645010009 2897/SDFSDFGHGWEWER
dtype:object
while trying to convert particulars column from object to string using astype()[with str, |S, |S32, |S80] types, or directly using str functions it is not converting in string (remain object) and for str methods[replacing '/' with ' '] it says AttributeError: 'DataFrame' object has no attribute 'str'
using pandas 0.23.4
Also refereed: https://github.com/pandas-dev/pandas/issues/18796
Use astype('string') instead of astype(str) :
df['column'] = df['column'].astype('string')
You could read the excel specifying the dtype as str:
df = pd.read_excel("Excelfile.xlsx", dtype=str)
then use string replace in particulars column as below:
df['particulars'] = df[df['particulars'].str.replace('/','')]
Notice that the df assignment is also a dataframe in '[]' brackets.
When you're using the below command in your program, it returns a string which you're trying to assign to a dataframe column. Hence the error.
df['particulars'] = df['particulars'].str.replace('/',' ')