I am using Pandas in Python 3,
I have a dataframe whose index is like '20160727', but the datatype is 'object'.
I am trying to convert it into string type.
I tried:
data.index.astye(str, copy=False)
and data.index = data.index.map(str)
But even after these two operations,
I get:
data.index.dtype is dtype('O')
I want to use sort after converting the index to string. How can I convert the index to string datatype so that I can process it like a string?
In pandas, object is type string.
dtype('O') means it is a python type object.
You can see more info about this here
As an example of what you want to achieve:
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=[20160103, 20160102, 20160104, 20160101])
df.index = pd.to_datetime(df.index, format='%Y%m%d')
I have the following data frame.
What I am trying to do is
Convert this object to a string and then to a numeric
I have looked at using the astype function (string) and then again to int. What I would like to get is the data type to be
df['a'] = df['a'].astype(string).astype(int).
I have tried other variations. What I have been trying to do is get the column values to become a number(obviously without the columns). It is just that the data type is an object initially.
Thanks so much!
You need to remove all the ,:
df['a'] = df['a'].str.replace(',', '').astype(int)
With both columns, you can do:
df[['a','b']] = df[['a','b']].replace(',', '', regex=True).astype('int')
I am using pandas data frames. The data contains 3032 columns. All the columns are 'object' datatype. How do I convert all the columns to 'float' datatype?
If need convert integers and floats columns use to_numeric with DataFrame.apply for apply for all columns:
df = df.apply(pd.to_numeric)
working same like:
df = df.apply(lambda x: pd.to_numeric(x))
If some columns contains strings (so converting failed) is possible add errors='coerce' for repalce it to missing values NaN:
df = df.apply(pd.to_numeric, errors='coerce')
If you are reading the df from a file you can do the same when reading using converters in case you want to apply a customized function, or using dtype to specify the data type you want.
I have a simple pandas dataframe with a column:
col = [['A']]
data = [[1.0],[2.3],[3.4]]
df = pd.DataFrame.from_records(data, columns=col)
This creates a dataframe with one column of type np.float64, which is what I want.
Later in the process, I want to add another column of type string.
The dtype of this column is coming though as dtype of object, but I need it to be type string. So I do the following:
df['SOMETEXT'] = df['SOMETEXT'].astype(str)
If I look at the dtype again, I get the same dtype for that column: object.
I have a process down my workflow that is dtype sensitive and I need the column to be a string.
Any ideas?
array = df.to_records(index=False) # convert to numpy array
The dtypes on the array still carry the object dtype, but the columns should be a string.
In pandas, all strings are object type. It confused me too when I first started.
Once in NumPy, you can cast the string:
In [24]: array['SOMETEXT'].astype(str)
I have data in a python dictionary like:
data = {u'01-01-2017 22:34:43:871': [u'88.49197', u'valid'],
u'01-01-2017 11:23:43:803': [u'88.49486', u'valid'],
u'02-01-2017 03:11:43:898': [u'88.49773', u'valid'],
u'01-01-2017 13:54:43:819': [u'88.50205', u'valid']}
I can convert it to a pandas Dataframe with:
data = pandas.DataFrame.from_dict(data, orient='index')
but I am not able to use the dtype parameter of from_dict. I would convert the index to a datetime of similar first column to float and third to string.
I have tried:
pandas.DataFrame.from_dict((data.values()[0]), orient='index',
dtype={0: 'float', 1:'str'})
but it doesn't work.
This appears to be an ongoing issue with some of the pandas constructor methods: How to set dtypes by column in pandas DataFrame
Instead of using the dtype argument, chaining .astype may do the trick:
pandas.DataFrame.from_dict(data, orient='index').astype({0: float, 1:str})
I want to convert existing Python list into Pandas DataFrame object. How to specify data format for each column and define index column?
Here is sample of my code:
import pandas as pd
data = [[1444990457000286208, 0, 286],
[1435233159000067840, 0, 68],
[1431544002000055040, 1, 55]]
df = pd.DataFrame(data, columns=['time', 'value1', 'value2'])
In above example I need to have the following types for existing columns:
time: datetime64[ns]
value1: bool
value2: int
Additionally time column should be used as index column.
By default all three columns are int64 and I can't find how to specify column types during DataFrame object create.
value2 is already of the correct dtype.
For time you can convert to datetimes with to_datetime and then set the index with set_index.
For value1 you can cast to bool with astype.
df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')
df['value1'] = df['value1'].astype(bool)
You can use the dtype keyword in the pd.DataFrame object constructor. Docs. Please see #alex answer.
To use a specific column as index you can use the set_index method of the dataframe instance.