I have data in a python dictionary like:
data = {u'01-01-2017 22:34:43:871': [u'88.49197', u'valid'],
u'01-01-2017 11:23:43:803': [u'88.49486', u'valid'],
u'02-01-2017 03:11:43:898': [u'88.49773', u'valid'],
u'01-01-2017 13:54:43:819': [u'88.50205', u'valid']}
I can convert it to a pandas Dataframe with:
data = pandas.DataFrame.from_dict(data, orient='index')
but I am not able to use the dtype parameter of from_dict. I would convert the index to a datetime of similar first column to float and third to string.
I have tried:
pandas.DataFrame.from_dict((data.values()[0]), orient='index',
dtype={0: 'float', 1:'str'})
but it doesn't work.
This appears to be an ongoing issue with some of the pandas constructor methods: How to set dtypes by column in pandas DataFrame
Instead of using the dtype argument, chaining .astype may do the trick:
pandas.DataFrame.from_dict(data, orient='index').astype({0: float, 1:str})
Related
I have the following data frame.
What I am trying to do is
Convert this object to a string and then to a numeric
I have looked at using the astype function (string) and then again to int. What I would like to get is the data type to be
df['a'] = df['a'].astype(string).astype(int).
I have tried other variations. What I have been trying to do is get the column values to become a number(obviously without the columns). It is just that the data type is an object initially.
Thanks so much!
You need to remove all the ,:
df['a'] = df['a'].str.replace(',', '').astype(int)
With both columns, you can do:
df[['a','b']] = df[['a','b']].replace(',', '', regex=True).astype('int')
I am using pandas data frames. The data contains 3032 columns. All the columns are 'object' datatype. How do I convert all the columns to 'float' datatype?
If need convert integers and floats columns use to_numeric with DataFrame.apply for apply for all columns:
df = df.apply(pd.to_numeric)
working same like:
df = df.apply(lambda x: pd.to_numeric(x))
If some columns contains strings (so converting failed) is possible add errors='coerce' for repalce it to missing values NaN:
df = df.apply(pd.to_numeric, errors='coerce')
If you are reading the df from a file you can do the same when reading using converters in case you want to apply a customized function, or using dtype to specify the data type you want.
I am using Pandas in Python 3,
I have a dataframe whose index is like '20160727', but the datatype is 'object'.
I am trying to convert it into string type.
I tried:
data.index.astye(str, copy=False)
and data.index = data.index.map(str)
But even after these two operations,
I get:
data.index.dtype is dtype('O')
I want to use sort after converting the index to string. How can I convert the index to string datatype so that I can process it like a string?
In pandas, object is type string.
dtype('O') means it is a python type object.
You can see more info about this here
As an example of what you want to achieve:
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=[20160103, 20160102, 20160104, 20160101])
df.index = pd.to_datetime(df.index, format='%Y%m%d')
df.sort_index()
Basically I am trying to delete rows of a pandas dataframe where values in a certain column are not instances of datetime. I have tried:
df = df[df[‘date’] == isinstance(datetime)]
I know isinstance takes two arguments (I am missing the value to be checked) but I’m not sure what to put there.
For efficiency, you should convert your series to datetime and then mask for non-null values:
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df = df[df['date'].notnull()]
As the docs says, .isinstance takes object as the first argument and classinfo as the second argument.
The correct way is as follows:
import datetime
df.loc[df['date'].apply(lambda x: isinstance(x, datetime.datetime))]
With small change to the similar answer, you can try following:
df = df[df['date'].apply(isinstance, args=(datetime,))]
I want to convert existing Python list into Pandas DataFrame object. How to specify data format for each column and define index column?
Here is sample of my code:
import pandas as pd
data = [[1444990457000286208, 0, 286],
[1435233159000067840, 0, 68],
[1431544002000055040, 1, 55]]
df = pd.DataFrame(data, columns=['time', 'value1', 'value2'])
In above example I need to have the following types for existing columns:
time: datetime64[ns]
value1: bool
value2: int
Additionally time column should be used as index column.
By default all three columns are int64 and I can't find how to specify column types during DataFrame object create.
Thanks!
value2 is already of the correct dtype.
For time you can convert to datetimes with to_datetime and then set the index with set_index.
For value1 you can cast to bool with astype.
df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')
df['value1'] = df['value1'].astype(bool)
You can use the dtype keyword in the pd.DataFrame object constructor. Docs. Please see #alex answer.
To use a specific column as index you can use the set_index method of the dataframe instance.