Pandas Read_Excel Datetime Converter - python

Using Python 3.6 and Pandas 0.19.2: How do you read in an excel file and change a column to datetime straight from read_excel? Similar to This Question about converters and dtypes. But I want to read in a certain column as datetime
I want to change this:
import pandas as pd
import datetime
import numpy as np
file = 'PATH_HERE'
df1 = pd.read_excel(file)
df1['COLUMN'] = pd.to_datetime(df1['COLUMN']) # <--- Line to get rid of
into something like:
df1 = pd.read_excel(file, dtypes= {'COLUMN': datetime})
The code does not error, but in my example, COLUMN is still a dtype of int64 after calling print(df1['COLUMN'].dtype)
I have tried using np.datetime64 instead of datetime. I have also tried using converters= instead of dtypes= but to no avail. This may be nit picky, but would be a nice feature to implement in my code.

Typically reading excel sheets will use the dtypes defined in the excel sheets but you cannot specify the dtypes like in read_csv for example. You can provide a converters arg for which you can pass a dict of the column and func to call to convert the column:
df1 = pd.read_excel(file, converters= {'COLUMN': pd.to_datetime})

Another way to read in an excel file and change a column to datetime straight from read_excel is as follows;
import pandas as pd
file = 'PATH_HERE'
df1 = pd.read_excel(file, parse_dates=['COLUMN'])
For reference, I am using python 3.8.3

read_excel supports dtype, just as read_csv, as of this writing:
import datetime
import pandas as pd
xlsx = pd.ExcelFile('path...')
df = pd.read_excel(xlsx, dtype={'column_name': datetime.datetime})
https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

Related

Pandas Dataframe is not pretty displayed

Well I am trying to create a dataframe in pandas and print it reading a csv file, however it is not pretty displayed
This is the code:
import pandas as pd
df = pd.read_csv("weather.csv")
print(df)
And this is my output:
What can I do?
A sample of weather.csv would help but I believe that this will solve the issue:
import pandas as pd
df = pd.read_csv("weather.csv", sep=';')
print(df)
Next time try to provide your data in text. You need to change separator, default is ','. So try this:
df = pd.read_csv('weather.csv', sep=';')

Deleting an unnamed column from a csv file Pandas Python

I ma trying to write a code that deletes the unnamed column , that comes right before Unix Timestamp. After deleting I will save the modified dataframe into data.csv. How would I be able to get the Expected Output below?
import pandas ads pd
data = pd.read_csv('data.csv')
data.drop('')
data.to_csv('data.csv')
data.csv file
,Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
0,1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1,1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
2,1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
Updated csv (Expected Output):
Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
This is the index. Use index=False in to_csv.
data.to_csv('data.csv', index=False)
Set the first column as index df = pd.read_csv('data.csv', index_col=0) and set index=False when writing the results.
you can follow below code.it will take column from 1st position and then you can save that df to csv without index values.
df = df.iloc[:,1:]
df.to_csv("data.csv",index=False)

How to export cleaned data from a jupyter notebook, not the original data

I have just started to learn to use Jupyter notebook. I have a data file called 'Diseases'.
Opening data file
import pandas as pd
df = pd.read_csv('Diseases.csv')
Choosing data from a column named 'DIABETES', i.e choosing subject IDs that have diabetes, yes is 1 and no is 0.
df[df.DIABETES >1]
Now I want to export this cleaned data (that has fewer rows)
df.to_csv('diabetes-filtered.csv')
This exports the original data file, not the filtered df with fewer rows.
I saw in another question that the inplace argument needs to be used. But I don't know how.
You forget assign back filtered DataFrame, here to df1:
import pandas as pd
df = pd.read_csv('Diseases.csv')
df1 = df[df.DIABETES >1]
df1.to_csv('diabetes-filtered.csv')
Or you can chain filtering and exporting to file:
import pandas as pd
df = pd.read_csv('Diseases.csv')
df[df.DIABETES >1].to_csv('diabetes-filtered.csv')

index_col parameter in datatable fread function

I want to accomplish the same result as:
import pandas as pd
pd.read_csv("data.csv", index_col=0)
But in a faster way, using the datatable library in python and convert it to a pandas dataframe. This is what I am currently doing:
import datatable as dt
datatable = dt.fread("data.csv")
dataframe = datatable.to_pandas().set_index('C0')
Is there any faster way to do it?
I would like to have a parameter that allows me to use a column as the row labels of the DataTable: Like the index_col=0 in pandas.read_csv(). Also, why does datatable.fread create a 'CO' column?

pandas to_csv: suppress scientific notion for data frame with mixed types

I have a Pandas data frame with multiple columns whose types are either float64 or strings. I'm trying to use to_csv to write the data frame to an output file. However, it outputs big numbers with scientific notion. For example, if the number is 1344154454156.992676, it's saved in the file as 1.344154e+12.
How to suppress scientific notion for to_csv and keep the numbers as they are in the output file? I have tried to use float_format parameter in the to_csv function but it broke since there are also columns with strings in the data frame.
Here are some example codes:
import pandas as pd
import numpy as np
import os
df = pd.DataFrame({'names': ['a','b','c'],
'values': np.random.rand(3)*100000000000000})
df.to_csv('example.csv')
os.system("cat example.csv")
,names,values
0,a,9.41843213808e+13
1,b,2.23837359193e+13
2,c,9.91801198906e+13
# if i set up float_format:
df.to_csv('example.csv', float_format='{:f}'.format)
ValueError: Unknown format code 'f' for object of type 'str'
How can I get the data saved in the csv without scientific notion like below?
names values
0 a 94184321380806.796875
1 b 22383735919307.046875
2 c 99180119890642.859375
The float_format argument should be a str, use this instead
df.to_csv('example.csv', float_format='%f')
try setting the options like this
pd.set_option('float_format', '{:f}'.format)
For python 3.xx (tested on 3.6.5 and 3.7):
Options and Settings
For visualization of the dataframe pandas.set_option
import pandas as pd #import pandas package
# for visualisation fo the float data once we read the float data:
pd.set_option('display.html.table_schema', True) # to can see the dataframe/table as a html
pd.set_option('display.precision', 5) # setting up the precision point so can see the data how looks, here is 5
df = pd.DataFrame({'names': ['a','b','c'],
'values': np.random.rand(3)*100000000000000}) # generate random dataframe
Output of the data:
df.dtypes # check datatype for columns
[output]:
names object
values float64
dtype: object
Dataframe:
df # output of the dataframe
[output]:
names values
0 a 6.56726e+13
1 b 1.63821e+13
2 c 7.63814e+13
And now write to_csv using the float_format='%.13f' parameter
df.to_csv('estc.csv',sep=',', float_format='%.13f') # write with precision .13
file output:
,names,values
0,a,65672589530749.0703125000000
1,b,16382088158236.9062500000000
2,c,76381375369817.2968750000000
And now write to_csv using the float_format='%f' parameter
df.to_csv('estc.csv',sep=',', float_format='%f') # this will remove the extra zeros after the '.'
file output:
,names,values
0,a,65672589530749.070312
1,b,16382088158236.906250
2,c,76381375369817.296875
For more details check pandas.DataFrame.to_csv

Categories