Pandas DatetimeIndex string format conversion from American to European - python

Ok I have read some data from a CSV file using:
df=pd.read_csv(path,index_col='Date',parse_dates=True,dayfirst=True)
The data are in European date convention format dd/mm/yyyy, that is why i am using dayfirst=True.
However, what i want to do is change the string format appearance of my dataframe index df from the American(yyyy/mm/dd) to the European format(dd/mm/yyyy) just to visually been consistent with how i am looking the dates.
I could't find any relevant argument in the pd.read_csv method.
In the output I want a dataframe in which simply the index will be a datetime index visually consistent with the European date format.
Could anyone propose a solution? It should be straightforward, since I guess there should be a pandas method to handle that, but i am currently stuck.

Try something like the following once it's loaded from the CSV. I don't believe it's possible to perform the conversion as part of the reading process.
import pandas as pd
df = pd.DataFrame({'date': pd.date_range(start='11/24/2016', periods=4)})
df['date_eu'] = df['date'].dt.strftime('%d/%m/%Y')

Related

Interpolating data for missing values pandas python

enter image description here[enter image description here][2]I am having trouble interpolating my missing values. I am using the following code to interpolate
df=pd.read_csv(filename, delimiter=',')
#Interpolating the nan values
df.set_index(df['Date'],inplace=True)
df2=df.interpolate(method='time')
Water=(df2['Water'])
Oil=(df2['Oil'])
Gas=(df2['Gas'])
Whenever I run my code I get the following message: "time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex"
My Data consist of several columns with a header. The first column is named Date and all the rows look similar to this 12/31/2009. I am new to python and time series in general. Any tips will help.
Sample of CSV file
Try this, assuming the first column of your csv is the one with date strings:
df = pd.read_csv(filename, index_col=0, parse_dates=[0], infer_datetime_format=True)
df2 = df.interpolate(method='time', limit_direction='both')
It theoretically should 1) convert your first column into actual datetime objects, and 2) set the index of the dataframe to that datetime column, all in one step. You can optionally include the infer_datetime_format=True argument. If your datetime format is a standard format, it can help speed up parsing by quite a bit.
The limit_direction='both' should back fill any NaNs in the first row, but because you haven't provided a copy-paste-able sample of your data, I cannot confirm on my end.
Reading the documentation can be incredibly helpful and can usually answer questions faster than you'll get answers from Stack Overflow!

Pandas to CSV column datatype [duplicate]

This question already has answers here:
datetime dtypes in pandas read_csv
(6 answers)
Closed 2 years ago.
I’m using Pandas and SQL Alchemy to import data from SQL. One of the SQL columns is datetime. I then covert the SQL data into a Pandas dataframe, the datetime column is “datetime64” – which is fine. I am able to use Matplotlib to plot any of my other columns against datetime.
I then covert my pandas dataframe to a csv using:
df.to_csv('filename')
This is to save me having to keep running a rather large sql query each time i log on. If i then try to read the csv back into python and work from that, the datetime column in now of datatype “object” rather than ”datetime64”. This means Matplotlib won't let me plot other columns against datetime because the datetime column is the wrong datatype.
How do I ensure that it stays as the correct datatype during the df to csv process?
EDIT:
The comments/solutions to my original post did work in getting the column to the correct dtype. However I now have a different problem. When i plot against the "datetime" column is looks like this:
When it should be looking like this (this is how it looks when I'm working directly with the SQL data).
I assume the datetime column is still not quite in the correct dtype (even though it states it is datetime64[ns].
CSV is a plain text format and does not specify the data type of any column. If you are using pandas to read the csv back into python, pd.read_csv() provides a few ways to specify that a column represents a date.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Try pd.read_csv(file.csv, parse_dates=[<colnum>]), where colnum is an integer index to your date column.
read_csv() provides additional options for parsing dates. Alternatively, you could use the 'dtypes' arg.
Unfortunately, you can not store datatype in CSV format.
One thing you can do if you are only reading the file in python is to use pickle
you can do that like:
import pickle
with open('filename.pkl', 'wb') as pickle_file:
pickle.dump(your_csv_file, pickle_file)
and you can load it using
with open('filename.pkl', 'rb') as pkl_file:
csv_file = pickle.load(pkl_file)

dtype does not provide enough data type precision. Alternatives?

I am trying to check the formats of columns in a number of excel files (.xlsx) to see if they match.
To do so, I am using the function dtype of pandas.
The problem is that it returns the same data type (datetime64[ns]) for two different date formats within 'Date'.
What are the alternatives of this function to have more precision?
#Import pandas
import pandas as pd
#Read MyFile and store in dataframe df1
df1=pd.read_excel(MyFile,sheetname=0,header=0,index_col=False,keep_default_na=False)
#Print the data type of the column MyColumnName
print(df1[str(MyColumnName)].dtype)
I would like to have more accuracy on the data type information to be able to flag differences between spreadsheets.

pandas.read_csv() can apply different date formats within the same column! Is it a known bug? How to fix it?

I have realised that, unless the format of a date column is declared explicitly or semi-explicitly (with dayfirst), pandas can apply different date formats to the same column, when reading a csv file! One row could be dd/mm/yyyy and another row in the same column mm/dd/yyyy!
Insane doesn't even come close to describing it! Is it a known bug?
To demonstrate: the script below creates a very simple table with the dates from January 1st to the 31st, in the dd/mm/yyyy format, saves it to a csv file, then reads back the csv.
I then use pandas.DatetimeIndex to extract the day.
Well, the day is 1 for the first 12 days (when month and day were both < 13), and 13 14 etc afterwards. How on earth is this possible?
The only way I have found to fix this is to declare the date format, either explicitly or just with dayfirst=True. But it's a pain because it means I must declare the date format even when I import csv with the best-formatted dates ever! Is there a simpler way?
This happens to me with pandas 0.23.4 and Python 3.7.1 on Windows 10
import numpy as np
import pandas as pd
df=pd.DataFrame()
df['day'] =np.arange(1,32)
df['day']=df['day'].apply(lambda x: "{:0>2d}".format(x) )
df['month']='01'
df['year']='2018'
df['date']=df['day']+'/'+df['month']+'/'+df['year']
df.to_csv('mydates.csv', index=False)
#same results whether you use parse_dates or not
imp = pd.read_csv('mydates.csv',parse_dates=['date'])
imp['day extracted']=pd.DatetimeIndex(imp['date']).day
print(imp['day extracted'])
By default it assumes the American dateformat, and indeed switches mid-column without throwing an Error, if that fails. And though it breaks the Zen of Python by letting this Error pass silently, "Explicit is better than implicit". So if you know your data has an international format, you can use dayfirst
imp = pd.read_csv('mydates.csv', parse_dates=['date'], dayfirst=True)
With files you produce, be unambiguous by using an ISO 8601 format with a timezone designator.

Changing many variables to datetime in Pandas - Python

I have a dataset with around 1 million rows and I'd like to convert 12 columns to datetime. Currently they are "object" type. I previously read that I could do this with:
data.iloc[:,7:19] = data.iloc[:,7:19].apply(pd.to_datetime, errors='coerce')
This does work, but the performance is extremely poor. Someone else mentioned performance could be sped up by doing:
def lookup(s):
"""
This is an extremely fast approach to datetime parsing.
For large data, the same dates are often repeated. Rather than
re-parse these, we store all unique dates, parse them, and
use a lookup to convert all dates.
"""
dates = {date:pd.to_datetime(date) for date in s.unique()}
return s.apply(lambda v: dates[v])
However, I'm not sure how to apply this code to my data (I'm a beginner). Does anyone know how to speed up changing many columns to datetime using this code or any other method? Thanks!
If all your dates have the same format you can define a dateparse function, then pass it as an argument when you import. Furst you import datetime, then use datetime.strf (#define your format here).
Once that function is defined, in pandas you set the parse dates option to True, then you have an option to call a date parser. you would put date parser=yourfunction.
I would look up the pandas api to get specific syntax

Categories