Pandas read CSV datetime column plotting - python

I’m using Pandas to import data from a csv. One of the columns is datetime. I'm using...
datetime = ['datetime']
df= pd.read_csv(File, parse_dates=datetime)
...to import the datetime column as "datetime64[ns]", which according to...
df.info()
...is in the correct format. However, when i plot against the "datetime" column it looks like this:
When it should be looking like this (this is how it looks when I'm working directly with the same data directly extracted from SQL).
I assume the datetime column is still not quite in the correct dtype (even though it states it is datetime64[ns]).
What should i do to ensure that the 1st plot looks like the 2nd?

Related

Interpolating data for missing values pandas python

enter image description here[enter image description here][2]I am having trouble interpolating my missing values. I am using the following code to interpolate
df=pd.read_csv(filename, delimiter=',')
#Interpolating the nan values
df.set_index(df['Date'],inplace=True)
df2=df.interpolate(method='time')
Water=(df2['Water'])
Oil=(df2['Oil'])
Gas=(df2['Gas'])
Whenever I run my code I get the following message: "time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex"
My Data consist of several columns with a header. The first column is named Date and all the rows look similar to this 12/31/2009. I am new to python and time series in general. Any tips will help.
Sample of CSV file
Try this, assuming the first column of your csv is the one with date strings:
df = pd.read_csv(filename, index_col=0, parse_dates=[0], infer_datetime_format=True)
df2 = df.interpolate(method='time', limit_direction='both')
It theoretically should 1) convert your first column into actual datetime objects, and 2) set the index of the dataframe to that datetime column, all in one step. You can optionally include the infer_datetime_format=True argument. If your datetime format is a standard format, it can help speed up parsing by quite a bit.
The limit_direction='both' should back fill any NaNs in the first row, but because you haven't provided a copy-paste-able sample of your data, I cannot confirm on my end.
Reading the documentation can be incredibly helpful and can usually answer questions faster than you'll get answers from Stack Overflow!

Pandas automatically converts string to date

I need to work with a csv file in which one column contains values like these: 1/2, 2/1, 3/1, etc.
When I load the csv into a pandas data frame object, automatically the values look like:01-Feb,02-Jan,03-Jan, etc.
How can I load this csv into a dataframe object in which the values of this columns are converted as strings?
I have tried this
df = pd.read_csv("/Users/Name/Desktop/QM/data.csv", encoding='latin-1',dtype=str)
But the dates remains
it sounds like there is format for that column some way...
anyway, you can just convert back to string following this Pandas Series.dt.strftime

Pandas to CSV column datatype [duplicate]

This question already has answers here:
datetime dtypes in pandas read_csv
(6 answers)
Closed 2 years ago.
I’m using Pandas and SQL Alchemy to import data from SQL. One of the SQL columns is datetime. I then covert the SQL data into a Pandas dataframe, the datetime column is “datetime64” – which is fine. I am able to use Matplotlib to plot any of my other columns against datetime.
I then covert my pandas dataframe to a csv using:
df.to_csv('filename')
This is to save me having to keep running a rather large sql query each time i log on. If i then try to read the csv back into python and work from that, the datetime column in now of datatype “object” rather than ”datetime64”. This means Matplotlib won't let me plot other columns against datetime because the datetime column is the wrong datatype.
How do I ensure that it stays as the correct datatype during the df to csv process?
EDIT:
The comments/solutions to my original post did work in getting the column to the correct dtype. However I now have a different problem. When i plot against the "datetime" column is looks like this:
When it should be looking like this (this is how it looks when I'm working directly with the SQL data).
I assume the datetime column is still not quite in the correct dtype (even though it states it is datetime64[ns].
CSV is a plain text format and does not specify the data type of any column. If you are using pandas to read the csv back into python, pd.read_csv() provides a few ways to specify that a column represents a date.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Try pd.read_csv(file.csv, parse_dates=[<colnum>]), where colnum is an integer index to your date column.
read_csv() provides additional options for parsing dates. Alternatively, you could use the 'dtypes' arg.
Unfortunately, you can not store datatype in CSV format.
One thing you can do if you are only reading the file in python is to use pickle
you can do that like:
import pickle
with open('filename.pkl', 'wb') as pickle_file:
pickle.dump(your_csv_file, pickle_file)
and you can load it using
with open('filename.pkl', 'rb') as pkl_file:
csv_file = pickle.load(pkl_file)

dtype does not provide enough data type precision. Alternatives?

I am trying to check the formats of columns in a number of excel files (.xlsx) to see if they match.
To do so, I am using the function dtype of pandas.
The problem is that it returns the same data type (datetime64[ns]) for two different date formats within 'Date'.
What are the alternatives of this function to have more precision?
#Import pandas
import pandas as pd
#Read MyFile and store in dataframe df1
df1=pd.read_excel(MyFile,sheetname=0,header=0,index_col=False,keep_default_na=False)
#Print the data type of the column MyColumnName
print(df1[str(MyColumnName)].dtype)
I would like to have more accuracy on the data type information to be able to flag differences between spreadsheets.

Pandas DatetimeIndex string format conversion from American to European

Ok I have read some data from a CSV file using:
df=pd.read_csv(path,index_col='Date',parse_dates=True,dayfirst=True)
The data are in European date convention format dd/mm/yyyy, that is why i am using dayfirst=True.
However, what i want to do is change the string format appearance of my dataframe index df from the American(yyyy/mm/dd) to the European format(dd/mm/yyyy) just to visually been consistent with how i am looking the dates.
I could't find any relevant argument in the pd.read_csv method.
In the output I want a dataframe in which simply the index will be a datetime index visually consistent with the European date format.
Could anyone propose a solution? It should be straightforward, since I guess there should be a pandas method to handle that, but i am currently stuck.
Try something like the following once it's loaded from the CSV. I don't believe it's possible to perform the conversion as part of the reading process.
import pandas as pd
df = pd.DataFrame({'date': pd.date_range(start='11/24/2016', periods=4)})
df['date_eu'] = df['date'].dt.strftime('%d/%m/%Y')

Categories