pandas read_csv not converting string to date - python

I've looked for help on this one and didn't find the answer (i'm sure i'm asking the wrong question)
I have a CSV file, it has dates in it, when i read it in, the date conversion doesn't happen.
import pandas
df = pd.read_csv('file', index_col='Sequence', parse_dates='Date')
CSV file
Sequence,Date,Unit,Name,Indexed,Arbitrated,Redo
1,2013-01-01,Aloha,first last,831,0,0
df.Date is a bunch of strings not datetime values

You need to pass the column to parse as a list, not a string:
df = pd.read_csv('file', index_col='Sequence', parse_dates=['Date'])
The docstring explanation for parse_dates says "list of ints or names", as in this way you can specify multiple columns to parse. But I have to agree that for one column it is a bit surprising.

Related

Interpolating data for missing values pandas python

enter image description here[enter image description here][2]I am having trouble interpolating my missing values. I am using the following code to interpolate
df=pd.read_csv(filename, delimiter=',')
#Interpolating the nan values
df.set_index(df['Date'],inplace=True)
df2=df.interpolate(method='time')
Water=(df2['Water'])
Oil=(df2['Oil'])
Gas=(df2['Gas'])
Whenever I run my code I get the following message: "time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex"
My Data consist of several columns with a header. The first column is named Date and all the rows look similar to this 12/31/2009. I am new to python and time series in general. Any tips will help.
Sample of CSV file
Try this, assuming the first column of your csv is the one with date strings:
df = pd.read_csv(filename, index_col=0, parse_dates=[0], infer_datetime_format=True)
df2 = df.interpolate(method='time', limit_direction='both')
It theoretically should 1) convert your first column into actual datetime objects, and 2) set the index of the dataframe to that datetime column, all in one step. You can optionally include the infer_datetime_format=True argument. If your datetime format is a standard format, it can help speed up parsing by quite a bit.
The limit_direction='both' should back fill any NaNs in the first row, but because you haven't provided a copy-paste-able sample of your data, I cannot confirm on my end.
Reading the documentation can be incredibly helpful and can usually answer questions faster than you'll get answers from Stack Overflow!

Pandas DatetimeIndex string format conversion from American to European

Ok I have read some data from a CSV file using:
df=pd.read_csv(path,index_col='Date',parse_dates=True,dayfirst=True)
The data are in European date convention format dd/mm/yyyy, that is why i am using dayfirst=True.
However, what i want to do is change the string format appearance of my dataframe index df from the American(yyyy/mm/dd) to the European format(dd/mm/yyyy) just to visually been consistent with how i am looking the dates.
I could't find any relevant argument in the pd.read_csv method.
In the output I want a dataframe in which simply the index will be a datetime index visually consistent with the European date format.
Could anyone propose a solution? It should be straightforward, since I guess there should be a pandas method to handle that, but i am currently stuck.
Try something like the following once it's loaded from the CSV. I don't believe it's possible to perform the conversion as part of the reading process.
import pandas as pd
df = pd.DataFrame({'date': pd.date_range(start='11/24/2016', periods=4)})
df['date_eu'] = df['date'].dt.strftime('%d/%m/%Y')

Python Pandas: Overwriting an Index with a list of datetime objects

I have an input CSV with timestamps in the header like this (the number of timestamps forming columns is several thousand):
header1;header2;header3;header4;header5;2013-12-30CET00:00:00;2013-12-30CET00:01:00;...;2014-00-01CET00:00:00
In Pandas 0.12 I was able to do this, to convert string timestamps into datetime objects. The following code strips out the 'CEST' in the timestamp string (translate()), reads it in as a datetime (strptime()) and then localizes it to the correct timezone (localize()) [The reason for this approach was because, with the versions I had at least, CEST wasn't being recognised as a timezone].
DF = pd.read_csv('some_csv.csv',sep=';')
transtable = string.maketrans(string.uppercase,' '*len(string.uppercase))
tz = pytz.country_timezones('nl')[0]
timestamps = DF.columns[5:]
timestamps = map(lambda x:x.translate(transtable), timestamps)
timestamps = map(lambda x:datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'), timestamps)
timestamps = map(lambda x: pytz.timezone(tz).localize(x), timestamps)
DF.columns[5:] = timestamps
However, my downstream code required that I run off of pandas 0.16
While running on 0.16, I get this error with the above code at the last line of the above snippet:
*** TypeError: Indexes does not support mutable operations
I'm looking for a way to overwrite my index with the datetime object. Using the method to_datetime() doesn't work for me, returning:
*** ValueError: Unknown string format
I have some subsequent code that copies, then drops, the first few columns of data in this dataframe (all the 'header1; header2, header3'leaving just the timestamps. The purpose being to then transpose, and index by the timestamp.
So, my question:
Either:
how can I overwrite a series of column names with a datetime, such that I can pass in a pre-arranged set of timestamps that pandas will be able to recognise as a timestamp in subsequent code (in pandas v0.16)
Or:
Any other suggestions that achieve the same effect.
I've explored set_index(), replace(), to_datetime() and reindex() and possibly some others but non seem to be able to achieve this overwrite. Hopefully this is simple to do, and I'm just missing something.
TIA
I ended up solving this by the following:
The issue was that I had several thousand column headers with timestamps, that I couldn't directly parse into datetime objects.
So, in order to get these timestamp objects incorporated I added a new column called 'Time', and then included the datetime objects in there, then setting the index to the new column (I'm omitting code where I purged the rows of other header data, through drop() methods:
DF = DF.transpose()
DF['Time'] = timestamps
DF = DF.set_index('Time')
Summary: If you have a CSV with a set of timestamps in your headers that you cannot parse; a way around this is to parse them separately, include in a new column of Time with the correct datetime objects, then set_index() based on the new column.

Pandas read_csv silently converting and messing up dates and strings?

I am reading a csv file that has two adjacent columns containing dates like this:
29/11/2004 00:00,29/11/2005 00:00,2,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
When I read this using read_csv and then write it back to csv using the to_csv method, it gets converted to
29/11/2004 00:00,00:00.0,2.0,,,,,,,,
I have got two questions about this: Why does it read the first date okay but thinks the second, which seems to have exactly the same format, is 0? And why do the NULLs get converted to empty strings?
Here is the code I am using:
df = pandas.read_csv(filepath, sep = ",")
df.to_csv("C:\\tmp\\test.csv")
Not sure the reason for the missing date. I think it's influenced by other rows.
For the NULL string problem, keep_default_na can help you to avoid that:
df = pd.read_csv('test.csv', sep=',', keep_default_na=False)

How do I tell pandas to parse a particular column as a datetime object, but not make it an index?

I have a csv file where one of the columns is a date/time string. How do I parse it correctly with pandas? I don't want to make that column the index. Thanks!
Uri
Pass dateutil.parser.parse (or another datetime conversion function) in the converters argument to read_csv

Categories