Parsing dates in Python using Pandas - python

So my question is when I run this code for first time and it was giving me the results correctly i.e. in the format of 2013-01-23.
But when i tried running this code next time I was not getting the correct result (giving the output as 23/01/2013).
Why is it different the second time?
from pandas import *
fec1 = read_csv("/user_home/w_andalib_dvpy/sample_data/sample.csv")
def convert_date(val):
d, m, y = val.split('/')
return datetime(int(y),int(m),int(d))
# FECHA is the date column name in raw file. format: 23/01/2013
fec1.FECHA.map(convert_date)
fec1.FECHA

Parsing dates with pandas can be done at the time you read the csv by passing parse_dates=['yourdatecolumn'] and date_parser=convert_date to the pandas.read_csv method.
Doing it this way is a much faster operation than loading the data, then parsing the dates.
The reason you get different outputs when you do the same operation twice is probably due to that when you parse the dates, you take D/M/Y as input, but have Y/M/D as output. it basically flips the D and Y every time.

Related

Problem with recursion in Python: Kernel Crash

I am trying to read data from a list, but there are some inconsistencies. So I want to store only the data which shows periodicity, assuming all the inconsistencies have the same format.
We expect each 12 items a datetime object, but for some days there are less data, and I am not interested in those dates (for simplicity sake). When a date has missing data I think it only has 6 elements instead of 11. Days and all data are items of a list. So I'm trying to store the index of the dates which don't follow the described pattern (for the next date, the element we are seeing shouldn't be a date)
I'm trying to do this using recursion, but every time I run the function I have created, the kernel restarts.
I cannot link the data of clean_values because AEMET opendata eliminates the data requested after like five minutes
import datetime as dt
tbe=[]
def recursive(x, clean_values):
if x<len(clean_values) and x>=0:
for i in range(0,len(clean_values),12):
if type(clean_values[i]) == dt.datetime: #If it is not a datetime
pass
else:
tbe.append(i-12) #We store the date before (that should be the one with the problem)
break
recursive(i-6, clean_values) # And restore the function but using the position in which we think the date is
else:
return
recursive(0, clean_values)
Sorry I cannot provide more information

pandas.to_datetime() does not filter when used with loc[] and comparison operator

I downloaded a .csv file to do some practice, a column named "year_month" is string with the format "YYYY-MM"
By doing:
df = pd.read_csv('C:/..../migration_flows.csv',parse_dates=["year_month"])
"year_month" is Dtype=object. So far so good.
By doing:
df["year_month"] = pd.to_datetime(df["year_month"],format='%Y-%m-%d')
it is converted to daterime64[ns]. So far so good.
I try to filter certain dates by doing:
filtered_df = df.loc[(df["year_month"]>= pd.Timestamp(2018-1-1))]
The program returns the whole column as if nothing happened. For instance, it starts displaying, starting from the date "2001-01-01"
Any thoughts on how to filter properly? Many thanks
how about this
df.loc[(df["year_month"]>= pd.to_datetime('2018-01-01'))]
or
df.loc[(df["year_month"]>= pd.Timestamp('2018-01-01'))]

Get date format code from a string/datetime using python

is there a way to find out in Python the date format code of a string?
My Input would be e.g.:
2020-09-11T17:42:33.040Z
What I am looking for is in this example to get this:
'%Y-%m-%dT%H:%M:%S.%fZ'
Point is that I have diffrent time Formats for diffrent Files, therefore I don't know in Advancce how my datetime code format will look like.
For processing my data, I need unix time format, but to calculate that I need a solution to this problem.
data["time_unix"] = data.time.apply(lambda row: (datetime.datetime.strptime(row, '%Y-%m-%dT%H:%M:%S.%fZ').timestamp()*100))
Thank you for the support!

the time difference in python

i have data set content many columns and date time ['%y/%m/%d %H:%M:%S']
input
I'm try to find the difference between date time for all rows.
I'm try by using this code
df['difference_time'] = (df['timezone']-df['timezone'].shift()).fillna(0)
and the output
but the output not right I'm not sure where is the problem in my code
output

Preventing csvkit from modifying dates/times?

I'm just trying out csvkit for converting Excel to csv. However, it's not taking into account formatting on dates and times, and producing different results from Excel's own save-as-csv. For example, this is a row of a spreadsheet:
And this what Excel's save-as produces:
22/04/1959,Bar,F,01:32.00,01:23.00,00:59.00,00:47.23
The date has no special formatting, and the time is formatted as [mm].ss.00. However, this is in2csv's version of the csv:
1959-04-22,Bar,F,0.00106481481481,0.000960648148148,0.00068287037037,0.000546643518519
which is of course of no use at all. Any ideas? There don't seem to be any command-line options for this - no-inference doesn't help. Thanks.
EDIT
Both csvkit ans xlrd do seem to take into account formatting, but they're not smart about it. A date of 21/02/1066 is passed though as the text string '21/02/1066' in both cases, but a date '22/04/1959' is turned into '21662.0' by xlrd, and 1959-04-22 by csvkit. Both of them just give up on small elapsed times and pass through the float representation. This is Ok if you know that the cell should contain an elapsed time, because you can just multiply by 24*60*60 to get the right answer.
I don't think xlrd would be much help here since its date tuple functions only handle seconds, and not centiseconds.
EDIT 2
Found out something interesting. I started with a base spreadsheet containing times. In one of them I formatted the times as [m:]ss.00, and in the other I formatted them as [mm:]ss.00. I then saved each as a .xls and a .xlsx, giving a total of 4 spreadsheets. Excel could convert all 4 to csv, and all the time text in the csv's appeared as originally written (ie. 0:21.0, for example, for 0m 21.0s).
in2csv can't handle the two .xls versions at all; this time appears as 00:00:21. It also can't handle the [m:]ss.00 version of the .xlsx - conversion gives the catch-all 'index out of range' error. The only one of the 4 spreadsheets that in2csv can handle is the .xlsx one, with [mm:]ss.00 formatting.
The optional -I argument should be working to avoid this issue. When testing your sample data, I get what Excel's save-as produces.
Command:
in2csv sample.csv -I > sample-output-i.csv
Output:
22/04/1959,Bar,F,01:32.00,01:23.00,00:59.00,00:47.23
-I, --no-inference Disable type inference when parsing CSV input.
https://csvkit.readthedocs.io/en/latest/scripts/in2csv.html

Categories