Python - Read file, rearrange date and change year from yy to yyyy

Python - Read file, rearrange date and change year from yy to yyyy - python

I'm a Python newb. I've been learning a bit and am on files. So I have a program that I'm trying to write where it reads data from a file. Each line in the file is a date in the format dd-mmm-yy for example, 13-SEP-20. For simplicity, each year is assumed to be in the year 2000 or later. I need to replace SEP with 09 and rearrange the output to be mm-dd-yyyy. So the end result would be:
13-SEP-20 to 09-13-2020
So far, I have the below code. It's replacing values on each line using the dictionary for the month and rearranging to the correct order. Next, I need to change the year to 20nn but I'm not sure how to do that part.
import datetime
months = {'JAN': '01', 'FEB': '02', 'MAR': '03', 'APR': '04', 'MAY': '05', 'JUN': '06',
'JUL': '07', 'AUG': '08', 'SEP': '09', 'OCT': '10', 'NOV': '11', 'DEC': '12'}
result_dic = {}
#replace mmm with mm int
with open('date_file.txt') as fh:
giv_date=fh.read().splitlines()
for line in giv_date:
line = line.rstrip()
for mon_alph, mon_num in months.items():
if mon_alph in line:
line = line.replace(mon_alph, mon_num)
line = datetime.datetime.strptime(line,'%d-%m-%y')
line = datetime.datetime.strftime(line,'%m-%d-%y')
print(line)
The output from the above for the first few lines is:
09-13-20
09-11-20
09-10-20
08-27-19
08-24-20
Can someone assist me with how I can change the yy to yyyy? For simplicity I'd say possibly just adding 2000 to each year but I think that would possibly be too complex and there may be a simpler way? Thank you in advance for your assistance.

Try %Y instead of %y. You can read more about it in the python documentation about strftime. Basically, the %y will return only 2 last digits, while %Y will return 2021.

To have 4 digits year representation replace last line with
line = datetime.datetime.strftime(line,'%m-%d-%Y')

Related

Iteration over two lists using index

I am trying to create a list of time and date at specific intervals. The times and dates are present in a time series csv and I want to write a code that extracts data from specific time intervals. I made two lists for day and hour and I am creating a new variable that that stores the date and time of interest. I have trying the following code but I get error:
day = ['01', '02', '03', '04', "05", '06', '07', '08', '09', '10', '11', '12','13','14','15','16','17','18'
'19','20','21','22','23','24','25','26','27','28','29','30','31']
hour = ['0', '3', '6', '9', '12', '15','18','21']
year, month, day, hour = year, month, day, hour # 2016-01-01 #01:00 am
day_time = []
for i in day.index:
for j in hour.index:
day_time = int("".join(day[i], hour[j], "00",))
print(day_time)
TypeError Traceback (most recent call last)
<ipython-input-72-15de17abf279> in <module>
6 year, month, day, hour = year, month, day, hour # 2016-01-01 #01:00 am
7 day_time = []
----> 8 for i in day.index:
9 for j in hour.index:
10 day_time = int("".join(day[i], hour[j], "00",))
TypeError: 'builtin_function_or_method' object is not iterable
can someone suggest a solution?

index is a function, not an attribute for list instance. please refer to Data structures
also, the join function of a str data type takes iterables, refer to here
Also, as #Lecdi pointed, you should use append to add to a list instead of redefinition of the variable using =; please refer to here
to be able to do what you want to do:
day = ['01', '02', '03', '04', "05", '06', '07', '08', '09', '10', '11', '12','13','14','15','16','17','18'
'19','20','21','22','23','24','25','26','27','28','29','30','31']
hour = ['0', '3', '6', '9', '12', '15','18','21']
year, month, day, hour = year, month, day, hour # 2016-01-01 #01:00 am
day_time = []
for day_i in day:
for hour_i in hour:
day_time.append(int("".join([day_i, hour_i, "00"])))
print(day_time)

I think enumerate() would work better for you
for indexDay, valueDay in enumerate(day):
for indexHour, valueHour in enumerate(hour):
day_time.append(int("".join([valueDay, valueHour, "00"])))

Looping through a list of tuples and removing them

Im having some difficulty understanding why my loop is not deleting invalid dates from a list of date tuples in the format of dd/mm/yyyy . heres what i have so far :
dates = [('12','10','1987'),('13','09','2010'), ('34','02','2002'), ('02','15','2005'),('37','10','2016'),('39','11','2001')]
print(dates)
for date in dates :
day = int(date[0])
month = int(date[1])
year = int(date[2])
if day > 31 :
dates.remove(date)
if month > 12 :
dates.remove(date)
print(dates)
and heres the result :
[('12', '10', '1987'), ('13', '09', '2010'), ('34', '02', '2002'), ('02', '15', '2005'), ('37', '10', '2016'), ('39', '11', '2001')]
[('12', '10', '1987'), ('13', '09', '2010'), ('02', '15', '2005'), ('39', '11', '2001')]
I'm a total beginner and any help would be much appreciated.

Never modify the (length of the) list you are looping over. Instead, use for example a temporary list:
dates = [('12','10','1987'),('13','09','2010'), ('34','02','2002'), ('02','15','2005'),('37','10','2016'),('39','11','2001')]
print(dates)
out = []
for date in dates :
day = int(date[0])
month = int(date[1])
year = int(date[2])
if day > 31 or month > 12:
continue
out.append(date)
dates = out
print(dates)
The continue statement jumps back to the first line of the loop, so the unwanted dates will be skipped.
Better alternative conserning dates
Commenting on the "date checking" functionality of the program: It might be really hard to determine by your own rules what dates are acceptable and what are not. Consider for example the Feb 29th, which is only valid on every fourth year.
What you could do instead is to use the datetime library to try to parse the strings to datetime objects, and if the parsing fails, you know the date is illegal.
import datetime as dt
dates = [('12','10','1987'),('13','09','2010'), ('34','02','2002'), ('02','15','2005'),('37','10','2016'),('39','11','2001')]
def filter_bad_dates(dates):
out = []
for date in dates:
try:
dt.datetime.strptime('-'.join(date), '%d-%m-%Y')
except ValueError:
continue
out.append(date)
return out
dates = filter_bad_dates(dates)
print(dates)
This try - except pattern is also called "Duck Typing":
If it looks like a date and gets parsed like a proper date, then it is probably a proper date.

You can easily accomplish that with this list comprehension:
dates = [('12','10','1987'),('13','09','2010'), ('34','02','2002'), ('02','15','2005'),('37','10','2016'),('39','11','2001')]
dates = [date for date in dates if int(date[1]) < 12 and int(date[0]) < 31]
print(dates)
Output:
[('12', '10', '1987'), ('13', '09', '2010')]

I like #AnnZen's comprehension approach (+1) though my tendency would be to go more symbolic at the waste of some time and space:
dates = [ \
('12', '10', '1987'), \
('13', '09', '2010'), \
('34', '02', '2002'), \
('02', '15', '2005'), \
('37', '10', '2016'), \
('39', '11', '2001'), \
]
dates = [date for (day, month, _), date in zip(dates, dates) if day < '31' and month < '12']
print(dates)
OUTPUT
> python3 test.py
[('12', '10', '1987'), ('13', '09', '2010')]
>
As far as #np8's "Never modify the list you are looping over.", that's excellent advice. Though, again, I might waste some space making the copy upfront to make my code simpler:
for date in list(dates): # iterate over a copy
day, month, _ = date
if int(day) > 31 or int(month) > 12:
dates.remove(date)
Though in the end, #np8's filtering through datetime seems the most reliable solution. (+1)

create separate new list from one mother list

I am trying to do a script that read a seismic USGS bulletin and take some data to build a new txt file in order to have an input for other program called Zmap to do seismic statistics
SO I have the following USGS bulletin format:
time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
2016-03-31T07:53:28.830Z,-22.6577,-68.5345,95.74,4.8,mww,,33,0.35,0.97,us,us20005dm3,2016-05-07T05:09:39.040Z,"43km NW of San Pedro de Atacama, Chile",earthquake,6.5,4.3,,,reviewed,us,us
2016-03-31T07:17:19.300Z,-18.779,-67.3104,242.42,4.5,mb,,65,1.987,0.85,us,us20005dlx,2016-04-24T07:21:05.358Z,"55km WSW of Totoral, Bolivia",earthquake,10.2,12.6,0.204,7,reviewed,us,us
this has many seismics events, so I did the following code which basically tries to read, split and save some variables in list to put them all together in a final *txt file.
import os, sys
import csv
import string
from itertools import (takewhile,repeat)
os.chdir('D:\\Seismic_Inves\\b-value_osc\\try_tonino')
archi=raw_input('NOMBRE DEL BOLETIN---> ')
ff=open(archi,'rb')
bufgen=takewhile(lambda x: x, (ff.read(1024*1024) for _ in repeat(None)))
numdelins= sum(buf.count(b'\n') for buf in bufgen if buf) - 1
with open(archi,'rb') as f:
next(f)
tiempo=[]
lat=[]
lon=[]
prof=[]
mag=[]
t_mag=[]
leo=csv.reader(f,delimiter=',')
for line in leo:
tiempo.append(line[0])
lat.append(line[1])
lon.append(line[2])
prof.append(line[3])
mag.append(line[4])
t_mag.append(line[5])
tiempo=[s.replace('T', ' ') for s in tiempo] #remplaza el tema de la T por espacio
tiempo=[s.replace('Z','') for s in tiempo] #quito la Z
tiempo=[s.replace(':',' ') for s in tiempo] # quito los :
tiempo=[s.replace('-',' ') for s in tiempo] # quito los -
From the USGS catalog I'd like to take the: Latitude (lat), longitude(lon), time(tiempo), depth (prof), magnitude (mag), type of magnitude (t_mag), with this part of teh code I took the variables I needed:
next(f)
tiempo=[]
lat=[]
lon=[]
prof=[]
mag=[]
t_mag=[]
leo=csv.reader(f,delimiter=',')
for line in leo:
tiempo.append(line[0])
lat.append(line[1])
lon.append(line[2])
prof.append(line[3])
mag.append(line[4])
t_mag.append(line[5])
but I had some troubles with the tim, so I applied my newbie knowledge to split the time from 2016-03-31T07:53:28.830Z to 2016 03 31 07 53 28.830.
Now I am suffering trying to have in one list the year ([2016,2016,2016,...]) in other list the months ([01,01,...03,03,...12]), in other the day ([12,14,...03,11]), in other the hour ([13,22,14,17...]), and the minutes with seconds merged by a point (.) like ([minute.seconds]) or ([12.234,14.443,...]), so I tryied to do this (to plit the spaces) and no success
tiempo2=[]
for element in tiempo:
tiempo2.append(element.split(' '))
print tiempo2
no success because i got this result:
[['2016', '03', '31', '07', '53', '28.830'], ['2016', '03', '31', '07', '17', '19.300'].
can you give me a hand in this part?, or is there a pythonic way to split the date like I said before.
Thank you for the time you spent reading it.
best regards.
Tonino

suppose our tiempo2 holds the following value extracted from the csv :
>>> tiempo2 = [['2016', '03', '31', '07', '53', '28.830'], ['2016', '03', '31', '07', '17', '19.300']]
>>> list (map (list, (map (float, items) if index == 5 else map (int, items) for index, items in enumerate (zip (*tiempo2)))))
[[2016, 2016], [3, 3], [31, 31], [7, 7], [53, 17], [28.83, 19.3]]
here we used the zip function to zip years, months, days, etc ...
I applied the conditional mapping for each item to an int if the index of the list is not the last otherwise to a float

I would suggest using the time.strptime() function to parse the time string into a Python time.struct_time which is a namedtuple. That means you can access any attributes you want using . notation.
Here's what I mean:
import time
time_string = '2016-03-31T07:53:28.830Z'
timestamp = time.strptime(time_string, '%Y-%m-%dT%H:%M:%S.%fZ')
print(type(timestamp))
print(timestamp.tm_year) # -> 2016
print(timestamp.tm_mon) # -> 3
print(timestamp.tm_mday) # -> 31
print(timestamp.tm_hour) # -> 7
print(timestamp.tm_min) # -> 53
print(timestamp.tm_sec) # -> 28
print(timestamp.tm_wday) # -> 3
print(timestamp.tm_yday) # -> 91
print(timestamp.tm_isdst) # -> -1
You could process a list of time strings by using a for loop as shown below:
import time
tiempo = ['2016-03-31T07:53:28.830Z', '2016-03-31T07:17:19.300Z']
for time_string in tiempo:
timestamp = time.strptime(time_string, '%Y-%m-%dT%H:%M:%S.%fZ')
print('year: {}, mon: {}, day: {}, hour: {}, min: {}, sec: {}'.format(
timestamp.tm_year, timestamp.tm_mon, timestamp.tm_mday,
timestamp.tm_hour, timestamp.tm_min, timestamp.tm_sec))
Output:
year: 2016, mon: 3, day: 31, hour: 7, min: 53, sec: 28
year: 2016, mon: 3, day: 31, hour: 7, min: 17, sec: 19

Another solution with the iso8601 add-on (pip install iso8601)
>>> import iso8601
>>> dt = iso8601.parse_date('2016-03-31T07:17:19.300Z')
>>> dt.year
2016
>>> dt.month
3
>>> dt.day
31
>>> dt.hour
7
>>> dt.minute
17
>>> dt.second
10
>>> dt.microsecond
300000
>>> dt.tzname()
'UTC'
Edited 2017/8/6 12h55
IMHO, it is a bad idea to split the datetime timestamp objects into components (year, month, ...) in individual lists. Keeping the datetime timestamp objects as provided by iso8601.parse_date(...) could help to compute time deltas between events, check the chronological order, ... See the doc of the datetime module for more https://docs.python.org/3/library/datetime.html
Having distinct lists for year, month, (...) would make such operations difficult. Anyway, if you prefer this solution, here are the changes
import iso8601
# Start as former solution
with open(archi,'rb') as f:
next(f)
# tiempo=[]
dt_years = []
dt_months = []
dt_days = []
dt_hours = []
dt_minutes = []
dt_timezones = []
lat=[]
lon=[]
prof=[]
mag=[]
t_mag=[]
leo=csv.reader(f,delimiter=',')
for line in leo:
# tiempo.append(line[0])
dt = iso8601.parse_date(line[0])
dt_years.append(dt.year)
dt_months.append(dt.month)
dt_days.append(dt.day)
dt_hours.append(dt.hour)
dec_minutes = dt.minute + (dt.seconds / 60) + (dt.microsecond / 600000000)
dt_minutes.append(dec_minutes)
dt_timezones.append(dt.tzname())
lat.append(line[1])
lon.append(line[2])
prof.append(line[3])
mag.append(line[4])
t_mag.append(line[5])

How to split dates from weather data file

I am new in python and i would greatly appreciate some help.
I have data generated from a weather station(rawdate) in the format 2015-04-26 00:00:48 like this
Date,Ambient Temperature (C),Wind Speed (m/s)
2015-04-26 00:00:48,10.75,0.00
2015-04-26 00:01:48,10.81,0.43
2015-04-26 00:02:48,10.81,0.32
and i would like to split them into year month day hour and minute. My attempt so far is this:
for i in range(len(rawdate)):
x=rawdate[1].split()
date.append(x)
but it gives me a list full of empty lists. My target is to convert this into a list of lists (using the command split) where the new data will be stored into x in the form of [date, time]. Then i want to split further using split with "-" and ":". Can someone offer some advice?

>>> from datetime import datetime
>>> str_date = '2015-04-26 00:00:48'
>>> datte = datetime.strptime(str_date, '%Y-%m-%d %H:%M:%S')
>>> t = datte.timetuple()
>>> y, m, d, h, min, sec, wd, yd, i = t
>>> y
2015
>>> m
4
>>> min
0
>>> sec
48

Your code is clearly broken, because you are not using the loop in any way other than repeating the same operation on rawdate[1], len(rawdate) times.
It's possible that you meant i where you have 1.
For this to make sense, your rawdate would have to be a list of strings (as suggested by #SuperBiasedMan)
Maybe something close to what you were after is like this:
>>> dates = []
>>> rawdates = ['2015-04-26 00:00:48', '2015-04-26 00:00:49']
>>> for i in range(len(rawdates)):
... the_date = rawdates[i].split()
... dates.append(the_date)
...
>>> dates
[['2015-04-26', '00:00:48'], ['2015-04-26', '00:00:49']]
>>>
Use meaningful names always.

rawdate[1] will always return a 0 cause '2015...'[1] is 0.
>>>a = '2015-04-26 00:00:48'
>>>print([date for date in [i for i in a.split(' ')][0].split('-')] + [time for time in [i for i in a.split(' ')][1].split(':')])
>>>['2015', '04', '26', '00', '00', '48']

python parse java calendar to isodate

I've data like this.
startDateTime: {'timeZoneID': 'America/New_York', 'date': {'year': '2014', 'day': '29', 'month': '1'}, 'second': '0', 'hour': '12', 'minute': '0'}
This is just a representation for 1 attribute. Like this i've 5 other attributes. LastModified, created etc.
I wanted to derive this as ISO Date format yyyy-mm-dd hh:mi:ss. is this the right way for doing this?
def parse_date(datecol):
x=datecol;
y=str(x.get('date').get('year'))+'-'+str(x.get('date').get('month')).zfill(2)+'-'+str(x.get('date').get('day')).zfill(2)+' '+str(x.get('hour')).zfill(2)+':'+str(x.get('minute')).zfill(2)+':'+str(x.get('second')).zfill(2)
print y;
return;

That works, but I'd say it's cleaner to use the string formatting operator here:
def parse_date(c):
d = c["date"]
print "%04d-%02d-%02d %02d:%02d:%02d" % tuple(map(str, (d["year"], d["month"], d["day"], c["hour"], c["minute"], c["second"])))
Alternatively, you can use the time module to convert your fields into a Python time value, and then format that using strftime. Remember the time zone, though.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Read file, rearrange date and change year from yy to yyyy - python

Try %Y instead of %y. You can read more about it in the python documentation about strftime. Basically, the %y will return only 2 last digits, while %Y will return 2021.

To have 4 digits year representation replace last line with line = datetime.datetime.strftime(line,'%m-%d-%Y')

Related

Iteration over two lists using index

Looping through a list of tuples and removing them

create separate new list from one mother list

How to split dates from weather data file

python parse java calendar to isodate

Categories

Resources