How to split dates from weather data file

How to split dates from weather data file - python

I am new in python and i would greatly appreciate some help.
I have data generated from a weather station(rawdate) in the format 2015-04-26 00:00:48 like this
Date,Ambient Temperature (C),Wind Speed (m/s)
2015-04-26 00:00:48,10.75,0.00
2015-04-26 00:01:48,10.81,0.43
2015-04-26 00:02:48,10.81,0.32
and i would like to split them into year month day hour and minute. My attempt so far is this:
for i in range(len(rawdate)):
x=rawdate[1].split()
date.append(x)
but it gives me a list full of empty lists. My target is to convert this into a list of lists (using the command split) where the new data will be stored into x in the form of [date, time]. Then i want to split further using split with "-" and ":". Can someone offer some advice?

>>> from datetime import datetime
>>> str_date = '2015-04-26 00:00:48'
>>> datte = datetime.strptime(str_date, '%Y-%m-%d %H:%M:%S')
>>> t = datte.timetuple()
>>> y, m, d, h, min, sec, wd, yd, i = t
>>> y
2015
>>> m
4
>>> min
0
>>> sec
48

Your code is clearly broken, because you are not using the loop in any way other than repeating the same operation on rawdate[1], len(rawdate) times.
It's possible that you meant i where you have 1.
For this to make sense, your rawdate would have to be a list of strings (as suggested by #SuperBiasedMan)
Maybe something close to what you were after is like this:
>>> dates = []
>>> rawdates = ['2015-04-26 00:00:48', '2015-04-26 00:00:49']
>>> for i in range(len(rawdates)):
... the_date = rawdates[i].split()
... dates.append(the_date)
...
>>> dates
[['2015-04-26', '00:00:48'], ['2015-04-26', '00:00:49']]
>>>
Use meaningful names always.

rawdate[1] will always return a 0 cause '2015...'[1] is 0.
>>>a = '2015-04-26 00:00:48'
>>>print([date for date in [i for i in a.split(' ')][0].split('-')] + [time for time in [i for i in a.split(' ')][1].split(':')])
>>>['2015', '04', '26', '00', '00', '48']

Related

How to distinguish between date, month and year in the string?

I have a question: How to use "strip" function to slice a date like "24.02.1999"?
The output should be like this '24', '02', '1999'.
Can you help to solve this?

You can do like this
>>> stri="24.02.1999"
>>> stri.split('.')
['24', '02', '1999']
>>>

strip is used to remove the characters. What you meant is split. For your code,
date = input('Enter date in the format (DD.MM.YY) : ')
dd, mm, yyyy = date.strip().split('.')
print('day = ',dd)
print('month = ',mm)
print('year = ',yyyy)
Output:
Enter date in the format (DD.MM.YY) : 24.02.1999
day = 24
month = 02
year = 1999

You need to use split() not strip().
strip() is used to remove the specified characters from a string.
split() is used to split the string to list based on the value provided.
date = str(input()) # reading input date in dd.mm.yyyy format
splitted_date = date.split('.') # splitting date
day = splitted_date[0] # storing day
month = splitted_date[1] # storing month
year = splitted_date[2] # storing year
# Display the values
print('Date : ',date)
print('Month : ',month)
print('Year : ',year)
You can split date given in DD.MM.YYYY format like this.

Instead of splitting the string, you should be using datetime.strptime(..) to convert the string to the datetime object like:
>>> from datetime import datetime
>>> my_date_str = "24.02.1999"
>>> my_date = datetime.strptime(my_date_str, '%d.%m.%Y')
Then you can access the values you desire as:
>>> my_date.day # For date
24
>>> my_date.month # For month
2
>>> my_date.year # For year
1999

Here you go
date="24.02.1999"
[dd,mm,yyyy] = date.split('.')
output=(("'%s','%s','%s'") %(dd,mm,yyyy))
print(output)
alternate way
date="24.02.1999"
dd=date[0:2]
mm=date[3:5]
yyyy=date[6:10]
newdate=(("'%s','%s','%s'") %(dd,mm,yyyy))
print(newdate)
one more alternate way
from datetime import datetime
date="24.02.1999"
date=datetime.strptime(date, '%d.%m.%Y')
date=(("'%s','%s','%s'") %(date.day,date.month,date.year))
print(date)
Enjoy

How to convert a string date in mongodb to ISODate(), or DATE() and delete if it is less than 30 days using python?

Using python I saved dates into MongoDB in below format,
completed_time : "2017:08:20 02:30:02"
Now I want to delete all entries which are older than 30 days.
How can I implement that logic?

You can do it more simply. This code is considerably longer than needed, in order to be explanatory.
I first create a collection of MongoDB records with dates that start from about a month and a half ago, and end about two two weeks ago.
>>> from pymongo import MongoClient
>>> client = MongoClient()
>>> db = client.test_database
>>> from datetime import datetime, timedelta
>>> some_dates = [datetime(2017, 7, d).strftime('%Y:%m:%d %H:%M:%S') for d in range(15,31)]+[datetime(2017, 8, d).strftime('%Y:%m:%d %H:%M:%S') for d in range(1,16)]
>>> posts = db.create_collection
>>> for some_date in some_dates:
... post = {'completed_time': some_date, 'stuff': 'more stuff'}
... post_id = posts.insert_one(post).inserted_id
...
This calculates the time and date that is (or was) 30 days earlier than 'now' when I calculated it, and puts it in the format in you MongoDB database.
>>> boundary = (datetime.now()-timedelta(30)).strftime('%Y:%m:%d %H:%M:%S')
This counts the number of records in the database whose dates and times precede the value in boundary just calculated, for later reference.
>>> count = 0
>>> for post in posts.find({'completed_time': {'$lt': boundary}}):
... count+=1
...
>>> count
19
This is the one line that, with the calculation of boundary, does what you want.
>>> r = posts.delete_many({'completed_time': {'$lt': boundary}})
Now we can check that the correct number of records has been deleted.
>>> count = 0
>>> for post in posts.find({'completed_time': {'$lt': boundary}}):
... count+=1
...
>>> count
0

You can use the datetime module to convert your date/time string into a datetime object, and then convert it to an ordinal day (just a single number), and compare it to the day that was thirty days ago.
Hopefully this will do what you want:
import datetime
completed_time = "2017:07:20 02:30:02"
timeFormat = '%Y:%m:%d %H:%M:%S'
thisDate = datetime.datetime.strptime(completed_time, timeFormat).toordinal()
today = datetime.date.today()
thirtyDaysAgo = today.toordinal() - 30
if thisDate < thirtyDaysAgo:
print("That needs deleting!")

create separate new list from one mother list

I am trying to do a script that read a seismic USGS bulletin and take some data to build a new txt file in order to have an input for other program called Zmap to do seismic statistics
SO I have the following USGS bulletin format:
time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
2016-03-31T07:53:28.830Z,-22.6577,-68.5345,95.74,4.8,mww,,33,0.35,0.97,us,us20005dm3,2016-05-07T05:09:39.040Z,"43km NW of San Pedro de Atacama, Chile",earthquake,6.5,4.3,,,reviewed,us,us
2016-03-31T07:17:19.300Z,-18.779,-67.3104,242.42,4.5,mb,,65,1.987,0.85,us,us20005dlx,2016-04-24T07:21:05.358Z,"55km WSW of Totoral, Bolivia",earthquake,10.2,12.6,0.204,7,reviewed,us,us
this has many seismics events, so I did the following code which basically tries to read, split and save some variables in list to put them all together in a final *txt file.
import os, sys
import csv
import string
from itertools import (takewhile,repeat)
os.chdir('D:\\Seismic_Inves\\b-value_osc\\try_tonino')
archi=raw_input('NOMBRE DEL BOLETIN---> ')
ff=open(archi,'rb')
bufgen=takewhile(lambda x: x, (ff.read(1024*1024) for _ in repeat(None)))
numdelins= sum(buf.count(b'\n') for buf in bufgen if buf) - 1
with open(archi,'rb') as f:
next(f)
tiempo=[]
lat=[]
lon=[]
prof=[]
mag=[]
t_mag=[]
leo=csv.reader(f,delimiter=',')
for line in leo:
tiempo.append(line[0])
lat.append(line[1])
lon.append(line[2])
prof.append(line[3])
mag.append(line[4])
t_mag.append(line[5])
tiempo=[s.replace('T', ' ') for s in tiempo] #remplaza el tema de la T por espacio
tiempo=[s.replace('Z','') for s in tiempo] #quito la Z
tiempo=[s.replace(':',' ') for s in tiempo] # quito los :
tiempo=[s.replace('-',' ') for s in tiempo] # quito los -
From the USGS catalog I'd like to take the: Latitude (lat), longitude(lon), time(tiempo), depth (prof), magnitude (mag), type of magnitude (t_mag), with this part of teh code I took the variables I needed:
next(f)
tiempo=[]
lat=[]
lon=[]
prof=[]
mag=[]
t_mag=[]
leo=csv.reader(f,delimiter=',')
for line in leo:
tiempo.append(line[0])
lat.append(line[1])
lon.append(line[2])
prof.append(line[3])
mag.append(line[4])
t_mag.append(line[5])
but I had some troubles with the tim, so I applied my newbie knowledge to split the time from 2016-03-31T07:53:28.830Z to 2016 03 31 07 53 28.830.
Now I am suffering trying to have in one list the year ([2016,2016,2016,...]) in other list the months ([01,01,...03,03,...12]), in other the day ([12,14,...03,11]), in other the hour ([13,22,14,17...]), and the minutes with seconds merged by a point (.) like ([minute.seconds]) or ([12.234,14.443,...]), so I tryied to do this (to plit the spaces) and no success
tiempo2=[]
for element in tiempo:
tiempo2.append(element.split(' '))
print tiempo2
no success because i got this result:
[['2016', '03', '31', '07', '53', '28.830'], ['2016', '03', '31', '07', '17', '19.300'].
can you give me a hand in this part?, or is there a pythonic way to split the date like I said before.
Thank you for the time you spent reading it.
best regards.
Tonino

suppose our tiempo2 holds the following value extracted from the csv :
>>> tiempo2 = [['2016', '03', '31', '07', '53', '28.830'], ['2016', '03', '31', '07', '17', '19.300']]
>>> list (map (list, (map (float, items) if index == 5 else map (int, items) for index, items in enumerate (zip (*tiempo2)))))
[[2016, 2016], [3, 3], [31, 31], [7, 7], [53, 17], [28.83, 19.3]]
here we used the zip function to zip years, months, days, etc ...
I applied the conditional mapping for each item to an int if the index of the list is not the last otherwise to a float

I would suggest using the time.strptime() function to parse the time string into a Python time.struct_time which is a namedtuple. That means you can access any attributes you want using . notation.
Here's what I mean:
import time
time_string = '2016-03-31T07:53:28.830Z'
timestamp = time.strptime(time_string, '%Y-%m-%dT%H:%M:%S.%fZ')
print(type(timestamp))
print(timestamp.tm_year) # -> 2016
print(timestamp.tm_mon) # -> 3
print(timestamp.tm_mday) # -> 31
print(timestamp.tm_hour) # -> 7
print(timestamp.tm_min) # -> 53
print(timestamp.tm_sec) # -> 28
print(timestamp.tm_wday) # -> 3
print(timestamp.tm_yday) # -> 91
print(timestamp.tm_isdst) # -> -1
You could process a list of time strings by using a for loop as shown below:
import time
tiempo = ['2016-03-31T07:53:28.830Z', '2016-03-31T07:17:19.300Z']
for time_string in tiempo:
timestamp = time.strptime(time_string, '%Y-%m-%dT%H:%M:%S.%fZ')
print('year: {}, mon: {}, day: {}, hour: {}, min: {}, sec: {}'.format(
timestamp.tm_year, timestamp.tm_mon, timestamp.tm_mday,
timestamp.tm_hour, timestamp.tm_min, timestamp.tm_sec))
Output:
year: 2016, mon: 3, day: 31, hour: 7, min: 53, sec: 28
year: 2016, mon: 3, day: 31, hour: 7, min: 17, sec: 19

Another solution with the iso8601 add-on (pip install iso8601)
>>> import iso8601
>>> dt = iso8601.parse_date('2016-03-31T07:17:19.300Z')
>>> dt.year
2016
>>> dt.month
3
>>> dt.day
31
>>> dt.hour
7
>>> dt.minute
17
>>> dt.second
10
>>> dt.microsecond
300000
>>> dt.tzname()
'UTC'
Edited 2017/8/6 12h55
IMHO, it is a bad idea to split the datetime timestamp objects into components (year, month, ...) in individual lists. Keeping the datetime timestamp objects as provided by iso8601.parse_date(...) could help to compute time deltas between events, check the chronological order, ... See the doc of the datetime module for more https://docs.python.org/3/library/datetime.html
Having distinct lists for year, month, (...) would make such operations difficult. Anyway, if you prefer this solution, here are the changes
import iso8601
# Start as former solution
with open(archi,'rb') as f:
next(f)
# tiempo=[]
dt_years = []
dt_months = []
dt_days = []
dt_hours = []
dt_minutes = []
dt_timezones = []
lat=[]
lon=[]
prof=[]
mag=[]
t_mag=[]
leo=csv.reader(f,delimiter=',')
for line in leo:
# tiempo.append(line[0])
dt = iso8601.parse_date(line[0])
dt_years.append(dt.year)
dt_months.append(dt.month)
dt_days.append(dt.day)
dt_hours.append(dt.hour)
dec_minutes = dt.minute + (dt.seconds / 60) + (dt.microsecond / 600000000)
dt_minutes.append(dec_minutes)
dt_timezones.append(dt.tzname())
lat.append(line[1])
lon.append(line[2])
prof.append(line[3])
mag.append(line[4])
t_mag.append(line[5])

Working with time values greater than 24 hours

How does one work with time periods greater than 24 hours in python? I looked at the datetime.time object but this seems to be for handling the time of a day, not time in general.
datetime.time has the requirement of 0 <= hour < 24 which makes it useless if you want to record a time of more than 24 hours unless I am missing something?
Say for example I wanted to calculate the total time worked by someone. I know the time they've taken to complete tasks individually. What class should I be using to safely calculate that total time.
My input data would look something like this:
# The times in HH:MM:SS
times = ["16:35:21", "8:23:14"]
total_time = ? # 24:58:35

Unfortunately there is not a builtin way to construct timedeltas from strings (like strptime() for datetime objects) so we have to build a parser:
>>> from datetime import timedelta
>>> import re
>>> def interval(s):
"Converts a string to a timedelta"
d = re.match(r'((?P<days>\d+) days, )?(?P<hours>\d+):'
r'(?P<minutes>\d+):(?P<seconds>\d+)', str(s)).groupdict(0)
return timedelta(**dict(((key, int(value)) for key, value in d.items())))
>>> times = ["16:35:21", "8:23:14"]
>>> print sum([interval(time) for time in times])
1 day, 0:58:35
EDIT: Old wrong answer (where I misread the question):
If you substract datetimes you get a timedelta object:
>>> import datetime as dt
>>> times = ["16:35:21", "8:23:14"]
>>> fmt = '%H:%M:%S'
>>> start = dt.datetime.strptime(times[1], fmt )
>>> end = dt.datetime.strptime(times[0], fmt)
>>> diff = (end - start)
>>> diff.total_seconds()
29527.0
>>> (diff.days, diff.seconds, diff.microseconds)
(0, 29527, 0)
>>> print diff
8:12:07

As I understand, you want a sum of all times and not difference. So you can convert your time to timedelta and then sum it:
>>> from datetime import timedelta
# get hours, minutes and seconds
>>> tm1 = [map(int, x.split(':')) for x in times]
# convert to timedelta
>>> tm2 = [timedelta(hours=x[0], minutes=x[1], seconds=x[2]) for x in tm1]
# sum
>>> print sum(tm2, timedelta())
1 day, 0:58:35

greater than 'date' python 3

I would like to be able to do greater than and less than against dates. How would I go about doing that? For example:
date1 = "20/06/2013"
date2 = "25/06/2013"
date3 = "01/07/2013"
date4 = "07/07/2013"
datelist = [date1, date2, date3]
for j in datelist:
if j <= date4:
print j
If I run the above, I get date3 back and not date1 or date2. I think I need I need to get the system to realise it's a date and I don't know how to do that. Can someone lend a hand?
Thanks

You can use the datetime module to convert them all to datetime objects. You are comparing strings in your example:
>>> from datetime import datetime
>>> date1 = datetime.strptime(date1, "%d/%m/%Y")
>>> date2 = datetime.strptime(date2, "%d/%m/%Y")
>>> date3 = datetime.strptime(date3, "%d/%m/%Y")
>>> date4 = datetime.strptime(date4, "%d/%m/%Y")
>>> datelist = [date1, date2, date3]
>>> for j in datelist:
... if j <= date4:
... print(j.strftime('%d/%m/%Y'))
...
20/06/2013
25/06/2013
01/07/2013

You are comparing strings, not dates. You should use a date-based object-type, such as datetime.
How to compare two dates?

You can use the datetime module:
>>> from datetime import datetime
>>> d = datetime.strptime(date4, '%d/%m/%Y')
>>> for j in datelist:
... d1 = datetime.strptime(j, '%d/%m/%Y')
... if d1 <= d:
... print j
...
20/06/2013
25/06/2013
01/07/2013

The problem with your comparison is that a string comparison first compares the first character, followed by the second one, and the third, and so on. You can of course convert the strings to dates, like the other answers suggest, but there is a different solution as well.
In order to compare dates as strings you need to have them in a different format like : 'yyyy-mm-dd'. In this way it first compares the year, than the month and finally the day:
>>> d1 = '2012-10-11'
>>> d2 = '2012-10-12'
>>> if d2 > d1:
... print('this works!')
this works!
The advantages of this are simplicity (for me at least) and performance because it saves the conversion of strings to dates (and possibly back) while still reliably comparing the dates. In programs I use I compare dates a lot as well. Since I take the dates from files it are always strings to begin with, and because performance is an issue with my program I normally like to compare dates as strings in this way.
Of course this would mean you would have to convert your dates to a different format, but if that is a one time action, it could well be worth the effort.. :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split dates from weather data file - python

>>> from datetime import datetime >>> str_date = '2015-04-26 00:00:48' >>> datte = datetime.strptime(str_date, '%Y-%m-%d %H:%M:%S') >>> t = datte.timetuple() >>> y, m, d, h, min, sec, wd, yd, i = t >>> y 2015 >>> m 4 >>> min 0 >>> sec 48

rawdate[1] will always return a 0 cause '2015...'[1] is 0. >>>a = '2015-04-26 00:00:48' >>>print([date for date in [i for i in a.split(' ')][0].split('-')] + [time for time in [i for i in a.split(' ')][1].split(':')]) >>>['2015', '04', '26', '00', '00', '48']

Related

How to distinguish between date, month and year in the string?

How to convert a string date in mongodb to ISODate(), or DATE() and delete if it is less than 30 days using python?

create separate new list from one mother list

Working with time values greater than 24 hours

greater than 'date' python 3

Categories

Resources