how to generate a range of yyyymm values? [duplicate]

how to generate a range of yyyymm values? [duplicate] - python

This question already has answers here:
Generate a list of datetimes between an interval
(5 answers)
Closed 8 years ago.
I have two yyyymm values that will be input by a user:
yyyymm_1 = '201406'
yyyymm_2 = '201501'
I want to be able to iterate through this range in increasing month order:
for yyyy and mm in the range of yyyymm_1 to yyyymm_2
my_function( yyyy, mm )
How can this be done in python?
Update:
Ideally, the solution should be as simple as possible without requiring external libraries. I'm not looking for a generic date manipulation solution, but a solution to answer the specific question I have asked above.
I had seen lots of generic solutions before posting my question. However, being a python noob, couldn't see how to adapt them to my question:
Generate a list of datetimes between an interval
Iterating through a range of dates in Python
On that note, the other questions linked to from this page are much more generic. If you are looking to generate a range of yyyymm values, I urge you to look at the selected answer on this page.

Here's another rather simple variant, without even using datetime. Just split the date, calculate the 'total month', and iterate.
def to_month(yyyymm):
y, m = int(yyyymm[:4]), int(yyyymm[4:])
return y * 12 + m
def iter_months(start, end):
for month in range(to_month(start), to_month(end) + 1):
y, m = divmod(month-1, 12) # ugly fix to compensate
yield y, m + 1 # for 12 % 12 == 0
for y, m in iter_months('201406', '201501'):
print y, m
Output:
2014 6
2014 7
...
2014 12
2015 1
For output in the same yyyymm format, use print("%d%02d" % (y, m)).

You can do this using the builtin datetime module and the third party package dateutil.
The code first converts your strings to datetime.datetime objects using datetime.datetime.strptime. It then uses the relativedelta function from dateutil to create a period of one month that can be added to your datetimes.
Within the while loop you can either work with the datetime objects directly, or construct the month and year as strings using strftime, I've shown an example of both in print functions.
import datetime as dt
from dateutil.relativedelta import relativedelta
yyyymm_1 = '201406'
yyyymm_2 = '201501'
MONTH = relativedelta(months=+1)
fmt = '%Y%m'
date_1 = dt.datetime.strptime(yyyymm_1, fmt).date()
date_2 = dt.datetime.strptime(yyyymm_2, fmt).date()
d = date_1
while d <= date_2:
print(d)
print(d.strftime('%Y'), d.strftime('%m'))
d += MONTH

Related

Python convert from ordinal time with milliseconds [duplicate]

I just started moving from Matlab to Python 2.7 and I have some trouble reading my .mat-files. Time information is stored in Matlab's datenum format. For those who are not familiar with it:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
MATLAB also uses serial time to represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75 serial days. So the string '31-Oct-2003, 6:00 PM' in MATLAB is date number 731885.75.
(taken from the Matlab documentation)
I would like to convert this to Pythons time format and I found this tutorial. In short, the author states that
If you parse this using python's datetime.fromordinal(731965.04835648148) then the result might look reasonable [...]
(before any further conversions), which doesn't work for me, since datetime.fromordinal expects an integer:
>>> datetime.fromordinal(731965.04835648148)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float
While I could just round them down for daily data, I actually need to import minutely time series. Does anyone have a solution for this problem? I would like to avoid reformatting my .mat files since there's a lot of them and my colleagues need to work with them as well.
If it helps, someone else asked for the other way round. Sadly, I'm too new to Python to really understand what is happening there.
/edit (2012-11-01): This has been fixed in the tutorial posted above.

You link to the solution, it has a small issue. It is this:
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
a longer explanation can be found here

Using pandas, you can convert a whole array of datenum values with fractional parts:
import numpy as np
import pandas as pd
datenums = np.array([737125, 737124.8, 737124.6, 737124.4, 737124.2, 737124])
timestamps = pd.to_datetime(datenums-719529, unit='D')
The value 719529 is the datenum value of the Unix epoch start (1970-01-01), which is the default origin for pd.to_datetime().
I used the following Matlab code to set this up:
datenum('1970-01-01') % gives 719529
datenums = datenum('06-Mar-2018') - linspace(0,1,6) % test data
datestr(datenums) % human readable format

Just in case it's useful to others, here is a full example of loading time series data from a Matlab mat file, converting a vector of Matlab datenums to a list of datetime objects using carlosdc's answer (defined as a function), and then plotting as time series with Pandas:
from scipy.io import loadmat
import pandas as pd
import datetime as dt
import urllib
# In Matlab, I created this sample 20-day time series:
# t = datenum(2013,8,15,17,11,31) + [0:0.1:20];
# x = sin(t)
# y = cos(t)
# plot(t,x)
# datetick
# save sine.mat
urllib.urlretrieve('http://geoport.whoi.edu/data/sine.mat','sine.mat');
# If you don't use squeeze_me = True, then Pandas doesn't like
# the arrays in the dictionary, because they look like an arrays
# of 1-element arrays. squeeze_me=True fixes that.
mat_dict = loadmat('sine.mat',squeeze_me=True)
# make a new dictionary with just dependent variables we want
# (we handle the time variable separately, below)
my_dict = { k: mat_dict[k] for k in ['x','y']}
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
# convert Matlab variable "t" into list of python datetime objects
my_dict['date_time'] = [matlab2datetime(tval) for tval in mat_dict['t']]
# print df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 201 entries, 2013-08-15 17:11:30.999997 to 2013-09-04 17:11:30.999997
Data columns (total 2 columns):
x 201 non-null values
y 201 non-null values
dtypes: float64(2)
# plot with Pandas
df = pd.DataFrame(my_dict)
df = df.set_index('date_time')
df.plot()

Here's a way to convert these using numpy.datetime64, rather than datetime.
origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
date = serdate * np.timedelta64(1, 'D') + origin
This works for serdate either a single integer or an integer array.

Just building on and adding to previous comments. The key is in the day counting as carried out by the method toordinal and constructor fromordinal in the class datetime and related subclasses. For example, from the Python Library Reference for 2.7, one reads that fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. ValueError is raised unless 1 <= ordinal <= date.max.toordinal().
However, year 0 AD is still one (leap) year to count in, so there are still 366 days that need to be taken into account. (Leap year it was, like 2016 that is exactly 504 four-year cycles ago.)
These are two functions that I have been using for similar purposes:
import datetime
def datetime_pytom(d,t):
'''
Input
d Date as an instance of type datetime.date
t Time as an instance of type datetime.time
Output
The fractional day count since 0-Jan-0000 (proleptic ISO calendar)
This is the 'datenum' datatype in matlab
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence an increase of 366 days, for year 0 AD was a leap year
'''
dd = d.toordinal() + 366
tt = datetime.timedelta(hours=t.hour,minutes=t.minute,
seconds=t.second)
tt = datetime.timedelta.total_seconds(tt) / 86400
return dd + tt
def datetime_mtopy(datenum):
'''
Input
The fractional day count according to datenum datatype in matlab
Output
The date and time as a instance of type datetime in python
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence a reduction of 366 days, for year 0 AD was a leap year
'''
ii = datetime.datetime.fromordinal(int(datenum) - 366)
ff = datetime.timedelta(days=datenum%1)
return ii + ff
Hope this helps and happy to be corrected.

Manipulating timestamp so that the new timestamp is still valid

I have a pandas dataframe in which each cell of a column contains a timestamp, saved as a string:
>>>dataset['DateTime'][1]
'2018-03-14 00:34:46'
I would like to create a new column in which those dates are manipulated in the following way:
year += 1,
month += 2,
day += 3,
hour += 4,
minute += 5,
second += 6
(Important to this manipulation is that the initial date and the new date have a one-to-one relation, so that I can transform the date back later onwards)
In my case, the output I am looking for is as follows:
>>> dataset['newTimestamp'][1]
'2019-05-17 04:39:52'
To do so I am using the datetime library to create datetime objects (as a test, I have started with one variable first):
timestamp = dataset['DateTime'][1]
p = datetime.datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
Currently I am doing arithmetics on the single variables:
year = p.year + 1
if p.month < 12:
month = p.month + 1
else:
month = 1
year += 1
However, as with the months, there are exceptions when you can and when you cannnot add values so that the new timestamp is still a real timestamp (12 + 1 = 13, which is not an actual month).
I could program every rule explicitly, but that seems too much work and I expect there are better ways. How could I do this faster?

Use DateOffset.
Also, have a look at relativedelta module for this kind of manipulations:
dataset['newTimestamp'] = pd.to_datetime(dataset['DateTime']) + pd.DateOffset(years=1, months=2, days=3, hours=4, minutes=5, seconds=6)

You should try out beautiful-date library:
pip install beautiful-date
And use it like so:
from beautiful_date import *
...
dataset['DateTime'].apply(lambda d: d + 1 * years + 2 * months + ... + 6 * seconds)
should do the trick.

strptime() and strftime() are the functions you are looking for.
Just go ahead and google the two fuctions. surely enough,you will be abe to solve the stated problem.
these can be used to directly manipulate date-time quantities

Changing list answers in python

I've been trying to input into a mysql table using python, thing is I'm trying to create a list with all dates from April 2016 to now so I can insert them individually into the sql insert, I searched but I didn't find how can I change value per list result (if it's 1 digit or 2 digits):
dates = ['2016-04-'+str(i+1) for i in range(9,30)]
I would like i to add a 0 every time i is a single digit (i.e 1,2,3 etc.)
and when its double digit for it to stay that way (i.e 10, 11, 12 etc.)

dates = ['2016-04-'+ '{:02d}'.format(i) for i in range(9,30)]
>>> print dates
['2016-04-09', '2016-04-10', '2016-04-11', '2016-04-12', '2016-04-13', '2016-04-14', '2016-04-15', '2016-04-16', '2016-0
4-17', '2016-04-18', '2016-04-19', '2016-04-20', '2016-04-21', '2016-04-22', '2016-04-23', '2016-04-24', '2016-04-25', '
2016-04-26', '2016-04-27', '2016-04-28', '2016-04-29']
>>>

Using C style formatting, all the dates in April:
dates = ['2016-04-%02d'%i for i in range(1,31)]
Need range(1,31) since the last value in the range is not used, or use range(30) and add 1 to i.
The same using .format():
dates = ['2016-04-{:02}'.format(i) for i in range(1,31)]

You can use dateutil module
from datetime import datetime
from dateutil.rrule import rrule, DAILY
start_date = datetime(2016,04,01)
w=[each.strftime('%Y-%m-%d') for each in list(rrule(freq=DAILY, dtstart=start_date, until=datetime(2016,05,9)))]

Python Find # of Months Between 2 Dates

I am trying to find the # of months between 2 dates. Some solutions are off by 1 month and others are off by several months. I found This solution on SO but the solutions are either too complicated or incorrect.
For example, given the starting date of 04/30/12 and ending date of 03/31/16,
def diff_month(d1, d2):
return (d1.year - d2.year)*12 + d1.month - d2.month
returns 47 months, not 48
and
dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
returns 44 (Reason being that February does not have a day # 30 so it does not see it as a valid date)
I can of course fix that by doing
dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt.replace(day=2), until=end_dt.replace(day=1))]
But this does not seem like a proper solution (I mean the answer is right but the method sucks).
Is there a proper way of calculating the # of months so that given my example dates, it would return 48?

I realize this post doesn't have a Pandas tag, but if you are willing to use it you can simply do the following which takes the difference between two monthly periods:
import pandas as pd
>>> pd.Period('2016-3-31', 'M') - pd.Period('2012-4-30', 'M')
47

Python: get all months in range?

I want to get all months between now and August 2010, as a list formatted like this:
['2010-08-01', '2010-09-01', .... , '2016-02-01']
Right now this is what I have:
months = []
for y in range(2010, 2016):
for m in range(1, 13):
if (y == 2010) and m < 8:
continue
if (y == 2016) and m > 2:
continue
month = '%s-%s-01' % (y, ('0%s' % (m)) if m < 10 else m)
months.append(month)
What would be a better way to do this?

dateutil.relativedelta is handy here.
I've left the formatting out as an exercise.
from dateutil.relativedelta import relativedelta
import datetime
result = []
today = datetime.date.today()
current = datetime.date(2010, 8, 1)
while current <= today:
result.append(current)
current += relativedelta(months=1)

I had a look at the dateutil documentation. Turns out it provides an even more convenient way than using dateutil.relativedelta: recurrence rules (examples)
For the task at hand, it's as easy as
from dateutil.rrule import *
from datetime import date
months = map(
date.isoformat,
rrule(MONTHLY, dtstart=date(2010, 8, 1), until=date.today())
)
The fine print
Note that we're cheating a little bit, here. The elements dateutil.rrule.rrule produces are of type datetime.datetime, even if we pass dtstart and until of type datetime.date, as we do above. I let map feed them to date's isoformat function, which just turns out to convert them to strings as if it were just dates without any time-of-day information.
Therefore, the seemingly equivalent list comprehension
[day.isoformat()
for day in rrule(MONTHLY, dtstart=date(2010, 8, 1), until=date.today())]
would return a list like
['2010-08-01T00:00:00',
'2010-09-01T00:00:00',
'2010-10-01T00:00:00',
'2010-11-01T00:00:00',
⋮
'2015-12-01T00:00:00',
'2016-01-01T00:00:00',
'2016-02-01T00:00:00']
Thus, if we want to use a list comprehension instead of map, we have to do something like
[dt.date().isoformat()
for dt in rrule(MONTHLY, dtstart=date(2010, 8, 1), until=date.today())]

use datetime and timedelta standard Python's modules - without installing any new libraries
from datetime import datetime, timedelta
now = datetime(datetime.now().year, datetime.now().month, 1)
ctr = datetime(2010, 8, 1)
list = [ctr.strftime('%Y-%m-%d')]
while ctr <= now:
ctr += timedelta(days=32)
list.append( datetime(ctr.year, ctr.month, 1).strftime('%Y-%m-%d') )
I'm adding 32 days to enter new month every time (longest months has 31 days)

It's seems like there's a very simple and clean way to do this by generating a list of dates and subsetting to take only the first day of each month, as shown in the example below.
import datetime
import pandas as pd
start_date = datetime.date(2010,8,1)
end_date = datetime.date(2016,2,1)
date_range = pd.date_range(start_date, end_date)
date_range = date_range[date_range.day==1]
print(date_range)

I got another way using datetime, timedelta and calender:
from calendar import monthrange
from datetime import datetime, timedelta
def monthdelta(d1, d2):
delta = 0
while True:
mdays = monthrange(d1.year, d1.month)[1]
d1 += timedelta(days=mdays)
if d1 <= d2:
delta += 1
else:
break
return delta
start_date = datetime(2016, 1, 1)
end_date = datetime(2016, 12, 1)
num_months = [i-12 if i>12 else i for i in range(start_date.month, monthdelta(start_date, end_date)+start_date.month+1)]
monthly_daterange = [datetime(start_date.year,i, start_date.day, start_date.hour) for i in num_months]

You could reduce the number of if statements to two lines instead of four lines because having a second if statement that does the same thing with the previous if statement is a bit redundant.
if (y == 2010 and m < 8) or (y == 2016 and m > 2):
continue

I don't know whether it's better, but an approach like the following might be considered more 'pythonic':
months = [
'{}-{:0>2}-01'.format(year, month)
for year in xrange(2010, 2016 + 1)
for month in xrange(1, 12 + 1)
if not (year <= 2010 and month < 8 or year >= 2016 and month > 2)
]
The main differences here are:
As we want the iteration(s) to produce a list, use a list comprehension instead of aggregating list elements in a for loop.
Instead of explicitly making a distinction between numbers below 10 and numbers 10 and above, use the capabilities of the format specification mini-language for the .format() method of str to specify
a field width (the 2 in the {:0>2} place holder)
right-alignment within the field (the > in the {:0>2} place holder)
zero-padding (the 0 in the {:0>2} place holder)
xrange instead of range returns a generator instead of a list, so that the iteration values can be produced as they're being consumed and don't have to be held in memory. (Doesn't matter for ranges this small, but it's a good idea to get used to this in Python 2.) Note: In Python 3, there is no xrange and the range function already returns a generator instead of a list.
Make the + 1 for the upper bounds explicit. This makes it easier for human readers of the code to recognize that we want to specify an inclusive bound to a method (range or xrange) that treats the upper bound as exclusive. Otherwise, they might wonder what's the deal with the number 13.

A different approach that doesn't require any additional libraries, nor nested or while loops. Simply convert your dates into an absolute number of months from some reference point (it can be any date really, but for simplicity we can use 1st January 0001). For example
a=datetime.date(2010,2,5)
abs_months = a.year * 12 + a.month
Once you have a number representing the month you are in you can simply use range to loop over the months, and then convert back:
Solution to the generalized problem:
import datetime
def range_of_months(start_date, end_date):
months = []
for i in range(start_date.year * 12 + start_date.month, end_date.year*12+end_date.month + 1)
months.append(datetime.date((i-13) // 12 + 1, (i-1) % 12 + 1, 1))
return months
Additional Notes/explanation:
Here // divides rounding down to the nearest whole number, and % 12 gives the remainder when divided by 12, e.g. 13 % 12 is 1.
(Note also that in the above date.year *12 + date.month does not give the number of months since the 1st of January 0001. For example if date = datetime.datetime(1,1,1), then date.year * 12 + date.month gives 13. If I wanted to do the actual number of months I would need to subtract 1 from the year and month, but that would just make the calculations more complicated. All that matters is that we have a consistent way to convert to and from some integer representation of what month we are in.)

fresh pythonic one-liner from me
from dateutil.relativedelta import relativedelta
import datetime
[(start_date + relativedelta(months=+m)).isoformat()
for m in range(0, relativedelta(start_date, end_date).months+1)]

In case you don't have any months duplicates and they are in correct order you can get what you want with this.
from datetime import date, timedelta
first = date.today()
last = first + timedelta(weeks=20)
date_format = "%Y-%m"
results = []
while last >= first:
results.append(last.strftime(date_format))
last -= timedelta(days=last.day)

Similar to #Mattaf, but simpler...
pandas.date_range() has an option frequency freq='m'...
Here I am adding a day (pd.Timedelta('1d')) in order to reach the beginning of each new month:
import pandas as pd
date_range = pd.date_range('2010-07-01','2016-02-01',freq='M')+pd.Timedelta('1d')
print(list(date_range))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.