List comprehension of 2+ variables in R - python

What's the best R equivalent of the Python 2-variable list comprehension
[datetime(y,m,15) for y in xrange(2000,2020) for m in [3,6,9,12]]
The result
[datetime.datetime(2000, 3, 15, 0, 0),
datetime.datetime(2000, 6, 15, 0, 0),
datetime.datetime(2000, 9, 15, 0, 0),
datetime.datetime(2000, 12, 15, 0, 0),
datetime.datetime(2001, 3, 15, 0, 0) ... ]

This will produce equivalent results in R
with(expand.grid(m=c(3,6,9,12), y=2000:2020), ISOdate(y,m,15))
We use expand.grid to get all combinations of year and month, and then we just use the vectorized ISOdate function to get the values.

Another way to express the desired result using the listcompr package:
library(listcompr)
gen.list(ISOdate(y, m, 15), y = 2000:2019, m = c(3, 6, 9, 12))
Note that range(a,b) (or xrange(a, b) in Python2) is equivalent to a:(b-1) (stop element b excluded) in R (and not a:b, which includes the stop element).

Related

List comprehension of an array that I created using np.loadtxt

I need to do list comprehension of an array that I created:
array([['12/12/80', '0.513393'],
['12/15/80', '0.486607'],
['12/16/80', '0.450893'],
...,
['2/20/19', '172.029999'],
['2/21/19', '171.059998'],
['2/22/19', '172.970001']], dtype='<U10')
The output should look like this:
array([[datetime.datetime(1980, 12, 12, 0, 0), 0.513393],
[datetime.datetime(1980, 12, 15, 0, 0), 0.486607],
[datetime.datetime(1980, 12, 16, 0, 0), 0.450893],
[datetime.datetime(1980, 12, 17, 0, 0), 0.462054],
[datetime.datetime(1980, 12, 18, 0, 0), 0.475446]], dtype=object)
I am struggling for the right code to do the list comprehension. Any help would be highly appreciated.
Can you try the following:
[[datetime.datetime.strptime(dt, '%m/%d/%y'), float(val)] for dt, val in arr]

convert yyyymmdd to serial number python

How do I convert a list of dates that are in the form yyyymmdd to a serial number? For example, if I have this list of dates:
t = [1898-10-12 06:00,1898-10-12 12:00,1932-09-30 08:00,1932-09-30 00:00]
How do I convert each date to a serial number? Im currently using the datetime toordinal() command, but each date is being rounded to the same serial number. How do I get the same dates with different times to be different numbers?
The times in the list are the datetime.datetime numbers. I tried then doing:
thurser = []
for i in range(len(t)):
thurser.append(t[i].toordinal())
But am not getting serial numbers as floats.
datetime.toordinal() considers only the 'date' part of the datetime object, not the time. So does date.toordinal() - it only has a date part. The first 2 and last 2 elements in your list have datetimes on the same date but at different times, which .toordinal ignores. So, .toordinal will give you the same value for those same-dated datetimes.
In general, the solution would be to calculate the delta between your dates and a pre-determined/fixed one. I'm using datetime.datetime(1, 1, 1), the earliest possible datetime, so all the deltas are positive:
thurser = []
# assuming t is a list of datetime objects
for d in t:
delta = d - datetime.datetime(1, 1, 1)
thurser.append(delta.days + delta.seconds/(24 * 3600))
>>> print(thurser)
[693149.25, 693149.5, 705555.3333333334, 705555.0]
And if you prefer ints instead of floats, then use seconds instead of days:
thurser.append(int(delta.total_seconds())) # total_seconds has microseconds in the float
>>> print(thurser)
[59888095200, 59888116800, 60959980800, 60959952000]
And to get back the original values in the 2nd example:
>>> [datetime.timedelta(seconds=d) + datetime.datetime(1, 1, 1) for d in thurser]
[datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0),
datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
>>> _ == t # compare with original values
True
Let me know if my understanding is wrong, I tried following and gives distinct numbers for each value of the list.
I modified
t = ['1898-10-12 06:00','1898-10-12 12:00','1932-09-30 08:00','1932-09-30 00:00']
with
t = [datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0), datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
As mentioned in comment it is list of datetime.datetime.
I am considering total MilliSeconds from 1970-01-01 00:00:00 the given date to generate a number.
So dates which are before above date give values in negative. But distinct values.
t = [datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0), datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
thurser = []
x = []
for i in range(len(t)):
thurser.append(t[i].toordinal())
x.append((t[i]-datetime.datetime.utcfromtimestamp(0)).total_seconds() * 1000.0)
print(thurser)
print(x)
output:
[693150, 693150, 705556, 705556]
[-2247501600000.0, -2247480000000.0, -1175616000000.0, -1175644800000.0]

Find third latest date in a list

I have a situation where I need to get the third latest date, i.e
INPUT :
['14-04-2001', '29-12-2061', '21-10-2019',
'07-01-1973', '19-07-2014','11-03-1992','21-10-2019']
Also , INPUT
6
14-04-2001
29-12-2061
21-10-2019
07-01-1973
19-07-2014
11-03-1992
OUTPUT : 19-07-2014
import datetime
datelist = ['14-04-2001', '29-12-2061', '21-10-2019', '07-01-1973', '19-07-2014','11-03-1992','21-10-2019' ]
for d in datelist:
x = datetime.datetime.strptime(d,'%d-%m-%Y')
print x
How can i achieve this?
You can sort the list and take the 3rd element from it.
my_list = [datetime.datetime.strptime(d,'%d-%m-%Y') for d in list]
# [datetime.datetime(2001, 4, 14, 0, 0), datetime.datetime(2061, 12, 29, 0, 0), datetime.datetime(2019, 10, 21, 0, 0), datetime.datetime(1973, 1, 7, 0, 0), datetime.datetime(2014, 7, 19, 0, 0), datetime.datetime(1992, 3, 11, 0, 0), datetime.datetime(2019, 10, 21, 0, 0)]
my_list.sort(reverse=True)
my_list[2]
# datetime.datetime(2019, 10, 21, 0, 0)
Also, as per Kerorin's suggestion, if you don't need to sort in-place and just need the 3rd element always, you can simply do
sorted(my_list, reverse=True)[2]
Update
To remove the duplicates, taking inspiration from this answer, you can do the following -
import datetime
datelist = ['14-04-2001', '29-12-2061', '21-10-2019', '07-01-1973', '19-07-2014', '11-03-1992', '21-10-2019']
seen = set()
my_list = [datetime.datetime.strptime(d,'%d-%m-%Y')
for d in datelist
if d not in seen and not seen.add(d)]
my_list.sort(reverse=True)
You can use heapq.nlargest to do this.
import heapq
from datetime import datetime
datelist = [
'14-04-2001',
'29-12-2061',
'21-10-2019',
'07-01-1973',
'19-07-2014',
'11-03-1992',
'21-10-2019'
]
heapq.nlargest(3, {datetime.strptime(d, "%d-%m-%Y") for d in datelist})[-1]
This return datetime.datetime(2014, 7, 19, 0, 0)

accessing two list elements to get results

I have two lists. One list name 'date' has dates in it which are related to persons birth date.
data = [ datetime.datetime(1958, 3, 15, 0, 0), datetime.datetime(1958, 9, 15, 0, 0), datetime.datetime(1930, 10, 23, 0, 0), datetime.datetime(1928, 9, 15, 0, 0), datetime.datetime(1928, 1, 23, 0, 0), datetime.datetime(1925, 11, 15, 0, 0), datetime.datetime(1962, 7, 20, 0, 0),datetime.datetime(1960, 12, 14, 0, 0), datetime.datetime(1960, 5, 10, 0, 0),datetime.datetime(1963, 9, 7, 0, 0), datetime.datetime(1956, 3, 10, 0, 0), datetime.datetime(1955, 2, 15, 0, 0),datetime.datetime(1958, 11, 14, 0, 0),datetime.datetime(1956, 8, 24, 0, 0),datetime.datetime(1990, 4, 30, 0, 0)]
Now next list contains marriage dates.
marriage = [ datetime.datetime(1985, 5, 14, 0, 0),datetime.datetime(1945, 6, 15, 0, 0), datetime.datetime(1938, 6, 11, 0, 0), datetime.datetime(1995, 4, 5, 0, 0), datetime.datetime(1987, 2, 26, 0, 0), datetime.datetime(1983, 12, 13, 0, 0), datetime.datetime(1980, 9, 16, 0, 0), datetime.datetime(2011, 6, 19, 0, 0)]
each date from the 'marriage' list is related to 2 dates from 'date' list. Now, I want to compare one date from marriage list to two dates from date list so that i can print"birth date is less than marriage.
How can accomplish this task using loop? confused with this one.
Please note that I used import datetime, import re to accomplish date comparison.
for i in range(len(data)):
if data[i] < marriage[i]:
print "birthdate is lt marriage date"
else:
print "birthdate is gt or eq to marriage date"
I'm not sure what you are trying to accomplish here... Also you don't need re for date comparison, you can use normal < > == <= >= operators.
This also sounds like a job for a hash(dictionary)...
marriage = {
'marriage1' : {
'1' : <birthday>,
'2' : <birthday>,
'marriage-date' : <marriage-date>
},
'marriage2' : {
'1' : <birthday>,
'2' : <birthday>,
'marriage-date' : <marriage-date>
}
}
A hash(dictionary) will make comparisons much easier with lists that don't contain the same number of values.
This assumes that the marriage and birth dates are in the same order (i.e., the first two birth dates correspond to the first marriage date and the next 2 birth dates correspond to the second marriage date)
for i in range(len(marriage)):
if marriage[i] > data[i*2] and marriage[i] > data[(i*2)+1]:
print "Both birthdates less than marriage data"
I believe my assumption is correct because there are twice as many entries in the data list as there are in the marriage list.

adding elements in nested lists

I have got a list containing nested lists like this :
[ [datetime.datetime(2000, 12, 10, 0, 0), 0.0011] , [datetime.datetime(2000, 12, 11, 0, 0), 0.0013 , [datetime.datetime(2000, 12, 12, 0, 0), 0.0014]]
etc..
How do I go about adding sub elements 2 by 2 like this :
sum(0.0011,0.0013) + 0.0014
then taking the result of this sum and adding it to the next sub element ?
I`m basically trying to compound the values .
thanks!
The easiest way to do this is with the sum() builtin and a generator expression:
>>>items = [[datetime.datetime(2000, 12, 10, 0, 0), 0.0011], [datetime.datetime(2000, 12, 11, 0, 0), 0.0013 ], [datetime.datetime(2000, 12, 12, 0, 0), 0.0014]]
>>>sum(item[1] for item in items)
0.0038000000000000004
Edit:
If you want to print out the result of each stage of the summation, you want to use functools.reduce() (which, in 2.x is the reduce builtin).
from functools import reduce
import datetime
items = [[datetime.datetime(2000, 12, 10, 0, 0), 0.0011], [datetime.datetime(2000, 12, 11, 0, 0), 0.0013 ], [datetime.datetime(2000, 12, 12, 0, 0), 0.0014]]
def add_printing_result(a, b):
total = a+b
print(total)
return total
reduce(add_printing_result, (item[1] for item in items))
Which gives us:
0.0024000000000000002
0.0038000000000000004
sum = 0, myarr = [ [datetime.datetime(2000, 12, 10, 0, 0), 0.0011] , [datetime.datetime(2000, 12, 11, 0, 0), 0.0013] , [datetime.datetime(2000, 12, 12, 0, 0), 0.0014]]
for(i in myarr):
sum+=i[1]
I'm sure there are better ways to do this (I'm no Python expert) but this should sum your values properly such that sum is the sub elements' sum.

Categories