Python intersection of multiple datetime lists - python

I'm trying to find the intersection list of 5 lists of datetime objects. I know the intersection of lists question has come up a lot on here, but my code is not performing as expected (like the ones from the other questions).
Here are the first 3 elements of the 5 lists with the exact length of the list at the end.
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38790
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38818
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38959
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38802
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 40415
I've made a list of these lists called times. I've tried 2 methods of intersecting.
Method 1:
intersection = times[0] # make intersection the first list
for i in range(len(times)):
if i == 0:
continue
intersection = [val for val in intersection if val in times[i]]
This method results in a list with length 20189 and takes 104 seconds to run.
Method 2:
intersection = times[0] # make intersection the first list
for i in range(len(times)):
if i == 0:
continue
intersection = list(set(intersection) & set(times[i]))
This method results in a list with length 20148 and takes 0.1 seconds to run.
I've run into 2 problems with this. The first problem is that the two methods yield different size intersections and I have no clue why. And the other problem is that the datetime object datetime.datetime(2014, 8, 14, 19, 25, 6) is clearly in all 5 lists (see above) but when I print (datetime.datetime(2014, 8, 14, 19, 25, 6) in intersection) it returns False.

Your first list times[0] has duplicate elements; this is the reason for inconsistency. If you would do intersection = list(set(times[0])) in your first snippet, the problem would go away.
As for your second code, the code will be faster if you never do changes between lists and sets:
intersection = set(times[0]) # make a set of the first list
for timeset in times[1:]:
intersection.intersection_update(timeset)
# if necessary make into a list again
intersection = list(intersection)
And actually since intersection supports multiple iterables as separate arguments. you can simply replace all your code with:
intersection = set(times[0]).intersection(*times[1:])
For the in intersection problem, is the instance an actual datetime.datetime or just pretending to be? At least the timestamps seem not to be timezone aware.

Lists can have duplicate items, which can cause inconsistencies with the length. To avoid these duplicates, you can turn each list of datetimes into a set:
map(set, times)
This will give you a list of sets (with duplicate times removed). To find the intersection, you can use set.intersection:
intersection = set.intersection(*map(set, times))
With your example, intersection will be this set:
set([datetime.datetime(2014, 8, 14, 19, 25, 9), datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7)])

There might be duplicated times and you can do it simply like this:
Python3:
import functools
result = functools.reduce(lambda x, y: set(x) & set(y), times)
Python2:
result = reduce(lambda x, y: set(x) & set(y), times)

intersection = set(*times[:1]).intersection(*times[1:])

Related

Python zip inner list of a random list

My code:
import random
randomlist = []
result_list=[]
l=int(input('Enter List Length'))
for i in range(0,l):
n = random.randint(1,30)
randomlist.append(n)
print(randomlist)
n=int(input('composite range:'))
composite_list = [randomlist[x:x + n] for x in range(0, len(randomlist), n)]
print(composite_list)
# zip inner list
for i in composite_list:
#stucked here
I wish to zip all list elements to a new list for example:
Random List: [25, 6, 15, 7, 21, 30, 10, 14, 3]
composite_list:[[25, 6, 15], [7, 21, 30], [10, 14, 3]]
Output list after zip: [[25, 7, 10],[6, 21, 14],[15, 30, 3]]
Because number of elements in composite_list is randomly. I have no idea how to use zip()
You can do the following:
rand_lst = [25, 6, 15, 7, 21, 30, 10, 14, 3]
it = iter(rand_lst)
comp_lst = list(zip(it, it, it))
# [(25, 6, 15), (7, 21, 30), (10, 14, 3)]
trans_lst = list(zip(*comp_lst))
# [(25, 7, 10), (6, 21, 14), (15, 30, 3)]
This uses the old "zip iterator with itself" pattern to create the chunks. Then you can zip the chunks by unpacking the list using the * operator. This also works in a single step:
it = iter(rand_lst)
trans_lst = list(zip(*zip(it, it, it)))
Use:
list(zip(*composite_list))
# output is a list of tuples
Or:
list(map(list, zip(*composite_list)))
# output is a list of lists
to look exactly your desired output.
Using numpy:
import numpy as np
np.array(composite_list).T.tolist()
Outputs:
[[25, 7, 10], [6, 21, 14], [15, 30, 3]]
Caveat would be probably even better, if you would keep your whole flow in numpy, otherwise converting to numpy might be a bit of a overhead.

How to randomly select a specific sequence from a list?

I have a list of hours starting from (0 is midnight).
hour = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
I want to generate a sequence of 3 consecutive hours randomly. Example:
[3,6]
or
[15, 18]
or
[23,2]
and so on. random.sample does not achieve what I want!
import random
hourSequence = sorted(random.sample(range(1,24), 2))
Any suggestions?
Doesn't exactly sure what you want, but probably
import random
s = random.randint(0, 23)
r = [s, (s+3)%24]
r
Out[14]: [16, 19]
Note: None of the other answers take in to consideration the possible sequence [23,0,1]
Please notice the following using itertools from python lib:
from itertools import islice, cycle
from random import choice
hours = list(range(24)) # List w/ 24h
hours_cycle = cycle(hours) # Transform the list in to a cycle
select_init = islice(hours_cycle, choice(hours), None) # Select a iterator on a random position
# Get the next 3 values for the iterator
select_range = []
for i in range(3):
select_range.append(next(select_init))
print(select_range)
This will print sequences of three values on your hours list in a circular way, which will also include on your results for example the [23,0,1].
You can try this:
import random
hour = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
index = random.randint(0,len(hour)-2)
l = [hour[index],hour[index+3]]
print(l)
You can get a random number from the array you already created hour and take the element that is 3 places afterward:
import random
def random_sequence_endpoints(l, span):
i = random.choice(range(len(l)))
return [hour[i], hour[(i+span) % len(l)]]
hour = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
result = random_sequence_endpoints(hour, 3)
This will work not only for the above hours list example but for any other list contain any other elements.

how to generate many lists and assign values to them

I want to read a specific number of lines from a list and assign all those values to a new list. Then I want to read the next bunch from the last_value+1 line from before for the exact same number of lines and assign those to a new list. So far I have this:
Let's say u = [1,2,3....,9,10,11,12,13...,19,20] and I want to assign the first 10 values from u into my newly generated list1 = [] => list1 = [1,2,..9,10]
then I want the next 10 values from u to be assigned to list2 so list2 = [11,12,13..,20]. The code so far is:
nLines = 10
nrepeats = 2
j=0
i=0
while (j<nrepeats):
### Generating empty lists ###
mklist = "list" + str(j) + " = []"
### do the segmentation ###
for i, uline in enumerate(u):
if i >= i and i < i+nLines:
mklist.append(uline)
j=j+1
Now the problem is, that i cant append to mklist because it's a string:
AttributeError: 'str' object has no attribute 'append'
How can I assign those values within that loop?
You could use a more suitable collection, for example, a dictionary:
nLines = 10
nrepeats = 2
j=0
i=0
my_dict = {}
while (j<nrepeats):
### Generating empty lists ###
my_dict[str(j)] = []
### do the segmentation ###
for i, uline in enumerate(u):
if i >= i and i < i+nLines:
my_dict[str(j)].append(uline)
j=j+1
You can use the zip function to group elements from iterables into groups of the same size. There are actually two ways, that differ in how you way to handle cases where you can't divide the source data cleanly
u = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
The first way is with regular zip and discards the leftover fragment
>>>list(zip(*[iter(u)]*10))
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)]
The second way uses itertools.zip_longest and pads out the last group with some fillvalue (default None)
>>>import itertools
>>>list(itertools.zip_longest(*[iter(u)]*10, fillvalue=None))
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20), (21, None, None, None, None, None, None, None, None, None)]

Subdividing python integer list into groups of linearly spaced items [duplicate]

In this other SO post, a Python user asked how to group continuous numbers such that any sequences could just be represented by its start/end and any stragglers would be displayed as single items. The accepted answer works brilliantly for continuous sequences.
I need to be able to adapt a similar solution but for a sequence of numbers that have potentially (not always) varying increments. Ideally, how I represent that will also include the increment (so they'll know if it was every 3, 4, 5, nth)
Referencing the original question, the user asked for the following input/output
[2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20] # input
[(2,5), (12,17), 20]
What I would like is the following (Note: I wrote a tuple as the output for clarity but xrange would be preferred using its step variable):
[2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20] # input
[(2,5,1), (12,17,1), 20] # note, the last element in the tuple would be the step value
And it could also handle the following input
[2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20] # input
[(2,8,2), (12,17,1), 20] # note, the last element in the tuple would be the increment
I know that xrange() supports a step so it may be possible to even use a variant of the other user's answer. I tried making some edits based on what they wrote in the explanation but I wasn't able to get the result I was looking for.
For anyone that doesn't want to click the original link, the code that was originally posted by Nadia Alramli is:
ranges = []
for key, group in groupby(enumerate(data), lambda (index, item): index - item):
group = map(itemgetter(1), group)
if len(group) > 1:
ranges.append(xrange(group[0], group[-1]))
else:
ranges.append(group[0])
The itertools pairwise recipe is one way to solve the problem. Applied with itertools.groupby, groups of pairs whose mathematical difference are equivalent can be created. The first and last items of each group are then selected for multi-item groups or the last item is selected for singleton groups:
from itertools import groupby, tee, izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
def grouper(lst):
result = []
for k, g in groupby(pairwise(lst), key=lambda x: x[1] - x[0]):
g = list(g)
if len(g) > 1:
try:
if g[0][0] == result[-1]:
del result[-1]
elif g[0][0] == result[-1][1]:
g = g[1:] # patch for duplicate start and/or end
except (IndexError, TypeError):
pass
result.append((g[0][0], g[-1][-1], k))
else:
result.append(g[0][-1]) if result else result.append(g[0])
return result
Trial: input -> grouper(lst) -> output
Input: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
Output: [(2, 5, 1), (12, 17, 1), 20]
Input: [2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20]
Output: [(2, 8, 2), (12, 17, 1), 20]
Input: [2, 4, 6, 8, 12, 12.4, 12.9, 13, 14, 15, 16, 17, 20]
Output: [(2, 8, 2), 12, 12.4, 12.9, (13, 17, 1), 20] # 12 does not appear in the second group
Update: (patch for duplicate start and/or end values)
s1 = [i + 10 for i in xrange(0, 11, 2)]; s2 = [30]; s3 = [i + 40 for i in xrange(45)]
Input: s1+s2+s3
Output: [(10, 20, 2), (30, 40, 10), (41, 84, 1)]
# to make 30 appear as an entry instead of a group change main if condition to len(g) > 2
Input: s1+s2+s3
Output: [(10, 20, 2), 30, (41, 84, 1)]
Input: [2, 4, 6, 8, 10, 12, 13, 14, 15, 16, 17, 20]
Output: [(2, 12, 2), (13, 17, 1), 20]
You can create an iterator to help grouping and try to pull the next element from the following group which will be the end of the previous group:
def ranges(lst):
it = iter(lst)
next(it) # move to second element for comparison
grps = groupby(lst, key=lambda x: (x - next(it, -float("inf"))))
for k, v in grps:
i = next(v)
try:
step = next(v) - i # catches single element v or gives us a step
nxt = list(next(grps)[1])
yield xrange(i, nxt.pop(0), step)
# outliers or another group
if nxt:
yield nxt[0] if len(nxt) == 1 else xrange(nxt[0], next(next(grps)[1]), nxt[1] - nxt[0])
except StopIteration:
yield i # no seq
which give you:
In [2]: l1 = [2, 3, 4, 5, 8, 10, 12, 14, 13, 14, 15, 16, 17, 20, 21]
In [3]: l2 = [2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20]
In [4]: l3 = [13, 14, 15, 16, 17, 18]
In [5]: s1 = [i + 10 for i in xrange(0, 11, 2)]
In [6]: s2 = [30]
In [7]: s3 = [i + 40 for i in xrange(45)]
In [8]: l4 = s1 + s2 + s3
In [9]: l5 = [1, 2, 5, 6, 9, 10]
In [10]: l6 = {1, 2, 3, 5, 6, 9, 10, 13, 19, 21, 22, 23, 24}
In [11]:
In [11]: for l in (l1, l2, l3, l4, l5, l6):
....: print(list(ranges(l)))
....:
[xrange(2, 5), xrange(8, 14, 2), xrange(13, 17), 20, 21]
[xrange(2, 8, 2), xrange(12, 17), 20]
[xrange(13, 18)]
[xrange(10, 20, 2), 30, xrange(40, 84)]
[1, 2, 5, 6, 9, 10]
[xrange(1, 3), 5, 6, 9, 10, 13, 19, xrange(21, 24)]
When the step is 1 it is not included in the xrange output.
Here is a quickly written (and extremely ugly) answer:
def test(inArr):
arr=inArr[:] #copy, unnecessary if we use index in a smart way
result = []
while len(arr)>1: #as long as there can be an arithmetic progression
x=[arr[0],arr[1]] #take first two
arr=arr[2:] #remove from array
step=x[1]-x[0]
while len(arr)>0 and x[1]+step==arr[0]: #check if the next value in array is part of progression too
x[1]+=step #add it
arr=arr[1:]
result.append((x[0],x[1],step)) #append progression to result
if len(arr)==1:
result.append(arr[0])
return result
print test([2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20])
This returns [(2, 8, 2), (12, 17, 1), 20]
Slow, as it copies a list and removes elements from it
It only finds complete progressions, and only in sorted arrays.
In short, it is shitty, but should work ;)
There are other (cooler, more pythonic) ways to do this, for example you could convert your list to a set, keep removing two elements, calculate their arithmetic progression and intersect with the set.
You could also reuse the answer you provided to check for certain step sizes. e.g.:
ranges = []
step_size=2
for key, group in groupby(enumerate(data), lambda (index, item): step_size*index - item):
group = map(itemgetter(1), group)
if len(group) > 1:
ranges.append(xrange(group[0], group[-1]))
else:
ranges.append(group[0])
Which finds every group with step size of 2, but only those.
I came across such a case once. Here it goes.
import more_itertools as mit
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20] # input
x = [list(group) for group in mit.consecutive_groups(iterable)]
output = [(i[0],i[-1]) if len(i)>1 else i[0] for i in x]
print(output)

Python: How to range() multiple values from list or dictionary?

Im new to programming. Trying to range numbers - For example if i want to range more than one range, 1..10 20...30 50...100. Where i need to store them(list or dictionary) and how to use them one by one?
example = range(1,10)
exaple2 = range(20,30)
for b in example:
print b
or you can use yield from (python 3.5)
def ranger():
yield from range(1, 10)
yield from range(20, 30)
yield from range(50, 100)
for x in ranger():
print(x)
The range function returns a list. If you want a list of multiple ranges, you need to concatenate these lists. For example:
range(1, 5) + range(11, 15)
returns [1, 2, 3, 4, 11, 12, 13, 14]
Range module helps you to get numbers between the given input.
Syntax:
range(x) - returns list starting from 0 to x-1
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
range(x,y) - returns list starting from x to y-1
>>> range(10,20)
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>>
range(x,y,stepsize) - returns list starting from x to y-1 with stepsize
>>> range(10,20,2)
[10, 12, 14, 16, 18]
>>>
In Python3.x you can do:
output = [*range(1, 10), *range(20, 30)]
or using itertools.chain function:
from itertools import chain
data = [range(1, 10), range(20, 30)]
output = [*chain(*data)]
or using chain.from_iterable function
from itertools import chain
data = [range(1, 10), range(20, 30)]
output = [*chain.from_iterable(data)]
output:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]

Categories