Compare items in list with each other

Compare items in list with each other - python

I did try to figure this out, but I am always getting index out of bounds or skipping some cases.
I have list of times:
list = ["8:00:33","8:05:02","8:06:12","8:58:17","8:58:58","9:53:11","11:03:54","11:45:51","13:54:42"]
I want to split this list into smaller chunks (lists) with 15 minutes diffrence from first.
Expected output:
list=[["8:00:33","8:05:02","8:06:12"],["8:58:17","8:58:58"],["9:53:11","11:03:54"],["11:45:51"]...)
I hope you get what I want, ask any question and sorry for bad english.
Thank you for your time and help :)
I got this far:
start=list[0]
firstchunk.append(list[0])
for i in range(len(list)-1):
if(int(time_diff(start,list[i+1])<900)): // time_diff is function that checks if times are 15 min apart
firstchunk.append(list[i+1])
print("Start: ",start," End: ",list[i+1])
else:
start=list[i+1]
print("Finished chunk")
result.append(firstchunk)
firstchunk=[start]
if(int(time_diff(start,list[i+1])>5400)): //ignore this part
print("Start: ",start)
Can you help with better solutions
Edit:
Thanks for the comments and solutions. Special thanks to Alain T.
Fastest of cars and most of money for you my brother. Thank you once again to all good people of stack overflow. I wish you a good and long life <3

You could use groupby from itertools to form the groups for you. You simply need to provide it with a function that will return the same value for times that are within 15 minutes from the last starting point. This can be computed using the accumulate function (also from itertools):
from itertools import accumulate,groupby
times = ["8:00:33","8:05:02","8:06:12","8:58:17","8:58:58","9:53:11",
"10:03:54","11:45:51","13:54:42"]
groups = accumulate(times,lambda g,t:g if time_diff(g,t)<900 else t)
grouped = [[*g] for _,g in groupby(times,key=lambda _:next(groups))]
print(grouped)
# [['8:00:33', '8:05:02', '8:06:12'], ['8:58:17', '8:58:58'],
# ['9:53:11', '10:03:54'], ['11:45:51'], ['13:54:42']]
If you don't want to use libraries, a simple loop on times can produce the same result by adding times to the last group or adding a new group based on the difference with the first entry of the last group:
grouped = []
for t in times:
if grouped and time_diff(grouped[-1][0],t)<900:
grouped[-1].append(t) # add to last group
else:
grouped.append([t]) # add a new group
[EDIT] Injecting empty groups for each chunk of 1:30 between groups:
Use zip() to get consecutive pairs of groups and compare the end of the first group with the start of the next one. For each chunk of 1.5 hour (5400 seconds), insert an empty group using a nested comprehension:
grouped[1:] = [ g for g1,g2 in zip(grouped,grouped[1:])
for g in time_diff(g1[-1],g2[0])//5400*[[]]+[g2] ]
print(grouped)
# [['8:00:33', '8:05:02', '8:06:12'], ['8:58:17', '8:58:58'],
# ['9:53:11', '10:03:54'], [], ['11:45:51'], [], ['13:54:42']]

Related

Printing a groupby object in pandas

df.groupby("female").apply(display)
This displays all the groups in the dataset but my dataset is very large and my VScode crashed after I ran this since the output was long, So how to display only the first n groups, just to get an idea of how my grouped data looks like.

You can use a generator. For example if you want to call only the first group, you can call next on generator once.
…
group_gen = (group for group in df.groupby("female"))
# get first group: returns tuple
g = next(group_gen)
# show g
display(g[1])
Generators allows you to lazy load what you need.
To load N times you would call next N times
gs = pd.concat((next(group_gen)[1] for _ in range(N)), ignore_index=True)
display(gs)

To follow up on the nice answer from #Prayson, you can use itertools.islice to get only the first few groups:
from itertools import islice
n = 5
top_n = list(islice(df.groupby('female'), n))

Identify groups of consecutive values within NumPy array

I have a NumPy array of values, and I am needing to identify groups of consecutive values within the array.
I've tried writing a "for" loop to do this, but I'm running into a host of issues. So I've looked at the documentation for groupby in itertools. I've never used this before, and I'm more than a little confused by the documentation, so here I am.
Could someone give a more "layman speak" explanation of how to use groupby? I don't need a sample code, per se, just a more thorough explanation of the documentation.

a good answer to this is to use a generator to group it (might not be the fastest method)
def groupings(a):
g = []
for val in a:
if not g:
g.append(val)
elif abs(g[-1] - val) <= 1.00001:
g.append(val)
else:
yield g
g = []
print list(groupings(my_numpy_array))
I know this doesnt give you a laymans explanation of group by (group consecutive items that match some criteria... it would be somewhat painful for this type of application)

Removing an item from a defalutdict(list) based on its timestamp for a rolling queue

I'm tracking a large sum of objects in a csv file and am trying to figure out how many events have happened in a rolling 5 minute interval (for each of a set of players). I'm using defaultdict to store the event times and then counting the number of events stored to give me my rolling total. Each time it calls a line from the csv it's supposed to check the timestamps in defaultdict for that player and if any of the times are more than 5 minutes old (300 seconds) it removes them from the defaultdict. It seems to be kind of working, but it never goes all the way down to 0 (when it's been more than 5 minutes between any events for a player). Hoping someone can tell me what I'm doing wrong here:
fishrollingmeanqueue = defaultdict(list)
def fishInLastNSeconds(num_seconds,ts,player): #I set num_seconds to 300 elsewhere, ts = timestamp for the event
curTime = timestampToEpoch(ts)
fishrollingmeanqueue[player].append(curTime)
for elt in fishrollingmeanqueue[player]:
if elt < (curTime - num_seconds):
fishrollingmeanqueue[player].remove(elt)
return str(len(fishrollingmeanqueue[player]))

The issue you're encountering is due to the fact that you're modifying the list at the same time you are iterating over it. This doesn't work right, as list iterators go by index, and the index of later items will change when an earlier one is removed.
As an example, consider a three-element list, lst=[a,b,c]. When you iterate over it with for elt in lst, Python will create an iterator, which is initially at index 0. On the first pass through the loop, elt will be a reference to a, the object at that index. If, within the loop, you remove a with lst.remove(elt), the list will now be [b,c]. On the next pass, the iterator will be pointing at index 1, and you'll get c as elt. The second element of the original list, b, will have been skipped.
There are a few ways to fix this. Often the best approach is to create a new list with only the items you want to keep, and then replace the old list with the new one:
new_q = [elt for elt in fishrollingmeanqueue[player] if elt >= curTime - num_seconds]
fishrollingmeanqueue[player] = newq
Other options are to iterate on a copy of the list, or to iterate in reverse so that the indexes of the values yet to be seen won't change.

Split list into chunks by condition

I have a list like:
["asdf-1-bhd","uuu-2-ggg","asdf-2-bhd","uuu-1-ggg","asdf-3-bhd"]
that I want to split into the two groups who's elements are equal after I remove the number:
"asdf-1-bhd", "asdf-2-bhd", "asdf-3-bhd"
"uuu-2-ggg" , uuu-1-ggg"
I have been using itertools.groupby with
for key, group in itertools.groupby(elements, key= lambda x : removeIndexNumber(x)):
but this does not work when the elements to be grouped are not consecutive.
I have thought about using list comprehensions, but this seems impossible since the number of groups is not fixed.
tl;dr:
I want to group stuff, two problems:
I don't know the number of chunks I will obtain
I the elements that will be grouped into a chunk might not be consecutive

Why don't you think about it a bit differently. You can map everyting into a dict:
import re
from collections import defaultdict
regex = re.compile('([a-z]+\-)\d(\-[a-z]+)')
t = ["asdf-1-bhd","uuu-2-ggg","asdf-2-bhd","uuu-1-ggg","asdf-3-bhd"]
maps = defaultdict(list)
for x in t:
parts = regex.match(x).groups()
maps[parts[0]+parts[1]].append(x)
Output:
[['asdf-1-bhd', 'asdf-2-bhd', 'asdf-3-bhd'], ['uuu-2-ggg', 'uuu-1-ggg']]
This is really fast because you don't have to compare one thing to another.
Edit:
On Thinking differently
Your original approach was to iterate through each item and compare them to one another. This is overcomplicated and unnecessary.
Let's consider what my code does. First it gets the stripped down version:
"asdf-1-bhd" -> "asdf--bhd"
"uuu-2-ggg" -> "uuu--ggg"
"asdf-2-bhd" -> "asdf--bhd"
"uuu-1-ggg" -> "uuu--ggg"
"asdf-3-bhd" -> "asdf--bhd"
You can already start to see the groups, and we haven't compared anything yet!
We now do a sort of reverse mapping. We take everything thing on the right and make it a key, and anything on the left and put it in a list that is mapped by its value on the left:
'asdf--bhd' -> ['asdf-1-bhd', 'asdf-2-bhd', 'asdf-3-bhd']
'uuu--ggg' -> ['uuu-2-ggg', 'uuu-1-ggg']
And there we have our groups defined by their common computed value (key). This will work for any amount of elements and groups.

Ok, simple solution (it must be too late over here):
Use itertools.groupby , but first sort the list.
As for the example given above:
elements = ["asdf-1-bhd","uuu-2-ggg","asdf-2-bhd","uuu-1-ggg","asdf-3-bhd"]
elemens.sort(key = lambda x : removeIndex(x))
for key, group in itertools.groupby(elements, key= lambda x : removeIndexNumber(x)):
for element in group:
# do stuff
As you can see, the condition for sorting is the same as for grouping. That way, the elements that will eventually have to be grouped are first put into consecutive order. After this has been done, itertools.groupy can work properly.

Random list with rules

I'm trying to create a list of tasks that I've read from some text files and put them into lists. I want to create a master list of what I'm going to do through the day however I've got a few rules for this.
One list has separate daily tasks that don't depend on the order they are completed. I call this list 'daily'. I've got another list of tasks for my projects, but these do depend on the order completed. This list is called 'projects'. I have a third list of things that must be done at the end of the day. I call it 'endofday'.
So here are the basic rules.
A list of randomized tasks where daily tasks can be performed in any order, where project tasks may be randomly inserted into the main list at any position but must stay in their original order relative to each other, and end of day tasks appended to the main list.
I understand how to get a random number from random.randint(), appending to lists, reading files and all that......but the logic is giving me a case of 'hurty brain'. Anyone want to take a crack at this?
EDIT:
Ok I solved it on my own, but at least asking the question got me to picture it in my head. Here's what I did.
random.shuffle(daily)
while projects:
daily.insert(random.randint(0,len(daily)), projects.pop(0))
random.shuffle(endofday)
daily.extend(endofday)
for x in daily: print x
Thanks for the answers, I'll give ya guys some kudos anyways!
EDIT AGAIN:
Crap I just realized that's not the right answer lol
LAST EDIT I SWEAR:
position = []
random.shuffle(daily)
for x in range(len(projects)):
position.append(random.randint(0,len(daily)+x))
position.sort()
while projects:
daily.insert(position.pop(0), projects.pop(0))
random.shuffle(endofday)
daily.extend(endofday)
for x in daily: print x
I LIED:
I just thought about what happens when position has duplicate values and lo and behold my first test returned 1,3,2,4 for my projects. I'm going to suck it up and use the answerer's solution lol
OR NOT:
position = []
random.shuffle(daily)
for x in range(len(projects)):
while 1:
pos = random.randint(0,len(daily)+x)
if pos not in position: break
position.append(pos)
position.sort()
while projects:
daily.insert(position.pop(0), projects.pop(0))
random.shuffle(endofday)
daily.extend(endofday)
for x in daily: print x

First, copy and shuffle daily to initialize master:
master = list(daily)
random.shuffle(master)
then (the interesting part!-) the alteration of master (to insert projects randomly but without order changes), and finally random.shuffle(endofday); master.extend(endofday).
As I said the alteration part is the interesting one -- what about:
def random_mix(seq_a, seq_b):
iters = [iter(seq_a), iter(seq_b)]
while True:
it = random.choice(iters)
try: yield it.next()
except StopIteration:
iters.remove(it)
it = iters[0]
for x in it: yield x
Now, the mixing step becomes just master = list(random_mix(master, projects))
Performance is not ideal (lots of random numbers generated here, we could do with fewer, for example), but fine if we're talking about a few dozens or hundreds of items for example.
This insertion randomness is not ideal -- for that, the choice between the two sequences should not be equiprobable, but rather with probability proportional to their lengths. If that's important to you, let me know with a comment and I'll edit to fix the issue, but I wanted first to offer a simpler and more understandable version!-)
Edit: thanks for the accept, let me complete the answer anyway with a different way of "random mixing preserving order" which does use the right probabilities -- it's only slightly more complicated because it cannot just call random.choice;-).
def random_mix_rp(seq_a, seq_b):
iters = [iter(seq_a), iter(seq_b)]
lens = [len(seq_a), len(seq_b)]
while True:
r = random.randrange(sum(lens))
itindex = r < lens[0]
it = iters[itindex]
lens[itindex] -= 1
try: yield it.next()
except StopIteration:
iters.remove(it)
it = iters[0]
for x in it: yield x
Of course other optimization opportunities arise here -- since we're tracking the lengths anyway, we could rely on a length having gone down to zero rather than on try/except to detect that one sequence is finished and we should just exhaust the other one, etc etc. But, I wanted to show the version closest to my original one. Here's one exploiting this idea to optimize and simplify:
def random_mix_rp1(seq_a, seq_b):
iters = [iter(seq_a), iter(seq_b)]
lens = [len(seq_a), len(seq_b)]
while all(lens):
r = random.randrange(sum(lens))
itindex = r < lens[0]
it = iters[itindex]
lens[itindex] -= 1
yield it.next()
for it in iters:
for x in it: yield x

Use random.shuffle to shuffle a list
random.shuffle(["x", "y", "z"])

How to fetch a random element in a list using python:
>>> import random
>>> li = ["a", "b", "c"]
>>> len = (len(li))-1
>>> ran = random.randint(0, len)
>>> ran = li[ran]
>>> ran
'b'
But it seems you're more curious about how to design this. If so, the python tag should probably not be there. If not, the question is probably to broad to get you any good answers code-wise.

Combine all 3 lists into a DAG
Perform all possible topological sorts, store each sort in a list.
Choose one from the list at random

In order for the elements of the "project" list to stay in order, you could do the following:
Say you have 4 project tasks: "a,b,c,d". Then you know there are five spots where other, randomly chosen elements can be inserted (before and after each element, including the beginning and the end), while the ordering naturally stays the same.
Next, you can add five times a special element (e.g. "-:-") to the daily list. When you now shuffle the daily list, these special items, corresponding to "a,b,c,d" from above, are randomly placed. Now you simply have to insert the elements of the "projects" list sequentially for each special element "-:-". And you keep the ordering, yet have a completely random list regarding the tasks from the daily list.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Compare items in list with each other - python

Related

Printing a groupby object in pandas

Identify groups of consecutive values within NumPy array

Removing an item from a defalutdict(list) based on its timestamp for a rolling queue

Split list into chunks by condition

Random list with rules

Categories

Resources