Grouping images by time interval python - python

I have a script with taking out exif data from images, and putting it into to the list, I sort my list after and that's what i have its a list of lists, on first position its a image time in seconds and 2nd place its a image path, its my list,
[[32372, 'F:\rubish\VOL1\cam\G0013025.JPG'], [32373, 'F:\rubish\VOL1\cam\G0013026.JPG'], [32373, 'F:\rubish\VOL1\cam\G0013027.JPG'],.... etc etc etc
That a script with grouping my images made by #blhsing , with works great, but I want to start my grouping , not from first image , start grouping by given position
That a script:
groups = []
for r in img:
if groups and r[0] - groups[-1][-1][0] <= 5:
groups[-1].append(r)
else:
groups.append([r])
for g in groups:
print(g[0][1], g[0][0], g[-1][0], g[-1][1])
And that what I have and , its does not work well , its taking only one image, , does no create a group , did somebody can help me please to fix it ??
groups = []
print(iii, "iii")
#print(min_list, " my min list ")
img.sort()
cnt = 0
mili = [32372, 34880]
for n in min_list:
#print(n, "mili")
for i in img:
#print(i[0])
if n == i[0]:
if groups and i[0] - groups[-1][-1][0] <= 5:
groups[-1].append(i)
else:
groups.append([i])
for ii in groups:
print(ii[0][1], ii[0][0], ii[-1][0], ii[-1][1])
Over here I have my min_list with 2 position means I want to create only 2 groups , and classifier only images starting from those 2 position , with interval as before 5 seconds.

Since your img list is sorted by time already, you can iterate through the records and append them to the last sub-list of the output list (named groups in my example code) if the time difference to the last entry is no more than 5 seconds; otherwise put the record into a new sub-list of the output list. Keep in mind that in Python a subscript of -1 means the last item in a list.
groups = []
for r in img:
if groups and r[0] - groups[-1][-1][0] <= 5:
groups[-1].append(r)
else:
groups.append([r])
for g in groups:
print(g[0][1], g[0][0], g[-1][0], g[-1][1])

Sure! I just actually wrote this same algorithm the other day, but for JavaScript. Easy to port to Python...
import pprint
def group_seq(data, predicate):
groups = []
current_group = None
for datum in data:
if current_group:
if not predicate(current_group[-1], datum): # Abandon the group
current_group = None
if not current_group: # Need to start a new group
current_group = []
groups.append(current_group)
current_group.append(datum)
return groups
data = [
[32372, r'F:\rubish\VOL1\cam\G0013025.JPG'],
[32373, r'F:\rubish\VOL1\cam\G0013026.JPG'],
[32373, r'F:\rubish\VOL1\cam\G0013027.JPG'],
[32380, r'F:\rubish\VOL1\cam\G0064646.JPG'],
[32381, r'F:\rubish\VOL1\cam\G0064646.JPG'],
]
groups = group_seq(
data=data,
predicate=lambda a, b: abs(a[0] - b[0]) > 5,
)
pprint.pprint(groups)
outputs
[[[32372, 'F:\\rubish\\VOL1\\cam\\G0013025.JPG'],
[32373, 'F:\\rubish\\VOL1\\cam\\G0013026.JPG'],
[32373, 'F:\\rubish\\VOL1\\cam\\G0013027.JPG']],
[[32380, 'F:\\rubish\\VOL1\\cam\\G0064646.JPG'],
[32381, 'F:\\rubish\\VOL1\\cam\\G0064646.JPG']]]
Basically the predicate is a function that should return True if b belongs in the same group as a; for your use case, we look at the (absolute) difference of the first items in the tuples/lists, which is the timestamp.

Related

Remove list in lists that satisfied the condition

I'm trying to make a quick OCR for specific use, I know should've just write a preprocessor for normal OCR and that would been faster but this idea came up to me first and I figure I should just try it anyway haha. This program would take a picture on a region of screen and identify the number within it, as of right now, it's only 0 and 1 but I've been working on it and stuck with some problems. Here is my code
while True:
if keyboard.is_pressed('`'):
Firstlist = list(pyautogui.locateAllOnScreen(image[0], confidence = 0.95,region=( 1570 , 990 , 230 , 70 )))
print(len(Firstlist))
Firstlist1 = list(pyautogui.locateAllOnScreen(image1, confidence = 0.95,region=( 1570 , 990 , 230 , 70 ))) + Firstlist
print(len(Firstlist1))
print(Firstlist)
if len(Firstlist) > 0:
print(Firstlist[0][0])
#compare all first instance of that number and eliminate all duplicated with in a different of 5 x pixel
break
Which would identify some predetermined set of number like this on screen and right now, it would give me a set of coordinate for number zero on screen, here is the result, please ignore other parts, it's just me playing around. Problem with this is pyautogui.locateAllOnScreen would sometimes generate duplicate value of the same picture within the coordinate ranging from approx 0-5 pixels if not set the confidence level right.
Example:
Value supposed to be [ (1655,1024,20,26),(1675,1024,20,26) ] but will yield a third value like [ (1655,1024,20,26), (1658,1024,20,26), (1675,1024,20,26) ].
And that's why I'm trying to make a correction for this. Is there anyway to identified if that x value of second duplicate coordinate is within a range of 0-5 pixels to the first coordinate and just delete it, moving the rest up the ladder so that the number would come up right and in order? Thank you!
Note: I'm still working on learning the list removal process by myself, and read the removing list with lambda to me is like gibberish. Please forgive me if something is wrong. Have a good day y'all!
You can try this.
if len(Firstlist) > 2:
elems = [f[0] for f in Firstlist] # create a list of just first index
i = 0
while i < len(elems) - 1: # iterate through the list with i
j = i + 1
while j < len(elems): # iterate through the rest of the list with j
if abs(elems[i] - elems[j]) <= 5: # if item at index i is within 5 pixels of item at index j
del elems[j] # delete item j and continue
else:
j += 1 # otherwise move to next item
i += 1 # Move to next i item
list1 = [ (1655,1024,20,26), (1658,1024,20,26), (1675,1024,20,26) ]
x = [list1[0]] + [x for x in list1 if abs(list1[0][0] - x[0]) > 5]
print(x)
Output:
[(1655, 1024, 20, 26), (1675, 1024, 20, 26)]

How to reduce a collection of ranges to a minimal set of ranges [duplicate]

This question already has answers here:
Union of multiple ranges
(5 answers)
Closed 7 years ago.
I'm trying to remove overlapping values from a collection of ranges.
The ranges are represented by a string like this:
499-505 100-115 80-119 113-140 500-550
I want the above to be reduced to two ranges: 80-140 499-550. That covers all the values without overlap.
Currently I have the following code.
cr = "100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250".split(" ")
ar = []
br = []
for i in cr:
(left,right) = i.split("-")
ar.append(left);
br.append(right);
inc = 0
for f in br:
i = int(f)
vac = []
jnc = 0
for g in ar:
j = int(g)
if(i >= j):
vac.append(j)
del br[jnc]
jnc += jnc
print vac
inc += inc
I split the array by - and store the range limits in ar and br. I iterate over these limits pairwise and if the i is at least as great as the j, I want to delete the element. But the program doesn't work. I expect it to produce this result: 80-125 500-550 200-250 180-185
For a quick and short solution,
from operator import itemgetter
from itertools import groupby
cr = "499-505 100-115 80-119 113-140 500-550".split(" ")
fullNumbers = []
for i in cr:
a = int(i.split("-")[0])
b = int(i.split("-")[1])
fullNumbers+=range(a,b+1)
# Remove duplicates and sort it
fullNumbers = sorted(list(set(fullNumbers)))
# Taken From http://stackoverflow.com/questions/2154249
def convertToRanges(data):
result = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
group = map(itemgetter(1), g)
result.append(str(group[0])+"-"+str(group[-1]))
return result
print convertToRanges(fullNumbers)
#Output: ['80-140', '499-550']
For the given set in your program, output is ['80-125', '180-185', '200-250', '500-550']
Main Possible drawback of this solution: This may not be scalable!
Let me offer another solution that doesn't take time linearly proportional to the sum of the range sizes. Its running time is linearly proportional to the number of ranges.
def reduce(range_text):
parts = range_text.split()
if parts == []:
return ''
ranges = [ tuple(map(int, part.split('-'))) for part in parts ]
ranges.sort()
new_ranges = []
left, right = ranges[0]
for range in ranges[1:]:
next_left, next_right = range
if right + 1 < next_left: # Is the next range to the right?
new_ranges.append((left, right)) # Close the current range.
left, right = range # Start a new range.
else:
right = max(right, next_right) # Extend the current range.
new_ranges.append((left, right)) # Close the last range.
return ' '.join([ '-'.join(map(str, range)) for range in new_ranges ]
This function works by sorting the ranges, then looking at them in order and merging consecutive ranges that intersect.
Examples:
print(reduce('499-505 100-115 80-119 113-140 500-550'))
# => 80-140 499-550
print(reduce('100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250'))
# => 80-125 180-185 200-250 500-550

Python 3.3.2 - 'Grouping' System with Characters

I have a fun little problem.
I need to count the amount of 'groups' of characters in a file. Say the file is...
..##.#..#
##..####.
.........
###.###..
##...#...
The code will then count the amount of groups of #'s. For example, the above would be 3. It includes diagonals. Here is my code so far:
build = []
height = 0
with open('file.txt') as i:
build.append(i)
height += 1
length = len(build[0])
dirs = {'up':(-1, 0), 'down':(1, 0), 'left':(0, -1), 'right':(0, 1), 'upleft':(-1, -1), 'upright':(-1, 1), 'downleft':(1, -1), 'downright':(1, 1)}
def find_patches(grid, length):
queue = []
queue.append((0, 0))
patches = 0
while queue:
current = queue.pop(0)
line, cell = path[-1]
if ## This is where I am at. I was making a pathfinding system.
Here’s a naive solution I came up with. Originally I just wanted to loop through all the elements once an check for each, if I can put it into an existing group. That didn’t work however as some groups are only combined later (e.g. the first # in the second row would not belong to the big group until the second # in that row is processed). So I started working on a merge algorithm and then figured I could just do that from the beginning.
So how this works now is that I put every # into its own group. Then I keep looking at combinations of two groups and check if they are close enough to each other that they belong to the same group. If that’s the case, I merge them and restart the check. If I completely looked at all possible combinations and could not merge any more, I know that I’m done.
from itertools import combinations, product
def canMerge (g, h):
for i, j in g:
for x, y in h:
if abs(i - x) <= 1 and abs(j - y) <= 1:
return True
return False
def findGroups (field):
# initialize one-element groups
groups = [[(i, j)] for i, j in product(range(len(field)), range(len(field[0]))) if field[i][j] == '#']
# keep joining until no more joins can be executed
merged = True
while merged:
merged = False
for g, h in combinations(groups, 2):
if canMerge(g, h):
g.extend(h)
groups.remove(h)
merged = True
break
return groups
# intialize field
field = '''\
..##.#..#
##..####.
.........
###.###..
##...#...'''.splitlines()
groups = findGroups(field)
print(len(groups)) # 3
I'm not exactly sure what your code is trying to do. Your with statement opens a file, but all you do is append the file object to a list before the with ends and it gets closed (without its contents ever being read). I suspect his is not what you intend, but I'm not sure what you were aiming for.
If I understand your problem correctly, you are trying to count the connected components of a graph. In this case, the graph's vertices are the '#' characters, and the edges are wherever such characters are adjacent to each other in any direction (horizontally, vertically or diagonally).
There are pretty simple algorithms for solving that problem. One is to use a disjoint set data structure (also known as a "union-find" structure, since union and find are the two operations it supports) to connect groups of '#' characters together as they're read in from the file.
Here's a fairly minimal disjoint set I wrote to answer another question a while ago:
class UnionFind:
def __init__(self):
self.rank = {}
self.parent = {}
def find(self, element):
if element not in self.parent: # leader elements are not in `parent` dict
return element
leader = self.find(self.parent[element]) # search recursively
self.parent[element] = leader # compress path by saving leader as parent
return leader
def union(self, leader1, leader2):
rank1 = self.rank.get(leader1,1)
rank2 = self.rank.get(leader2,1)
if rank1 > rank2: # union by rank
self.parent[leader2] = leader1
elif rank2 > rank1:
self.parent[leader1] = leader2
else: # ranks are equal
self.parent[leader2] = leader1 # favor leader1 arbitrarily
self.rank[leader1] = rank1+1 # increment rank
And here's how you can use it for your problem, using x, y tuples for the nodes:
nodes = set()
groups = UnionFind()
with open('file.txt') as f:
for y, line in enumerate(f): # iterate over lines
for x, char in enumerate(line): # and characters within a line
if char == '#':
nodes.add((x, y)) # maintain a set of node coordinates
# check for neighbors that have already been read
neighbors = [(x-1, y-1), # up-left
(x, y-1), # up
(x+1, y-1), # up-right
(x-1, y)] # left
for neighbor in neighbors:
if neighbor in nodes:
my_group = groups.find((x, y))
neighbor_group = groups.find(neighbor)
if my_group != neighbor_group:
groups.union(my_group, neighbor_group)
# finally, count the number of unique groups
number_of_groups = len(set(groups.find(n) for n in nodes))

Python interval interesction

My problem is as follows:
having file with list of intervals:
1 5
2 8
9 12
20 30
And a range of
0 200
I would like to do such an intersection that will report the positions [start end] between my intervals inside the given range.
For example:
8 9
12 20
30 200
Beside any ideas how to bite this, would be also nice to read some thoughts on optimization, since as always the input files are going to be huge.
this solution works as long the intervals are ordered by the start point and does not require to create a list as big as the total range.
code
with open("0.txt") as f:
t=[x.rstrip("\n").split("\t") for x in f.readlines()]
intervals=[(int(x[0]),int(x[1])) for x in t]
def find_ints(intervals, mn, mx):
next_start = mn
for x in intervals:
if next_start < x[0]:
yield next_start,x[0]
next_start = x[1]
elif next_start < x[1]:
next_start = x[1]
if next_start < mx:
yield next_start, mx
print list(find_ints(intervals, 0, 200))
output:
(in the case of the example you gave)
[(0, 1), (8, 9), (12, 20), (30, 200)]
Rough algorithm:
create an array of booleans, all set to false seen = [False]*200
Iterate over the input file, for each line start end set seen[start] .. seen[end] to be True
Once done, then you can trivially walk the array to find the unused intervals.
In terms of optimisations, if the list of input ranges is sorted on start number, then you can track the highest seen number and use that to filter ranges as they are processed -
e.g. something like
for (start,end) in input:
if end<=lowest_unseen:
next
if start<lowest_unseen:
start=lowest_unseen
...
which (ignoring the cost of the original sort) should make the whole thing O(n) - you go through the array once to tag seen/unseen and once to output unseens.
Seems I'm feeling nice. Here is the (unoptimised) code, assuming your input file is called input
seen = [False]*200
file = open('input','r')
rows = file.readlines()
for row in rows:
(start,end) = row.split(' ')
print "%s %s" % (start,end)
for x in range( int(start)-1, int(end)-1 ):
seen[x] = True
print seen[0:10]
in_unseen_block=False
start=1
for x in range(1,200):
val=seen[x-1]
if val and not in_unseen_block:
continue
if not val and in_unseen_block:
continue
# Must be at a change point.
if val:
# we have reached the end of the block
print "%s %s" % (start,x)
in_unseen_block = False
else:
# start of new block
start = x
in_unseen_block = True
# Handle end block
if in_unseen_block:
print "%s %s" % (start, 200)
I'm leaving the optimizations as an exercise for the reader.
If you make a note every time that one of your input intervals either opens or closes, you can do what you want by putting together the keys of opens and closes, sort into an ordered set, and you'll be able to essentially think, "okay, let's say that each adjacent pair of numbers forms an interval. Then I can focus all of my logic on these intervals as discrete chunks."
myRange = range(201)
intervals = [(1,5), (2,8), (9,12), (20,30)]
opens = {}
closes = {}
def open(index):
if index not in opens:
opens[index] = 0
opens[index] += 1
def close(index):
if index not in closes:
closes[index] = 0
closes[index] += 1
for start, end in intervals:
if end > start: # Making sure to exclude empty intervals, which can be problematic later
open(start)
close(end)
# Sort all the interval-endpoints that we really need to look at
oset = {0:None, 200:None}
for k in opens.keys():
oset[k] = None
for k in closes.keys():
oset[k] = None
relevant_indices = sorted(oset.keys())
# Find the clear ranges
state = 0
results = []
for i in range(len(relevant_indices) - 1):
start = relevant_indices[i]
end = relevant_indices[i+1]
start_state = state
if start in opens:
start_state += opens[start]
if start in closes:
start_state -= closes[start]
end_state = start_state
if end in opens:
end_state += opens[end]
if end in closes:
end_state -= closes[end]
state = end_state
if start_state == 0:
result_start = start
result_end = end
results.append((result_start, result_end))
for start, end in results:
print(str(start) + " " + str(end))
This outputs:
0 1
8 9
12 20
30 200
The intervals don't need to be sorted.
This question seems to be a duplicate of Merging intervals in Python.
If I understood well the problem, you have a list of intervals (1 5; 2 8; 9 12; 20 30) and a range (0 200), and you want to get the positions outside your intervals, but inside given range. Right?
There's a Python library that can help you on that: python-intervals (also available from PyPI using pip). Disclaimer: I'm the maintainer of that library.
Assuming you import this library as follows:
import intervals as I
It's quite easy to get your answer. Basically, you first want to create a disjunction of intervals based on the ones you provide:
inters = I.closed(1, 5) | I.closed(2, 8) | I.closed(9, 12) | I.closed(20, 30)
Then you compute the complement of these intervals, to get everything that is "outside":
compl = ~inters
Then you create the union with [0, 200], as you want to restrict the points to that interval:
print(compl & I.closed(0, 200))
This results in:
[0,1) | (8,9) | (12,20) | (30,200]

Split a list of dates by another list of dates

I have a number of nodes in a network. The nodes send status information every hour to indicate that they are alive. So i have a list of Nodes and the time when they were last alive. I want to graph the number of alive nodes over the time.
The list of nodes is sorted by the time they were last alive but i cant figure out a nice way to count how many are alive at a each date.
from datetime import datetime, timedelta
seen = [ n.last_seen for n in c.nodes ] # a list of datetimes
seen.sort()
start = seen[0]
end = seen[-1]
diff = end - start
num_points = 100
step = diff / num_points
num = len( c.nodes )
dates = [ start + i * step for i in range( num_points ) ]
What i want is basically
alive = [ len([ s for s in seen if s > date]) for date in dates ]
but thats not really efficient. The solution should use the fact that the seen list is sorted and not loop over the whole list for every date.
this generator traverses the list only once:
def get_alive(seen, dates):
c = len(seen)
for date in dates:
for s in seen[-c:]:
if s >= date: # replaced your > for >= as it seems to make more sense
yield c
break
else:
c -= 1
The python bisect module will find the correct index for you, and you can deduct the number of items before and after.
If I'm understanding right, that would be O(dates) * O(log(seen))
Edit 1
It should be possible to do in one pass, just like SilentGhost demonstrates. However,itertools.groupby works fine with sorted data, it should be able to do something here, perhaps like this (this is more than O(n) but could be improved):
import itertools
# numbers are easier to make up now
seen = [-1, 10, 12, 15, 20, 75]
dates = [5, 15, 25, 50, 100]
def finddate(s, dates):
"""Find the first date in #dates larger than s"""
for date in dates:
if s < date:
break
return date
for date, group in itertools.groupby(seen, key=lambda s: finddate(s, dates)):
print date, list(group)
I took SilentGhosts generator solution a bit further using explicit iterators. This is the linear time solution i was thinking of.
def splitter( items, breaks ):
""" assuming `items` and `breaks` are sorted """
c = len( items )
items = iter(items)
item = items.next()
breaks = iter(breaks)
breaker = breaks.next()
while True:
if breaker > item:
for it in items:
c -= 1
if it >= breaker:
item = it
yield c
break
else:# no item left that is > the current breaker
yield 0 # 0 items left for the current breaker
# and 0 items left for all other breaks, since they are > the current
for _ in breaks:
yield 0
break # and done
else:
yield c
for br in breaks:
if br > item:
breaker = br
break
yield c
else:
# there is no break > any item in the list
break

Categories