Save deque into csv data frame - python

I am working with tracking elements video using opencv (basically counting number of elements after hsv thresholding). I have a deque buffer to store centroid positions. I chose a limited buffer of 64 (~2 seconds on 30 fps, could be longer). My goal is to save the data into .csv file in such a format that I can readily use later (see below). Additionally, I am counting the number of detected regions. The format would be like
cX cY number
444 265 19
444 265 19
444 264 19
444 264 19
...
With cX being the centroid in X and cY the centroid in Y of the largest element, and the number of detected regions. Column naming is not the main goal although it would be nice.
For display purposes, I need to have the centroid as tuple. I make them grow frame by frame using appendleft:
center_points = deque(maxlen=64)
object_number = deque(maxlen=64)
iteration_counter = 1
while True
# read video frames..
# do stuff...
# get contours
my_cnts = cv2.findContours(...)
# get largest object
c = max(my_cnts, key=cv2.contourArea)
((x, y), radius) = cv2.minEnclosingCircle(c)
M = cv2.moments(c)
big_center = (int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"]))
# count object number as int name it 'num'
center_points.appendleft(big_center)
object_number.appendleft(num)
Now, when the buffer is full, I want to save the data into file):
# Convert to array to save
# Wait until the iteration number is divisible by the buffer length
if(iteration_number % 64 == 0):
print("Saving on iteration..." + str(iteration_number))
array_to_save = np.array([center_points, object_number]).T
with open(filename,'a') as outfile:
np.savetxt(outfile, array_to_save,
delimiter=',', fmt='%s')
# Add 1 to the counter
iteration_number = iteration_number + 1
Problem
The code above works and writes something that looks like this:
(444 265) 19
(444 265) 19
(444 264) 19
(444 263) 19
I would like to do something like np.array(center_points) and bind that to object_number. I have had trouble with dimensions (e.g, (64,2) and (64) not being compatible). I have tried np.append and np.stack but can't find the correct way of formatting the data.
Else, I could keep the code as is but I would like to somehow get rid of the parenthesis on columns 1 and 2 and save that object instead (have tried regular expressions on array_to_save without success). All three columns should be numeric or saved as string but easily retrieved as numeric later in reading.
Update
Based on comments I tried
array_to_save = np.concatenate([np.array(center_points), object_number[:, None]])
TypeError: sequence index must be integer, not 'tuple'
I also tried
array_to_save = np.concatenate([np.array(center_points), np.array(object_number)[:, None]])
ValueError: all the input array dimensions except for the concatenation axis must match exactly

You can concatenate the arrays along the column dimension in order to create a (X, 3) array out of the (X, 2) and (X,) array. In order to be ready for concatenation all the arrays need to have the same number of dimensions and hence you need to add an extra dimension to the flat array object_number: (X,) -> (X, 1). This can by done via object_number[:, np.newaxis] or object_number[:, None]. The complete solution then is:
np.concatenate([np.array(center_points),
np.array(object_number)[:, None]], axis=-1)

Part of your difficulty, I think, is that np.savetxt() does not work well with tuples held in numpy arrays. I've developed some test code that I think replicates the key aspects of your problem and provides solutions to them:
import numpy as np
from collections import deque
# Create test data
center_points = deque(maxlen=64)
number = deque(maxlen=64)
for i in range(10):
big_center = (i*3,i*100)
center_points.appendleft(big_center)
number.appendleft(19)
# Write the test data
array_to_save = np.array([center_points,number]).T
print (array_to_save)
with open("test.txt","w") as outfile:
outfile.write("\n".join([" ".join([str(a[0]),str(a[1]),str(b)]) for a,b in
array_to_save]))
# Re-read the test data
center_points2 = deque(maxlen=64)
number2 = deque(maxlen=64)
with open("test.txt","r") as infile:
for line in infile:
x = [int(xx) for xx in line.split()]
center_points2.append((x[0],x[1]))
number2.append(x[2])
new_array = np.array([center_points2,number2]).T
print (new_array)
When run, this code outputs the following, showing that the original array_to_save is identical to the new_array that has been read back in:
[[(27, 900) 19]
[(24, 800) 19]
[(21, 700) 19]
[(18, 600) 19]
[(15, 500) 19]
[(12, 400) 19]
[(9, 300) 19]
[(6, 200) 19]
[(3, 100) 19]
[(0, 0) 19]]
[[(27, 900) 19]
[(24, 800) 19]
[(21, 700) 19]
[(18, 600) 19]
[(15, 500) 19]
[(12, 400) 19]
[(9, 300) 19]
[(6, 200) 19]
[(3, 100) 19]
[(0, 0) 19]]
The file test.txt is as follows:
27 900 19
24 800 19
21 700 19
18 600 19
15 500 19
12 400 19
9 300 19
6 200 19
3 100 19
0 0 19
The file reading and writing code in this version is a little more complicated than just calling np.savetxt() but it handles the tuples explicitly.
Update
Alternatively, if you preferred to do all the manipulation in the numpy arrays, you could use:
import numpy as np
from collections import deque
# Create test data
center_points = deque(maxlen=64)
number = deque(maxlen=64)
for i in range(10):
big_center = (i*3,i*100)
center_points.appendleft(big_center)
number.appendleft(19)
print (center_points)
print (number)
# Write the test data
x, y = zip(*center_points)
array_to_save = np.array([x,y,number]).T
print (array_to_save)
np.savetxt("test.txt", array_to_save, fmt="%d")
# Re-read the test data
new_array = np.loadtxt("test.txt", dtype=int)
print (new_array)
center_points2 = deque(zip(new_array.T[0],new_array.T[1]),maxlen=64)
number2 = deque(new_array.T[2],maxlen=64)
print (center_points2)
print (number2)
This uses the approach described in Transpose/Unzip Function (inverse of zip)? to separate the two elements of each tuple into two lists that are then included with the number list into a single numpy array that can be saved with savetxt() and re-loaded with loadtxt().
The print() calls are just to illustrate that the data that the program finishes with is exactly the same as the data that it started with. They produce the following output:
deque([(27, 900), (24, 800), (21, 700), (18, 600), (15, 500), (12, 400), (9, 300), (6, 200), (3, 100), (0, 0)], maxlen=64)
deque([19, 19, 19, 19, 19, 19, 19, 19, 19, 19], maxlen=64)
[[ 27 900 19]
[ 24 800 19]
[ 21 700 19]
[ 18 600 19]
[ 15 500 19]
[ 12 400 19]
[ 9 300 19]
[ 6 200 19]
[ 3 100 19]
[ 0 0 19]]
[[ 27 900 19]
[ 24 800 19]
[ 21 700 19]
[ 18 600 19]
[ 15 500 19]
[ 12 400 19]
[ 9 300 19]
[ 6 200 19]
[ 3 100 19]
[ 0 0 19]]
deque([(27, 900), (24, 800), (21, 700), (18, 600), (15, 500), (12, 400), (9, 300), (6, 200), (3, 100), (0, 0)], maxlen=64)
deque([19, 19, 19, 19, 19, 19, 19, 19, 19, 19], maxlen=64)

Related

How can I replace pd intervals with integers in python

How can I replace pd intervals with integers
import pandas as pd
df = pd.DataFrame()
df['age'] = [43, 76, 27, 8, 57, 32, 12, 22]
age_band = [0,10,20,30,40,50,60,70,80,90]
df['age_bands']= pd.cut(df['age'], bins=age_band, ordered=True)
output:
age age_bands
0 43 (40, 50]
1 76 (70, 80]
2 27 (20, 30]
3 8 (0, 10]
4 57 (50, 60]
5 32 (30, 40]
6 12 (10, 20]
7 22 (20, 30]
now I want to add another column to replace the bands with a single number (int). but I could not
for example this did not work :
df['age_code']= df['age_bands'].replace({'(40, 50]':4})
how can I get a column looks like this?
age_bands age_code
0 (40, 50] 4
1 (70, 80] 7
2 (20, 30] 2
3 (0, 10] 0
4 (50, 60] 5
5 (30, 40] 3
6 (10, 20] 1
7 (20, 30] 2
Assuming you want to the first digit from every interval, then, you can use pd.apply to achieve what you want as follows:
df["age_code"] = df["age_bands"].apply(lambda band: str(band)[1])
However, note this may not be very efficient for a large dataframe,
To convert the column values to int datatype, you can use pd.to_numeric,
df["age_code"] = pd.to_numeric(df['age_code'])
As the column contains pd.Interval objects, use its property left
df['age_code'] = df['age_bands'].apply(lambda interval: interval.left // 10)
You can do that by simply adding a second pd.cut and define labels argument.
import pandas as pd
df = pd.DataFrame()
df['age'] = [43, 76, 27, 8, 57, 32, 12, 22]
age_band = [0,10,20,30,40,50,60,70,80,90]
df['age_bands']= pd.cut(df['age'], bins=age_band, ordered=True)
#This is the part of code you need to add
age_labels = [0, 1, 2, 3, 4, 5, 6, 7, 8]
df['age_code']= pd.cut(df['age'], bins=age_band, labels=age_labels, ordered=True)
>>> print(df)
You can create a dictionary of bins and map it to the age_bands column:
bins_sorted = sorted(pd.cut(df['age'], bins=age_band, ordered=True).unique())
bins_dict = {key: idx for idx, key in enumerate(bins_sorted)}
df['age_code'] = df.age_bands.map(bins_dict).astype(int)

How to delete elements repeated in a 2_D numpy array?

I have the following question which I want to solve with numpy Library.
Let's suppose that we have this 'a' array
a = np.vstack(([10, 10, 20, 20, 30, 10, 40, 50, 20] ,[10, 20, 10, 20, 30, 10, 40, 50, 20]))
As output we have
[[10 10 20 20 30 10 40 50 20]
[10 20 10 20 30 10 40 50 20]]
with the shape (2, 9)
I want to delete the elements repeated vertically in our array so that I have as result:
[[10 10 20 20 30 40 50]
[10 20 10 20 30 40 50]]
In this example I want to delete the elements ((0, 5), (1, 5)) and ((0, 8), (1, 8)). Is there any numpy function that can do the job ?
Thanks
This is easily done with:
np.unique(a, axis=1)
Following the idea of this answer, you could do the following.
np.hstack({tuple(row) for row in a.T}).T

How to develope an algorithm in python to make specific pairs out of values in a numpy array

I want automatically create some pairs based on the data stored as numpy arrays. In fact, the numbers in my first arrays are numbers of some lines. I want to connect the lines and create surfaces using created pairs. This is the array of lines:
line_no= np.arange (17, 25)
These lines are in two perpendicular directions. I uploaded a fig to show it (they are as blue and red colors). I know where the direction of my lines change and call it as sep.
sep=20
Another data which should be useable is the number of points creating lines. I call it rep.
rep = np.array([3,3,1])
Then, I used the following code to achieve my goal but it is not correct:
start =line_no[0]
N_x = len(rep) - 1
N_y = max(rep) - 1
grid = np.zeros((N_y + 1, N_x + 1, 2))
kxs = [0] + [min(rep[i], rep[i+1]) for i in range(len(rep)-1)]
T_x = sum(kxs)
T_y = sum(rep) - len(rep)
T_total = T_x + T_y
lines = np.arange(start, start+T_total)
lines_before = 0
for i in range(N_x):
for j in range(N_y, -1, -1):
if j >= kxs[i+1]:
continue
grid[j,i,0] = lines[lines_before]
lines_before += 1
for i in range(N_x+1):
for j in range(N_y-1, -1, -1):
if j < rep[i] - 1:
grid[j,i,1] = lines[lines_before]
lines_before += 1
joints=np.array([])
for i in range(N_x - 1):
for j in range(N_y - 1):
square = np.append(grid[j:j+2, i, 0], grid[j, i:i+2, 1])
if all(square):
new_joints = square
joints=np.append (new_joints,joints)
In my fig I have two scenarios: A (rep = np.array([3,3,1])) and B (rep = np.array([1,3,3])). For A I want to have the following pairs:
17, 21, 18, 23
18, 22, 19, 24
And for B:
18, 21, 19, 23
19, 22, 20, 24
In reality the distribution of my lines can change. For example, in scenario A, the last line is not creating any surface and in B the first one is not in any surface and in case I may have several lines that are not part of any surface. For example I may have another red line bellow line number 21 which do not make any surface. Thanks for paying attention to my problem. I do appreciate any help in advance.
A more complicated case is also shown in the following. In scenario C I have:
line_no= np.arange (17, 42)
sep=29
rep = np.array([5,4,4,2,2,1])
In scenario D I have:
line_no= np.arange (17, 33)
sep=24
rep = np.array([1,3,4,4])
Sorry, but I couldn't go through your implementation. Tip for next time onwards- please try to comment your code, it helps.
Anyway, here is somewhat of a readable implementation that gets the job done. But, I advice you to check with more scenarios to verify the scripts validity before making any conclusions.
import numpy as np
line_no = np.arange(17, 25)
sep = 20 # this information is redundant for the problem
nodes = np.array(np.array([1,3,4,4]))
# for generalised implementation hlines start from 0 and vlines start where hlines end
# offset parameter can be used to change the origin or start number of hlines and hence changed the vlines also
offset = 17
# calculate the number of horizontal lines and vertical lines in sequence
hlines = np.array([min(a, b) for a, b in zip(nodes[:-1], nodes[1:])])
# vlines = np.array([max(a, b) - 1 for a, b in zip(nodes[:-1], nodes[1:])])
vlines = nodes - 1
print(f"hlines: {hlines}, vlines: {vlines}")
# nodes = np.array([3, 3, 1]) ---> hlines: [3, 1], vlines: [2, 2]
# nodes = np.array([1, 3, 3]) ---> hlines: [1, 3], vlines: [2, 2]
hlines_no = list(range(sum(hlines)))
vlines_no = list(range(sum(hlines), sum(hlines)+sum(vlines)))
print(f"hlines numbers: {hlines_no}, vlines numbers: {vlines_no}")
# nodes = np.array([3, 3, 1]) ---> hlines numbers: [0, 1, 2, 3], vlines numbers: [4, 5, 6, 7]
# nodes = np.array([1, 3, 3]) ---> hlines numbers: [0, 1, 2, 3], vlines numbers: [4, 5, 6, 7]
cells = [] # to store complete cell tuples
hidx = 0 # to keep track of horizontal lines index
vidx = 0 # to keep track of vertical lines index
previous_cells = 0
current_cells = 0
for LN, RN in zip(nodes[:-1], nodes[1:]):
# if either the left or right side nodes is equal to 1, implies only 1 horizontal line exists
# and the horizontal index is updated
if LN == 1 or RN == 1:
hidx += 1
else:
# to handle just a blank vertical line
if LN - RN == 1:
vidx += 1
# iterate 'cell' number of times
# number of cells are always 1 less than the minimum of left and right side nodes
current_cells = min(LN, RN)-1
if previous_cells != 0 and previous_cells > current_cells:
vidx += previous_cells - current_cells
for C in range(current_cells):
cell = (offset + hlines_no[hidx],
offset + vlines_no[vidx],
offset + hlines_no[hidx+1],
offset + vlines_no[vidx+current_cells])
hidx += 1
vidx += 1
cells.append(cell)
# skip the last horizontal line in a column
hidx += 1
previous_cells = min(LN, RN)-1
print(cells)
Results
# nodes = np.array([3, 3, 1]) ---> [(17, 21, 18, 23), (18, 22, 19, 24)]
# nodes = np.array([1, 3, 3]) ---> [(18, 21, 19, 23), (19, 22, 20, 24)]
# nodes = np.array([5,4,4,2,2,1]) ---> [(17, 31, 18, 34),
# (18, 32, 19, 35),
# (19, 33, 20, 36),
# (21, 34, 22, 37),
# (22, 35, 23, 38),
# (23, 36, 24, 39),
# (25, 39, 26, 40),
# (27, 40, 28, 41)]
# nodes = np.array([1,3,4,4]) ---> [(18, 25, 19, 27),
# (19, 26, 20, 28),
# (21, 27, 22, 30),
# (22, 28, 23, 31),
# (23, 29, 24, 32)]
Edit: Updated the code to account for the special case scenarios

How to update matrix based on multiple maximum value per row?

I am a newbie to Python. I have an NxN matrix and I want to know the maximum value per each row. Next, I want to nullify(update as zero) all other values except this maximum value. If the row contains multiple maximum values, all those maximum values should be preserved.
Using DataFrame, I tried to get the maximum of each row.Then I tried to get indices of these max values. Code is given below.
matrix = [(22, 16, 23),
(12, 6, 43),
(24, 67, 11),
(87, 9,11),
(66, 36,66)
]
dfObj = pd.DataFrame(matrix, index=list('abcde'), columns=list('xyz'))
maxValuesObj = dfObj.max(axis=1)
maxValueIndexObj = dfObj.idxmax(axis=1)
The above code doesn't consider multiple maximum values. Only the first occurrence is returned.
Also,I am stuck with how to update the matrix accordingly. My expected output is:
matrix = [(0, 0, 23),
(0, 0, 43),
(0, 67, 0),
(87, 0,0),
(66, 0,66)
]
Can you please help me to sort out this?
Using df.where():
dfObj.where(dfObj.eq(dfObj.max(1),axis=0),0)
x y z
a 0 0 23
b 0 0 43
c 0 67 0
d 87 0 0
e 66 0 66
For an ND array instead of a dataframe , call .values after the above code:
dfObj.where(dfObj.eq(dfObj.max(1),axis=0),0).values
Or better is to_numpy():
dfObj.where(dfObj.eq(dfObj.max(1),axis=0),0).to_numpy()
Or np.where:
np.where(dfObj.eq(dfObj.max(1),axis=0),dfObj,0)
array([[ 0, 0, 23],
[ 0, 0, 43],
[ 0, 67, 0],
[87, 0, 0],
[66, 0, 66]], dtype=int64)
I'll show how to do it with a Python built-ins instead of Pandas, since you're new to Python and should know how to do it outside of Pandas (and the Pandas syntax isn't as clean).
matrix = [(22, 16, 23),
(12, 6, 43),
(24, 67, 11),
(87, 9,11),
(66, 36,66)
]
new_matrix = []
for row in matrix:
row_max = max(row)
new_row = tuple(element if element == row_max else 0 for element in row)
new_matrix.append(new_row)
You can do this with a short for loop pretty easily:
import numpy as np
matrix = np.array([(22, 16, 23), (12, 6, 43), (24, 67, 11), (87, 9,11), (66, 36,66)])
for i in range(0, len(matrix)):
matrix[i] = [x if x == max(matrix[i]) else 0 for x in matrix[i]]
print(matrix)
output:
[[ 0 0 23]
[ 0 0 43]
[ 0 67 0]
[87 0 0]
[66 0 66]]
I would also use numpy for matrices not pandas.
This isn't the most performant solution, but you can write a function for the row operation then apply it to each row:
def max_row(row):
row.loc[row != row.max()] = 0
return row
dfObj.apply(max_row, axis=1)
Out[17]:
x y z
a 0 0 23
b 0 0 43
c 0 67 0
d 87 0 0
e 66 0 66

Combinatoric / cartesian product of Numpy arrays without iterators and/or loop(s) [duplicate]

This question already has answers here:
Cartesian product of x and y array points into single array of 2D points
(17 answers)
Closed 7 years ago.
The following code
import numpy as np
import itertools
a_p1 = np.arange(0, 4, 1)
a_p2 = np.arange(20, 25, 1)
params = itertools.product(a_p1, a_p2)
for (p1, p2) in params:
print(p1, p2)
outputs
(0, 20) (0, 21) (0, 22) (0, 23) (0, 24) (1, 20) (1, 21) (1, 22) (1, 23) (1, 24) (2, 20) (2, 21) (2, 22) (2, 23) (2, 24) (3, 20) (3, 21) (3, 22) (3, 23) (3, 24)
2 nested for loops can also outputs same results
for i, p1 in enumerate(a_p1):
for j, p2 in enumerate(a_p2):
print(p1, p2)
I'm looking for a solution to directly output a Numpy array with such combination (a Numpy array of tuples).
Is there a way to generate such a Numpy array without iterators and/or for loop(s) ?
I'm aware that such a solution will be more memory consuming than using iterators.
Install Scikit-Learn http://scikit-learn.org/
from sklearn.utils.extmath import cartesian
print cartesian([a_p1, a_p2])
It should output
[[ 0 20]
[ 0 21]
[ 0 22]
[ 0 23]
[ 0 24]
[ 1 20]
[ 1 21]
[ 1 22]
[ 1 23]
[ 1 24]
[ 2 20]
[ 2 21]
[ 2 22]
[ 2 23]
[ 2 24]
[ 3 20]
[ 3 21]
[ 3 22]
[ 3 23]
[ 3 24]]
This solution was taken from a similar question:
Using numpy to build an array of all combinations of two arrays

Categories