How to delete elements repeated in a 2_D numpy array? - python

I have the following question which I want to solve with numpy Library.
Let's suppose that we have this 'a' array
a = np.vstack(([10, 10, 20, 20, 30, 10, 40, 50, 20] ,[10, 20, 10, 20, 30, 10, 40, 50, 20]))
As output we have
[[10 10 20 20 30 10 40 50 20]
[10 20 10 20 30 10 40 50 20]]
with the shape (2, 9)
I want to delete the elements repeated vertically in our array so that I have as result:
[[10 10 20 20 30 40 50]
[10 20 10 20 30 40 50]]
In this example I want to delete the elements ((0, 5), (1, 5)) and ((0, 8), (1, 8)). Is there any numpy function that can do the job ?
Thanks

This is easily done with:
np.unique(a, axis=1)

Following the idea of this answer, you could do the following.
np.hstack({tuple(row) for row in a.T}).T

Related

How can I replace pd intervals with integers in python

How can I replace pd intervals with integers
import pandas as pd
df = pd.DataFrame()
df['age'] = [43, 76, 27, 8, 57, 32, 12, 22]
age_band = [0,10,20,30,40,50,60,70,80,90]
df['age_bands']= pd.cut(df['age'], bins=age_band, ordered=True)
output:
age age_bands
0 43 (40, 50]
1 76 (70, 80]
2 27 (20, 30]
3 8 (0, 10]
4 57 (50, 60]
5 32 (30, 40]
6 12 (10, 20]
7 22 (20, 30]
now I want to add another column to replace the bands with a single number (int). but I could not
for example this did not work :
df['age_code']= df['age_bands'].replace({'(40, 50]':4})
how can I get a column looks like this?
age_bands age_code
0 (40, 50] 4
1 (70, 80] 7
2 (20, 30] 2
3 (0, 10] 0
4 (50, 60] 5
5 (30, 40] 3
6 (10, 20] 1
7 (20, 30] 2
Assuming you want to the first digit from every interval, then, you can use pd.apply to achieve what you want as follows:
df["age_code"] = df["age_bands"].apply(lambda band: str(band)[1])
However, note this may not be very efficient for a large dataframe,
To convert the column values to int datatype, you can use pd.to_numeric,
df["age_code"] = pd.to_numeric(df['age_code'])
As the column contains pd.Interval objects, use its property left
df['age_code'] = df['age_bands'].apply(lambda interval: interval.left // 10)
You can do that by simply adding a second pd.cut and define labels argument.
import pandas as pd
df = pd.DataFrame()
df['age'] = [43, 76, 27, 8, 57, 32, 12, 22]
age_band = [0,10,20,30,40,50,60,70,80,90]
df['age_bands']= pd.cut(df['age'], bins=age_band, ordered=True)
#This is the part of code you need to add
age_labels = [0, 1, 2, 3, 4, 5, 6, 7, 8]
df['age_code']= pd.cut(df['age'], bins=age_band, labels=age_labels, ordered=True)
>>> print(df)
You can create a dictionary of bins and map it to the age_bands column:
bins_sorted = sorted(pd.cut(df['age'], bins=age_band, ordered=True).unique())
bins_dict = {key: idx for idx, key in enumerate(bins_sorted)}
df['age_code'] = df.age_bands.map(bins_dict).astype(int)

How do I change my code to draw a table from this list?

seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(seq[i],end ="\t")
How do I get my output table to look like this?
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
one of many ways is this, you make iterate over the seq list by a step of 6 and print the element between those margins
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(0, len(seq), 6):
print(*seq[i:i+6], sep=' ')
output
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
You probably want to make use of string formatting. Below, f"{seq[i]:<4d}" means "A string of length 4, left-aligned, containing the string representation of seq[i]". If you want to right-align, just remove <.
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(f"{seq[i]:<4d}", end = "")
if not (i+1) % 6:
print("")
print("")
Output:
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
The simplest relevant technique is padding
for i in range(0, len(seq), 6):
print(" ".join[str(k).ljust(2, " ") for k in seq[i: i + 6]]
but string formatting as in Printing Lists as Tabular Data will make is a more sophisticated solution

Insert in array at specific location

I have an array [ 0 10 15 20 10 0 35 25 15 35 0 30 20 25 30 0] and I need to insert each element of another array ' [5,7,8,15] ' at locations with an increment of 5 such that the final array looks [ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15] length is 20
I am trying with this code
arr_fla = [ 0 10 15 20 10 0 35 25 15 35 0 30 20 25 30 0]
arr_split = [5,7,8,15]
node = 5
node_len = node * (node-1)
for w in range(node, node_len, 5):
for v in arr_split:
arr_fla = np.insert(arr_fla,w,v)
print(arr_fla)
The result I am getting is
'[ 0 10 15 20 10 15 8 7 5 0 15 8 7 5 35 15 8 7 5 25 15 35 0 30
20 25 30 0]' length 28
Can someone please tell me where I am going wrong.
If the sizes line up as cleanly as in your example you can use reshape ...
np.reshape(arr_fla,(len(arr_split),-1))
# array([[ 0, 10, 15, 20],
# [10, 0, 35, 25],
# [15, 35, 0, 30],
# [20, 25, 30, 0]])
... append arr_split as a new column ...
np.c_[np.reshape(arr_fla,(len(arr_split),-1)),arr_split]
# array([[ 0, 10, 15, 20, 5],
# [10, 0, 35, 25, 7],
# [15, 35, 0, 30, 8],
# [20, 25, 30, 0, 15]])
... and flatten again ...
np.c_[np.reshape(arr_fla,(len(arr_split),-1)),arr_split].ravel()
# array([ 0, 10, 15, 20, 5, 10, 0, 35, 25, 7, 15, 35, 0, 30, 8, 20, 25,
# 30, 0, 15])
I have corrected it:
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
node = 5
for w in range(len(arr_split)):
arr_fla = np.insert(arr_fla, (w+1)*node-1, arr_split[w])
print(arr_fla)
'''
Output:
[ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15]
'''
In your code:
for v in arr_split:
This gets all the elements at once (in total w times), but you need just one element at a time. Thus you do not need an extra for loop.
You want to have a counter that keeps going up every time you insert the item from your second array arr_split.
Try this code. My assumption is that your last element can be inserted directly as the original array has only 16 elements.
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
j = 0 #use this as a counter to insert from arr_split
#start iterating from 4th position as you want to insert in the 5th position
for i in range(4,len(arr_fla),5):
arr_fla.insert(i,arr_split[j]) #insert at the 5th position every time
#every time you insert an element, the array size increase
j +=1 #increase the counter by 1 so you can insert the next element
arr_fla.append(arr_split[j]) #add the final element to the original array
print(arr_fla)
Output:
[0, 10, 15, 20, 5, 10, 0, 35, 25, 7, 15, 35, 0, 30, 8, 20, 25, 30, 0, 15]
You could split the list in even chunks, append to each the split values to each chunk, and reassemble the whole (credit to Ned Batchelder for the chunk function ):
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
node = 5
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
tmp_arr = chunks(arr_fla, node)
arr_out = []
for index, chunk in enumerate(tmp_arr):
if arr_split[index]: # make sure arr_split is not exhausted
chunk.append(arr_split[index]) # we use the index of the chunks list to access the split number to insert
arr_out += chunk
print(arr_out)
Outputs:
[0, 10, 15, 20, 10, 5, 0, 35, 25, 15, 35, 7, 0, 30, 20, 25, 30, 8, 0, 15]
you can change to below and have a try.
import numpy as np
arr_fla = [0, 10, 15, 20, 10, 0, 35, 25, 15, 35, 0, 30, 20, 25, 30, 0]
arr_split = [5, 7, 8, 15]
index = 4
for ele in arr_split:
arr_fla = np.insert(arr_fla, index, ele)
index += 5
print(arr_fla)
the result is
[ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15]
about the wrong part of yours, I think it's have two questions:
the second loop is no need, it will cause np insert all the element of arr_split at the same position
the position is not start at 5, it should be 4

Save deque into csv data frame

I am working with tracking elements video using opencv (basically counting number of elements after hsv thresholding). I have a deque buffer to store centroid positions. I chose a limited buffer of 64 (~2 seconds on 30 fps, could be longer). My goal is to save the data into .csv file in such a format that I can readily use later (see below). Additionally, I am counting the number of detected regions. The format would be like
cX cY number
444 265 19
444 265 19
444 264 19
444 264 19
...
With cX being the centroid in X and cY the centroid in Y of the largest element, and the number of detected regions. Column naming is not the main goal although it would be nice.
For display purposes, I need to have the centroid as tuple. I make them grow frame by frame using appendleft:
center_points = deque(maxlen=64)
object_number = deque(maxlen=64)
iteration_counter = 1
while True
# read video frames..
# do stuff...
# get contours
my_cnts = cv2.findContours(...)
# get largest object
c = max(my_cnts, key=cv2.contourArea)
((x, y), radius) = cv2.minEnclosingCircle(c)
M = cv2.moments(c)
big_center = (int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"]))
# count object number as int name it 'num'
center_points.appendleft(big_center)
object_number.appendleft(num)
Now, when the buffer is full, I want to save the data into file):
# Convert to array to save
# Wait until the iteration number is divisible by the buffer length
if(iteration_number % 64 == 0):
print("Saving on iteration..." + str(iteration_number))
array_to_save = np.array([center_points, object_number]).T
with open(filename,'a') as outfile:
np.savetxt(outfile, array_to_save,
delimiter=',', fmt='%s')
# Add 1 to the counter
iteration_number = iteration_number + 1
Problem
The code above works and writes something that looks like this:
(444 265) 19
(444 265) 19
(444 264) 19
(444 263) 19
I would like to do something like np.array(center_points) and bind that to object_number. I have had trouble with dimensions (e.g, (64,2) and (64) not being compatible). I have tried np.append and np.stack but can't find the correct way of formatting the data.
Else, I could keep the code as is but I would like to somehow get rid of the parenthesis on columns 1 and 2 and save that object instead (have tried regular expressions on array_to_save without success). All three columns should be numeric or saved as string but easily retrieved as numeric later in reading.
Update
Based on comments I tried
array_to_save = np.concatenate([np.array(center_points), object_number[:, None]])
TypeError: sequence index must be integer, not 'tuple'
I also tried
array_to_save = np.concatenate([np.array(center_points), np.array(object_number)[:, None]])
ValueError: all the input array dimensions except for the concatenation axis must match exactly
You can concatenate the arrays along the column dimension in order to create a (X, 3) array out of the (X, 2) and (X,) array. In order to be ready for concatenation all the arrays need to have the same number of dimensions and hence you need to add an extra dimension to the flat array object_number: (X,) -> (X, 1). This can by done via object_number[:, np.newaxis] or object_number[:, None]. The complete solution then is:
np.concatenate([np.array(center_points),
np.array(object_number)[:, None]], axis=-1)
Part of your difficulty, I think, is that np.savetxt() does not work well with tuples held in numpy arrays. I've developed some test code that I think replicates the key aspects of your problem and provides solutions to them:
import numpy as np
from collections import deque
# Create test data
center_points = deque(maxlen=64)
number = deque(maxlen=64)
for i in range(10):
big_center = (i*3,i*100)
center_points.appendleft(big_center)
number.appendleft(19)
# Write the test data
array_to_save = np.array([center_points,number]).T
print (array_to_save)
with open("test.txt","w") as outfile:
outfile.write("\n".join([" ".join([str(a[0]),str(a[1]),str(b)]) for a,b in
array_to_save]))
# Re-read the test data
center_points2 = deque(maxlen=64)
number2 = deque(maxlen=64)
with open("test.txt","r") as infile:
for line in infile:
x = [int(xx) for xx in line.split()]
center_points2.append((x[0],x[1]))
number2.append(x[2])
new_array = np.array([center_points2,number2]).T
print (new_array)
When run, this code outputs the following, showing that the original array_to_save is identical to the new_array that has been read back in:
[[(27, 900) 19]
[(24, 800) 19]
[(21, 700) 19]
[(18, 600) 19]
[(15, 500) 19]
[(12, 400) 19]
[(9, 300) 19]
[(6, 200) 19]
[(3, 100) 19]
[(0, 0) 19]]
[[(27, 900) 19]
[(24, 800) 19]
[(21, 700) 19]
[(18, 600) 19]
[(15, 500) 19]
[(12, 400) 19]
[(9, 300) 19]
[(6, 200) 19]
[(3, 100) 19]
[(0, 0) 19]]
The file test.txt is as follows:
27 900 19
24 800 19
21 700 19
18 600 19
15 500 19
12 400 19
9 300 19
6 200 19
3 100 19
0 0 19
The file reading and writing code in this version is a little more complicated than just calling np.savetxt() but it handles the tuples explicitly.
Update
Alternatively, if you preferred to do all the manipulation in the numpy arrays, you could use:
import numpy as np
from collections import deque
# Create test data
center_points = deque(maxlen=64)
number = deque(maxlen=64)
for i in range(10):
big_center = (i*3,i*100)
center_points.appendleft(big_center)
number.appendleft(19)
print (center_points)
print (number)
# Write the test data
x, y = zip(*center_points)
array_to_save = np.array([x,y,number]).T
print (array_to_save)
np.savetxt("test.txt", array_to_save, fmt="%d")
# Re-read the test data
new_array = np.loadtxt("test.txt", dtype=int)
print (new_array)
center_points2 = deque(zip(new_array.T[0],new_array.T[1]),maxlen=64)
number2 = deque(new_array.T[2],maxlen=64)
print (center_points2)
print (number2)
This uses the approach described in Transpose/Unzip Function (inverse of zip)? to separate the two elements of each tuple into two lists that are then included with the number list into a single numpy array that can be saved with savetxt() and re-loaded with loadtxt().
The print() calls are just to illustrate that the data that the program finishes with is exactly the same as the data that it started with. They produce the following output:
deque([(27, 900), (24, 800), (21, 700), (18, 600), (15, 500), (12, 400), (9, 300), (6, 200), (3, 100), (0, 0)], maxlen=64)
deque([19, 19, 19, 19, 19, 19, 19, 19, 19, 19], maxlen=64)
[[ 27 900 19]
[ 24 800 19]
[ 21 700 19]
[ 18 600 19]
[ 15 500 19]
[ 12 400 19]
[ 9 300 19]
[ 6 200 19]
[ 3 100 19]
[ 0 0 19]]
[[ 27 900 19]
[ 24 800 19]
[ 21 700 19]
[ 18 600 19]
[ 15 500 19]
[ 12 400 19]
[ 9 300 19]
[ 6 200 19]
[ 3 100 19]
[ 0 0 19]]
deque([(27, 900), (24, 800), (21, 700), (18, 600), (15, 500), (12, 400), (9, 300), (6, 200), (3, 100), (0, 0)], maxlen=64)
deque([19, 19, 19, 19, 19, 19, 19, 19, 19, 19], maxlen=64)

Matlab vs Python: Reshape

So I found this:
When converting MATLAB code it might be necessary to first reshape a
matrix to a linear sequence, perform some indexing operations and then
reshape back. As reshape (usually) produces views onto the same
storage, it should be possible to do this fairly efficiently.
Note that the scan order used by reshape in Numpy defaults to the 'C'
order, whereas MATLAB uses the Fortran order. If you are simply
converting to a linear sequence and back this doesn't matter. But if
you are converting reshapes from MATLAB code which relies on the scan
order, then this MATLAB code:
z = reshape(x,3,4);
should become
z = x.reshape(3,4,order='F').copy()
in Numpy.
I have a multidimensional 16*2 array called mafs, when I do in MATLAB:
mafs2 = reshape(mafs,[4,4,2])
I get something different than when in python I do:
mafs2 = reshape(mafs,(4,4,2))
or even
mafs2 = mafs.reshape((4,4,2),order='F').copy()
Any help on this? Thank you all.
Example:
MATLAB:
>> mafs = [(1:16)' (17:32)']
mafs =
1 17
2 18
3 19
4 20
5 21
6 22
7 23
8 24
9 25
10 26
11 27
12 28
13 29
14 30
15 31
16 32
>> reshape(mafs,[4 4 2])
ans(:,:,1) =
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
ans(:,:,2) =
17 21 25 29
18 22 26 30
19 23 27 31
20 24 28 32
Python:
>>> import numpy as np
>>> mafs = np.c_[np.arange(1,17), np.arange(17,33)]
>>> mafs.shape
(16, 2)
>>> mafs[:,0]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
>>> mafs[:,1]
array([17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32])
>>> r = np.reshape(mafs, (4,4,2), order="F")
>>> r.shape
(4, 4, 2)
>>> r[:,:,0]
array([[ 1, 5, 9, 13],
[ 2, 6, 10, 14],
[ 3, 7, 11, 15],
[ 4, 8, 12, 16]])
>>> r[:,:,1]
array([[17, 21, 25, 29],
[18, 22, 26, 30],
[19, 23, 27, 31],
[20, 24, 28, 32]])
I was having a similar issue myself, as I am also trying to make the transition from MATLAB to Python. I was finally able to convert a numpy matrix, given in depth, row, col, format to a single sheet of column vectors (per image).
In MATLAB I would have done something like:
output = reshape(imStack,[row*col,depth])
In Python this seems to translate to:
import numpy as np
output=np.transpose(imStack)
output=output.reshape((row*col, depth), order='F')

Categories