Related
I have a 2-D List as follows:
[
[6 4 4 2 5 5 4 5 4 1 3 5]
[4 3 6 5 4 4 5 1 5 5 2 4]
[2 5 2 0 4 5 4 4 2 3 2 6]
[5 5 4 3 5 4 6 7 3 4 4 4]
[3 5 6 5 6 5 3 5 3 4 7 4]
[4 5 5 4 5 4 7 5 3 5 4 1]
[2 5 3 3 5 3 4 4 3 3 1 3]
[2 5 5 2 5 4 6 2 5 6 2 5]
]
Conditions:
compare column 1,5 and 9 (in steps of 4) - row-wise and process them in the following order
If one of them is zero - do nothing. Go to Step 2
(6,5,4) - none of them zero so go to step 2
If they are all equal - change all of them to zero. If not go Step 3
Take the lowest of the three and subtract each by this minimum
Repeat this with next three elements (2,6,10) until (4,8,12)
How to do efficiently this in python using pandas or numpy or even list operation.
Any help appreciated. Thanks!
You could write a custom function and then apply that functions to every element in the array.
def check_conditions(x):
for i in range(4):
if x[i] == 0 or x[i+4] == 0 or x[i+8] == 0:
continue
elif x[i] == x[i+4] == x[i+8]:
x[i] = 0
x[i+4] = 0
x[i+8] = 0
else:
min_val = min(x[i], x[i+4], x[i+8])
x[i] -= min_val
x[i+4] -= min_val
x[i+8] -= min_val
return x
new_arr = [check_conditions(x) for x in arr]
To get the following result.
print(new_arr)
[[2, 3, 1, 2, 1, 4, 1, 5, 0, 0, 0, 5],
[0, 0, 4, 5, 0, 1, 3, 1, 1, 2, 0, 4],
[0, 2, 0, 0, 2, 2, 2, 4, 0, 0, 0, 6],
[2, 1, 0, 3, 2, 0, 2, 7, 0, 0, 0, 4],
[0, 1, 3, 5, 3, 1, 0, 5, 0, 0, 4, 4],
[1, 1, 1, 4, 2, 0, 3, 5, 0, 1, 0, 1],
[0, 2, 2, 3, 3, 0, 3, 4, 1, 0, 0, 3],
[0, 1, 3, 2, 3, 0, 4, 2, 3, 2, 0, 5]]
I am currently working with two sets of dataframes. Each set contains 60 dataframes. They are sorted to line up for mapping (eg. set1 df1 corresponds with set2 df1). First set is about 27 rows x 2 columns; second set is over 25000 rows x 8 columns. I want to create a new dataframe that contains rows from the 2nd dataframe according to the values in the 1st dataframe.
For simplicity I've created a shorten example of the first df of each set to illustrate. I want to use the 797 to take the first 796 rows (indexes 0 - 795) from df2 and add them to a new dataframe, and then rows 796 to 930 and filter them to a 2nd new dataframe. Any suggestions how I could that do for all 60 pairs of dataframes?
0 1
0 797.0 930.0
1 1650.0 1760.0
2 2500.0 2570.0
3 3250.0 3333.0
4 3897.0 3967.0
0 -1 -2 -1 -3 -2 -1 2 0
1 0 0 0 -2 0 -1 0 0
2 -3 0 0 -1 -2 -1 -1 -1
3 0 1 -1 -1 -3 -2 -1 0
4 0 -3 -3 0 0 0 -4 -2
edit to add:
import pandas as pd
df1 = pd.DataFrame([(3, 5), (8, 11)])
df2 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (3, 0, 2, 3, 1, 0, 1, 2),
(4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2),
(7, 0, 2, 3, 1, 0, 1, 2), (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2),
(10, 0, 2, 3, 1, 0, 1, 2), (11, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2),
(13, 0, 2, 3, 1, 0, 1, 2), (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
#expected output will be two dataframes containing rows from df2
output1 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2),
(7, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), (13, 0, 2, 3, 1, 0, 1, 2),
(14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
output2 = pd.DataFrame([(3, 0, 2, 3, 1, 0, 1, 2), (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2),
(8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), (10, 0, 2, 3, 1, 0, 1, 2),
(11, 0, 2, 3, 1, 0, 1, 2)])
You can use list comprehension with flatten for indices:
rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
print (rng)
[2, 3, 4, 7, 8, 9, 10]
And then filter by DataFrame.iloc and Index.difference:
output1 = df2.iloc[df2.index.difference(rng)]
print (output1)
0 1 2 3 4 5 6 7
0 1 0.0 2 3 1.0 0 1 2
1 2 0.5 1 3 1.0 0 1 2
5 6 0.0 2 3 1.0 0 1 2
6 7 0.0 2 3 1.0 0 1 2
11 12 0.0 2 3 1.0 0 1 2
12 13 0.0 2 3 1.0 0 1 2
13 14 0.0 0 1 2.0 5 2 3
output2 = df2.iloc[rng]
print (output2)
0 1 2 3 4 5 6 7
2 3 0.0 2 3 1.0 0 1 2
3 4 0.0 2 3 1.0 0 1 2
4 5 0.0 2 3 1.0 0 1 2
7 8 0.0 2 3 1.0 0 1 2
8 9 0.0 2 3 1.0 0 1 2
9 10 0.0 2 3 1.0 0 1 2
10 11 0.0 2 3 1.0 0 1 2
EDIT:
#list of DataFrames
L1 = [df11, df21, df31]
L2 = [df12, df22, df32]
#if necessary output lists
out1 = []
out2 = []
#loop with zipped lists and apply solution
for df1, df2 in zip(L1, L2):
print (df1)
print (df2)
rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
output1 = df2.iloc[df2.index.difference(rng)]
output2 = df2.iloc[rng]
#if necessary append output df to lists
out1.append(output1)
out2.append(output2)
this might not be efficient, but I could generate your desired results
import pandas as pd
import numpy as np
df_out1 = pd.DataFrame()
df_out2 = pd.DataFrame()
#generate the secode dataframe
for x, y in np.array(df1):
df_out2 = df_out2.append(df2.iloc[x-1:y], ignore_index=True)
#get the difference
df_out1 = pd.concat([df_out2,df2]).drop_duplicates(keep=False)
to compare the results with yours
np.array_equal(df_out1.values,output1.values)
np.array_equal(df_out2.values,output2.values)
I am trying to count the number of neighbours for each element in a 2d numpy array that differ from the element itself (4-neighbourhood in this case, but 8-neighbourhood is also interesting).
Something like this:
input labels:
[[1 1 1 2 2 2 2]
[1 1 1 2 2 2 2]
[1 1 1 2 2 2 2]
[1 1 3 3 3 5 5]
[4 4 4 3 3 5 5]
[4 4 4 3 3 5 5]] (6, 7)
count of unique neighbour labels:
[[0 0 1 1 0 0 0]
[0 0 1 1 0 0 0]
[0 0 2 2 1 1 1]
[1 2 2 1 2 2 1]
[1 1 1 1 1 1 0]
[0 0 1 1 1 1 0]] (6, 7)
I have the code below, and out of curiosity I am wondering if there is a better way to achieve this, perhaps without the for loops?
import numpy as np
import cv2
labels_image = np.array([
[1,1,1,2,2,2,2],
[1,1,1,2,2,2,2],
[1,1,1,2,2,2,2],
[1,1,3,3,3,5,5],
[4,4,4,3,3,5,5],
[4,4,4,3,3,5,5]])
print('input labels:\n', labels_image, labels_image.shape)
# Make a border, otherwise neighbours are counted as wrapped values from the other side
labels_image = cv2.copyMakeBorder(labels_image, 1, 1, 1, 1, cv2.BORDER_REPLICATE)
offsets = [(-1, 0), (0, -1), (0, 1), (1, 0)] # 4 neighbourhood
# Stack labels_image with one shifted per offset so we get a 3d array
# where each z-value corresponds to one of the neighbours
stacked = np.dstack(np.roll(np.roll(labels_image, i, axis=0), j, axis=1) for i, j in offsets)
# count number of unique neighbours, also take the border away again
labels_image = np.array([[(len(np.unique(stacked[i,j])) - 1)
for j in range(1, labels_image.shape[1] - 1)]
for i in range(1, labels_image.shape[0] - 1)])
print('count of unique neighbour labels:\n', labels_image, labels_image.shape)
I tried using np.unique with the return_counts and axis arguments, but could not get it to work.
Here's one approach -
import itertools
def count_nunique_neighbors(ar):
a = np.pad(ar, (1,1), mode='reflect')
c = a[1:-1,1:-1]
top = a[:-2,1:-1]
bottom = a[2:,1:-1]
left = a[1:-1,:-2]
right = a[1:-1,2:]
ineq = [top!= c,bottom!= c, left!= c, right!= c]
count = ineq[0].astype(int) + ineq[1] + ineq[2] + ineq[3]
blck = [top, bottom, left, right]
for i,j in list(itertools.combinations(range(4), r=2)):
count -= ((blck[i] == blck[j]) & ineq[j])
return count
Sample run -
In [22]: a
Out[22]:
array([[1, 1, 1, 2, 2, 2, 2],
[1, 1, 1, 2, 2, 2, 2],
[1, 1, 1, 2, 2, 2, 2],
[1, 1, 3, 3, 3, 5, 5],
[4, 4, 4, 3, 3, 5, 5],
[4, 4, 4, 3, 3, 5, 5]])
In [23]: count_nunique_neighbors(a)
Out[23]:
array([[0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 2, 2, 1, 1, 1],
[1, 2, 2, 1, 2, 2, 1],
[1, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 0]])
I'm searching for a pythonic way to do this operation faster
import numpy as np
von_knoten = np.array([0, 0, 1, 1, 1, 2, 2, 2, 3, 4])
zu_knoten = np.array([1, 2, 0, 2, 3, 0, 1, 4, 1, 2])
try:
for i in range(0,len(von_knoten)-1):
for j in range(0,len(von_knoten)-1):
if (i != j) & ([von_knoten[i],zu_knoten[i]] == [zu_knoten[j],von_knoten[j]]):
print(str(i)+".column equal " +str(j)+".column")
von_knoten = sp.delete(von_knoten , j)
zu_knoten = sp.delete(zu_knoten , j)
print(von_knoten)
print(zu_knoten)
except:
print('end')
so I need the fastest way to get
[0 0 1 1 4]
[1 2 2 3 2]
from
[0 0 1 1 1 2 2 2 3 4]
[1 2 0 2 3 0 1 4 1 2]
Thanks ;)
Some comments about your code; as-is, it does not do what you want, it shall print some stuff, did you even try to run it? Could you show us what you obtain?
first, simply do a range(len(von_knoten)); this will do what you want, as range starts at 0 by default, and ends one step before the end.
if you delete some items from the input lists, and try to access to items at end of them, you will likely obtain IndexErrors, this before exhausting the analysis of your input lists.
you do some sp.delete but we do not know what that is (neither do the code), this will raise AttributeErrors.
alas, please do not use except:. This will catch Exceptions you never dreamt of, and may explain why you don't understand what's wrong.
Then, what about using zip built-in function to obtain sorted two-dimensions tuples, and remove the duplicates ? Something like:
>>> von_knoten = [0, 0, 1, 1, 1, 2, 2, 2, 3, 4]
>>> zu_knoten = [1, 2, 0, 2, 3, 0, 1, 4, 1, 2]
>>> set(tuple(sorted([m, n])) for m, n in zip(von_knoten, zu_knoten))
{(0, 1), (0, 2), (1, 2), (1, 3), (2, 4)}
I let you work around this to obtain the exact thing you're looking for.
You are trying to build up a collection of pairs you haven't seen before.
You can use not in but need to check this either way round:
L = []
for x,y in zip(von_knoten, zu_knoten):
if (x, y) not in L and (y, x ) not in L:
L.append((x, y))
This gives a list of tuples
[(0, 1), (0, 2), (1, 2), (1, 3), (2, 4)]
which you can reshape.
Here's a vectorized output -
def unique_pairs(von_knoten, zu_knoten):
s = np.max([von_knoten, zu_knoten])+1
p1 = zu_knoten*s + von_knoten
p2 = von_knoten*s + zu_knoten
p = np.maximum(p1,p2)
sidx = p.argsort(kind='mergesort')
ps = p[sidx]
m = np.concatenate(([True],ps[1:] != ps[:-1]))
sm = sidx[m]
return von_knoten[sm],zu_knoten[sm]
Sample run -
In [417]: von_knoten = np.array([0, 0, 1, 1, 1, 2, 2, 2, 3, 4])
...: zu_knoten = np.array([1, 2, 0, 2, 3, 0, 1, 4, 1, 2])
In [418]: unique_pairs(von_knoten, zu_knoten)
Out[418]: (array([0, 0, 1, 1, 2]), array([1, 2, 2, 3, 4]))
Using np.unique and the void view method from here
def unique_pairs(a, b):
c = np.sort(np.stack([a, b], axis = 1), axis = 1)
c_view = np.ascontiguousarray(c).view(np.dtype((np.void,
c.dtype.itemsize * c.shape[1])))
_, i = np.unique(c_view, return_index = True)
return a[i], b[i]
I have a dataset of like 3 items e.g. [1,2,3]
I want to find the product of it with 3 repeats and then separate them into 3 datasets like this (it should be vertical actually):
[1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3]
[1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3]
[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3]
I noticed that in python I can use iteration.product for finding products as:
data_prod=itertools.product(data,repeat=3)
now my question is how can I convert each column of the result (which the datatype is itertools.product) to 3 new datasets as shown in above example?
Use zip(*..) to turn columns into rows:
dataset1, dataset2, dataset3 = zip(*itertools.product(data,repeat=3))
Demo:
>>> zip(*itertools.product(data,repeat=3))
[(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3), (1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3), (1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)]
>>> dataset1, dataset2, dataset3 = zip(*itertools.product(data,repeat=3))
>>> dataset1
(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3)
>>> dataset2
(1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3)
>>> dataset3
(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)
An alternate way, for display purposes, still using itertools.product:
import itertools
import pandas as pd
cols=['series1', 'series2', 'series3']
originDataset = [1,2,3]
data_prod = lambda x: list(itertools.product(x, repeat=3))
df1 = pd.DataFrame(originDataset, columns=['OriginalDataSet'])
df2 = pd.DataFrame(data_prod(originDataset), columns=cols)
print df1
print '-'*80
print df2
print '-'*80
series1, series2, series3 = df2.T.values
print series1
print series2
print series3
Output:
OriginalDataSet
0 1
1 2
2 3
--------------------------------------------------------------------------------
series1 series2 series3
0 1 1 1
1 1 1 2
2 1 1 3
3 1 2 1
4 1 2 2
5 1 2 3
6 1 3 1
7 1 3 2
8 1 3 3
9 2 1 1
10 2 1 2
11 2 1 3
12 2 2 1
13 2 2 2
14 2 2 3
15 2 3 1
16 2 3 2
17 2 3 3
18 3 1 1
19 3 1 2
20 3 1 3
21 3 2 1
22 3 2 2
23 3 2 3
24 3 3 1
25 3 3 2
26 3 3 3
--------------------------------------------------------------------------------
[1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3]
[1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3]
[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]
I hope it helps to, at the same time, learn how to use Pandas