Faster method for iterating through a numpy array of numpy arrays

Faster method for iterating through a numpy array of numpy arrays - python

I have a numpy array of numpy arrays like the following example:
data = [[0.4, 1.5, 2.6],
[3.4, 0.2, 0.0],
[null, 3.2, 1.0],
[1.0, 4.6, null]]
I would like an efficient way of returning the row index, column index and value if the value meets a condition.
I need the row and column values because I feed them into func_which_returns_lat_long_based_on_row_and_column(column, row) which is applied if the value meets a condition.
Finally I would like to append the value, and outputs of the function to my_list.
I have solved my problem with the nested for loop solution shown below but it is slow. I believe I should be using np.where() however I cannot figure that out.
my_list = []
for ii, array in enumerate(data):
for jj, value in enumerate(array):
if value > 1:
lon , lat = func_which_returns_lat_long_based_on_row_and_column(jj,ii)
my_list.append([value, lon, lat])
I'm hoping there is a more efficient solution than the one I'm using above.

import numpy as np
import warnings
warnings.filterwarnings('ignore')
data = [[0.4, 1.5, 2.6],
[3.4, 0.2, 0.0],
[np.nan, 3.2, 1.0],
[1.0, 4.6, np.nan]]
x = np.array(data)
i, j = np.where(x > 1 )
for a, b in zip(i, j):
print('lon: {} lat: {} value: {}'.format(a, b, x[a,b]))
Output is
lon: 0 lat: 1 value: 1.5
lon: 0 lat: 2 value: 2.6
lon: 1 lat: 0 value: 3.4
lon: 2 lat: 1 value: 3.2
lon: 3 lat: 1 value: 4.6
As there is np.nan in comparison, there will be RuntimeWarning.

you can use
result = np.where(arr == 15)
it will return a np array of indices where element is in arr

try to build a function that works on arrays. For instance a function that adds to every element of the data the corresonding column and row index could look like:
import numpy as np
def func_which_returns_lat_long_based_on_row_and_column(data,indices):
# returns element of data + columna and row index
return data + indices[:,:,0] + indices[:,:,1]
data = np.array([[0.4, 1.5, 2.6],
[3.4, 0.2, 0.0],
[np.NaN, 3.2, 1.0],
[1.0, 4.6, np.NaN]])
# create a matrix of the same shape as data (plus an additional dim because they are two indices)
# with the corresponding indices of the element in it
x_range = np.arange(0,data.shape[0])
y_range = np.arange(0,data.shape[1])
grid = np.meshgrid(x_range,y_range, indexing = 'ij')
indice_matrix = np.concatenate((grid[0][:,:,None],grid[1][:,:,None]),axis=2)
# for instance:
# indice_matrix[0,0] = np.array([0,0])
# indice_matrix[1,0] = np.array([1,0])
# indice_matrix[1,3] = np.array([1,3])
# calculate the output
out = func_which_returns_lat_long_based_on_row_and_column(data,indice_matrix)
data.shape
>> (4,3)
indice_matrix.shape
>> (4, 3, 2)
indice_matrix
>>> array([[[0, 0],
[0, 1],
[0, 2]],
[[1, 0],
[1, 1],
[1, 2]],
[[2, 0],
[2, 1],
[2, 2]],
[[3, 0],
[3, 1],
[3, 2]]])
indice_matrix[2,1]
>> array([2, 1])

Related

When i take the adjacency matrix of a graph and convert it to a SymPy matrix, the values of matrix get many zeros decimals and i dont understand why

I took the adjacency matrix of a graph, then converted it to a sympy matrix. Here is my code:
import networkx as nx
import sympy as sp
import matplotlib.pyplot as plt
G = nx.Graph()
E = [[1,2], [1,3], [2,4], [4,1]]
G.add_edges_from(E)
nx.draw_networkx(G)
A = nx.adjacency_matrix(G).todense()
m = sp.Matrix(A)
print((m.table(sp.StrPrinter())))
And what i get:
Output:
[ 0.0, 1.00000000000000, 1.00000000000000, 1.00000000000000]
[1.00000000000000, 0.0, 0.0, 1.00000000000000]
[1.00000000000000, 0.0, 0.0, 0.0]
[1.00000000000000, 1.00000000000000, 0.0, 0.0]
What happened and how can i convert all these values to integers? I guess that i can make a loop and and convert them all with int(i) but there must be a shorter version. A friend run the same code and his values were integers from the start

This is happening I think because the type of your matrix is a floaat (either 64 or 32 bit). To make it ints you can just use the dtype parameter like so:
>>> m = sp.Matrix(A, dtype=int)
>>> m
Matrix([
[0, 1, 1, 1],
[1, 0, 0, 1],
[1, 0, 0, 0],
[1, 1, 0, 0]])

Python: Group same integer values, then average

I have a huge array of data and I would like to do subgroups for the values for same integers and then take their average.
For example:
a = [0, 0.5, 1, 1.5, 2, 2.5]
I want to take sub groups as follows:
[0, 0.5] [1, 1.5] [2, 2.5]
... and then take the average and put all the averages in a new array.

Assuming you want to group by the number's integer value (so the number rounded down), something like this could work:
>>> a = [0, 0.5, 1, 1.5, 2, 2.5]
>>> groups = [list(g) for _, g in itertools.groupby(a, int)]
>>> groups
[[0, 0.5], [1, 1.5], [2, 2.5]]
Then averaging becomes:
>>> [sum(grp) / len(grp) for grp in groups]
[0.25, 1.25, 2.25]
This assumes a is already sorted, as in your example.
Ref: itertools.groupby, list comprehensions.

If you have no problem using additional libraries:
import pandas as pd
import numpy as np
a = [0, 0.5, 1, 1.5, 2, 2.5]
print(pd.Series(a).groupby(np.array(a, dtype=np.int32)).mean())
Gives:
0 0.25
1 1.25
2 2.25
dtype: float64

If you want an approach with dictionary, you can go ahead like this:
dic={}
a = [0, 0.5, 1, 1.5, 2, 2.5]
for items in a:
if int(items) not in dic:
dic[int(items)]=[]
dic[int(items)].append(items)
print(dic)
for items in dic:
dic[items]=sum(dic[items])/len(dic[items])
print(dic)

You can use groupby to easily get that (you might need to sort the list first):
from itertools import groupby
from statistics import mean
a = [0, 0.5, 1, 1.5, 2, 2.5]
for k, group in groupby(a, key=int):
print(mean(group))
Will give:
0.25
1.25
2.25

values overwrite numpy array

I am quite new to Python. Any help would be appreciated.
ret_val is generating 0 or 1 labels and euclidean distance is genreating distance 0.55 or 0.23..
So what i want is to add those values into a numpy array like:
example
> arr= np.array([[0.55, 0.23], [0.4, 0.6], [0.8, 0.2]])
> arrdist= np.array([[1, 0], [0, 1], [1, 0]])
i want to apply it to my code
output of my code should be :
[[0.7 0.3]
[0.4 0.6]
[0.8 0.2]]
[[1 0]
[0 1]
[1 0]]
but output of my code return only the last values [[0]]
[[37.11052]]
When I try to run the code, values overwrite the array i get only the last element added on the array :
i = 1
for j in range (1,5):
ret_val, euclidean_distance = verifyFace(str(i)+"tst.jpg", str(j)+"train.jpg", epsilon)
if ret_val == '0':
a = 0
print(euclidean_distance)
arr = np.array([[(a)]])
arrdist = np.array([[(euclidean_distance)]])
elif ret_val =='1':
b=1
arr = np.array([[(b)]])
arrdist = np.array([[(euclidean_distance)]])
print(arr)
print(arrdist)

You need to bring arr and arrdist outside of your for loop. Initialize them as empty arrays like so:
i = 1
arr = []
arrdist = []
for j in range (1,5):
...
if ...:
...
arr.append([a])
arrdist.append([euclidean_distance])
...
after your for loop, you convert your list into a numpy array, if you need a numpy array:
arr = np.array(arr)
arrdist = np.array(arrdist)

pandas dataframe how to do "elementwise" concatenation?

I have two pandas dataframes A,B with identical shape, index and column. Each element of A is a np.ndarray with shape (n,1), and each element of B is a float value. Now I want to efficiently append B elementwise to A. A minimal example:
index = ['fst', 'scd']
column = ['a','b']
A
Out[23]:
a b
fst [1, 2] [1, 4]
scd [3, 4] [3, 2]
B
Out[24]:
a b
fst 0.392414 0.641136
scd 0.264117 1.644251
resulting_df = pd.DataFrame([[np.append(A.loc[i,j], B.loc[i,j]) for i in index] for j in column], columns=column, index=index)
resulting_df
Out[27]:
a b
fst [1.0, 2.0, 0.392414377685] [3.0, 4.0, 0.264117463613]
scd [1.0, 4.0, 0.641136433253] [3.0, 2.0, 1.64425062851]
Is there something similar to pd.DataFrame.applymap that can operate elementwise between two instead of just one pandas dataframe?

You can convert the elements in df2 to list using applymap and then just ordinary addition to combine the list i.e
index = ['fst', 'scd']
column = ['a','b']
A = pd.DataFrame([[[1, 2],[1, 4]],[[3, 4],[3, 2]]],index,column)
B = pd.DataFrame([[0.392414,0.264117],[ 0.641136 , 1.644251]],index,column)
Option 1 :
n = B.applymap(lambda y: [y])
ndf = A.apply(lambda x : x+n[x.name])
Option 2 :
using pd.concat to know how this works check here i.e
pd.concat([A,B]).groupby(level=0).apply(lambda g: pd.Series({i: np.hstack(g[i].values) for i in A.columns}))
To make you current method give correct output shift the loops i.e
pd.DataFrame([[np.append(A.loc[i,j], B.loc[i,j]) for j in A.columns] for i in A.index], columns=A.columns, index=A.index)
Output:
a b
fst [1.0, 2.0, 0.392414] [1.0, 4.0, 0.264117]
scd [3.0, 4.0, 0.641136] [3.0, 2.0, 1.644251]

You can simply do this:
>>> A + B.applymap(lambda x : [x])
a b
fst [1, 2, 0.392414] [1, 4, 0.264117]
scd [3, 4, 0.641136] [3, 2, 1.644251]

Python - find the ratios in a list of arrays

I have a data structure
my_list = [ [a, b], [c,d], [e,f], [g,h], [i, j], [k, l] ....]
where the letters are floats.
I need to find the ratio between c,e and a >>>> c/a...e/a
Then find the ratio between d,f and b >>>> d/b, f/b
and continue this for all elements 12 elements in the list. So 8 ratios calculated.
Is there a function that can do this efficiently since we are going between list elements? Without having to extract the data in the arrays individually first and then do the math.

ex_array = [[5.0, 2.5], [10.0, 5.0], [20.0, 13.0]] # makes ndarray, which makes division easier
for i in xrange(len(ex_array)):
print "\n" + str(ex_array[i][0]) + " ratios for x values:\n"
for j in xrange(len(ex_array)):
print str(ex_array[i][0] / ex_array[j][0]) + "\t|{} / {}".format(ex_array[i][0], ex_array[j][0]) # gives ratios for each nested 0 index values against the others
for i in xrange(len(ex_array)):
print "\n" + str(ex_array[i][1]) + " ratios for x values:\n"
for j in xrange(len(ex_array)):
print str(ex_array[i][1] / ex_array[j][1]) + "\t|{} / {}".format(ex_array[i][1], ex_array[j][1]) # gives ratios for each nested 1 index values against the others
output formatted as such:

The required operations must be specified anyway.
def get(l):
return [l[i+k+1][j]/float(l[i][j]) for i in range(0, len(l)-2, 3) for j in range(2) for k in range(2)]
print get([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])

Using list comprehension,
# sample list
a = [ [1.0, 2.0], [4.0, 8.0], [3.0, 9.0] ]
print('List:', a, '\n\nMatching:')
# divides each element to other elements besides itself
xs = [ x[0] / x1[0] for x1 in a for x in a if x[0] != x1[0] ]
ys = [ y[1] / y1[1] for y1 in a for y in a if y[1] != y1[1] ]
print("Quo of x's:", xs)
print("Quo of y's:", ys)
Outputs to
List: [[1.0, 2.0], [4.0, 8.0], [3.0, 9.0]]
Matching:
Quo of x's: [4.0, 3.0, 0.25, 0.75, 0.3333333333333333, 1.3333333333333333]
Quo of y's: [4.0, 4.5, 0.25, 1.125, 0.2222222222222222, 0.8888888888888888]

Or you can have some fun with difflib (builtin to stdlib since Python 2.1):
In [1]: from difflib import SequenceMatcher
In [2]: SequenceMatcher(None, (5,6), (6,)).ratio()
Out[2]: 0.6666666666666666

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Faster method for iterating through a numpy array of numpy arrays - python

you can use result = np.where(arr == 15) it will return a np array of indices where element is in arr

Related

When i take the adjacency matrix of a graph and convert it to a SymPy matrix, the values of matrix get many zeros decimals and i dont understand why

Python: Group same integer values, then average

values overwrite numpy array

pandas dataframe how to do "elementwise" concatenation?

Python - find the ratios in a list of arrays

Categories

Resources