pandas dataframe how to do "elementwise" concatenation? - python

I have two pandas dataframes A,B with identical shape, index and column. Each element of A is a np.ndarray with shape (n,1), and each element of B is a float value. Now I want to efficiently append B elementwise to A. A minimal example:
index = ['fst', 'scd']
column = ['a','b']
A
Out[23]:
a b
fst [1, 2] [1, 4]
scd [3, 4] [3, 2]
B
Out[24]:
a b
fst 0.392414 0.641136
scd 0.264117 1.644251
resulting_df = pd.DataFrame([[np.append(A.loc[i,j], B.loc[i,j]) for i in index] for j in column], columns=column, index=index)
resulting_df
Out[27]:
a b
fst [1.0, 2.0, 0.392414377685] [3.0, 4.0, 0.264117463613]
scd [1.0, 4.0, 0.641136433253] [3.0, 2.0, 1.64425062851]
Is there something similar to pd.DataFrame.applymap that can operate elementwise between two instead of just one pandas dataframe?

You can convert the elements in df2 to list using applymap and then just ordinary addition to combine the list i.e
index = ['fst', 'scd']
column = ['a','b']
A = pd.DataFrame([[[1, 2],[1, 4]],[[3, 4],[3, 2]]],index,column)
B = pd.DataFrame([[0.392414,0.264117],[ 0.641136 , 1.644251]],index,column)
Option 1 :
n = B.applymap(lambda y: [y])
ndf = A.apply(lambda x : x+n[x.name])
Option 2 :
using pd.concat to know how this works check here i.e
pd.concat([A,B]).groupby(level=0).apply(lambda g: pd.Series({i: np.hstack(g[i].values) for i in A.columns}))
To make you current method give correct output shift the loops i.e
pd.DataFrame([[np.append(A.loc[i,j], B.loc[i,j]) for j in A.columns] for i in A.index], columns=A.columns, index=A.index)
Output:
a b
fst [1.0, 2.0, 0.392414] [1.0, 4.0, 0.264117]
scd [3.0, 4.0, 0.641136] [3.0, 2.0, 1.644251]

You can simply do this:
>>> A + B.applymap(lambda x : [x])
a b
fst [1, 2, 0.392414] [1, 4, 0.264117]
scd [3, 4, 0.641136] [3, 2, 1.644251]

Related

Extract values from two columns of a dataframe and put it in a list

I have a dataframe as shown below:
df =
A col_1 col_45 col_3
1.0 4.0 45.0 [1, 9]
2.0 4.0 NaN [9, 10]
3.0 49.2 10.8 [1, 10]
The values in col_1 are of type float and the values in col_3 are in a list. For every row, I want to extract the values in col_1 and col_3 and put it together in a list.
I tried the following:
df[['col_1','col_3']].astype(float).values.tolist()
But it threw me a Value error: ValueError: setting an array element with a sequence..
I would like to have a list as follows:
[[4.0,1.0,9.0],
[4.0,9.0,10.0],
[49.2,1.0,10.0]]
Is there a way to do this?
Thanks.
Convert one element in col_1 to list then use merge two list like list_1 + list_2, You can use pandas.apply with axis=1 for iterate over each row:
>>> df.apply(lambda row: [row['col_1']] + row['col_3'], axis=1)
0 [4.0, 1, 9]
1 [4.0, 9, 10]
2 [49.2, 1, 10]
dtype: object
>>> df.apply(lambda row: [row['col_1']] + row['col_3'], axis=1).to_list()
[
[4.0, 1, 9],
[4.0, 9, 10],
[49.2, 1, 10]
]
The best IMO, might be to use underlying numpy array:
out = np.c_[df['col_1'].to_numpy(), df['col_3'].to_list()].tolist()
output:
[[4.0, 1.0, 9.0],
[4.0, 9.0, 10.0],
[49.2, 1.0, 10.0]]
If you want to keep a DataFrame:
pd.concat([df['col_1'], pd.DataFrame(df['col_3'].to_list())], axis=1)
output:
col_1 0 1
0 4.0 1 9
1 4.0 9 10
2 49.2 1 10
Use apply functions to cast the col_1 to list and then concatenate by + operator -
df['col_1'].apply(lambda x: [x]) + df['col_3']
Output
0 [4.0, 1, 9]
1 [4.0, 9, 10]
2 [49.2, 1, 10]
dtype: object

How to print a the same map result 5 times in a loop?

Here I've a simple assoc. array of maps where I want to loop, but I want to print the arr['b'] by repeating 5 times.
number = 0
arr = {}
arr['a'] = map(float, [1, 2, 3])
arr['b'] = map(float, [4, 5, 6])
arr['c'] = map(float, [7, 8, 9])
arr['d'] = map(float, [10, 11, 12])
while number < 5:
print(list(arr['b']))
number = number + 1
Why is the output as such, instead of [4.0, 5.0, 6.0] repeating 5 times? How can I loop to get arr['b'] result 5 times?
Output:
[4.0, 5.0, 6.0]
[]
[]
[]
[]
This is the output I really want.
Intended Output:
[4.0, 5.0, 6.0]
[4.0, 5.0, 6.0]
[4.0, 5.0, 6.0]
[4.0, 5.0, 6.0]
[4.0, 5.0, 6.0]
map produces a generator which gets consumed the first time you access its content. Therefore, the first time you convert it to a list, it gives you the expected results, but the second time the resulting list is empty. Simple example:
a = map(float, [1, 2, 3])
print(list(a))
# out: [1.0, 2.0, 3.0]
print(list(a))
# out: []
Convert the map object/generator to a list once (outside the loop!) and you can print it as often as you need: arr['a'] = list(map(float, [1, 2, 3])) etc.
Other improvement: In Python you don't need counters in loops as you use it here. Instead, in order to do something 5 times, rather use range (the _ by convention denotes a value we are not interested in):
for _ in range(5):
print(list(arr['b']))

How to divide elements of a list by elements of another list with different dimension?

So for the following two lists:
A=[ [1,2,3,4], [2,4,6,8], [2,5,8,10], [10,20,40,50] ]
B=[2, 3, 4, 5]
A is a list of lists, and B is a list. I would like to divide the first element of each sub-list in A by the first element of B, and the second element of each sub-list in A by the second element of B, etc to produce a third list C:
C = [ [1/2, 2/3, 3/4, 4/5], [2/2, 4/3, 6/4, 8/5], [2/2, 5/3, 8/4, 10/5], [10/2 ,20/3, 40/4, 50/5] ]
I am aware that the zip() function can be used to divide each element of a list by elements of another list, but I have only seen examples of this being used when both lists have identical structures. My attempt was to use the following:
C = [ [(m/n) for m, n in zip(subm, subn)] for subm, subn in zip(A, B)]
But this returns an error, presumably because both A and B have different number of elements. May someone explain to me how I could modify the above line of code to get in order to correctly obtain C? Thank you.
since you need to divide the inner list element with B, so you need to zip the inner sublist with B and loop through the A
A=[ [1,2,3,4], [2,4,6,8], [2,5,8,10], [10,20,40,50] ]
B=[2, 3, 4, 5]
res = [[a/b for a,b in zip(i, B)] for i in A]
Another option to do it without zip is numpy:
import numpy as np
A = np.array([[1,2,3,4], [2,4,6,8], [2,5,8,10], [10,20,40,50]])
B = np.array([2, 3, 4, 5])
>>> (A/B).tolist()
[[0.5, 0.6666666666666666, 0.75, 0.8],
[1.0, 1.3333333333333333, 1.5, 1.6],
[1.0, 1.6666666666666667, 2.0, 2.0],
[5.0, 6.666666666666667, 10.0, 10.0]]

Faster method for iterating through a numpy array of numpy arrays

I have a numpy array of numpy arrays like the following example:
data = [[0.4, 1.5, 2.6],
[3.4, 0.2, 0.0],
[null, 3.2, 1.0],
[1.0, 4.6, null]]
I would like an efficient way of returning the row index, column index and value if the value meets a condition.
I need the row and column values because I feed them into func_which_returns_lat_long_based_on_row_and_column(column, row) which is applied if the value meets a condition.
Finally I would like to append the value, and outputs of the function to my_list.
I have solved my problem with the nested for loop solution shown below but it is slow. I believe I should be using np.where() however I cannot figure that out.
my_list = []
for ii, array in enumerate(data):
for jj, value in enumerate(array):
if value > 1:
lon , lat = func_which_returns_lat_long_based_on_row_and_column(jj,ii)
my_list.append([value, lon, lat])
I'm hoping there is a more efficient solution than the one I'm using above.
import numpy as np
import warnings
warnings.filterwarnings('ignore')
data = [[0.4, 1.5, 2.6],
[3.4, 0.2, 0.0],
[np.nan, 3.2, 1.0],
[1.0, 4.6, np.nan]]
x = np.array(data)
i, j = np.where(x > 1 )
for a, b in zip(i, j):
print('lon: {} lat: {} value: {}'.format(a, b, x[a,b]))
Output is
lon: 0 lat: 1 value: 1.5
lon: 0 lat: 2 value: 2.6
lon: 1 lat: 0 value: 3.4
lon: 2 lat: 1 value: 3.2
lon: 3 lat: 1 value: 4.6
As there is np.nan in comparison, there will be RuntimeWarning.
you can use
result = np.where(arr == 15)
it will return a np array of indices where element is in arr
try to build a function that works on arrays. For instance a function that adds to every element of the data the corresonding column and row index could look like:
import numpy as np
def func_which_returns_lat_long_based_on_row_and_column(data,indices):
# returns element of data + columna and row index
return data + indices[:,:,0] + indices[:,:,1]
data = np.array([[0.4, 1.5, 2.6],
[3.4, 0.2, 0.0],
[np.NaN, 3.2, 1.0],
[1.0, 4.6, np.NaN]])
# create a matrix of the same shape as data (plus an additional dim because they are two indices)
# with the corresponding indices of the element in it
x_range = np.arange(0,data.shape[0])
y_range = np.arange(0,data.shape[1])
grid = np.meshgrid(x_range,y_range, indexing = 'ij')
indice_matrix = np.concatenate((grid[0][:,:,None],grid[1][:,:,None]),axis=2)
# for instance:
# indice_matrix[0,0] = np.array([0,0])
# indice_matrix[1,0] = np.array([1,0])
# indice_matrix[1,3] = np.array([1,3])
# calculate the output
out = func_which_returns_lat_long_based_on_row_and_column(data,indice_matrix)
data.shape
>> (4,3)
indice_matrix.shape
>> (4, 3, 2)
indice_matrix
>>> array([[[0, 0],
[0, 1],
[0, 2]],
[[1, 0],
[1, 1],
[1, 2]],
[[2, 0],
[2, 1],
[2, 2]],
[[3, 0],
[3, 1],
[3, 2]]])
indice_matrix[2,1]
>> array([2, 1])

Python - find the ratios in a list of arrays

I have a data structure
my_list = [ [a, b], [c,d], [e,f], [g,h], [i, j], [k, l] ....]
where the letters are floats.
I need to find the ratio between c,e and a >>>> c/a...e/a
Then find the ratio between d,f and b >>>> d/b, f/b
and continue this for all elements 12 elements in the list. So 8 ratios calculated.
Is there a function that can do this efficiently since we are going between list elements? Without having to extract the data in the arrays individually first and then do the math.
ex_array = [[5.0, 2.5], [10.0, 5.0], [20.0, 13.0]] # makes ndarray, which makes division easier
for i in xrange(len(ex_array)):
print "\n" + str(ex_array[i][0]) + " ratios for x values:\n"
for j in xrange(len(ex_array)):
print str(ex_array[i][0] / ex_array[j][0]) + "\t|{} / {}".format(ex_array[i][0], ex_array[j][0]) # gives ratios for each nested 0 index values against the others
for i in xrange(len(ex_array)):
print "\n" + str(ex_array[i][1]) + " ratios for x values:\n"
for j in xrange(len(ex_array)):
print str(ex_array[i][1] / ex_array[j][1]) + "\t|{} / {}".format(ex_array[i][1], ex_array[j][1]) # gives ratios for each nested 1 index values against the others
output formatted as such:
The required operations must be specified anyway.
def get(l):
return [l[i+k+1][j]/float(l[i][j]) for i in range(0, len(l)-2, 3) for j in range(2) for k in range(2)]
print get([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
Using list comprehension,
# sample list
a = [ [1.0, 2.0], [4.0, 8.0], [3.0, 9.0] ]
print('List:', a, '\n\nMatching:')
# divides each element to other elements besides itself
xs = [ x[0] / x1[0] for x1 in a for x in a if x[0] != x1[0] ]
ys = [ y[1] / y1[1] for y1 in a for y in a if y[1] != y1[1] ]
print("Quo of x's:", xs)
print("Quo of y's:", ys)
Outputs to
List: [[1.0, 2.0], [4.0, 8.0], [3.0, 9.0]]
Matching:
Quo of x's: [4.0, 3.0, 0.25, 0.75, 0.3333333333333333, 1.3333333333333333]
Quo of y's: [4.0, 4.5, 0.25, 1.125, 0.2222222222222222, 0.8888888888888888]
Or you can have some fun with difflib (builtin to stdlib since Python 2.1):
In [1]: from difflib import SequenceMatcher
In [2]: SequenceMatcher(None, (5,6), (6,)).ratio()
Out[2]: 0.6666666666666666

Categories