Generate 3D "matrix" with Pandas, based on comparing two dataframes [Python] - python

Good morning everyone. I am working with Python and Pandas.
I have two DataFrames, of the following type:
df_C = pd.DataFrame(data=[[-3,-1,-1], [5,3,3], [3,3,1], [-1,-1,-3], [-3,-1,-1], [2,3,1], [1,1,1]], columns=['C1','C2','C3'])
C1 C2 C3
0 -3 -1 -1
1 5 3 3
2 3 3 1
3 -1 -1 -3
4 -3 -1 -1
5 2 3 1
6 1 1 1
df_F = pd.DataFrame(data=[[-1,1,-1,-1,-1],[1,1,1,1,1],[1,1,1,-1,1],[1,-1,-1,-1,1],[-1,0,0,-1,-1],[1,1,1,-1,0],[1,1,-1,1,-1]], columns=['F1','F2','F3','F4','F5'])
F1 F2 F3 F4 F5
0 -1 1 -1 -1 -1
1 1 1 1 1 1
2 1 1 1 -1 1
3 1 -1 -1 -1 1
4 -1 0 0 -1 -1
5 1 1 1 -1 0
6 1 1 -1 1 -1
I would like to be able to "cross" these two DataFrames, to generate or one in 3D, as follows:
The new data that is generated must compare the values of the df_F with the values of the df_C, taking into account the following:
If both values are positive, generate 1
If both values are negative, generate 1
If one value is positive and the other negative, it generates 0
If any of the values is zero, it generates None (NaN)
True table
Comparison of the data df_C vs df_F
df_C vs df_F = 3D
+ + 1
+ - 0
+ 0 None
- + 0
- - 1
- 0 None
0 + None
0 - None
0 0 None
You, who are experts in programming, could you please guide me, as I generate this matrix, I compare the values. I wish to do it with Pandas. I have done it with loops (for) and conditions (if), but it is visually unpleasant and I think that with Pandas it is more efficient and elegant.
Thank you.

Numpy broadcasting and np.select
Broadcast and multiply the values in df_C with the values from df_F in such a way that the shape of the resulting product matrix will be (3, 7, 5), then test for the condition where the values in the product matrix are positive, negative or zero and assign the corresponding values 1, 0 and NaN where the condition holds True
a = df_C.values.T[:, :, None] * df_F.values
a = np.select([a > 0, a < 0], [1, 0], np.nan)
array([[[ 1., 0., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 0., 1.],
[ 0., 1., 1., 1., 0.],
[ 1., nan, nan, 1., 1.],
[ 1., 1., 1., 0., nan],
[ 1., 1., 0., 1., 0.]],
[[ 1., 0., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 0., 1.],
[ 0., 1., 1., 1., 0.],
[ 1., nan, nan, 1., 1.],
[ 1., 1., 1., 0., nan],
[ 1., 1., 0., 1., 0.]],
[[ 1., 0., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 0., 1.],
[ 0., 1., 1., 1., 0.],
[ 1., nan, nan, 1., 1.],
[ 1., 1., 1., 0., nan],
[ 1., 1., 0., 1., 0.]]])

Related

How to read different blocks from a single csv into different arrays?

My textfile consists of different "blocks", e.g.,
0 0 1
1 1 1
1 0 0
1 0 0 1
1 1 1 1
1 1 1 0
1 0 0 1
1 1 1 1
1 1 1 0
1 0 1 0
1 0 0
0 1 1
1 1 1
1 0 0 0 0 1
1 1 1 1 0 0
1 1 0 0 1 0
1 0 1 0 0 0
I want to read each block in a np array.
I didn't find a parameter fornp.loadtxt() to read within blank lines.
I guess imposing conditions at f = open('test_case_11x5.txt', 'r') for line in f: ... is slow.
Does anyone know a neat method?
Here is a working solution using re.split and a small list comprehension. I assumes the full text is first loaded in the variable text:
import re, io
import numpy as np
# text = ... ## load here your file
[np.loadtxt(io.StringIO(t)) for t in re.split('\n\n', text)]
output:
[array([[0., 0., 1.],
[1., 1., 1.],
[1., 0., 0.]]),
array([[1., 0., 0., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 0.]]),
array([[1., 0., 0., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 0.],
[1., 0., 1., 0.]]),
array([[1., 0., 0.],
[0., 1., 1.],
[1., 1., 1.]]),
array([[1., 0., 0., 0., 0., 1.],
[1., 1., 1., 1., 0., 0.],
[1., 1., 0., 0., 1., 0.],
[1., 0., 1., 0., 0., 0.]])]
You can use the groupby function in itertools like this:
from itertools import groupby
import numpy as np
arr = []
with open('data.txt') as f_data:
for k, g in groupby(f_data, lambda x: x.startswith('#')):
if not k:
arr.append(np.array([[int(x) for x in d.split()] for d in g if len(d.strip())]))
This will yield a list of np arrays.

Expanding a matrix [duplicate]

This question already has answers here:
Quick way to upsample numpy array by nearest neighbor tiling [duplicate]
(3 answers)
Closed 4 years ago.
Given a matrix, such as:
1 0 0
0 1 1
1 1 0
I would like to expand each element to a "sub-matrix" of size AxA, e.g., 3x3, the result will be:
1 1 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1
1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 0 0 0
What is the fastest way of doing it in Python using numpy (or PyTorch)?
Since what you're describing is the Kronecker product:
Use np.kron
Computes the Kronecker product, a composite array made of blocks of the second array scaled by the first.
x = np.array([[1, 0, 0], [0, 1, 1], [1, 1, 0]])
np.kron(x, np.ones((3, 3)))
array([[1., 1., 1., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 1.],
[0., 0., 0., 1., 1., 1., 1., 1., 1.],
[0., 0., 0., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 0., 0., 0.],
[1., 1., 1., 1., 1., 1., 0., 0., 0.],
[1., 1., 1., 1., 1., 1., 0., 0., 0.]])

How to get rid of nested for loop

mazeHow do i replace the nested for loop without affecting the functionality of the code:
def addCoordinate(self, x, y, blockType):
if self.x1 < x :
self.x1 = x
if self.y1 < y:
self.y1 = y
if self.x1 >= len(self.mazeboard) or self.y1 >= len(self.mazeboard):
modified_board = [[1 for a in range(self.x1 + 1)] for b in range(self.y1 + 1)]
for a in range(len(self.mazeboard)):
for b in range(len(self.mazeboard[a])):
modified_board[a][b] = self.mazeboard[a][b]
self.mazeboard = modified_board
self.mazeboard[x][y] = blockType
Yes, the nested loops & the range(len(self.mazeboard)) are highly unpythonic here, most of all when you just want to extend a matrix like
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
to
0 0 0 0 0 1 1 1
0 0 0 0 0 1 1 1
0 0 0 0 0 1 1 1
0 0 0 0 0 1 1 1
0 0 0 0 0 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
you could work in-place, completing the existing rows with ones, and adding rows of ones until you reach the proper dimension
Self-contained example:
mazeboard = [[0]*5 for _ in range(5)]
x1 = 7
x2 = 7
old_len = len(mazeboard[0])
# extend the existing rows
for m in mazeboard:
m += [1]*(x1+1-old_len)
# add rows
mazeboard += [[1]*(x1+1) for i in range(len(mazeboard),x2+1)]
print(mazeboard)
result:
[[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1]]
so no nested loop, no useless copy, using list multiplication to generate the proper lengths for the lists to add.
If you work with a matrix in Python, you may want to consider using Numpy
You example becomes trivial with numpy. First, import numpy:
>>> import numpy as np
Create the 5x5 matrix:
>>> a=np.ones(shape=(5,5))
>>> a
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
Expand that matrix with 5 more columns and 5 more rows:
>>> a=np.pad(a,((0,5),(0,5)),mode='constant', constant_values=0)
>>> a
array([[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Instead of nested Python loops, you will have C code executing matrix function many times faster and more efficiently.

Matrix of labels to adjacency matrix

Just wondering if there is an off-the-shelf function to perform the following operation; given a matrix X, holding labels (that can be assumed to be integer numbers 0-to-N) in each entry e.g.:
X = [[0 1 1 2 2 3 3 3],
[0 1 1 2 2 3 3 4],
[0 1 5 5 5 5 3 4]]
I want its adjacency matrix G i.e. G[i,j] = 1 if i,j are adjacent in X and 0 otherwise.
For example G[1,2] = 1, because 1,2 are adjacent in (X[0,2],X[0,3]), (X[1,2],X[1,3]) etc..
The naive solution is to loop through all entries and check its neighbors, but I'd rather avoid loops for performance reason.
You can use fancy indexing to assign the values of G directly from your X array:
import numpy as np
X = np.array([[0,1,1,2,2,3,3,3],
[0,1,1,2,2,3,3,4],
[0,1,5,5,5,5,3,4]])
G = np.zeros([X.max() + 1]*2)
# left-right pairs
G[X[:, :-1], X[:, 1:]] = 1
# right-left pairs
G[X[:, 1:], X[:, :-1]] = 1
# top-bottom pairs
G[X[:-1, :], X[1:, :]] = 1
# bottom-top pairs
G[X[1:, :], X[:-1, :]] = 1
print(G)
#array([[ 1., 1., 0., 0., 0., 0.],
# [ 1., 1., 1., 0., 0., 1.],
# [ 0., 1., 1., 1., 0., 1.],
# [ 0., 0., 1., 1., 1., 1.],
# [ 0., 0., 0., 1., 1., 0.],
# [ 0., 1., 1., 1., 0., 1.]])

Matrix-like printing of 2D arrays in Python

Say I have a matrix in a numpy array in Python
In [3]: my_matrix
Out[3]:
array([[ 2., 2., 2., 2., 2., 2., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 2., 2., 2., 2., 0., 0., 0.,
0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 2., 2.,
2., 2., 2., 2., 2.]])
Is there a way to have Python/IPython print my array as:
[ 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2;
0 0 0 0 0 0 2 2 2 2 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 ]
? (~ similar to the way MATLAB does it)
Also, I have noticed that IPython does not use the full width of my terminal when printing numpy arrays. Other functions do (e.g. pprint.pprint). How can I change that?
Use numpy.set_printoptions. For increasing the line width:
np.set_printoptions(linewidth=150)
Replace 150 by whatever you need. Now, to print as you asked (I guess it means without the decimal point):
print my_matrix.astype('i')
If you have floating point values you can also control the precision for printouts with the option precision:
np.set_printoptions(precision=3)

Categories