Related
let's say I have the following line:
l = Line(Point(25, 0), Point(25, 25))
and I have a dataframe (df) which contains 2500 points, something like:
x y
0 0 49
1 13 48
2 0 47
3 5 46
4 9 45
...
How can I efficiently examine if the lines formed by each and every combination of those points intersects with the above line?
Note that I am using the intersection function from the sympy library.
And note that using two nested loop takes forever... not efficient.
It's a very beginner oriented question. I have been working on regular python threads and C threads and learnt that I can create threads that run a specific function and they use semaphores and other sync primitives.
But, I am currently trying to learn Cuda using the numba's python based compiler. I have written the following code.
from numba import cuda
import numpy as np
#cuda.jit
def image_saturate(data):
pos_x, pos_y = cuda.grid(2)
if (pos_x, pos_y) <= data.shape:
data[pos_x, pos_y] = 1
if __name__ == "__main__":
image_quality = (128, 72)
image = np.zeros(image_quality)
thread_size = 32
block_size = image_quality
image_saturate[block_size, thread_size](image)
print(image)
But, the thing that I feel weird is, I can change thread_size as I want and the result is the same - meaning the output is all ones as expected. But, the moment I change the the block_size weird things start happening and only that size of the original matrix gets filled with ones - so it's only a partial filling.
Form this I understand that the cuda.grid(2) returns the block coordinates. But, shouldn't I be able to get the actual thread coordinates and the block coordinates as well?
I am terribly new and I can't find any resources to learn online. It would be great if anyone can answer my question and also provide and resources for learning Cuda using Numba.
Form this I understand that the cuda.grid(2) returns the block coordinates.
That's not the case. That statement returns a fully-qualified 2D thread index. The range of the returned values will extend to the product of the block coordinates limit and the thread coordinates limit.
In CUDA, the grid dimension parameter for a kernel launch (the one you are calling block_size) specifies the grid dimension in terms of number of blocks (in each direction). The block dimension parameter for a kernel launch (the one you are calling thread_size) specifies the size of each block in the grid, in terms of the number of threads per block.
Therefore the total number of threads launched is equal to the product of the grid dimension(s) and the the block dimension(s). The total would be the product of all those things, in all dimensions. The total per dimension would be the product of the grid dimension in that direction and the block dimension in that direction.
So you have a questionable design choice, in that you have an image size and you are setting the grid dimension equal to the image size. This could only be sensible if you had only 1 thread per block. As you will discover by looking at any proper numba CUDA code (such as the one here) a typical approach is to divide the total desired dimension (in this case, the image size or dimension(s)), by the number of threads per block, to get the grid dimension.
When we do so, the cuda.grid() statement in your kernel code will return a tuple that has sensible ranges. In your case, it would return tuples to threads that correctly go from 0..127 in x, and 0..71 in y. The problem you have at the moment is that the cuda.grid() statement can return tuples that range from 0..((128*32)-1) in x, and that is unnecessary.
Of course, the goal of your if statement is to prevent out-of-bounds indexing, however the test of <= does not look right to me. This is the classical computer science off-by-1 error. Threads whose indices happen to match the limit returned by shape should be excluded.
But, the moment I change the the block_size weird things start happening and only that size of the original matrix gets filled with ones - so it's only a partial filling.
It's really not clear what your expectations are here. Your kernel design is such that each thread populates (at most) one output point. Therefore sensible grid sizing is to match the grid (the total threads in x, and the total threads in y) to the image dimensions. If you follow the above recommendations for grid sizing calculations, and then set your grid size to something less than your image size, I would expect that portions of your output image would not be populated. Don't do that. Or if you must do that, employ a grid-stride loop kernel design.
Having said all that, the following is how I would rewrite your code:
from numba import cuda
import numpy as np
#cuda.jit
def image_saturate(data):
pos_x, pos_y = cuda.grid(2)
if pos_x < data.shape[0] and pos_y < data.shape[1]:
data[pos_x, pos_y] = 1
if __name__ == "__main__":
image_x = 128
image_y = 72
image_quality = (image_x, image_y)
image = np.zeros(image_quality)
thread_x = 32
thread_y = 1
thread_size = (thread_x, thread_y)
block_size = ((image_x//thread_x) + 1, (image_y//thread_y) + 1) # "lazy" round-up
image_saturate[block_size, thread_size](image)
print(image)
it appears to run correctly for me. If you now suggest that what you want to do is to arbitrary modify the block_size variable e.g.:
block_size = (5,5)
and make no other changes, and expect the output image to be fully populated, I would say that is not a sensible expectation. I have no idea how that could be sensible, so I will just say that CUDA doesn't work that way. If you wish to "decouple" the data size from the grid size, the canonical way to do it is the grid stride loop as already discussed.
I've also removed the tuple comparison. I don't think it is really germane here. If you still want to use the tuple comparison, that should work exactly as you would expect based on python. There isn't anything CUDA specific about it.
I'm late to the game, but I thought an explanation with more visual components might help in lending some clarity to the existing answer. It is surprisingly hard to find illustrative answers for how thread indexing works in cuda. Though the concepts are very easy to hold in your head once you come to understand them, there can be many points for confusion along the way to getting that understanding, hopefully this helps.
Pardon the lack of discussion outside of the comments of the script, but this seems like a case where the context of the code will help avoid miscommunication and help to demonstrate the indexing concepts discussed by others. So I'll leave you with the script, the comments therein, and the compressed output.
To produce uncompressed output, see the first few comments in the thread-main block.
Example script:
from numba import cuda
import numpy as np
#cuda.jit
def image_saturate(data):
grid_pos_x, grid_pos_y = cuda.grid(2)
tx = cuda.threadIdx.x
ty = cuda.threadIdx.y
bx = cuda.blockIdx.x
by = cuda.blockIdx.y
if (grid_pos_x, grid_pos_y) < data.shape:
# note that the cuda device array retains the stride order of the original numpy array,
# so the data is in row-major order (numpy default) and the first index (where we are
# using grid_pos_x) doesn't actually map to the horizontal axis (aka the typical x-axis),
# but rather maps to vertical axis (the typical y-axis).
#
# And, as you would then expecet, the second axis (where we are using grid_pos_y) maps
# to the horizontal axis of the array.
#
# What you should take away from this observation is that the x,y labels of cuda elements
# have no explicit connection to the array's memory layout.
#
# Therefore, it is up to you, the programmer to understand the memory layout for your
# array (whether it's the C-like row-major, or the Fortran-like column-major), and how
# you should map the (x,y) thread IDs onto the coordinates of your array.
data[grid_pos_x, grid_pos_y,0,0] = tx
data[grid_pos_x, grid_pos_y,0,1] = ty
data[grid_pos_x, grid_pos_y,0,2] = tx + ty
data[grid_pos_x, grid_pos_y,1,0] = bx
data[grid_pos_x, grid_pos_y,1,1] = by
data[grid_pos_x, grid_pos_y,1,2] = bx + by
data[grid_pos_x, grid_pos_y,2,0] = grid_pos_x
data[grid_pos_x, grid_pos_y,2,1] = grid_pos_y
data[grid_pos_x, grid_pos_y,2,2] = grid_pos_x + grid_pos_y
if __name__ == "__main__":
# uncomment the following line and remove the line after it
# if you run this code to get more readable results
# np.set_printoptions(linewidth=500)
np.set_printoptions(linewidth=500,threshold=3) # compressed output for use on stack overflow
# image_quality = (128, 72)
# we are shrinking image_quality to be 23x21 to make the printout easier to read,
# and intentionally step away from the alignment of threads per block being a
# multiplicative factor to the image shape. THIS IS BAD for practical applications,
# it's just helpful for this illustration.
image_quality = (23,21)
image = np.zeros(image_quality+(3,3),int)-1
## thread_size = 32 # commented to show where the original variable was used
# below, we rename the variable to be more semantically clear
# Note: to define the desired thread-count in multiple axis, threads_per_block
# would have to become a tuple, E.G.:
# # defines 32 threads in thread-block's x axis, and 16 in the y
# threads_per_block = 32,16
threads_per_block = 32
#threads_per_block = 32 results in an implicit 1 for the y-axis, and implicit 1 in the z.
### Thread blocks are always 3d
# this is also true for the thread grid the device will create for your kernel
## block_size = image_quality
# renaming block_size to semantically more accurate variable name
# Note: As with the threads_per_block, any axis we don't explicitly specify a size
# for will be given the default value of 1. So, because image_quality gives 2 values,
# for x/y respectively, the z axis will implicitly be given size of 1.
block_count = image_quality
# REMEMBER: The thread/block/grid dimensions we are passing to the function compiler
# are NOT used to infer details about the arguments being passed to the
# compiled function (our image array)
# It is up to us to write code that appropriately utilizes the arrangement
# of the thread blocks the device will build for us once inside the kernel.
# SEE THE COMMENT INSIDE THE image_saturate function above.
image_saturate[block_count, threads_per_block](image)
print(f"{block_count=}; {threads_per_block=}; {image.shape=}")
print("thread id within block; x")
print(image[:,:,0,0])
print("\nthread id within block; y"
"\n-- NOTE 1 regarding all zeros: see comment at the end of printout")
print(image[:,:,0,1])
print("\nsum of x,y thread id within block")
print(image[:,:,0,2])
print("\nblock id within grid; x"
"\n-- NOTE 2 also regarding all zeros: see second comment at the eod of printout")
print(image[:,:,1,0])
print("\nblock id within grid; y")
print(image[:,:,1,1])
print("\nsum of x,y block id within grid")
print(image[:,:,1,2])
print("\nthread unique global x id within full grid; x")
print(image[:,:,2,0])
print("\nthread unique global y id within full grid; y")
print(image[:,:,2,1])
print("\nsum of thread's unique global x,y ids")
print(image[:,:,2,2])
print(f"{'End of 32 threads_per_block output':-<70}")
threads_per_block = 16
# reset the values of image so we can be sure to see if any elements
# of the image go unassigned
image *= 0
image -= 1
# block_count = image_quality # if you wanted to try
print(f"\n\n{block_count=}; {threads_per_block=}; {image.shape=}")
image_saturate[block_count, threads_per_block](image)
print("thread id within block; x")
print(image[:,:,0,0])
print("\nthread id within block; y "
"\n-- again, see NOTE 1")
print(image[:,:,0,1])
print("\nsum of x,y thread id within block")
print(image[:,:,0,2])
print("\nblock id within grid; x "
"\n-- notice that unlike when we had 32 thread_per_block, not all 0")
print(image[:,:,1,0])
print("\nblock id within grid; y")
print(image[:,:,1,1])
print("\nsum of x,y block id within grid")
print(image[:,:,1,2])
print("\nthread unique global x id within full grid; x")
print(image[:,:,2,0])
print("\nthread unique global y id within full grid; y")
print(image[:,:,2,1])
print("\nsum of thread's unique global x,y ids")
print(image[:,:,2,2])
from textwrap import dedent
print(dedent("""
NOTE 1:
The thread IDs recorded for 'thread id within block; y'
are all zero for both versions of `threads_per_block` because we never
specify the number of threads per block that should be created for
the 'y' axis.
So, the compiler defaults to creating only a single thread along those
undefined axis of each block. For that reason, we see that the only
threadID.y value stored is 0 for all i,j elements of the array.
NOTE 2:
**Note 2 mostly pertains to the case where threads_per_block == 32**
The block IDs recorded for 'block id within grid; x' are all zero for
both versions of `threads_per_block` results from similar reasons
mentioned in NOTE 1.
The size of a block, in any axis, is determined by the specified number
of threads for that axis. In this example script, we define threads_per_block
to have an explicit 32 threads in the x axis, leaving the compiler to give an
implicit 1 for both the y and z axis. We then tell the compiler to create 23 blocks
in the x-axis, and 21 blocks in the y; resulting in:
\t* A kernel where the device creates a grid of blocks, 23:21:1 for 483 blocks
\t\t* (x:y:z -> 23:21:1)
\t* Where each block has 32 threads
\t\t* (x:y:z -> 32:1:1)
\t* And our image has height:width of 23:21 for 483 'pixels' in each
\t contrived layer of the image.
As it is hopefully being made clear now, you should see that because each
block has 32 threads on its x-axis, and we have only 23 elements on the corresponding
axis in the image, only 1 of the 23 blocks the device created along the grid's x-axis
will be used. Do note that the overhead of creating those unused blocks is a gross waste
of GPU processor time and could potentially reduce the available resources to the block
that does get used."""))
The output:
block_count=(23, 21); threads_per_block=32; image.shape=(23, 21, 3, 3)
thread id within block; x
[[ 0 0 0 ... 0 0 0]
[ 1 1 1 ... 1 1 1]
[ 2 2 2 ... 2 2 2]
...
[20 20 20 ... 20 20 20]
[21 21 21 ... 21 21 21]
[22 22 22 ... 22 22 22]]
thread id within block; y
-- NOTE 1 regarding all zeros: see comment at the end of printout
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
sum of x,y thread id within block
[[ 0 0 0 ... 0 0 0]
[ 1 1 1 ... 1 1 1]
[ 2 2 2 ... 2 2 2]
...
[20 20 20 ... 20 20 20]
[21 21 21 ... 21 21 21]
[22 22 22 ... 22 22 22]]
block id within grid; x
-- NOTE 2 also regarding all zeros: see second comment at the eod of printout
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
block id within grid; y
[[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
...
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]]
sum of x,y block id within grid
[[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
...
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]]
thread unique global x id within full grid; x
[[ 0 0 0 ... 0 0 0]
[ 1 1 1 ... 1 1 1]
[ 2 2 2 ... 2 2 2]
...
[20 20 20 ... 20 20 20]
[21 21 21 ... 21 21 21]
[22 22 22 ... 22 22 22]]
thread unique global y id within full grid; y
[[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
...
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]]
sum of thread's unique global x,y ids
[[ 0 1 2 ... 18 19 20]
[ 1 2 3 ... 19 20 21]
[ 2 3 4 ... 20 21 22]
...
[20 21 22 ... 38 39 40]
[21 22 23 ... 39 40 41]
[22 23 24 ... 40 41 42]]
End of 32 threads_per_block output------------------------------------
block_count=(23, 21); threads_per_block=16; image.shape=(23, 21, 3, 3)
thread id within block; x
[[0 0 0 ... 0 0 0]
[1 1 1 ... 1 1 1]
[2 2 2 ... 2 2 2]
...
[4 4 4 ... 4 4 4]
[5 5 5 ... 5 5 5]
[6 6 6 ... 6 6 6]]
thread id within block; y
-- again, see NOTE 1
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
sum of x,y thread id within block
[[0 0 0 ... 0 0 0]
[1 1 1 ... 1 1 1]
[2 2 2 ... 2 2 2]
...
[4 4 4 ... 4 4 4]
[5 5 5 ... 5 5 5]
[6 6 6 ... 6 6 6]]
block id within grid; x
-- notice that unlike when we had 32 thread_per_block, not all 0
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[1 1 1 ... 1 1 1]
[1 1 1 ... 1 1 1]
[1 1 1 ... 1 1 1]]
block id within grid; y
[[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
...
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]]
sum of x,y block id within grid
[[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
...
[ 1 2 3 ... 19 20 21]
[ 1 2 3 ... 19 20 21]
[ 1 2 3 ... 19 20 21]]
thread unique global x id within full grid; x
[[ 0 0 0 ... 0 0 0]
[ 1 1 1 ... 1 1 1]
[ 2 2 2 ... 2 2 2]
...
[20 20 20 ... 20 20 20]
[21 21 21 ... 21 21 21]
[22 22 22 ... 22 22 22]]
thread unique global y id within full grid; y
[[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
...
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]
[ 0 1 2 ... 18 19 20]]
sum of thread's unique global x,y ids
[[ 0 1 2 ... 18 19 20]
[ 1 2 3 ... 19 20 21]
[ 2 3 4 ... 20 21 22]
...
[20 21 22 ... 38 39 40]
[21 22 23 ... 39 40 41]
[22 23 24 ... 40 41 42]]
NOTE 1:
The thread IDs recorded for 'thread id within block; y'
are all zero for both versions of `threads_per_block` because we never
specify the number of threads per block that should be created for
the 'y' axis.
So, the compiler defaults to creating only a single thread along those
undefined axis of each block. For that reason, we see that the only
threadID.y value stored is 0 for all i,j elements of the array.
NOTE 2:
**Note 2 mostly pertains to the case where threads_per_block == 32
is greater than the number of elements in the corresponding axis of the image**
The block.x IDs recorded for 'block id within grid; x' are all zero for
the `32` version of `threads_per_block` the relative difference in size between
the specified number of threads per block and the number of elements in the
image along the corresponding axis.
The size of a block, in any axis, is determined by the specified number
of threads for that axis. In this example script, we define threads_per_block
to have an explicit 32 threads in the x axis, leaving the compiler to give an
implicit 1 for both the y and z axis. We then tell the compiler to create 23 blocks
in the x-axis, and 21 blocks in the y; resulting in:
* A kernel where the device creates a grid of blocks, 23:21:1 for 483 blocks
* (x:y:z -> 23:21:1)
* Where each block has 32 threads
* (x:y:z -> 32:1:1)
* And our image has height:width of 23:21 for 483 'pixels' in each
contrived layer of the image.
As it is hopefully being made clear now, you should see that because each
block has 32 threads on its x-axis, and we have only 23 elements on the corresponding
axis in the image, only 1 of the 23 blocks the device created along the grid's x-axis
will be used. Do note that the overhead of creating those unused blocks is a gross waste
of GPU processor time and could potentially reduce the available resources to the block
that does get used.
I have a large dataframe (500K rows x 100 cols) and want to do the following search-and-masking operation efficiently, but I can't find the right pandas/numpy incantation; better still if it can be vectorized:
on each row, the N columns m1,m2,...,m6 can contain distinct values from 1..9, or else trailing NaNs. (The NaNs are there for a very good reason, to prevent aggregation/taking sum/mean/etc. on nonexistent records when we process the output from this step; it is very strongly desirable that you preserve the NaNs)
distinctness: it is guaranteed that the columns m<i> will contain at most one occurrence of each of the values 1..9
columns x1,x2,...,x6 are associated with the columns m<i>, and contain some integer values
For each possible value v in range 1..9 (I will manually sweep v from 1:9 at top-level of my analysis, don't worry about that part), I want to do the following:
on each row where that value v occurs in one of the m<i>, find which column m<i> equals v (either as boolean mask/array/indices/anything else you prefer)
on rows where v doesn't occur in m<i>, preferably I don't want any result for that row, not even NaN
then I want to use that intermediate boolean mask/array/indices/whatever to slice the corresponding value from the x<i> (x1,x2,...,x6) on that row
Here's my current code; I tried iloc, melt, stack/unstack, mask, np.where, np.select and other things but can't get the desired result:
import numpy as np
from numpy import nan
import pandas as pd
N = 6 # the width of our column-slices of interest
# Sample dataframe
dat = pd.compat.StringIO("""
text,m1,m2,m3,m4,m5,m6,x1,x2,x3,x4,x5,x6\n
'foo',9,3,4,2,1,, 21,22,23,24,25,26\n
'bar',2,3,4,6,5,, 31,32,33,34,35,36\n
'baz',7,3,4,1,,, 11,12,13,14,15,16\n
'qux',2,6,3,4,7,, 41,42,43,44,45,46\n
'gar',3,1,4,7,,, 51,52,53,54,55,56\n
'wal',3,,,,,, 11,12,13,14,15,16\n
'fre',2,3,4,6,5,, 61,62,63,64,65,66\n
'plu',2,3,4,9,1,, 71,72,73,74,75,76\n
'xyz',2,3,4,9,6,1, 81,82,83,84,85,86\n
'thu',1,3,6,4,5,, 51,52,53,54,55,56""".replace(' ',''))
df = pd.read_csv(dat, header=[0])
v = 1 # For example; Actually we want to sweep v from 1:9 ...
# On each row, find the index 'i' of column 'm<i>' which equals v; or NaN if v doesn't occur
df.iloc[:, 1:N+1] == v
(df.iloc[:, 1:N+1] == 1).astype(np.int64)
# m1 m2 m3 m4 m5 m6
# 0 0 0 0 0 1 0
# 1 0 0 0 0 0 0
# 2 0 0 0 1 0 0
# 3 0 0 0 0 0 0
# 4 0 1 0 0 0 0
# 5 0 0 0 0 0 0
# 6 0 0 0 0 0 0
# 7 0 0 0 0 1 0
# 8 0 0 0 0 0 1
# 9 1 0 0 0 0 0
# np.where() seems useful...
_ = np.where((df.iloc[:, 1:N+1] == 1).astype(np.int64))
# (array([0, 2, 4, 7, 8, 9]), array([4, 3, 1, 4, 5, 0]))
# But you can't directly use df.iloc[ np.where((df.iloc[:, 1:N+1] == 1).astype(np.int64)) ]
# Feels like you want something like df.iloc[ *... ] where we can pass in our intermediate result as separate vectors of row- and col-indices
# can't unpack the np.where output into separate row- and col- indices vectors
irow,icol = *np.where((df.iloc[:, 1:N+1] == 1).astype(np.int64))
SyntaxError: can't use starred expression here
# ...so unpack manually...
irow = _[0]
icol = _[1]
# ... but now can't manage to slice the `x<i>` with those...
df.iloc[irow, 7:13] [:, icol.tolist()]
TypeError: unhashable type: 'slice'
# Want to get numpy-type indexing, rather than pandas iloc[]
# This also doesn't work:
df.iloc[:, 7:13] [list(zip(*_))]
# Want to slice into the x<i> which are located in df.iloc[:, N+1:2*N+1]
# Or any alternative faster numpy/pandas implementation...
For readability, and to avoid float notation in df, I first used
the following instruction to change NaN values to 0 and change their type to int:
df.fillna(0, downcast='infer', inplace=True)
SOLUTION 1
And now get down to the main task, for v == 1. Start with:
x1 = np.argwhere(df.iloc[:, 1:N+1].values == v)
The result is:
[[0 4]
[2 3]
[4 1]
[7 4]
[8 5]
[9 0]]
They are indices of elements == v in the subset of df.
Then, to "shift" to indices of the target elements, in the whole df,
we have to add 7 (actually, N+1) to each column index:
x2 = x1 + [0, N+1]
The result is:
[[ 0 11]
[ 2 10]
[ 4 8]
[ 7 11]
[ 8 12]
[ 9 7]]
And to get the result (for v == 1), execute:
df.values[tuple(x2.T)]
The result is:
array([25, 14, 52, 75, 86, 51], dtype=object)
Alternative: If you want the above result in a single instruction, run:
df.values[tuple((np.argwhere(df.iloc[:, 1:N+1].values == v) + [0, N+1]).T)]
The procedure described above gives result for v == 1.
It is up to you how to assemble results of each pass (for v = 1..9) into the
final result. You didn't decribe this detail in your question (or I failed
to see and understand it).
One of possible solutions is:
pd.DataFrame([ df.values[tuple((np.argwhere(df.iloc[:, 1:N+1].values
== v) + [0, N+1]).T)].tolist() for v in range(1,10) ],
index=range(1,10)).fillna('-')
giving the following result:
0 1 2 3 4 5 6 7 8 9
1 25 14 52 75 86 51 - - - -
2 24 31 41 61 71 81 - - - -
3 22 32 12 43 51 11 62 72 82 52
4 23 33 13 44 53 63 73 83 54 -
5 35 65 55 - - - - - - -
6 34 42 64 85 53 - - - - -
7 11 45 54 - - - - - - -
8 - - - - - - - - - -
9 21 74 84 - - - - - - -
Index values are taken from the current value of v.
It is up to you whether you are happy about default
column names (consecutive numbers from 0).
Additional remark: Remove apostrophes surrounding values in the first
column (e.g. change 'foo' to just foo).
Otherwise these apostrophes are part of the column content, and I suppose
you don't want it.
Note that e.g. in the first row of your source column names are without
apostrophes and read_csv is clever enough to recognize them as string
values.
EDIT - SOLUTION 2
Another, maybe simpler solution:
As we operate on the underlying NumPy table, instead of .values
in a number of points, start with:
tbl = df.values
Then, for a single v value, instead of argwhere use nonzero:
tbl[:, N+1:][np.nonzero(tbl[:, 1:N+1] == v)]
Details:
tbl[:, 1:N+1] - the slice for m... columns.
np.nonzero(tbl[:, 1:N+1] == v) - a tuple of lists - indices of
the "wanted" elements, grouped by axis, so it can be directly
used in indexation.
tbl[:, N+1:] - the slice for x<i> columns.
An important difference between nonzero and argwhere is that
nonzero returns a tuple so adding of a "shift" value to the
column number is more difficult, so I decided to take a different
slice (for x<i> columns) instead.
link: https://cw.felk.cvut.cz/courses/a4b33alg/task.php?task=pary_py&idu=2341
I want to input the matrix split by space by using:
def neighbour_pair(l):
matrix = [[int(row) for row in input().split()] for i in range(l)]
but the program told me
TypeError: 'str' object cannot be interpreted as an integer
It seems the .split() didn't work but I don't know why.
here is an example of the input matrix:
13 5
7 50 0 0 1
2 70 10 11 0
4 30 9 0 0
6 70 0 0 0
1 90 8 12 0
9 90 0 2 1
13 90 0 6 0
5 30 4 3 0
12 80 0 0 1
10 50 0 0 1
11 50 0 0 0
3 80 1 13 0
8 70 7 0 1
The input is a binary tree with N nodes, the nodes are labeled by numbers 1 to N in random order, each label is unique. Each node contains an integer key in the range from 0 to (2^31)−1.
The first line of input contains two integers N and R separated by space. N is the number of nodes in the tree, R is the label of the tree root.
Next, there are N lines. Each line describes one node and the order of the nodes is arbitrary. A node is specified by five integer values. The first value is the node label, the second value is the node key, the third and the fourth values represent the labels of the left and right child respectively, and the fifth value represents the node color, white is 0, black is 1. If any of the children does not exist there is value 0 instead of the child label at the corresponding place. The values on the line are separated by a space.
This is the range() complaining that your l variable is a string:
>>> range('1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object cannot be interpreted as an integer
I suspect you are reading the l from the standard in as well, cast it to integer:
l = int(input())
matrix = [[int(row) for row in input().split()] for i in range(l)]
I agree with #alecxe. It seems that your error is in reference to the string being used as l in your range(l) function. If I put a static int in the range() function it seems to work. 3 followed by three rows of input, will give me the below output.
>>> l = input() # define the number of rows expected the input matrix
>>> [[int(row) for row in input().split()] for i in range(int(l))]
13 5
7 50 0 0 1
2 70 10 11 0
output
[[13, 5], [7, 50, 0, 0, 1], [2, 70, 10, 11, 0]]
Implemented as a method, per the OP request in the comments below:
def neighbour_pair():
l = input()
return [[int(row) for row in input().split()] for i in range(int(l))]
print( neighbour_pair() )
# input
# 3
# 13 5
# 7 50 0 0 1
# 2 70 10 11 0
# output
[[13, 5], [7, 50, 0, 0, 1], [2, 70, 10, 11, 0]]
Still nothing wrong with this implementation...
I have a 3D matrix (lon, lat, hight) which some elements have the value 0. I want to replace those values with the data in their previus level until all zero data are replaced. It means that if 'a' is the matrix[i, j, k], then I want to replace the zero values in it with [i,j,k-1] and if the previous value is zero again it takes the previus data until it gets value. I have tried the code below but it gives error and what ever that I do the result is nonesense. LW is a nc file.
LW = S.netcdf_file('/path','r')
a = LW.variables['nflx'][:,:,:]
lona = LW.variables['lon'][:]
lata = LW.variables['lat'][:]
M = np.zeros([96,73,25])
for i in xrange(0, 96):
for j in xrange(0, 73):
for k in xrange(0,25):
while a==0:
M = a[:,:,k-1]
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Does anybody have any idea about it? All helps are appreciated.
Here is an easy and dynamic way. Every subarray M[i,j] will be handled as loop. If the first value is zero, it will be replaced by the last value.
>>> M = np.arange(20)
... M[[6,11,12,19]] = 0
... M = M.reshape((2,2,5))
... print(M)
[[[ 0 1 2 3 4]
[ 5 0 7 8 9]]
[[10 0 0 13 14]
[15 16 17 18 0]]]
>>> for i in np.ndindex(M.shape[:-1]):
... while 0 in M[i]:
... args = np.argwhere(M[i]==0)
... M[i][args] = M[i][args-1]
... print(M)
[[[ 4 1 2 3 4]
[ 5 5 7 8 9]]
[[10 10 10 13 14]
[15 16 17 18 18]]]