Consider the question:
The grid is:
[ [3, 0, 8, 4],
[2, 4, 5, 7],
[9, 2, 6, 3],
[0, 3, 1, 0] ]
The max viewed from top (i.e. max across columns) is: [9, 4, 8, 7]
The max viewed from left (i.e. max across rows) is: [8, 7, 9, 3]
I know how to define a grid in Python:
maximums = [[0 for x in range(len(grid[0]))] for x in range(len(grid))]
Getting maximum across rows looks easy:
max_top = [max(x) for x in grid]
But how to get maximum across columns?
Further, I need to find a way to do so in linear space O(M+N) where MxN is the size of the Matrix.
Use zip:
result = [max(i) for i in zip(*grid)]
In Python, * is not a pointer, rather, it is used for unpacking a structure passed to an object's parameter or specifying that the object can receive a variable number of items. For instance:
def f(*args):
print(args)
f(434, 424, "val", 233, "another val")
Output:
(434, 424, 'val', 233, 'another val')
Or, given an iterable, each item can be inserted at its corresponding function parameter:
def f(*args):
print(args)
f(*["val", "val3", 23, 23])
>>>('val', 'val3', 23, 23)
zip "transposes" a listing of data i.e each row becomes a column, and vice versa.
You could use numpy:
import numpy as np
x = np.array([ [3, 0, 8, 4],
[2, 4, 5, 7],
[9, 2, 6, 3],
[0, 3, 1, 0] ])
print(x.max(axis=0))
Output:
[9 4 8 7]
You said that you need to do this in O(m+n) space (not using numpy), so here's a solution that doesn't recreate the matrix:
max = x[0]
for i in x:
for j, k in enumerate(i):
if k > max[j]:
max[j] = k
print(max)
Output:
[9, 4, 8, 7]
I figured a shortcut too:
transpose the matrix and then just take maximum over rows:
grid_transposed = [[grid[j][i] for j in range(len(grid[0]))] for i in range(len(grid))]
max_left = [max(x) for x in grid]
But then again this takes O(M*N) space I have to alter the matrix.
I don't want to use numpy as external libraries are not allowed in any assignments.
Easiest way is to use numpy's array max:
array.max(0)
Something like these works both ways and is quite easy to read:
# 1.
maxLR, maxTB = [], []
maxlr, maxtb = 0, 0
# max across rows
for i, x in enumerate(grid):
maxlr = 0
for j, y in enumerate(grid[0]):
maxlr = max(maxlr, grid[i][j])
maxLR.append(maxlr)
# max across columns
for j, y in enumerate(grid[0]):
maxtb = 0
for i, x in enumerate(grid):
maxtb = max(maxtb, grid[i][j])
maxTB.append(maxtb)
# 2.
row_maxes = [max(row) for row in grid]
col_maxes = [max(col) for col in zip(*grid)]
Related
Note : almost duplicate of Numpy vectorization: Find intersection between list and list of lists
Differences :
I am focused on efficiently when the lists are large
I'm searching for the largest intersections.
x = [500 numbers between 1 and N]
y = [[1, 2, 3], [4, 5, 6, 7], [8, 9], [10, 11, 12], etc. up to N]
Here are some assumptions:
y is a list of ~500,000 sublist of ~500 elements
each sublist in y is a range, so y is characterized by the last elements of each sublists. In the example : 3, 7, 9, 12 ...
x is not sorted
y contains once and only once each numbers between 1 and ~500000*500
y is sorted in the sense that, as in the example, the sub-lists are sorted and the first element of one sublist is the next of the last element of the previous list.
y is known long before even compile-time
My purpose is to know, among the sublists of y, which have at least 10 intersections with x.
I can obviously make a loop :
def find_best(x, y):
result = []
for index, sublist in enumerate(y):
intersection = set(x).intersection(set(sublist))
if len(intersection) > 2: # in real live: > 10
result.append(index)
return(result)
x = [1, 2, 3, 4, 5, 6]
y = [[1, 2, 3], [4], [5, 6], [7], [8, 9, 10, 11]]
res = find_best(x, y)
print(res) # [0, 2]
Here the result is [0,2] because the first and third sublist of y have 2 elements in intersection with x.
An other method should to parse only once y and count the intesections :
def find_intersec2(x, y):
n_sublists = len(y)
res = {num: 0 for num in range(0, n_sublists + 1)}
for list_no, sublist in enumerate(y):
for num in sublist:
if num in x:
x.remove(num)
res[list_no] += 1
return [n for n in range(n_sublists + 1) if res[n] >= 2]
This second method uses more the hypothesis.
Questions :
what optimizations are possibles ?
Is there a completely different approach ? Indexing, kdtree ? In my use case, the large list y is known days before the actual run. So i'm not afraid to buildind an index or whatever from y. The small list x is only known at runtime.
Since y contains disjoint ranges and the union of them is also a range, a very fast solution is to first perform a binary search on y and then count the resulting indices and only return the ones that appear at least 10 times. The complexity of this algorithm is O(Nx log Ny) with Nx and Ny the number of items in respectively x and y. This algorithm is nearly optimal (since x needs to be read entirely).
Actual implementation
First of all, you need to transform your current y to a Numpy array containing the beginning value of all ranges (in an increasing order) with N as the last value (assuming N is excluded for the ranges of y, or N+1 otherwise). This part can be assumed as free since y can be computed at compile time in your case. Here is an example:
import numpy as np
y = np.array([1, 4, 8, 10, 13, ..., N])
Then, you need to perform the binary search and check that the values fits in the range of y:
indices = np.searchsorted(y, x, 'right')
# The `0 < indices < len(y)` check should not be needed regarding the input.
# If so, you can use only `indices -= 1`.
indices = indices[(0 < indices) & (indices < len(y))] - 1
Then you need to count the indices and filter the ones with at least :
uniqueIndices, counts = np.unique(indices, return_counts=True)
result = uniqueIndices[counts >= 10]
Here is an example based on your:
x = np.array([1, 2, 3, 4, 5, 6])
# [[1, 2, 3], [4], [5, 6], [7], [8, 9, 10, 11]]
y = np.array([1, 4, 5, 7, 8, 12])
# Actual simplified version of the above algorithm
indices = np.searchsorted(y, x, 'right') - 1
uniqueIndices, counts = np.unique(indices, return_counts=True)
result = uniqueIndices[counts >= 2]
# [0, 2]
print(result.tolist())
It runs in less than 0.1 ms on my machine on a random input based on your input constraints.
Turn y into 2 dicts.
index = { # index to count map
0 : 0,
1 : 0,
2 : 0,
3 : 0,
4 : 0
}
y = { # elem to index map
1: 0,
2: 0,
3: 0,
4: 1,
5: 2,
6: 2,
7: 3,
8 : 4,
9 : 4,
10 : 4,
11 : 4
}
Since you know y in advance, I don't count the above operations into the time complexity. Then, to count the intersection:
x = [1, 2, 3, 4, 5, 6]
for e in x: index[y[e]] += 1
Since you mentioned x is small, I try to make the time complexity depends only on the size of x (in this case O(n)).
Finally, the answer is the list of keys in index dict where the value is >= 2 (or 10 in real case).
answer = [i for i in index if index[i] >= 2]
This uses y to create a linear array mapping every int to the (1 plus), the index of the range or subgroup the int is in; called x2range_counter.
x2range_counter uses a 32 bit array.array type to save memory and can be cached and used for calculations of all x on the same y.
calculating the hits in each range for a particular x is then just indirected array incrementing of a count'er in function count_ranges`.
y = [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11, 12]]
x = [5, 3, 1, 11, 8, 10]
range_counter_max = len(y)
extent = y[-1][-1] + 1 # min in y must be 1 not 0 remember.
x2range_counter = array.array('L', [0] * extent) # efficient 32 bit array storage
# Map any int in any x to appropriate ranges counter.
for range_counter_index, rng in enumerate(y, start=1):
for n in rng:
x2range_counter[n] = range_counter_index
print(x2range_counter) # array('L', [0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4])
# x2range_counter can be saved for this y and any x on this y.
def count_ranges(x: List[int]) -> List[int]:
"Number of x-hits on each y subgroup in order"
# Note: count[0] initially catches errors. count[1..] counts x's in y ranges [0..]
count = array.array('L', [0] * (range_counter_max + 1))
for xx in x:
count[x2range_counter[xx]] += 1
assert count[0] == 0, "x values must all exist in a y range and y must have all int in its range."
return count[1:]
print(count_ranges(x)) # array('L', [1, 2, 1, 2])
I created a class for this, with extra functionality such as returning the ranges rather than the indices; all ranges hit >=M times; (range, hit-count) tuples sorted most hit first.
Range calculations for different x are proportional to x and are simple array lookups rather than any hashing of dicts.
What do you think?
I have a pandas dataframe,
[[1, 3],
[4, 4],
[2, 8]...
]
I want to create a column that has this:
1*(a)^(3) # = x
1*(a)^(3 + 4) + 4 * (a)^4 # = y
1*(a)^(3 + 4 + 8) + 4 * (a)^(4 + 8) + 2 * (a)^8 # = z
...
Where "a" is some value.
The stuff: 1, 4, 2, is from column one, the repeated 3, 4, 8 is column 2
Is this possible using some form of transform/apply?
Essentially getting:
[[1, 3, x],
[4, 4, y],
[2, 8, z]...
]
Where x, y, z is the respective sums from the new column (I want them next to each other)
There is a "groupby" that is being done on the dataframe, and this is what I want to do for a given group
If I'm understanding your question correctly, this should work:
df = pd.DataFrame([[1, 3], [4, 4], [2, 8]], columns=['a', 'b'])
a = 42
new_lst = []
for n in range(len(lst)):
z = 0
i = 0
while i <= n:
z += df['a'][i]*a**(sum(df['b'][i:n+1]))
i += 1
new_lst.append(z)
df['new'] = new_lst
Update:
Saw that you are using pandas and updated with dataframe methods. Not sure there's an easy way to do this with apply since you need a mix of values from different rows. I think this for loop is still the best route.
Here two cells are considered adjacent if they share a boundary.
For example :
A = 5 6 4
2 1 3
7 9 8
Here adjacent elements to index 0,0 is at index [0,1] and [1,0] and for index 1,1 the adjacent elements are at index [0,1],[1,0],[2,1] and [1,2].
Supposed you have mxn matrix, and you want to find the adjacent indices of the cell (i, j):
def get_adjacent_indices(i, j, m, n):
adjacent_indices = []
if i > 0:
adjacent_indices.append((i-1,j))
if i+1 < m:
adjacent_indices.append((i+1,j))
if j > 0:
adjacent_indices.append((i,j-1))
if j+1 < n:
adjacent_indices.append((i,j+1))
return adjacent_indices
To also check for diagonals, regarding what Casper Dijkstrao asked, I usually write some code like this:
def adj_finder(matrix, position):
adj = []
for dx in range(-1, 2):
for dy in range(-1, 2):
rangeX = range(0, matrix.shape[0]) # X bounds
rangeY = range(0, matrix.shape[1]) # Y bounds
(newX, newY) = (position[0]+dx, position[1]+dy) # adjacent cell
if (newX in rangeX) and (newY in rangeY) and (dx, dy) != (0, 0):
adj.append((newX, newY))
return adj
The function gets the matrix argument to extract the size of its row and column (I use numpy, so matrix.shape returns (row_size, column_size) tuple).
It also gets the current cell as pointer argument (it's like (X,Y)).
Then It generate adjacent cells, and if they were legit (1. they were not out of the bound, and 2. not identical to reference position), it adds them to adjacent list, adj.
I'd like to emphasize that using the above algorithm, you can easily obtain neighbors in farther distances as well. Just modify the range in the for loops, like this:
for v in range(0-distance, 1+distance):
for h in range(0-distance, 1+distance):
...
Where distance is the maximum distance of adjacent you want to let in.
This will be another way - prob. involve some math tricks or regarded more concise (if you're more math-inclined) :
def neighbours(grid, r, c):
vals = sum((row[c -(c>0): c+2]
for row in grid[r -(r>0):r+2]), [])
vals.remove(grid[r][c]) # rm itself.
return vals
grid = [[1, 5, 4, 9],
[2, 8, 3, 8],
[6, 3, 6, 3],
[7, 4, 7, 1]]
Outputs: (all items are in order)
print(f' {neighbours(grid, 2, 2)} ') # [8, 3, 8, 3, 3, 4, 7, 1]
print(f' {neighbours(grid, 0, 0)} ') # [5, 2, 8]
print(f' {neighbours(grid, 1, 1)} ') # [[1, 5, 4, 2, 3, 6, 3, 6]
import numpy
square = numpy.reshape(range(0,16),(4,4))
square
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In the above array, how do I access the primary diagonal and secondary diagonal of any given element? For example 9.
by primary diagonal, I mean - [4,9,14],
by secondary diagonal, I mean - [3,6,9,12]
I can't use numpy.diag() cause it takes the entire array to get the diagonal.
Base on your description, with np.where, np.diagonal and np.fliplr
import numpy as np
x,y=np.where(square==9)
np.diagonal(square, offset=-(x-y))
Out[382]: array([ 4, 9, 14])
x,y=np.where(np.fliplr(square)==9)
np.diagonal(np.fliplr(square), offset=-(x-y))
# base on the op's comment it should be np.diagonal(np.fliplr(square), offset=-(x-y))
Out[396]: array([ 3, 6, 9, 12])
For the first diagonal, use the fact that both x_coordiante and y_coordinate increase with 1 each step:
def first_diagonal(x, y, length_array):
if x < y:
return zip(range(x, length_array), range(length_array - x))
else:
return zip(range(length_array - y), range(y, length_array))
For the secondary diagonal, use the fact that the x_coordinate + y_coordinate = constant.
def second_diagonal(x, y, length_array):
tot = x + y
return zip(range(tot+1), range(tot, -1, -1))
This gives you two lists you can use to access your matrix.
Of course, if you have a non square matrix these functions will have to be reshaped a bit.
To illustrate how to get the desired output:
a = np.reshape(range(0,16),(4,4))
first = first_diagonal(1, 2, len(a))
second = second_diagonal(1,2, len(a))
primary_diagonal = [a[i[0]][i[1]] for i in first]
secondary_diagonal = [a[i[0]][i[1]] for i in second]
print(primary_diagonal)
print(secondary_diagonal)
this outputs:
[4, 9, 14]
[3, 6, 9, 12]
If I have a 2 dimensional list like this:
TopRow = [1, 3, 5]
MiddleRow = [7, 9, 11]
BottomRow = [13, 15, 17]
matrix = [TopRow, MiddleRow, BottomRow]
I need to make a function that takes in the 2 dimensional list and two values, row and col, as inputs, and then prints out the specified number in the specified row and collumn of the 2-dimensional list. Let's say row and col are defined like this:
row = 2
col = 3
Why doesn't this code retrieve the value (in this case, 11) and print it out?
def get_value(matrix, row, col):
print(matrix[row][col])
Python indexes start at 0, not 1. 11 is in row 1, column 2.
Indices start at 0, so for your matrix, you have [0][0]...[2][2]
>>> TopRow = [1, 3, 5]
>>> MiddleRow = [7, 9, 11]
>>> BottomRow = [13, 15, 17]
>>> matrix = [TopRow, MiddleRow, BottomRow]
>>>
>>> def get_value(matrix, row, col):
... print(matrix[row][col])
...
>>> get_value(matrix, 1, 2)
11
>>> get_value(matrix, 2, 1)
15