Python permutations of a huge matrix

Python permutations of a huge matrix - python

I want to create permutations of a matrix, which has 10 rows with 70 items each.
Every item contains either True or False. I need to create permutations of this matrix.
The problem is that I would need to write 1400 for statements.
Is there a better way to do these permutations?
matrix = [[False for i in range(0, 70)] for i in range(0, 10)]
possible_items = [True, False]
Edit: Loop through all possible combinations of all True and False items in the matrix.

I agree 100% with the comment made by #user2357112, there must be an underlying issue with your solution that prompted you to pursue such a solution.
However, if for any reason you do want a solution to this you might consider using itertools.product.
VALUES = (True, False)
rows = itertools.product(VALUES, repeat=70)
This will produce all rows of 70 items of VALUES, I do not suggest running it.
You can then easily extend this to be a solution to your problem, but I repeat, this is probably not a good way to do this.

Related

Cutting an array into consistent pieces of any size, with recursion

The problem is to, given an array, write a generator function that will yield all combinations of cutting the array into consistent pieces(arrays of elements that are consecutive in the given array) of any size and which together make up the whole given array. The elements in any one of the combinations don't have to be of the same size.
For example given an array [1,2,3,4] I want to yield:
[[1],[2],[3],[4]]
[[1,2],[3],[4]]
[[1],[2,3],[4]]
[[1],[2],[3,4]]
[[1,2],[3,4]]
[[1],[2,3,4]]
[[1,2,3],[4]]
[[1,2,3,4]]
def powerset_but_consistent(T):
if len(T)==0:
return
for el in T:
yield el
for k in range(len(el)):
yield ([el[:k],el[k:]])
#for l in range(k, len(el)):
# yield ([el[:k],el[k:l],el[l:]])
powerset_but_consistent([T[:k],T[k:]])
T = [[1,2,3,4,5]]
subsets = [x for x in powerset_but_consistent(T)]
for i in subsets:
print(i)
And this prints only those combinations that are made of two arrays. If I uncomment what I commented then it will also print combinations consisting of 3 arrays. If I add another inner for loop, it will print those combinations consisting of 4 arrays and so on... How can I use recursion instead of infinite inner for loops? Is it time for using something like:
for x in powerset_but_consistent(T[some_slicing] or something else) ?
I find it difficult to understand this construction. Can anyone help?

One of the algorithm commonly used for these type of questions (permutations and combinations) is using depth-first-search (DFS). Here's a link to a more similar but harder leetcode problem on palindrome partitioning that uses backtracking and DFS. My solution is based off of that leetcode post.
Algorithm
If my explanation is not enough go through the leetcode link provided above it may make sense.
The general idea is iteration through the list and get all the combinations from current element by resursively traversing through the remaining elements after the current elements.
Pseudocode
function recursive (list)
recurise condition
yield child
for element in remaining-elements:
// Get the combinations from all elements start from 'element'
partitions = recursive (...)
// join the list of elements already explored with the provided combinations
for every child from partitions:
yield combination_before + child
The major concept through this is using Depth-First-Search, and maybe figuring out the recursive condition as that really took me a while when.
You can also optimize the code by storing the results of the deep recursive operations in a dictionary and access them when you revisit over in the next iterations. I'm also pretty sure there is some optimal dynamic programming solution for this somewhere out there. Goodluck, hope this helped
Edit: My bad, I realised i had the actual solution before the edit, had no idea that may slighty conflict with individual community guidelines.

More efficient way to access rows based on a list of indices in 2d numpy array?

So I have 2d numpay array arr. It's a relatively big one: arr.shape = (2400, 60000)
What I'm currently doing is the following:
randomly (with replacement) select arr.shape[0] indices
access (row-wise) chosen indices of arr
calculating column-wise averages and selecting max value
I'm repeating it for k times
It looks sth like:
no_rows = arr.shape[0]
indicies = np.array(range(no_rows))
my_vals = []
for k in range(no_samples):
random_idxs = np.random.choice(indicies, size=no_rows, replace=True)
my_vals.append(
arr[random_idxs].mean(axis=0).max()
)
My problem is that is very slow. With my arr size, it takes ~3s for 1 loop. As I want a sample that is bigger than 1k - my current solution solution pretty bad (1k*~3s -> ~1h). I've profiled it and the bottleneck is accessing row based on indices. "mean" and "max" work fast. np.random.choice is also ok.
Do you see any area for improvement? A more efficient way of accessing indices or maybe better a faster approach that solves the problem without this?
What I tried so far:
numpy.take (slower)
numpy.ravel :
sth similar to:
random_idxs = np.random.choice(sample_idxs, size=sample_size, replace=True)
test = random_idxs.ravel()[arr.ravel()].reshape(arr.shape)
similar approach to current one but without loop. I created 3d arr and accessed rows across additional dimension in one go

Since advanced indexing will generate a copy, the program will allocate huge memory in arr[random_idxs].
So one of the most simple way to improve efficiency is that do things batch wise.
BATCH = 512
max(arr[random_idxs,i:i+BATCH].mean(axis=0).max() for i in range(0,arr.shape[1],BATCH))

This is not a general solution to the problem, but should make your specific problem much faster. Basically, arr.mean(axis=0).max() won't change, so why not take random samples from that array?
Something like:
mean_max = arr.mean(axis=0).max()
my_vals = np.array([np.random.choice(mean_max, size=len(mean_max), replace=True) for i in range(no_samples)])
You may even be able to do: my_vals = np.random.choice(mean_max, size=(no_samples, len(mean_max)), replace=True), but I'm not sure how, if at all, that would change your statistics.

Counting ocurrences of specific True/False ordering in Numpy Array

I have a Numpy Array of True and False values like:
test = np.array([False, False, False, True, False, True, False, True, False,False, False, False, True, True, False, True])
I would like to know the number of times the following pattern (False, True, False) happens in the array. In the test above it will be 4. This is not the only pattern, but I assume that when I understand this code I can probably also make the others.
Of course, I can loop over the array. If the first value is equal, compare the next and otherwise go to the next value in the loop. Like this:
totalTimes=0
def swapToBegin(x):
if(x>=len(test)):
x-=len(test)
return(x)
for i in range(len(test)):
if(test[i]==False):
if(test[swapToBegin(i+1)]==True):
if test[swapToBegin(i+2)]==False:
totalTimes += 1
However, since I need to do this many times, this code will be very slow. Little improvements can be made, since this was made very quickly to show what I need. But there must be a better solution.
Is there a better way to search for a pattern in an array? It does not need to combine the end and beginning of the array, since I would be able to this afterwards. But if it can be included it would be nice.

You haven't given any details on how large test is, so for benchmarks of the methods I've used it has 1000 elements. The next important part is to actually profile the code. You can't say it's slow (or fast) until there are hard numbers to back it up. Your code runs in around 1.49ms on my computer.
You can often get improvements with numpy by removing python loops and replacing them with numpy functions.
So, rather than testing each element individually (lots of if conditions could slow things down) I've put it all into one array comparison, then used all to check that every element matches.
check = array([False, True, False])
sum([(test[i:i+3]==check).all() for i in range(len(test) - 2)])
Profiling this shows it running in 1.91ms.
That's actually a step backwards. So, what could be causing the slowdown? Well, array access using [] creates a new array object which could be part of it. A better approach may be to create one large array with the offsets, then use broadcasting to do the comparison.
sum((c_[test[:-2], test[1:-1], test[2:]] == check).all(1))
This time check is compared with each row of the array c_[test[:-2], test[1:-1], test[2:]]. The axis argument (1) of all is used to only count rows that every element matches. This runs in 40.1us. That's a huge improvement.
Of course, creating the array to broadcast is going to have a large cost in terms of copying elements over. Why not do the comparisons directly?
sum(all([test[i:len(test)-2+i]==v for i, v in enumerate(check)], 0))
This runs in 18.7us.
The last idea to speed things up is using as_strided. This is an advanced trick to alter the strides of an array to get the offset array without copying any data. It's usually not worth the effort, but I'm including it here just for fun.
sum((np.lib.index_tricks.as_strided(test, (len(test) - len(check) + 1, len(check)), test.strides + (1, )) == check).all(1))
This also runs in around 40us. So, the extra effort doesn't add anything in this case.

You can use an array containing [False, True, False] and search for this instead.
searchfor = np.array([False, True, False])

numpy shorthand for taking jagged slice

I have an operation that I'm doing commonly which I'm calling a "jagged-slice" because I don't know the real name for it. It's best explained by example:
a = np.random.randn(50, 10)
entries_of_interest = np.random.randint(10, size = 50) # Vector of 50 indices between 0 and 9
# Now I want the values contained in each row of a at the corresponding index in "entries of interest"
jagged_slice_of_a = a[np.arange(a.shape[0]), entries_of_interest]
# jagged_slice_of_a is now a vector with 50 elements. Good.
Only problem is it's a bit cumbersome to do this a[np.arange(a.shape[0]), entries_of_interest] indexing (it seems silly to have to construct the "np.arange(a.shape[0])" just for the sake of this). I'd like something like the : operator for this, but the : does something else. Is there any more succinct way to do this operation?
Best answer:
No, there is no better way with native numpy. You can create a helper function for this if you want.

This is combersome only in the sense that it requires more typing for a task that seems so simple to you.
a[np.arange(a.shape[0]), entries_of_interest]
But as you note, the syntactically simpler a[:, entries_of_interest] has another interpretation in numpy. Choosing a subset of the columns of an array is a more common task that choosing one (random) item from each row.
Your case is just a specialized instance of
a[I, J]
where I and J are 2 arrays of the same shape. In the general case entries_of_interest could be smaller than a.shape[0] (not all the rows), or larger (several items from some rows), or even be 2d. It could even select certain elements repeatedly.
I have found in other SO questions that performing this kind of element selection is faster when applied to a.flat. But that requires some math to construct the I*n+J kind of flat index.
With your special knowledge of J, constructing I seems extra work, but numpy can't make that kind of assumption. If this selection was more common someone could write a function that wraps your expression
def peter_selection(a,I):
# check the a.shape[0]==I.shape[0]
return a[np.arange(a.shape[0]), I]

I think that your current method is probably the best way.
You can also use choose for this kind of selection. This is syntactically clearer, but is trickier to get right and potentially more limited. The equivalent with this method would be:
entries_of_interest.choose(a.T)

The elements in jagged_slice_of_a are the diagonal elements of a[:,entries_of_interest]
A slightly less cumbersome way of doing this would therefore be to use np.diagonal to extract them.
jagged_slice_of_a = a[:, entries_of_interest].diagonal()

Efficient Array replacement in Python

I'm wondering what is the most efficient way to replace elements in an array with other random elements in the array given some criteria. More specifically, I need to replace each element which doesn't meet a given criteria with another random value from that row. For example, I want to replace each row of data as a random cell in data(row) which is between -.8 and .8. My inefficinet solution looks something like this:
import numpy as np
data = np.random.normal(0, 1, (10, 100))
for index, row in enumerate(data):
row_copy = np.copy(row)
outliers = np.logical_or(row>.8, row<-.8)
for prob in np.where(outliers==1)[0]:
fixed = 0
while fixed == 0:
random_other_value = r.randint(0,99)
if random_other_value in np.where(outliers==1)[0]:
fixed = 0
else:
row_copy[prob] = row[random_other_value]
fixed = 1
Obviously, this is not efficient.

I think it would be faster to pull out all the good values, then use random.choice() to pick one whenever you need it. Something like this:
import numpy as np
import random
from itertools import izip
data = np.random.normal(0, 1, (10, 100))
for row in data:
good_ones = np.logical_and(row >= -0.8, row <= 0.8)
good = row[good_ones]
row_copy = np.array([x if f else random.choice(good) for f, x in izip(good_ones, row)])
High-level Python code that you write is slower than the C internals of Python. If you can push work down into the C internals it is usually faster. In other words, try to let Python do the heavy lifting for you rather than writing a lot of code. It's zen... write less code to get faster code.
I added a loop to run your code 1000 times, and to run my code 1000 times, and measured how long they took to execute. According to my test, my code is ten times faster.
Additional explanation of what this code is doing:
row_copy is being set by building a new list, and then calling np.array() on the new list to convert it to a NumPy array object. The new list is being built by a list comprehension.
The new list is made according to the rule: if the number is good, keep it; else, take a random choice from among the good values.
A list comprehension walks over a sequence of values, but to apply this rule we need two values: the number, and the flag saying whether that number is good or not. The easiest and fastest way to make a list comprehension walk along two sequences at once is to use izip() to "zip" the two sequences together. izip() will yield up tuples, one at a time, where the tuple is (f, x); f in this case is the flag saying good or not, and x is the number. (Python has a built-in feature called zip() which does pretty much the same thing, but actually builds a list of tuples; izip() just makes an iterator that yields up tuple values. But you can play with zip() at a Python prompt to learn more about how it works.)
In Python we can unpack a tuple into variable names like so:
a, b = (2, 3)
In this example, we set a to 2 and b to 3. In the list comprehension we unpack the tuples from izip() into variables f and x.
Then the heart of the list comprehension is a "ternary if" statement like so:
a if flag else b
The above will return the value a if the flag value is true, and otherwise return b. The one in this list comprehension is:
x if f else random.choice(good)
This implements our rule.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.