Performing bulk arithmetic operations on python list - python

I have a list of integers and I want to perform operations like addition, multiplication, floor division on every element of list slice (sub array) or at certain indexes (eg. range(start, end, jump) ) efficiently. The number being added or multiplied by each element of list slice is constant (say 'k').
For example:
nums = [23, 44, 65, 78, 87, 11, 33, 44, 3]
for i in range(2, 7, 2):
nums[i] //= 2 # here 2 is the constant 'k'
print(nums)
>>> [23, 44, 32, 78, 43, 11, 16, 44, 3]
I have to perform these operations several times on different slices/ranges and the constant 'k' varies for different slices/ranges. The obvious way to do this is to run a for loop and modify the value of elements, but that isn't fast enough. You can do this efficiently by using a numpy array because it supports bulk assignment/modification but I am looking for a way to do this in pure python.

One way to avoid the for loop is the following:
>>> nums = [23, 44, 65, 78, 87, 11, 33, 44, 3]
>>> nums[2:7:2] = [x//2 for x in nums[2:7:2]]
>>> nums
[23, 44, 32, 78, 43, 11, 16, 44, 3]

Related

Faster/lazier way to evenly and randomly split m*n into n group (each has m elements) in python

I want to split m*n elements (e.g., 1, 2, ..., m*n) into n group randomly and evenly such that each group has m random elements. Each group will process k (k>=1) elements at one time from its own group and at the same speed (via some synchronization mechanism), until all group has processed all their own elements. Actually each group is in an independent process/thread.
I use numpy.random.choice(m*n, m*n, replace=False) to generate the permutation first, and then index the permuted result from each group.
The problem is that when m*n is very large (e.g., >=1e8), the speed is very slow (tens of seconds or minutes).
Is there any faster/lazier way to do this? I think maybe this can be done in a lazier way, which is not generating the permuted result in the first time, but generate a generator first, and in each group, generate k elements at each time, and its effect should be identical to the method I currently use. But I don't know how to achieve this lazy way. And I am not sure whether this can be implemented actually.
You can make a generator that will progressively shuffle (a copy of) the list and lazily yield distinct groups:
import random
def rndGroups(A,size):
A = A.copy() # work on a copy (if needed)
p = len(A) # target position of random item
for _ in range(0,len(A),size): # work in chunks of group size
for _ in range(size): # Create one group
i = random.randrange(p) # random index in remaining items
p -= 1 # update randomized position
A[i],A[p] = A[p],A[i] # swap items
yield A[p:p+size] # return shuffled sub-range
Output:
A = list(range(100))
iG = iter(rndGroups(A,10)) # 10 groups of 10 items
s = set() # set to validate uniqueness
for _ in range(10): # 10 groups
g = next(iG) # get the next group from generator
s.update(g) # to check that all items are distinct
print(g)
print(len(s)) # must get 100 distinct values from groups
[87, 19, 85, 90, 35, 55, 86, 58, 96, 68]
[38, 92, 93, 78, 39, 62, 43, 20, 66, 44]
[34, 75, 72, 50, 42, 52, 60, 81, 80, 41]
[13, 14, 83, 28, 53, 5, 94, 67, 79, 95]
[9, 33, 0, 76, 4, 23, 2, 3, 32, 65]
[61, 24, 31, 77, 36, 40, 47, 49, 7, 97]
[63, 15, 29, 25, 11, 82, 71, 89, 91, 30]
[12, 22, 99, 37, 73, 69, 45, 1, 88, 51]
[74, 70, 98, 26, 59, 6, 64, 46, 27, 21]
[48, 17, 18, 8, 54, 10, 57, 84, 16, 56]
100
This will take just as long as pre-shuffling the list (if not longer) but it will let you start/feed threads as you go, thus augmenting the parallelism

Numpy random array from 0 to 99, including both

I'm trying to create a np array with size (80,10) so each row has random values with range 0 to 99.
I've done that by
np.random.randint(99, size=(80, 10))
But I would like to always include both 0 and 99 as values in each row.
So two values in each row are already defined and the other 8 will be random.
How would I accomplish this? Is there a way to generate an array size (80,8) and just concatenate [0,99] to every row to make it (80,10) at the end?
As suggested in the comments by Tim, you can generate a matrix with random values not including 0 and 99. Then replace two random indices along the second axis with the values 0 and 99.
rand_arr = np.random.randint(low=1, high=98, size=(80, 10))
rand_indices = np.random.rand(80,10).argsort(axis=1)[:,:2]
np.put_along_axis(rand_arr, rand_indices, [0,99], axis=1)
The motivation for using argsort is that we want random indices along the second axis without replacement. Just generating a random integer matrix for values 0-10 with size=(80,2) will not guarantee this.
In your scenario, you could do np.argpartion with kth=2 instead of np.argsort. This should be more efficient.
I've tried a few things and this is what I came up with
def generate_matrix(low, high, shape):
x, y = shape
values = np.random.randint(low+1, high-1, size=(x, y-2))
predefined = np.tile([low, high], (x, 1))
values = np.hstack([values, predefined])
for row in values:
np.random.shuffle(row)
return values
Example usage
>>> generate_matrix(0, 99, (5, 10))
array([[94, 0, 45, 99, 18, 31, 78, 80, 32, 17],
[28, 99, 72, 3, 0, 14, 26, 37, 41, 80],
[18, 78, 71, 40, 99, 0, 85, 91, 8, 59],
[65, 99, 0, 45, 93, 94, 16, 33, 52, 53],
[22, 76, 99, 15, 27, 64, 91, 32, 0, 82]])
The way I approached it:
Generate an array of size (80, 8) in the range [1, 98] and then concatenate 0 and 99 for each row. But you probably need the 0/99 to occur at different indices for each row, so you have to shuffle them. Unfortunately, np.random.shuffle() only shuffles the rows among themselves. And if you use np.random.shuffle(arr.T).T, or random.Generator.permutation, you don't shuffle the columns independently. I haven't found a vectorised way to shuffle the rows independently other than using a Python loop.
Another way:
You can generate an array of size (80, 10) in the range [1, 98] and then substitute in random indices the values 0 and 99 for each row. Again, I couldn't find a way to generate unique indices per row (so that 0 doesn't overwrite 99 for example) without a Python loop. Since I couldn't find a way to avoid Python loops, I opted for the first way, which seemed more straightforward.
If you don't care about duplicates, create an array of zeros, replace columns 1-9 with random numbers and set column 10 to 99.
final = np.zeros(shape=(80, 10))
final[:,1:9] = np.random.randint(97, size=(80, 8))+1
final[:,9] = 99
Creating a matrix 80x10 with random values from 0 to 99 with no duplicates in the same row with 0 and 99 included in every row
import random
row99=[ n for n in range(1,99) ]
perm = [n for n in range(0,10) ]
m = []
for i in range(80):
random.shuffle(row99)
random.shuffle(perm)
r = row99[:10]
r[perm[0]] = 0
r[perm[1]] = 99
m.append(r)
print(m)
Partial output:
[
... other elements here ...
[70, 58, 0, 25, 41, 10, 90, 5, 42, 18],
[0, 57, 90, 71, 39, 65, 52, 24, 28, 77],
[55, 42, 7, 9, 32, 69, 90, 0, 64, 2],
[0, 59, 17, 35, 56, 34, 33, 37, 90, 71]]

Multiple indices for numpy array: IndexError: failed to coerce slice entry of type numpy.ndarray to integer

Is there a way to do multiple indexing in a numpy array as described below?
arr=np.array([55, 2, 3, 4, 5, 6, 7, 8, 9])
arr[np.arange(0,2):np.arange(5,7)]
output:
IndexError: too many indices for array
Desired output:
array([55,2,3,4,5],[2,3,4,5,6])
This problem might be similar to calculating a moving average over an array (but I want to do it without any function that is provided).
Here's an approach using strides -
start_index = np.arange(0,2)
L = 5 # Interval length
n = arr.strides[0]
strided = np.lib.stride_tricks.as_strided
out = strided(arr[start_index[0]:],shape=(len(start_index),L),strides=(n,n))
Sample run -
In [976]: arr
Out[976]: array([55, 52, 13, 64, 25, 76, 47, 18, 69, 88])
In [977]: start_index
Out[977]: array([2, 3, 4])
In [978]: L = 5
In [979]: out
Out[979]:
array([[13, 64, 25, 76, 47],
[64, 25, 76, 47, 18],
[25, 76, 47, 18, 69]])

python seed() not keeping same sequence

I'm using a random.seed() to try and keep the random.sample() the same as I sample more values from a list and at some point the numbers change.....where I thought the one purpose of the seed() function was to keep the numbers the same.
Heres a test I did to prove it doesn't keep the same numbers.
import random
a=range(0,100)
random.seed(1)
a = random.sample(a,10)
print a
then change the sample much higher and the sequence will change(at least for me they always do):
a = random.sample(a,40)
print a
I'm sort of a newb so maybe this is an easy fix but I would appreciate any help on this.
Thanks!
If you were to draw independent samples from the generator, what would happen would be exactly what you're expecting:
In [1]: import random
In [2]: random.seed(1)
In [3]: [random.randint(0, 99) for _ in range(10)]
Out[3]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2]
In [4]: random.seed(1)
In [5]: [random.randint(0, 99) for _ in range(40)]
Out[5]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2, 83, 43 ...]
As you can see, the first ten numbers are indeed the same.
It is the fact that random.sample() is drawing samples without replacement that's getting in the way. To understand how these algorithms work, see Reservoir Sampling. In essence what happens is that later samples can push earlier samples out of the result set.
One alternative might be to shuffle a list of indices and then take either 10 or 40 first elements:
In [1]: import random
In [2]: a = range(0,100)
In [3]: random.shuffle(a)
In [4]: a[:10]
Out[4]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80]
In [5]: a[:40]
Out[5]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80, ...]
It seems that random.sample is deterministic only if the seed and sample size are kept constant. In other words, even if you reset the seed, generating a sample with a different length is not "the same" random operation, and may give a different initial subsequence than generating a smaller sample with the same seed. In other words, the same random numbers are being generated internally, but the way sample uses them to derive the random sequence is different depending on how large a sample you ask for.
You are assuming an implementation of random.sample something like this:
def samples(lst, k):
n = len(lst)
indices = []
while len(indices) < k:
index = random.randrange(n)
if index not in indices:
indices.append(index)
return [lst[i] for i in indices]
Which gives:
>>> random.seed(1)
>>> samples(list(range(20)), 5)
[4, 18, 2, 8, 3]
>>> random.seed(1)
>>> samples(list(range(20)), 10)
[4, 18, 2, 8, 3, 15, 14, 12, 6, 0]
However, that isn't how random.sample is actually implemented; seed does work how you think, it's sample that doesn't!
You simply need to re-seed it:
a = list(range(100))
random.seed(1) # seed first time
random.sample(a, 10)
>> [17, 72, 97, 8, 32, 15, 63, 57, 60, 83]
random.seed(1) # seed second time with same value
random.sample(a, 40)
>> [17, 72, 97, 8, 32, 15, 63, 57, 60, 83, 48, 26, 12, 62, 3, 49, 55, 77, 0, 92, 34, 29, 75, 13, 40, 85, 2, 74, 69, 1, 89, 27, 54, 98, 28, 56, 93, 35, 14, 22]
But in your case you're using a generator, not a list, so after sampling the first time a will shrink (from 100 to 90), and you will lose the elements that you had sampled, so it won't work. So just use a list and seed before every sampling.

What is :: (double colon) in Python when subscripting sequences?

I know that I can use something like string[3:4] to get a substring in Python, but what does the 3 mean in somesequence[::3]?
it means 'nothing for the first argument, nothing for the second, and jump by three'. It gets every third item of the sequence sliced.
Extended slices is what you want. New in Python 2.3
Python sequence slice addresses can be written as a[start:end:step] and any of start, stop or end can be dropped. a[::3] is every third element of the sequence.
seq[::n] is a sequence of each n-th item in the entire sequence.
Example:
>>> range(10)[::2]
[0, 2, 4, 6, 8]
The syntax is:
seq[start:end:step]
So you can do (in Python 2):
>>> range(100)[5:18:2]
[5, 7, 9, 11, 13, 15, 17]
Explanation
s[i:j:k] is, according to the documentation, "slice of s from i to j with step k". When i and j are absent, the whole sequence is assumed and thus s[::k] means "every k-th item".
Examples
First, let's initialize a list:
>>> s = range(20)
>>> s
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Let's take every 3rd item from s:
>>> s[::3]
[0, 3, 6, 9, 12, 15, 18]
Let's take every 3rd item from s[2:]:
>>> s[2:]
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> s[2::3]
[2, 5, 8, 11, 14, 17]
Let's take every 3rd item from s[5:12]:
>>> s[5:12]
[5, 6, 7, 8, 9, 10, 11]
>>> s[5:12:3]
[5, 8, 11]
Let's take every 3rd item from s[:10]:
>>> s[:10]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> s[:10:3]
[0, 3, 6, 9]
TL;DR
This visual example will show you how to a neatly select elements in a NumPy Matrix (2 dimensional array) in a pretty entertaining way (I promise). Step 2 below illustrate the usage of that "double colons" :: in question.
(Caution: this is a NumPy array specific example with the aim of illustrating the a use case of "double colons" :: for jumping of elements in multiple axes. This example does not cover native Python data structures like List).
One concrete example to rule them all...
Say we have a NumPy matrix that looks like this:
In [1]: import numpy as np
In [2]: X = np.arange(100).reshape(10,10)
In [3]: X
Out[3]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
Say for some reason, your boss wants you to select the following elements:
"But How???"... Read on! (We can do this in a 2-step approach)
Step 1 - Obtain subset
Specify the "start index" and "end index" in both row-wise and column-wise directions.
In code:
In [5]: X2 = X[2:9,3:8]
In [6]: X2
Out[6]:
array([[23, 24, 25, 26, 27],
[33, 34, 35, 36, 37],
[43, 44, 45, 46, 47],
[53, 54, 55, 56, 57],
[63, 64, 65, 66, 67],
[73, 74, 75, 76, 77],
[83, 84, 85, 86, 87]])
Notice now we've just obtained our subset, with the use of simple start and end indexing technique. Next up, how to do that "jumping"... (read on!)
Step 2 - Select elements (with the "jump step" argument)
We can now specify the "jump steps" in both row-wise and column-wise directions (to select elements in a "jumping" way) like this:
In code (note the double colons):
In [7]: X3 = X2[::3, ::2]
In [8]: X3
Out[8]:
array([[23, 25, 27],
[53, 55, 57],
[83, 85, 87]])
We have just selected all the elements as required! :)
 Consolidate Step 1 (start and end) and Step 2 ("jumping")
Now we know the concept, we can easily combine step 1 and step 2 into one consolidated step - for compactness:
In [9]: X4 = X[2:9,3:8][::3,::2]
In [10]: X4
Out[10]:
array([[23, 25, 27],
[53, 55, 57],
[83, 85, 87]])
Done!
When slicing in Python the third parameter is the step. As others mentioned, see Extended Slices for a nice overview.
With this knowledge, [::3] just means that you have not specified any start or end indices for your slice. Since you have specified a step, 3, this will take every third entry of something starting at the first index. For example:
>>> '123123123'[::3]
'111'
remember that the foundations is what a[start:end:step] means. From there you can get a[1::2] get every odd index, a[::2] get every even, a[2::2] get every even starting at 2, a[2:4:2] get every even starting at 2 and ending at 4. Inspired by https://stackoverflow.com/a/3453102/1601580
You can also use this notation in your own custom classes to make it do whatever you want
class C(object):
def __getitem__(self, k):
return k
# Single argument is passed directly.
assert C()[0] == 0
# Multiple indices generate a tuple.
assert C()[0, 1] == (0, 1)
# Slice notation generates a slice object.
assert C()[1:2:3] == slice(1, 2, 3)
# If you omit any part of the slice notation, it becomes None.
assert C()[:] == slice(None, None, None)
assert C()[::] == slice(None, None, None)
assert C()[1::] == slice(1, None, None)
assert C()[:2:] == slice(None, 2, None)
assert C()[::3] == slice(None, None, 3)
# Tuple with a slice object:
assert C()[:, 1] == (slice(None, None, None), 1)
# Ellipsis class object.
assert C()[...] == Ellipsis
We can then open up slice objects as:
s = slice(1, 2, 3)
assert s.start == 1
assert s.stop == 2
assert s.step == 3
This is notably used in Numpy to slice multi-dimensional arrays in any direction.
Of course, any sane API should use ::3 with the usual "every 3" semantic.
The related Ellipsis is covered further at: What does the Ellipsis object do?
The third parameter is the step. So [::3] would return every 3rd element of the list/string.
Did I miss or nobody mentioned reversing with [::-1] here?
# Operating System List
systems = ['Windows', 'macOS', 'Linux']
print('Original List:', systems)
# Reversing a list
#Syntax: reversed_list = systems[start:stop:step]
reversed_list = systems[::-1]
# updated list
print('Updated List:', reversed_list)
source:
https://www.programiz.com/python-programming/methods/list/reverse
Python uses the :: to separate the End, the Start, and the Step value.

Categories