Python: Delete random elements in every row - python

I need to do this quite often in my code so I wondered if there is a faster way to do it. I have a matrix defined as
m = 100
n = 10
M = np.random.rand(m,n)
Now I want to delete in every row a random element. I did the following:
idx2del = np.random.randint(0,n,m)
arr = np.arange(0,m)
M[arr,idx2del] = 0.0
Is there a faster / cleaner way to do that?

Related

improving speed of nested python for-loops which append lists

Inside my nested for-loops I am doing some calculations which are appended to lists. In some cases the for-loops will be huge generating lists that are hundreds of millions and even billions of entries. When the lists gets huge it seems to fill up memory slow things down.
Anyone having suggestions for how I can execute the below quicker? Alternative ways to set it up which doesn't drain memory?
n = 10000 #This number can be much higher
m = 10000 #This number can be much higher
x = []
y = []
a = 1
for n in range(n):
b = 1
for m in range(m):
x.append(a+b)
y.append(a+b*2)
b = b+1
a = a+1

Sequentially take a window of rows from array (python)

I have an array of size nxm and want to take the first 10 rows and perform calculations, then take the next 10 rows perform calculations, etc. But this is hard coded, how can I make a loop?
Code Attempted:
import numpy as np
total = []
x= np.random.random((100,4))
a = np.average(x[:10])
total.append(a)
a = np.average(x[10:20])
total.append(a)
a = np.average(x[20:30])
....
Goal:
for *this array*:
# to something
# append value
# go back and get next 10 values
It looks like you want the following.
import numpy as np
x = np.random.random((100,4))
L = 10
k = 100//L
total = [np.average(x[L*i:L*(i+1)]) for i in range(k)]
If you'd rather implement this using a loop rather than list comprehension,
import numpy as np
x = np.random.random((100,4))
L = 10
k = 100//L
total = []
for i in range(k):
total.append(np.average(x[L*i:L*(i+1)]))
As an alternative, here's an approach using a 3-dimensional reshape.
x= np.random.random((100,4))
L = 10 #window length
n = x.shape[1] #number of columns
total = a.reshape(-1,10,n).mean(axis = (1,2))
import numpy as np
x = np.random.random((100,4))
a = 10
b = 100//a
c = 4
You want the array of average numbers of the first 10 * 4 part, the second 10 * 4 part,..., right?
reshape function can be really useful here.
x_splited = x.reshape((-1, a*c))
total = x_splited.mean(axis=1)
This is the answer you need. The reshape function let the first a*c elements in the original matrix become the first row of the new matrix. Then, mean(axis=1) help you get the average of the first row.
Also, you could try something like this:
x_splited = x.reshape((-1, a, c))
You can do something more complicated than this question with it.
Just a tip: in python, it is prefered to avoid using loop because it is slow.
Second tip: if you are still not proficient in using loop in Python, you are encouraged to spend some time to practice it.

Efficiently adding two different sized one dimensional arrays

I want to add two numpy arrays of different sizes starting at a specific index. As I need to do this couple of thousand times with large arrays, this needs to be efficient, and I am not sure how to do this efficiently without iterating through each cell.
a = [5,10,15]
b = [0,0,10,10,10,0,0]
res = add_arrays(b,a,2)
print(res) => [0,0,15,20,25,0,0]
naive approach:
# b is the bigger array
def add_arrays(b, a, i):
for j in range(len(a)):
b[i+j] = a[j]
You might assign smaller one into zeros array then add, I would do it following way
import numpy as np
a = np.array([5,10,15])
b = np.array([0,0,10,10,10,0,0])
z = np.zeros(b.shape,dtype=int)
z[2:2+len(a)] = a # 2 is offset
res = z+b
print(res)
output
[ 0 0 15 20 25 0 0]
Disclaimer: I assume that offset + len(a) is always less or equal len(b).
Nothing wrong with your approach. You cannot get better asymptotic time or space complexity. If you want to reduce code lines (which is not an end in itself), you could use slice assignment and some other utils:
def add_arrays(b, a, i):
b[i:i+len(a)] = map(sum, zip(b[i:i+len(a)], a))
But the functional overhead should makes this less performant, if anything.
Some docs:
map
sum
zip
It should be faster than Daweo answer, 1.5-5x times (depending on the size ratio between a and b).
result = b.copy()
result[offset: offset+len(a)] += a

Improve performance of difference between elements block

I have a rather simple block that obtains the absolute valued difference between two selected elements from two arrays.
import numpy as np
# Input data with proper format.
N_bb, N_cc = np.random.randint(1e5), np.random.randint(1e5)
bb = np.random.uniform(0., 1., N_bb)
cc = np.random.uniform(0., 1., N_cc)
# My actual code repeats this process ~500 times.
all_ds = []
for _ in range(500):
# An index into cc for each element in bb.
idx_into_cc = np.random.randint(0, len(cc), len(bb))
# This is the block I need to make faster.
aa = []
for i, b in enumerate(bb):
aa.append(abs(b - cc[idx_into_cc[i]]))
d = np.median(aa)
# Use 'd' before the next iteration, and store the result.
all_ds.append(some_func(d))
I use the absolute difference because I need positive values, I could also use a squared difference. The bb and cc arrays stay unchanged during the entire process, but idx_into_cc changes with each iteration.
How can I improve the performance of this code?
We can simply use vectorized indexing to remove the inner loop, like so -
d = np.median(np.abs(bb-cc[idx_into_cc]))

Append keeps on appending the same item, does not append the right ones, Python

This is what I have imported:
import random
import matplotlib.pyplot as plt
from math import log, e, ceil, floor
import numpy as np
from numpy import arange,array
import pdb
from random import randint
Here I define the function matrix(p,m)
def matrix(p,m): # A matrix with zeros everywhere, except in every entry in the middle of the row
v = [0]*m
v[(m+1)/2 - 1] = 1
vv = array([v,]*p)
return vv
ct = np.zeros(5) # Here, I choose 5 cause I wanted to work with an example, but should be p in general
Here I define MHops which basically takes the dimensions of the matrix, the matrix and the vector ct and gives me a new matrix mm and a new vector ct
def MHops(p,m,mm,ct):
k = 0
while k < p : # This 'spans' the rows
i = 0
while i < m : # This 'spans' the columns
if mm[k][i] == 0 :
i+=1
else:
R = random.random()
t = -log(1-R,e) # Calculate time of the hopping
ct[k] = ct[k] + t
r = random.random()
if 0 <= r < 0.5 : # particle hops right
if 0 <= i < m-1:
mm[k][i] = 0
mm[k][i+1] = 1
break
else:
break # Because it is at the boundary
else: # particle hops left
if 0 < i <=m-1:
mm[k][i] = 0
mm[k][i-1] = 1
break
else: # Because it is at the boundary
break
break
k+=1
return (mm,ct) # Gives me the new matrix showing the new position of the particles and a new vector of times, showing the times taken by each particle to hop
Now what I wanna do is iterating this process, but I wanna be able to visualize every step in a list. In short what I am doing is:
1. creating a matrix representing a lattice, where 0 means there is no particle in that slot and 1 means there is a particle there.
2. create a function MHops which simulate a random walk of one step and gives me the new matrix and a vector ct which shows the times at which the particles move.
Now I want to have a vector or an array where I have 2*n objects, i.e. the matrix mm and the vector ct for n iterations. I want the in a array, list or something like this cause I need to use them later on.
Here starts my problem:
I create an empty list, I use append to append items at every iteration of the while loop. However the result that I get is a list d with n equal objects coming from the last iteration!
Hence my function for the iteration is the following:
def rep_MHops(n,p,m,mm,ct):
mat = mm
cct = ct
d = []
i = 0
while i < n :
y = MHops(p,m,mat,cct) # Calculate the hop, so y is a tuple y = (mm,ct)
mat = y[0] # I reset mat and cct so that for the next iteration, I go further
cct = y[1]
d.append(mat)
d.append(cct)
i+=1
return d
z = rep_MHops(3,5,5,matrix(5,5),ct) #If you check this, it doesn't work
print z
However it doesn't work, I don't understand why. What I am doing is using MHops, then I want to set the new matrix and the new vector as those in the output of MHops and doing this again. However if you run this code, you will see that v works, i.e. the vector of the times increases and the matrix of the lattice change, however when I append this to d, d is basically a list of n equal objects, where the object are the last iteration.
What is my mistake?
Furthermore if you have any coding advice for this code, they would be more than welcome, I am not sure this is an efficient way.
Just to let you understand better, I would like to use the final vector d in another function where first of all I pick a random time T, then I would basically check every odd entry (every ct) and hence check every entry of every ct and see if these numbers are less than or equal to T. If this happens, then the movement of the particle happened, otherwise it didn't.
From this then I will try to visualize with matpotlibt the result with an histogram or something similar.
Is there anyone who knows how to run this kind of simulation in matlab? Do you think it would be easier?
You're passing and storing by references not copies, so on the next iteration of your loop MHops alters your previously stored version in d. Use import copy; d.append(copy.deepcopy(mat)) to instead store a copy which won't be altered later.
Why?
Python is passing the list by reference, and every loop you're storing a reference to the same matrix object in d.
I had a look through python docs, and the only mention I can find is
"how do i write a function with output parameters (call by reference)".
Here's a simpler example of your code:
def rep_MHops(mat_init):
mat = mat_init
d = []
for i in range(5):
mat = MHops(mat)
d.append(mat)
return d
def MHops(mat):
mat[0] += 1
return mat
mat_init = [10]
z = rep_MHops(mat_init)
print(z)
When run gives:
[[15], [15], [15], [15], [15]]
Python only passes mutable objects (such as lists) by reference. An integer isn't a mutable object, here's a slightly modified version of the above example which operates on a single integer:
def rep_MHops_simple(mat_init):
mat = mat_init
d = []
for i in range(5):
mat = MHops_simple(mat)
d.append(mat)
return d
def MHops_simple(mat):
mat += 1
return mat
z = rep_MHops_simple(mat_init=10)
print(z)
When run gives:
[11, 12, 13, 14, 15]
which is the behaviour you were expecting.
This SO answer How do I pass a variable by reference? explains it very well.

Categories