Creating fixed length numpy array from variable length lists - python

I have been generating variable length lists, a simplified example.
list_1 = [5] * 5
list_2 = [8] * 10
I then want to convert to np.array for manipulation. As such they need to be the same length (e.g. 1200), with the tail either populated with zeros, or truncated at the target length.
For a fixed length of 8, I considered setting up a zero array then filling the appropriate entries:
np_list_1 = np.zeros(8)
np_list_1[0:5] = list_1[0:5]
np_list_2 = np.zeros(8)
np_list_2[0:8] = list_2[0:8] # longer lists are truncated
I've created the following function to produce these
def get_np_fixed_length(list_like, length):
list_length = len(list_like)
np_array = np.zeros(length)
if list_length <= length:
np_array[0:list_length] = list_like[:]
else:
np_array[:] = list_like[0:length]
return np_array
Is there a more efficient way to do this? (I couldn't see anything in numpy documentation)

You did the job already. You may save some lines by using min function.
np_array = np.zeros(length)
l = min(length, len(list_like))
np_array[:l] = list_like[:l]
There is a bit more elegant way, by first creating an empty array, fill it up and then fill the tail with zeros.

Related

Creating data in loop subject to moving condition

I am trying to create a list of data in a for loop then store this list in a list if it satisfies some condition. My code is
R = 10
lam = 1
proc_length = 100
L = 1
#Empty list to store lists
exponential_procs_lists = []
for procs in range(0,R):
#Draw exponential random variables
z_exponential = np.random.exponential(lam,proc_length)
#Sort values to increase
z_exponential.sort()
#Insert 0 at start of list
z_dat_r = np.insert(z_exponential,0,0)
sum = np.sum(np.diff(z_dat_r))
if sum < 5*L:
exponential_procs_lists.append(z_dat_r)
which will store some of the R lists that satisfies the sum < 5L condition. My question is, what is the best way to store R lists where the sum of each list is less than 5L? The lists can be different length but they must satisfy the condition that the sum of the increments is less than 5*L. Any help much appreciated.
Okay so based on your comment, I take that you want to generate an exponential_procs_list, inside which every sublist has a sum < 5*L.
Well, I modified your code to chop the sublists as soon as the sum exceeds 5*L.
Edit : See answer history to see my last answer for the approach above.
Well looking closer, notice you don't actually need the discrete difference array. You're finding the difference array, summing it up and checking whether the sum's < 5L and if it is, you append the original array.
But notice this:
if your array is like so: [0, 0.00760541, 0.22281415, 0.60476231], it's difference array would be [0.00760541 0.21520874 0.38194816].
If you add the first x terms of the difference array, you get the x+1th element of the original array. So you really just need to keep elements which are lesser than 5L:
import numpy as np
R = 10
lam = 1
proc_length = 5
L = 1
exponential_procs_lists = []
def chop(nums, target):
good_list = []
for num in nums:
if num >= target:
break
good_list.append(num)
return good_list
for procs in range(0,R):
z_exponential = np.random.exponential(lam,proc_length)
z_exponential.sort()
z_dat_r = np.insert(z_exponential,0,0)
good_list = chop(z_dat_r, 5*L)
exponential_procs_lists.append(good_list)
You could probably also just do a binary search(for better time complexity) or use a filter lambda, that's up to you.

Speed up search of array element in second array

I have a pretty simple operation involving two not so large arrays:
For every element in the first (larger) array, located in position i
Find if it exists in the second (smaller) array
If it does, find its index in the second array: j
Store a float taken from a third array (same length as first array) in the position i, in the position j of a fourth array (same length as second array)
The for block below works, but gets very slow for not so large arrays (>10000).
Can this implementation be made faster?
import numpy as np
import random
##############################################
# Generate some random data.
#'Nb' is always smaller then 'Na
Na, Nb = 50000, 40000
# List of IDs (could be any string, I use integers here for simplicity)
ids_a = random.sample(range(1, Na * 10), Na)
ids_a = [str(_) for _ in ids_a]
random.shuffle(ids_a)
# Some floats associated to these IDs
vals_in_a = np.random.uniform(0., 1., Na)
# Smaller list of repeated IDs from 'ids_a'
ids_b = random.sample(ids_a, Nb)
# Array to be filled
vals_in_b = np.zeros(Nb)
##############################################
# This block needs to be *a lot* more efficient
#
# For each string in 'ids_a'
for i, id_a in enumerate(ids_a):
# if it exists in 'ids_b'
if id_a in ids_b:
# find where in 'ids_b' this element is located
j = ids_b.index(id_a)
# store in that position the value taken from 'ids_a'
vals_in_b[j] = vals_in_a[i]
In defense of my approach, here is the authoritative implementation:
import itertools as it
def pp():
la,lb = len(ids_a),len(ids_b)
ids = np.fromiter(it.chain(ids_a,ids_b),'<S6',la+lb)
unq,inv = np.unique(ids,return_inverse=True)
vals = np.empty(la,vals_in_a.dtype)
vals[inv[:la]] = vals_in_a
return vals[inv[la:]]
(juanpa()==pp()).all()
# True
timeit(juanpa,number=100)
# 3.1373191522434354
timeit(pp,number=100)
# 2.5256317732855678
That said, #juanpa.arrivillaga's suggestion can also be implemented better:
import operator as op
def ja():
return op.itemgetter(*ids_b)(dict(zip(ids_a,vals_in_a)))
(ja()==pp()).all()
# True
timeit(ja,number=100)
# 2.015202699229121
I tried the approaches by juanpa.arrivillaga and Paul Panzer. The first one is the fastest by far. It is also the simplest. The second one is faster than my original approach, but considerably slower than the first one. It also has the drawback that this line vals[inv_a] = vals_in_a stores floats into a U5 array, thus converting them into strings. It can be converted back to floats at the end, but I lose digits (unless I'm missing something obvious of course.
Here are the implementations:
def juanpa():
dict_ids_b = {_: i for i, _ in enumerate(ids_b)}
for i, id_a in enumerate(ids_a):
try:
vals_in_b[dict_ids_b[id_a]] = vals_in_a[i]
except KeyError:
pass
return vals_in_b
def Paul():
# 1) concatenate ids_a and ids_b
ids_ab = ids_a + ids_b
# 2) apply np.unique with keyword return_inverse=True
vals, idxs = np.unique(ids_ab, return_inverse=True)
# 3) split the inverse into inv_a and inv_b
inv_a, inv_b = idxs[:len(ids_a)], idxs[len(ids_a):]
# 4) map the values to match the order of uniques: vals[inv_a] = vals_in_a
vals[inv_a] = vals_in_a
# 5) use inv_b to pick the correct values: result = vals[inv_b]
vals_in_b = vals[inv_b].astype(float)
return vals_in_b

Python: Updating a list or array within a while loop

I have a numpy array of numpy arrays (would be happy to work with a list of numpy arrays), and I want to edit the overall array. More specifically, I check if arrays (within the larger array) share values, and if they do, I remove the shared values from the smaller array.
The issue I'm having is that when I try to reinsert the modified arrays into the all encompassing array, the final output when the while loop is finished does not remember the updated modules.
I believe this is something to do with pythons nuances of copy/view items, and that when I access element i or j of the overall array, I'm making a new object within the while loop rather than editing the element within the larger array. However, I'm happy to admit I don't fully understand this and definitively can't think of an alternative despite hours of trying.
#Feature_Modules is an array (or list) of number arrays, each containing a set of integers
i = 0
j = 0
while i < Feature_Modules.shape[0]: # Check element i against every other element j
if i != j:
Ref_Module = Feature_Modules[i]
while j < Feature_Modules.shape[0]:
if i != j:
Query_Module = Feature_Modules[j]
if np.array_equal(np.sort(Ref_Module),np.sort(Query_Module)) == 1: # If modules contain exactly the same integers, delete one of this. This bit actually works and is outputted at the end.
Feature_Modules = np.delete(Feature_Modules,j)
Shared_Features = np.intersect1d(Ref_Module, Query_Module)
if Shared_Features.shape[0] > 0 and np.array_equal(np.sort(Ref_Module),np.sort(Query_Module)) == 0: # If the modules share elements, remove the shared elements from the smaller module. This is the bit that isn't outputted in the final Feature_Modules object.
Module_Cardinalities = np.array([Ref_Module.shape[0],Query_Module.shape[0]])
Smaller_Group = np.where(Module_Cardinalities == np.min(Module_Cardinalities))[0][0]
New_Groups = np.array([Ref_Module,Query_Module])
New_Groups[Smaller_Group] = np.delete(New_Groups[Smaller_Group],np.where(np.isin(New_Groups[Smaller_Group],Shared_Features) == 1))
Feature_Modules = Feature_Modules.copy()
Feature_Modules[i] = New_Groups[0] # Replace the current module of Feature_Modules with the new module (Isn't outputted at end of while loops)
Feature_Modules[j] = New_Groups[1] # Replace the current module of Feature_Modules with the new module (Isn't outputted at end of while loops)
else:
j = j + 1
else:
j = j + 1
else:
i = i + 1
i = i + 1
So if we use this small data set as an example,
Feature_Modules = np.array([np.array([1,2,3,4,5,6,7,8]),np.array([9,10,1,2,3,4]), np.array([20,21,22,23])])
The new Feature_Modules should be;
Feature_Modules = np.array([np.array([1,2,3,4,5,6,7,8]), np.array([9,10]), np.array([20,21,22,23])])
since the shared values in array's [0] and [1], were removed from the [1] as it was the smaller array.
I would suggest taking a more python X numpy approach to the code:
import numpy as np
Feature_Modules = np.array([np.array([1,2,3,4,5,6,7,8]), np.array([9,10,1,2,3,4]), np.array([20,21,22,23])])
for n1,arr1 in enumerate(Feature_Modules[:-1]):
l1 = len(arr1)
for n2,arr2 in enumerate(Feature_Modules[n1+1:]):
l2 = len(arr2)
intersect, ind1, ind2 = np.intersect1d(arr1, arr2, return_indices=True)
if len(intersect) == 0:
continue
if l1 > l2:
Feature_Modules[n2+n1+1] = np.delete(arr2, ind2)
else:
Feature_Modules[n1] = np.delete(arr1, ind1)
# [array([1, 2, 3, 4, 5, 6, 7, 8]) array([ 9, 10]) array([20, 21, 22, 23])]
EDIT:
This code will edit the original array in order to keep track of the list that already had element removed. If you want to leave the original array untached, just make a copy of it:
copy = np.array(original)

Python set() and list() functions together

Looking through some python code I encountered this line:
x = list(set(range(height)) - set(array))
where array is just an int array. It's guaranteed that array's len is less than height.
Could someone please explain me how does it work?
Thanks!
Let's take sample values and see what's happening here
height = 5 # height has to be an integer for range()
array = [1,1,2,3]
x = list(set(range(height)) - set(array))
print(x) # [0,4]
Let's break it down into smaller pieces of code
height = 5
array = [1,1,2,3]
a = range(height) # Generates a list [0,1,2,3,4]
a_set = set(a) # Converts a into a set (0,1,2,3,4)
b_set = set(array) # Converts array into a set (1,2,3)
x_set = a_set - b_set # Does set operation A-B, ie, removes elements of B from A. (0,4)
x = list(x_set) # Converts it into a list [0,4]
print(x) # [0,4]
I an guessing height is an int.
What your script does is that it returns a list of all distinct values from 0 to 'height' , height not included, that do not appear in your array.
The range of a number returns 0 up to that number.
set() of a collection removes all duplicate elements
You can subtract two sets, which is called the difference, returning all elements that are not shared between them.
The result is casted back to a list and assigned to the variable

How do I get an empty list of any size in Python?

I basically want a Python equivalent of this Array in C:
int a[x];
but in python I declare an array like:
a = []
and the problem is I want to assign random slots with values like:
a[4] = 1
but I can't do that with Python, since the Python list is empty (of length 0).
If by "array" you actually mean a Python list, you can use
a = [0] * 10
or
a = [None] * 10
You can't do exactly what you want in Python (if I read you correctly). You need to put values in for each element of the list (or as you called it, array).
But, try this:
a = [0 for x in range(N)] # N = size of list you want
a[i] = 5 # as long as i < N, you're okay
For lists of other types, use something besides 0. None is often a good choice as well.
You can use numpy:
import numpy as np
Example from Empty Array:
np.empty([2, 2])
array([[ -9.74499359e+001, 6.69583040e-309],
[ 2.13182611e-314, 3.06959433e-309]])
also you can extend that with extend method of list.
a= []
a.extend([None]*10)
a.extend([None]*20)
Just declare the list and append each element. For ex:
a = []
a.append('first item')
a.append('second item')
If you (or other searchers of this question) were actually interested in creating a contiguous array to fill with integers, consider bytearray and memoryivew:
# cast() is available starting Python 3.3
size = 10**6
ints = memoryview(bytearray(size)).cast('i')
ints.contiguous, ints.itemsize, ints.shape
# (True, 4, (250000,))
ints[0]
# 0
ints[0] = 16
ints[0]
# 16
It is also possible to create an empty array with a certain size:
array = [[] for _ in range(n)] # n equal to your desired size
array[0].append(5) # it appends 5 to an empty list, then array[0] is [5]
if you define it as array = [] * n then if you modify one item, all are changed the same way, because of its mutability.
x=[]
for i in range(0,5):
x.append(i)
print(x[i])
If you actually want a C-style array
import array
a = array.array('i', x * [0])
a[3] = 5
try:
[5] = 'a'
except TypeError:
print('integers only allowed')
Note that there's no concept of un-initialized variable in python. A variable is a name that is bound to a value, so that value must have something. In the example above the array is initialized with zeros.
However, this is uncommon in python, unless you actually need it for low-level stuff. In most cases, you are better-off using an empty list or empty numpy array, as other answers suggest.
The (I think only) way to assign "random slots" is to use a dictionary, e.g.:
a = {} # initialize empty dictionary
a[4] = 1 # define the entry for index 4 to be equal to 1
a['French','red'] = 'rouge' # the entry for index (French,red) is "rouge".
This can be handy for "quick hacks", and the lookup overhead is irrelevant if you don't have intensive access to the array's elements.
Otherwise, it will be more efficient to work with pre-allocated (e.g., numpy) arrays of fixed size, which you can create with a = np.empty(10) (for an non-initialized vector of length 10) or a = np.zeros([5,5]) for a 5x5 matrix initialized with zeros).
Remark: in your C example, you also have to allocate the array (your int a[x];) before assigning a (not so) "random slot" (namely, integer index between 0 and x-1).
References:
The dict datatype: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
Function np.empty(): https://numpy.org/doc/stable/reference/generated/numpy.empty.html

Categories