Python set() and list() functions together - python

Looking through some python code I encountered this line:
x = list(set(range(height)) - set(array))
where array is just an int array. It's guaranteed that array's len is less than height.
Could someone please explain me how does it work?
Thanks!

Let's take sample values and see what's happening here
height = 5 # height has to be an integer for range()
array = [1,1,2,3]
x = list(set(range(height)) - set(array))
print(x) # [0,4]
Let's break it down into smaller pieces of code
height = 5
array = [1,1,2,3]
a = range(height) # Generates a list [0,1,2,3,4]
a_set = set(a) # Converts a into a set (0,1,2,3,4)
b_set = set(array) # Converts array into a set (1,2,3)
x_set = a_set - b_set # Does set operation A-B, ie, removes elements of B from A. (0,4)
x = list(x_set) # Converts it into a list [0,4]
print(x) # [0,4]

I an guessing height is an int.
What your script does is that it returns a list of all distinct values from 0 to 'height' , height not included, that do not appear in your array.

The range of a number returns 0 up to that number.
set() of a collection removes all duplicate elements
You can subtract two sets, which is called the difference, returning all elements that are not shared between them.
The result is casted back to a list and assigned to the variable

Related

Speed up search of array element in second array

I have a pretty simple operation involving two not so large arrays:
For every element in the first (larger) array, located in position i
Find if it exists in the second (smaller) array
If it does, find its index in the second array: j
Store a float taken from a third array (same length as first array) in the position i, in the position j of a fourth array (same length as second array)
The for block below works, but gets very slow for not so large arrays (>10000).
Can this implementation be made faster?
import numpy as np
import random
##############################################
# Generate some random data.
#'Nb' is always smaller then 'Na
Na, Nb = 50000, 40000
# List of IDs (could be any string, I use integers here for simplicity)
ids_a = random.sample(range(1, Na * 10), Na)
ids_a = [str(_) for _ in ids_a]
random.shuffle(ids_a)
# Some floats associated to these IDs
vals_in_a = np.random.uniform(0., 1., Na)
# Smaller list of repeated IDs from 'ids_a'
ids_b = random.sample(ids_a, Nb)
# Array to be filled
vals_in_b = np.zeros(Nb)
##############################################
# This block needs to be *a lot* more efficient
#
# For each string in 'ids_a'
for i, id_a in enumerate(ids_a):
# if it exists in 'ids_b'
if id_a in ids_b:
# find where in 'ids_b' this element is located
j = ids_b.index(id_a)
# store in that position the value taken from 'ids_a'
vals_in_b[j] = vals_in_a[i]
In defense of my approach, here is the authoritative implementation:
import itertools as it
def pp():
la,lb = len(ids_a),len(ids_b)
ids = np.fromiter(it.chain(ids_a,ids_b),'<S6',la+lb)
unq,inv = np.unique(ids,return_inverse=True)
vals = np.empty(la,vals_in_a.dtype)
vals[inv[:la]] = vals_in_a
return vals[inv[la:]]
(juanpa()==pp()).all()
# True
timeit(juanpa,number=100)
# 3.1373191522434354
timeit(pp,number=100)
# 2.5256317732855678
That said, #juanpa.arrivillaga's suggestion can also be implemented better:
import operator as op
def ja():
return op.itemgetter(*ids_b)(dict(zip(ids_a,vals_in_a)))
(ja()==pp()).all()
# True
timeit(ja,number=100)
# 2.015202699229121
I tried the approaches by juanpa.arrivillaga and Paul Panzer. The first one is the fastest by far. It is also the simplest. The second one is faster than my original approach, but considerably slower than the first one. It also has the drawback that this line vals[inv_a] = vals_in_a stores floats into a U5 array, thus converting them into strings. It can be converted back to floats at the end, but I lose digits (unless I'm missing something obvious of course.
Here are the implementations:
def juanpa():
dict_ids_b = {_: i for i, _ in enumerate(ids_b)}
for i, id_a in enumerate(ids_a):
try:
vals_in_b[dict_ids_b[id_a]] = vals_in_a[i]
except KeyError:
pass
return vals_in_b
def Paul():
# 1) concatenate ids_a and ids_b
ids_ab = ids_a + ids_b
# 2) apply np.unique with keyword return_inverse=True
vals, idxs = np.unique(ids_ab, return_inverse=True)
# 3) split the inverse into inv_a and inv_b
inv_a, inv_b = idxs[:len(ids_a)], idxs[len(ids_a):]
# 4) map the values to match the order of uniques: vals[inv_a] = vals_in_a
vals[inv_a] = vals_in_a
# 5) use inv_b to pick the correct values: result = vals[inv_b]
vals_in_b = vals[inv_b].astype(float)
return vals_in_b

Displaying possible elements from list of list within the range of input floats

Let's say i have a list of list containing:
L = [['10.2','9.1','G'],['12.9','7.4','H'],['5.6','4.3','G'],['5.7','4.5','G']]
where the alphabets in each list within the list of list represents something like 'type'
In this case, python will request for the user input of four float numbers separated by ':', for example;
input = 5.5:4.4:5.7:4.7
Before python proceed on dealing with the input, as shown in the list of list, the alphabets in each list at the third section represents a type therefore;
For example, upon user input, python will compare the number of the input to the values in the list of list within the range of the user input of type 'G'.
Hence, python will output the list from the list of list in which the numbers are in range as the user input. So,
input = 5.5:4.4:5.7:4.6
output = [5.6,4.3] and [5.7,4.5]
note: the input consist of four float numbers separated by ':' and we can assume the first half is a set 5.5:4.4 and the second half is a set 5.7:4.6.
I gave it a try but i don't know how i would be able to output the list within range to the input.
L = [['10.2','9.1','G'],['12.9','7.4','H'],['5.6','4.3','G'],['5.8','4.5','G']]
userinput = input("Enter floats:") #example 5.5:4.4:5.7:4.6
strSeparate = userinput.split(':')
floatInput = [float(i) for i in strSeparate] #turn input into float
inputList = [floatInput[:2],floatInput[2:]] #[[5.5,4.4],[5.7,4.6]]
for line in L:
for val in inputList:#???
output format would be:
[[5.6,4.3],[5.7,4.5]]
You can do it as shown below.
First the user input is split on the :, the values are converted to floats, and an iterator is created to help pair the values with zip(). Then each pair is compared with the ranges in L. A pair lies within the range if both of its values lie between the upper and lower values of the range. Any pair that lies within the range is added to the results list.
L = [['10.2','9.1','G'],['12.9','7.4','H'],['5.6','4.3','G'],['5.8','4.5','G']]
inputs = [float(s) for s in '5.5:4.4:5.7:4.6'.split(':')]
it = iter(inputs)
results = []
for pair in zip(it, it):
for line in L:
if line[2] == 'G':
upper = float(line[0])
lower = float(line[1])
if ((lower <= pair[0] <= upper) and
(lower <= pair[1] <= upper)):
results.append([upper, lower])
print(results)
This will output:
[[5.6, 4.3], [5.8, 4.5]]
Note that this code will include duplicate values in results if there is more than one input pair that fall within a range. If this is not wanted you can use a set instead of a list for results, and add tuples to the set (because lists are not hashable).
Also this assumes that the upper and lower bounds for each sub list in L is in order (upper then lower). If that's not the case you can do this:
upper = float(line[0])
lower = float(line[1])
lower, upper = lower, upper if lower <= upper else upper, lower
The solution using numpy.arange() and numpy.any() funcions:
import numpy as np
L = [['10.2','9.1','G'],['12.9','7.4','H'],['5.6','4.3','G'],['5.7','4.5','G']]
userinput = "5.5:4.4:5.7:4.6" #example 5.5:4.4:5.7:4.6
floatInput = [float(i) for i in userinput.split(':')] #turn input into float
result = []
for i in (floatInput[0:2], floatInput[2:]):
r = np.arange(i[1], i[0], 0.1) # generating float numbers range
items = [l[0:2] for l in L
if isinstance(np.any([r[r >= float(l[0])], r[r >= float(l[1])]]), np.ndarray)
and l[0:2] not in result]
if (items): result.extend(items)
print(result)
The output:
[['5.6', '4.3'], ['5.7', '4.5']]

how to get the maximum value from a specific portion of a array in python

I have a specific scenario that I need to scan a specific portion of an array for a maximum value of that portion and return the position of that value with regards to the entire array.
for example
searchArray = [10,20,30,40,50,60,100,80,90,110]
I want to scan for the max value in portion 3 to 8, (40,50,60,100,80,90)
and then return the location of that value.
so in this case max value is 100 and location is 6
is there a way to get that using python alone or with help oy numpy
First slice your list and then use index on the max function:
searchArray = [10,20,30,40,50,60,100,80,90,110]
slicedArray = searchArray[3:9]
print slicedArray.index(max(slicedArray))+3
This returns the index of the sliced array, plus the added beginSlice
Try this...Assuming you want the index of the max in the whole list -
import numpy as np
searchArray = [10,20,30,40,50,60,100,80,90,110]
start_index = 3
end_index = 8
print (np.argmax(searchArray[start_index:end_index+1]) + start_index)
Use enumerate to get an enumerated list of tuples (actually it's a generator, which means that it always only needs memory for one single entry and not for the whole list) holding the indexes and values, then use max with a custom comparator function to find the greatest value:
searchArray = [10,20,30,40,50,60,100,80,90,110]
lower_bound = 3 # the lower bound is inclusive, i.e. element 3 is the first checked one
upper_bound = 9 # the upper bound is exclusive, i.e. element 8 (9-1) is the last checked one
max_index, max_value = max(enumerate(searchArray[lower_bound:upper_bound], lower_bound),
key=lambda x: x[1])
print max_index, max_value
# output: 6 100
See this code running on ideone.com
I'd do it like this:
sliced = searchArray[3:9]
m = max(sliced)
pos = sliced.index(m) + 3
I've added an offset of 3 to the position to give you the true index in the unmodified list.
With itemgetter:
pos = max(enumerate(searcharray[3:9], 3), key=itemgetter(1))[0]
i guess this what you want
maxVal = max(searchArray[3:8]) // to get max element
position = searchArray.index(max(ary[3:8])) //to get the position of the index

Creating fixed length numpy array from variable length lists

I have been generating variable length lists, a simplified example.
list_1 = [5] * 5
list_2 = [8] * 10
I then want to convert to np.array for manipulation. As such they need to be the same length (e.g. 1200), with the tail either populated with zeros, or truncated at the target length.
For a fixed length of 8, I considered setting up a zero array then filling the appropriate entries:
np_list_1 = np.zeros(8)
np_list_1[0:5] = list_1[0:5]
np_list_2 = np.zeros(8)
np_list_2[0:8] = list_2[0:8] # longer lists are truncated
I've created the following function to produce these
def get_np_fixed_length(list_like, length):
list_length = len(list_like)
np_array = np.zeros(length)
if list_length <= length:
np_array[0:list_length] = list_like[:]
else:
np_array[:] = list_like[0:length]
return np_array
Is there a more efficient way to do this? (I couldn't see anything in numpy documentation)
You did the job already. You may save some lines by using min function.
np_array = np.zeros(length)
l = min(length, len(list_like))
np_array[:l] = list_like[:l]
There is a bit more elegant way, by first creating an empty array, fill it up and then fill the tail with zeros.

How do I get an empty list of any size in Python?

I basically want a Python equivalent of this Array in C:
int a[x];
but in python I declare an array like:
a = []
and the problem is I want to assign random slots with values like:
a[4] = 1
but I can't do that with Python, since the Python list is empty (of length 0).
If by "array" you actually mean a Python list, you can use
a = [0] * 10
or
a = [None] * 10
You can't do exactly what you want in Python (if I read you correctly). You need to put values in for each element of the list (or as you called it, array).
But, try this:
a = [0 for x in range(N)] # N = size of list you want
a[i] = 5 # as long as i < N, you're okay
For lists of other types, use something besides 0. None is often a good choice as well.
You can use numpy:
import numpy as np
Example from Empty Array:
np.empty([2, 2])
array([[ -9.74499359e+001, 6.69583040e-309],
[ 2.13182611e-314, 3.06959433e-309]])
also you can extend that with extend method of list.
a= []
a.extend([None]*10)
a.extend([None]*20)
Just declare the list and append each element. For ex:
a = []
a.append('first item')
a.append('second item')
If you (or other searchers of this question) were actually interested in creating a contiguous array to fill with integers, consider bytearray and memoryivew:
# cast() is available starting Python 3.3
size = 10**6
ints = memoryview(bytearray(size)).cast('i')
ints.contiguous, ints.itemsize, ints.shape
# (True, 4, (250000,))
ints[0]
# 0
ints[0] = 16
ints[0]
# 16
It is also possible to create an empty array with a certain size:
array = [[] for _ in range(n)] # n equal to your desired size
array[0].append(5) # it appends 5 to an empty list, then array[0] is [5]
if you define it as array = [] * n then if you modify one item, all are changed the same way, because of its mutability.
x=[]
for i in range(0,5):
x.append(i)
print(x[i])
If you actually want a C-style array
import array
a = array.array('i', x * [0])
a[3] = 5
try:
[5] = 'a'
except TypeError:
print('integers only allowed')
Note that there's no concept of un-initialized variable in python. A variable is a name that is bound to a value, so that value must have something. In the example above the array is initialized with zeros.
However, this is uncommon in python, unless you actually need it for low-level stuff. In most cases, you are better-off using an empty list or empty numpy array, as other answers suggest.
The (I think only) way to assign "random slots" is to use a dictionary, e.g.:
a = {} # initialize empty dictionary
a[4] = 1 # define the entry for index 4 to be equal to 1
a['French','red'] = 'rouge' # the entry for index (French,red) is "rouge".
This can be handy for "quick hacks", and the lookup overhead is irrelevant if you don't have intensive access to the array's elements.
Otherwise, it will be more efficient to work with pre-allocated (e.g., numpy) arrays of fixed size, which you can create with a = np.empty(10) (for an non-initialized vector of length 10) or a = np.zeros([5,5]) for a 5x5 matrix initialized with zeros).
Remark: in your C example, you also have to allocate the array (your int a[x];) before assigning a (not so) "random slot" (namely, integer index between 0 and x-1).
References:
The dict datatype: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
Function np.empty(): https://numpy.org/doc/stable/reference/generated/numpy.empty.html

Categories