How can I write a code that changes the values of each individual arrays within the multidimensional array a to zeroes right after there was a negative value. So the second array within a has a negative value [12,34,5,6,88,-10,30,75] of -10 that would turn all the values of that and the values right after it to zeroes. Turning the array into [12,34,5,6,88,0,0,0]. How would I be able to get my Expected Output?
import numpy as np
a = np.array([[12,45,50,60,30],
[12,34,5,6,88,-10,30,75],
[3,45,332,45,-12,-4,-64,12],
[12,45,3,22,323]])
Expected Output:
[[12,45,50,60,30],
[12,34,5,6,88,0,0,0],
[3,45,332,45,0,0,0,0],
[12,45,3,22,323]]
try this:
import numpy as np
a = np.array([[12,45,50,60,30],
[12,34,5,6,88,-10,30,75],
[3,45,332,45,-12,-4,-64,12],
[12,45,3,22,323]], dtype='object')
for l in a:
for i in l:
if i<0:
l[l.index(i):] = [0] * len(l[l.index(i):])
a
output:
array([list([12, 45, 50, 60, 30]), list([12, 34, 5, 6, 88, 0, 0, 0]),
list([3, 45, 332, 45, 0, 0, 0, 0]), list([12, 45, 3, 22, 323])],
dtype=object)
second solution:
import numpy as np
def neg_to_zero(l):
for i in l:
if i<0:
l[l.index(i):] = [0] * len(l[l.index(i):])
a = np.array([[12,45,50,60,30],
[12,34,5,6,88,-10,30,75],
[3,45,332,45,-12,-4,-64,12],
[12,45,3,22,323]], dtype='object')
list(map(neg_to_zero, a))
a
Your array:
In [608]: a = np.array([[12,45,50,60,30],
...: [12,34,5,6,88,-10,30,75],
...: [3,45,332,45,-12,-4,-64,12],
...: [12,45,3,22,323]])
<ipython-input-608-894f7005e102>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
a = np.array([[12,45,50,60,30],
In [609]: a
Out[609]:
array([list([12, 45, 50, 60, 30]), list([12, 34, 5, 6, 88, -10, 30, 75]),
list([3, 45, 332, 45, -12, -4, -64, 12]),
list([12, 45, 3, 22, 323])], dtype=object)
This contains lists the vary in length. It is not multidimensional. Making it an array, as opposed to leaving it as a list of lists, does not make it any easier to process.
Either way you have to iterate, and change each list separately.
First, pay attention to the answer by hpaulj. Don't use numpy if your data is unsuitable. Your data is unsuitable for numpy because you have a list of lists where each contained list has a different length. It would be suitable for numpy if all had the same length (matrix shape).
To the problem itself: reduce it to solving the task on a single list, then transform each list.
data = [
[12, 45, 50, 60, 30],
[12, 34, 5, 6, 88, -10, 30, 75],
[3, 45, 332, 45, -12, -4, -64, 12],
[12, 45, 3, 22, 323]
]
for row in data:
transform(row)
The algorithm: we'll iterate over the list, and when we find the negative element, we know the current position, and then we can set all following elements.
I'll show you two variants.
The first variant uses slicing. It also uses enumerate(), which gives you (index, value) tuples for a list (or other iterable).
def transform(lst):
for (index, value) in enumerate(lst):
if value < 0:
lst[index:] = [0] * (len(lst) - index)
It creates a new list filled with zeroes by multiplying [0] (a 1-element list) by the length of the remainder. Then it assigns that to a slice of the list that is being transformed. This slice assignment changes the list itself.
The second variant works with a bit of "state":
def transform(lst):
do_overwrite = False
for (index, value) in enumerate(lst):
if value < 0:
do_overwrite = True # "flips a switch", stay on
if do_overwrite:
lst[index] = 0
Python lists are objects, like pretty much everything else in python. That means, when you call a function and pass a list as an argument, the list isn't copied, but the function call gets that list object to work with. Any changes to that object... are "visible" to the caller, because it is the same list object that is handled.
Related
I have a vector of the following form:
import numpy as np
vec = np.array([2, 2, 2, 51, 51, 52, 52, 14, 14, 14, 51, 51, 52, 52])
Is there a numpy-thonic way to find the index of the first occurrence of a value that is not (for instance) 51 or 52? In other words, a function that would return the following indexes: [0, 7], where 0 is the index of the first apparition of 2, and 7 is the index of the first apparition of 14.
np.unique returns the first index of each number if you specify return_index=True. You can filter the result pretty easily using, e.g., np.isin:
u, i = np.unique(vec, return_index=True)
result = i[np.isin(u, [51, 52], invert=True)]
The advantage of doing it this way is that u is a significantly reduced search space compared to the original data. Using invert=True also speeds things up a little compared to explicitly negating the resulting mask.
A version of np.isin that relies on the fact that the data is already sorted could be made using np.searchsorted like this:
def isin_sorted(a, i, invert=False):
ind = np.searchsorted(a, i)
ind = ind[a[ind.clip(max=a.size)] == i]
if invert:
mask = np.ones(a.size, dtype=bool)
mask[ind] = False
else:
mask = np.zeros(a.size, dtype=bool)
mask[ind] = True
return mask
You could use this version in place of np.isin, after calling np.unique, which always returns a sorted array. For sufficiently large vec and exclusion lists, it will be more efficient:
result = i[isin_sorted(u, [51, 52], invert=True)]
import numpy as np
vec = np.array([2, 2, 2, 51, 51, 52, 52, 14, 14, 14, 51, 51, 52, 52])
first_occurrence = []
for x in np.unique(vec):
if x not in [51,52]:
first_occurrence.append(np.argmax(x==vec))
argmax finds the index of the first occurrence of the maximum (i.e. True) in the boolean array x==vec. As x is from vec it is guaranteed that there is minimum one True value.
Performance depends on the size of vec and on how many values to find. This simple loop method (blue) outperforms the accepted answer (green and orange) for larger arrays, especially for small numbers of values to find like in the example (for the given toy example it's in fact 1.7 times faster) (source).
It turns out the using unique with index=True is relatively slow, another factor for larger arrays is the memory allocation for the mask.
The following code snippet shows how to initialize a python array from various container classes (tuple, list, dictionary, set, etc...)
import array as arr
ar_iterator = arr.array('h', range(100))
ar_tuple = arr.array('h', (0, 1, 2,))
ar_list = arr.array('h', [0, 1, 2,])
ar_list = arr.array('h', {0:None, 1:None, 2:None}.keys())
ar_set = arr.array('h', set(range(100)))
ar_fset = arr.array('h', frozenset(range(100)))
The array initialized from range(100) is particularly nice because an iterator does not need to store a hundred elements. It can simply store the current value and a transition function describing how to calculate the next value from the current value (add one to the current value every-time __next__ is called).
However, what if the initial values of an array do not follow a simple pattern, such as counting upwards 0, 1, 2, 3, 4, ..., 99? An iterator might not be practical. It makes no sense to create a list, copy the list to the array, and then delete the list. You have essentially created the array twice and copied it unnecessarily. Is there someway to construct an array directly, by passing in the initial values?
From the python docs (https://docs.python.org/3/library/array.html):
class array.array(typecode[, initializer])
A new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, a bytes-like object, or iterable over elements of the appropriate type.
So it would appear that you are constrained to passing in an initial python container.
Assuming that the initial elements can be derived logically, you could pass a generator as the initialiser. Generators yield their elements as they are iterated over, similar to range.
>>> def g():
... for _ in range(10):
... yield random.randint(0, 100)
...
>>> arr = array.array('h', g())
>>> arr
array('h', [47, 6, 91, 0, 76, 20, 77, 75, 46, 7])
For simple cases, a generator expression can be used:
>>> arr = array.array('h', (random.randint(0, 100) for _ in range(10)))
>>> arr
array('h', [72, 30, 40, 58, 77, 74, 25, 6, 71, 58])
I want to go through each element of an array I've created. However, I'm doing some debugging and it's not. Here's what I have so far and what it's printing out.
def prob_thirteen(self):
#create array of numbers 2-30
xcoords = [range(2,31)]
ycoords = []
for i in range(len(xcoords)):
print 'i:', xcoords[i]
output:
i: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
Why does 'i' return my whole array and not just the first element: 2? I'm not sure why this is returning my whole array.
xcoords = [range(2,31)]
This line will create an array of length 1. The only element in that array is an array of the numbers 2 -> 30. Your loop is printing the elements of the outer array. Change that line to:
xcoords = range(2,31)
This answer is correct for Python 2 because the range function returns a list. Python 3 will return a range object (which can be iterated on producing the required values). The following line should work in Python 2 and 3:
xoords = list(range(2,31))
First of all, change xcoords so that it isn't a list inside a list:
xcoords = range(2, 31)
We don't need to iterate over a list using an index into the list using len(xcoords). In python we can simply iterate over a list like this:
for coord in xcoords:
print "i: ", coord
If we did need to keep track of the index we could use enumerate:
for i, coord in enumerate(xcoords):
print str(i) + ":", coord
I'm trying to randomly select items from a list and add them to another list.
The list of elements I'm choosing from looks like this:
data=[2,3,4,7,8,12,17,24,27,33,35,36,37,38,40,43,44,50,51,54]
I want to randomly take an element from this list and add it to one of four lists until each list has the same number of elements.
lists=[[1,'x','x','x','x','x'],[3,'x','x','x','x','x'],[5,'x','x','x','x','x'],[7,'x','x','x','x','x']]
I have tried using random.choice but this gives me duplicates:
def fill_lists(data):
for list in lists:
for n,i in enumerate(list):
if i=='x':
list[n]= random.choice(data)
I want my function to return a list that contains 4 lists each containing a random sample of the data list with no duplicates. I also want the first element of each list to be a value that I have already placed into the list.
import random
data=[2,3,4,7,8,12,17,24,27,33,35,36,37,38,40,43,44,50,51,54]
random.shuffle(data)
lists = [data[i:i+len(data)/4] for i in range(0, len(data), len(data)/4)]
print(lists)
Randomly pulling from your initial list will have the same effect as shuffling then pulling in order. Splitting into sublists can then be done. If you need the sublists sorted, just map sort over the list afterwards.
You can change the number of groups by altering the divisor of len(data)/4
Edit: I missed this part of your question:
heads = [1,3,5,7]
[q.insert(0,p) for p,q in zip(heads,lists)]
You can use random.sample:
data=[2,3,4,7,8,12,17,24,27,33,35,36,37,38,40,43,44,50,51,54]
random.sample(data, 5)
# [27, 12, 33, 24, 17]
To get a nested list of it, use a list comprehension
[random.sample(data, 5) for _ in range(5)]
# [[40, 35, 24, 54, 17],
# [17, 54, 35, 43, 37],
# [40, 4, 43, 33, 44],
# [51, 37, 35, 33, 8],
# [54, 4, 44, 27, 50]]
Edit: The above won't give you unique values; you should accept the above answer for the unique values. I interpreted the question wrong!
another shuffle based but ensuring the all sub lists have the same size in case number of elements is not divisible to number of lists (try 7 for example).
from random import shuffle
def split(data, n):
size=int(len(data)/n);
for i in range(0, n*size, size):
yield data[i:i+size]
data=[2,3,4,7,8,12,17,24,27,33,35,36,37,38,40,43,44,50,51,54]
shuffle(data)
list(split(data, 5))
You could try this, modifying the ranges inside the d function to tune to the number elements you want.
import random
def f(data):
val = random.choice(data)
ix = data.index(val)
data.pop(ix)
return val, data
def d(data):
topholder = []
m = len(data)/4
for i in range(4):
holder = []
for n in range(m):
holder.append(f(data)[0])
topholder.append(holder)
return topholder
d(data)
This will always give you 4 lists of randomly sampled values without duplication.
This is a dynamic function that returns a list of list where each list starts with a specified value. The amount of nested lists is determined by the amount of starting_values.
import random
def get_random_list(element_list, starting_values, size_per_group):
num_of_groups = len(starting_values)
size_per_group -= 1
total_elements = num_of_groups * size_per_group
random_data = random.sample(element_list, total_elements)
return [[starting_values[x]] + random_data[x * size_per_group:(x + 1) * size_per_group] for x in range(num_of_groups)]
data = [2, 3, 4, 7, 8, 12, 17, 24, 27, 33, 35, 36, 37, 38, 40, 43, 44, 50, 51, 54]
print(get_random_list(data, starting_values=[1, 2, 3, 4, 5, 6], size_per_group=2))
# OUTPUT: [[1, 36], [2, 54], [3, 17], [4, 7], [5, 35], [6, 33]]
print(get_random_list(data, starting_values=[9, 3, 5], size_per_group=6))
# OUTPUT: [[9, 54, 2, 7, 38, 24], [3, 35, 8, 37, 40, 17], [5, 44, 4, 27, 50, 3]]
It works for Python2.x and Python3.x but for Python2.x you should change range() to xrange() for better use of memory.
Suppose I have an array/list/string, e.g. arr=[0,1,2,3,...,97,98,99]
How do I slice it such that the output are contiguous chunks stepped by a certain amount, e.g.:
out = [0,1,10,11,20,21..]
I've tried variations on out = arr[(0,1)::10] but to no avail. Am I missing something really simple?
First of all: which type are you interested in? numpy arrays allow extended indexing while python built-ins (i.e. list, tuple, str etc) do not.
If you want a solution that works for any one-dimensional sequence, then simply use:
from itertools import chain
result = list(chain.from_iterable(seq[i:i+step] for i in range(0, len(seq), step2))
In your case you want step to be 2 and step2 to be 10.
In any case for generic sequences you must do one slice for each consecutive portion you want to select, so I don't think you can do much better than this.
For numpy arrays you could reshape the array into a multidimensional array such that the continuguous parts are all at the start of a row and select the first portions of the rows:
In [1]: import numpy as np
In [2]: seq = np.array(range(100))
In [3]: seq2 = seq.reshape((10, 10))
In [4]: seq2[:, :2]
Out[4]:
array([[ 0, 1],
[10, 11],
[20, 21],
[30, 31],
[40, 41],
[50, 51],
[60, 61],
[70, 71],
[80, 81],
[90, 91]])
In [5]: seq2[:, :2].reshape((2*10,))
Out[5]:
array([ 0, 1, 10, 11, 20, 21, 30, 31, 40, 41, 50, 51, 60, 61, 70, 71, 80,
81, 90, 91])
(There are many ways to reshape and flatten the result; read the numpy documentation if you are interested).
Note however that this will fail if the slices overlap, while the first solution works (repeating some elements, but that's what should happen).
If you don't care for overlapping slices (i.e. slices never overlap), then you can simply do:
indices = frozenset(range(step))
result = [el for i, el in enumerate(seq) if i % step2 in indices]
This may seem more efficient than doing multiple slicing, but I wouldn't be that sure because here you need an indexing operation per element instead of one per slice. Especially in CPython this may not be more efficient than the first solution especially if step is big.
From this last idea you could also do something to avoid reshapeing the numpy array:
indices = frozenset(range(step))
arr = np.array(i % step2 in indices for i in range(len(seq)))
result = seq[arr]
However I cannot think of a simple and efficient way of building the arr array of indices, so I doubt it improves performance.