Creating an array of booleans from an array of numbers - python

So say we have some values:
data = np.random.standard_normal(size = 10)
I want my function to output an array which identifies whether the values in data are positive or not, something like:
[1, 0, 1, 1, 0, 1, 1, 0, 0, 0]
Ive tried
def myfunc():
for a in data > 0:
if a:
return 1
else:
return 0
But I'm only getting the boolean for the first value in the random array data, I don't know how to loop this function to ouput an array.
Thanks

You can do np.where, it's your friend:
np.where(data>0,1,0)
Demo:
print(np.where(data>0,1,0))
Output:
[1 0 1 1 0 1 1 0 0 0]
Do np.where(data>0,1,0).tolist() for getting a list with normal commas, output would be:
[1, 0, 1, 1, 0, 1, 1, 0, 0, 0]

It's very simple with numpy:
posvals = data > 0
>> [True, False, True, True, False, True, True, False, False, False]
If you explicitly want 1s and 0s:
posvals.astype(int)
>> [1, 0, 1, 1, 0, 1, 1, 0, 0, 0]

You can use ternary operators alongside list comprehension.
data = [10, 15, 58, 97, -50, -1, 1, -33]
output = [ 1 if number >= 0 else 0 for number in data ]
print(output)
This would output:
[1, 1, 1, 1, 0, 0, 1, 0]
What's happening is that either '1' or '0' is being assigned with the logic being if the number is bigger (or equal to) 0.
If you'd like this in function form, then it's as simple as:
def pos_or_neg(my_list):
return [ 1 if number >= 0 else 0 for number in data ]

You are attempting to combine an if and a for statement.
Seeing as you want to manipulate each element in the array using the same criteria and then return an updated array, what you want is the map function:
def my_func(data):
def map_func(num):
return num > 0
return map(map_func, data)
The map function will apply map_func() to each number in the data array and replace the current value in the array with the output from map_func()
If you explicitly want 1 and 0, map_func() would be:
def map_func(num):
if num > 0:
return 1
return 0

Related

Bit wise operator manipulation

I have a column within a dataframe that consists of either 1, 0 or -1.
For example:
<0,0,0,1,0,0,1,0,0,1,0,-1,0,0,1,0,0,-1,0,0,1,0,0,1,0,0,1,0,0,0,-1,0,0>
How can I create a new column in that dataframe where it is a sequence of 1s from the first 1 to the first -1. And then starts again at 0 until the next 1.
In this example, it would be:
<0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,0,0,1,1,1,1,0,0,1,1,1,1, 1,0,0>
Essentially I’m trying to create a trading strategy where I buy when the price is >1.25 and sell when goes below 0.5. 1 represents buy and -1 represents sell. If I can get it into the form above I can easily implement it.
Not sure how your data is stored but an algorithm similar to the following might work without the use of bitwise operators
x = [0,0,0,1,0,0,1,0,0,1,0,-1,0,0,1,0,0,-1,0,0,1,0,0,1,0,0,1,0,0,0,-1,0,0]
newcol = []
flag = 0
for char in x:
if char == 1 and flag == 0:
flag = 1
newcol.append(flag)
if char == -1 and flag == 1:
flag = 0
print(newcol)
Seems like a good use case for pandas:
import pandas as pd
s = pd.Series([0,0,0,1,0,0,1,0,0,1,0,-1,0,0,1,0,0,-1,0,0,1,0,0,1,0,0,1,0,0,0,-1,0,0])
s2 = s.groupby(s[::-1].eq(-1).cumsum()).cummax()
print(s2.to_list())
Output:
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]

In a one dimensional numpy array of 1's and 0's, how can I convert the next n elements following a 1 to 0's?

For a one dimensional numpy array of 1's and 0's, how can I effectively "mask" the array such that after the occurrence of a 1, the next n elements of the array are converted to zero. After the n elements have passed, the pattern repeats such that the first next occurrence of a 1 is preserved followed again by n zeros.
It is important that the first eligible occurrences of 1 are preserved, so a simple mask such as:
[true, false, false, true ...] won't work.
furthermore, the data set is massive so efficiency is important.
I've written crude python code to give me the desired results, but it is way too slow for what I need.
Here is an example:
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1]
n = 3
newData = []
tail = 0
for x in data:
if x == 1 and tail <= 0:
newData.append(1)
tail = n
else:
newData.append(0)
tail -= 1
print(newData)
newData: [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1]
Is there possibly a vectorized numpy solution to this problem?
I'm processing tens of thousands of arrays, with more than a million elements in each array. So far using numpy functions has been the only way to manage this.
As far as I know, there is no option completely in numpy to do this. You could still use numpy to reduce the time for grabbing the indices, though.
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1]
n=3
def get_new_data(data,n):
new_data = np.zeros(len(data))
non_zero = np.argwhere(data).ravel()
idx = non_zero[0]
new_data[idx] =1
idx += n
for i in non_zero[1:]:
if i > idx:
new_data[i] = 1
idx+=n
return new_data
get_new_data(data, n)
A function like this should give you a better run time since you are not looping over the whole array.
If this is still not optimal to you, you can look at using numba, which works very well with numpy and is relatively easy to use.
You could do it like this:-
N = 3
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1]
newData = data.copy()
i = 0
M = [0 for _ in range(N)]
while i < len(newData) - N:
if newData[i] == 1:
newData[i + 1:i + 1 + N] = M
i += N
i += 1
print(newData)

How to reduce higher values to 1s and sort the ones from right to left

I want to make a python program that quickly reduces a number greater than 1 in an array/list and places it in an empty spot before it. Say we have:
li = [4,1,0,0,0,1,3,0]
we'd get:
rtr = [1,1,0,1,1,1,1,0]
Note how the 4 turns into a 1 because it's already to the left and then the 3 gets divided into 2 positions before the 1 that has already been taken. Can anyone help me with this problem?
You can iterate the list from end to start and keep track of the sum you collect from the values. When you have a non zero sum, take 1 from it to populate the result list, and otherwise put a 0 in the result list.
Here is how that could work:
def spread(lst):
carry = 0
res = []
for i in reversed(lst):
carry += i
res.append(int(carry > 0))
if carry:
carry -= 1
return list(reversed(res))
lst = [4, 1, 0, 0, 0, 1, 3, 0]
print(spread(lst)) # [1, 1, 0, 1, 1, 1, 1, 0]
Using numpy
def fun(l):
s = np.array(l[::-1])
for i in range(len(s)):
if s[i] != 1 and s[i] != 0:
x = s[i+1:]
x[(x == 0).nonzero()[0][:s[i]-1]] = 1
s[i] = 1
return s[::-1].tolist()
print (fun([4,1,0,0,0,1,3,0]))
print (fun([0, 10]))
Output:
[1, 1, 0, 1, 1, 1, 1, 0]
[1, 1]

When iterating through a list inside of a function, why is 0 not 0.0?

I am trying to solve a problem in which I have to remove the zeroes (both 0 and 0.0) from a list and add them to the end of the list (the zero that is appended can be a 0, doesn't have to be 0.0). But the catch is that I must not remove False. I thought I finally figured out my problem by using is instead of == but for some reason it's not working:
arr = [9,0.0,0,9,1,2,0,1,0,1,0.0,3,0,1,9,0,0,0,0,9]
def move_zeros(array):
temp = array.copy()
j = 0
for p, i in enumerate(array):
if i is 0 or i is 0.0:
del temp[p - j]
j += 1
temp.append(0)
return temp
print(move_zeros(arr))
I've tried putting parentheses at the if statement but the result is the same. It removes all the 0 but for some reason it won't remove 0.0. Result:
[9, 0.0, 9, 1, 2, 1, 1, 0.0, 3, 1, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0]
I rewrote the function but outside of the function and for some reason it works:
array = [1, 0, 2, False, 0.0, 4]
for p, i in enumerate(array):
if i is 0 or i is 0.0:
del temp[p - j]
j += 1
temp.append(0)
And here the result is how I would expect it:
[1, 2, False, 4, 0, 0]
Why is the floating point zero being removed when executed outside of the function, but when the function move_zeros is called, the floating points are not recognized in the if statement?
You only handle floats and ints - so you can construct a new list from your input if the value is not 0:
arr = [9,0.0,0,9,1,2,0,1,0,1,0.0,3,0,1,9,0,0,0,0,9]
def move_zeros(array):
len_arr = len(array)
return ([ x for x in array if x != 0.0]+[0]*len_arr)[:len_arr]
# equivalent non-listcomprehension but simple for loop
def m_z(arr):
k = []
tmp = []
for e in arr:
if e/1.0 != 0.0:
k.append(e)
else:
tmp.append(0)
return k+tmp
print(move_zeros(arr))
Output:
[9, 9, 1, 2, 1, 1, 3, 1, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
If x is an integer only 0 and -0 can lead to 0.0 - if x is a float, only 0.0 and -0.0 can lead to 0.0 - simply do not put them into the output. .copy() is not needed because the list comprehension copies already for you.
The check for intergers and is works because python caches integers from -5 to 256 or so - they all get the same id() and hence is "seems" to work.
Only use is for None checks or if you know what you do, never for numbers.
If you want to leave non (ìnt,float) untouched you can check for that too:
arr = [9,0.0,0,False,9,1,2,0,1,0,1,0.0,3,0,1,9,0,0,0,0,9]
def move_zeros(array):
len_arr = len(array)
return ([ x for x in array if type(x) not in {int,float} or
x != 0.0]+[0]*len_arr)[:len_arr]
# [9, False, 9, 1, 2, 1, 1, 3, 1, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Read more:
What's with the Integer Cache inside Python?
Is there a difference between `==` and `is` in Python?
thruthyness: why False is 0 - Part 1
why False is 0 - Part 2
Small sanity check:
k = 365
i = 365 // 5 * 5 # if you do i = 365 only, they are ==/is True, this way: they are not
for elem in [0,False,k]:
for olem in [0,False,i]:
print(f"{str(elem):>8} is {str(olem):<10}: {str(elem is olem):<10} ")
print(f"{str(elem):>8} == {str(olem):<10}: {str(elem == olem):<10} ")
Output:
0 is 0 : True
0 == 0 : True
0 is False : False
0 == False : True
0 is 365 : False
0 == 365 : False
False is 0 : False
False == 0 : True
False is False : True
False == False : True
False is 365 : False
False == 365 : False
365 is 0 : False
365 == 0 : False
365 is False : False
365 == False : False
365 is 365 : False # k is i
365 == 365 : True # k == i
use ==, not is. is is for comparing identity (two "things" occupying the same space in memory, essentially), == is for equality (do two things have the same defining properties). Furthermore, with floats it is often useful to check whether things are "close enough", e.g.
is_equal = a-b < 0.00001

Count number of tails since the last head

Consider a sequence of coin tosses: 1, 0, 0, 1, 0, 1 where tail = 0 and head = 1.
The desired output is the sequence: 0, 1, 2, 0, 1, 0
Each element of the output sequence counts the number of tails since the last head.
I have tried a naive method:
def timer(seq):
if seq[0] == 1: time = [0]
if seq[0] == 0: time = [1]
for x in seq[1:]:
if x == 0: time.append(time[-1] + 1)
if x == 1: time.append(0)
return time
Question: Is there a better method?
Using NumPy:
import numpy as np
seq = np.array([1,0,0,1,0,1,0,0,0,0,1,0])
arr = np.arange(len(seq))
result = arr - np.maximum.accumulate(arr * seq)
print(result)
yields
[0 1 2 0 1 0 1 2 3 4 0 1]
Why arr - np.maximum.accumulate(arr * seq)? The desired output seemed related to a simple progression of integers:
arr = np.arange(len(seq))
So the natural question is, if seq = np.array([1, 0, 0, 1, 0, 1]) and the expected result is expected = np.array([0, 1, 2, 0, 1, 0]), then what value of x makes
arr + x = expected
Since
In [220]: expected - arr
Out[220]: array([ 0, 0, 0, -3, -3, -5])
it looks like x should be the cumulative max of arr * seq:
In [234]: arr * seq
Out[234]: array([0, 0, 0, 3, 0, 5])
In [235]: np.maximum.accumulate(arr * seq)
Out[235]: array([0, 0, 0, 3, 3, 5])
Step 1: Invert l:
In [311]: l = [1, 0, 0, 1, 0, 1]
In [312]: out = [int(not i) for i in l]; out
Out[312]: [0, 1, 1, 0, 1, 0]
Step 2: List comp; add previous value to current value if current value is 1.
In [319]: [out[0]] + [x + y if y else y for x, y in zip(out[:-1], out[1:])]
Out[319]: [0, 1, 2, 0, 1, 0]
This gets rid of windy ifs by zipping adjacent elements.
Using itertools.accumulate:
>>> a = [1, 0, 0, 1, 0, 1]
>>> b = [1 - x for x in a]
>>> list(accumulate(b, lambda total,e: total+1 if e==1 else 0))
[0, 1, 2, 0, 1, 0]
accumulate is only defined in Python 3. There's the equivalent Python code in the above documentation, though, if you want to use it in Python 2.
It's required to invert a because the first element returned by accumulate is the first list element, independently from the accumulator function:
>>> list(accumulate(a, lambda total,e: 0))
[1, 0, 0, 0, 0, 0]
The required output is an array with the same length as the input and none of the values are equal to the input. Therefore, the algorithm must be at least O(n) to form the new output array. Furthermore for this specific problem, you would also need to scan all the values for the input array. All these operations are O(n) and it will not get any more efficient. Constants may differ but your method is already in O(n) and will not go any lower.
Using reduce:
time = reduce(lambda l, r: l + [(l[-1]+1)*(not r)], seq, [0])[1:]
I try to be clear in the following code and differ from the original in using an explicit accumulator.
>>> s = [1,0,0,1,0,1,0,0,0,0,1,0]
>>> def zero_run_length_or_zero(seq):
"Return the run length of zeroes so far in the sequnece or zero"
accumulator, answer = 0, []
for item in seq:
accumulator = 0 if item == 1 else accumulator + 1
answer.append(accumulator)
return answer
>>> zero_run_length_or_zero(s)
[0, 1, 2, 0, 1, 0, 1, 2, 3, 4, 0, 1]
>>>

Categories