Add values when there is a gap between elements - python

I have a list defined as
A = [1.0, 3.0, 6.0, 7.0, 8.0]
I am trying to fill the gap between the elements of the list with zero values. A gap is an increment between elements that is more than one. So for instance between 1.0 and 3.0 there is one gap: 2.0 and between 3.0 and 6.0 there are two gaps:4.0 and 5.0
I am working with this code but it is not complete and I am missing adding multiple values when the gap is bigger than one increment
B = []
cnt = 0
for i in range(len(A)-1):
if A[i] == A[i+1] - 1:
B.append(A[cnt])
cnt += 1
if A[i] != A[i+1] - 1:
B.append(A[cnt])
B.append(0.0)
cnt += 1
The output of this code is:
B = [1.0, 0.0, 3.0, 0.0, 6.0, 7.0]
But since there are two gaps between 3.0 and 6.0 I need B to look like this:
B = [1.0, 0.0, 3.0, 0.0, 0.0, 6.0, 7.0]
I am a bit stuck on how to do this and I already have a feeling that my code is not very optimized. Any help is appreciated!

You can use a list comprehension. Assuming your list is ordered, you can extract the first and last indices of A. We use set for O(1) lookup complexity within the comprehension.
A = [1.0, 3.0, 6.0, 7.0, 8.0]
A_set = set(A)
res = [i if i in A_set else 0 for i in range(int(A[0]), int(A[-1])+1)]
print(res)
[1, 0, 3, 0, 0, 6, 7, 8]
However, for larger arrays I'd recommend you use a specialist library such as NumPy:
import numpy as np
A = np.array([1.0, 3.0, 6.0, 7.0, 8.0]).astype(int)
B = np.zeros(A.max())
B[A-1] = A
print(B)
array([ 1., 0., 3., 0., 0., 6., 7., 8.])

Based on comments to the question, I can suggest the following solution:
B = [float(x) if x in A else 0.0 for x in range(int(min(A)), int(max(A)) + 1)]

Related

How do I count the last elements in a list after a certain element?

I have a python list containing zeros and ones like this:
a = [1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0]
I know how to count the ones and zeros in this, but what I can't manage to figure out is how to count the last zeros after the last 1.0 in that list. In this case the solution would be "2".
I would like to have a simple code which I can use for this problem in order to put it in a loop.
I hope someone can help me with that. Thank you!
Try this:
a = [1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0]
a[::-1].index(1)
You reverse the list, and take the index of the first 1.
An alternative using functional programming:
from itertools import takewhile
from operator import eq
from functools import partial
equal_0 = partial(eq, 0)
a = [1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0]
res = sum(1 for _ in takewhile(equal_0, reversed(a)))
print(res)
Output
2
Another solution, if you want to use explicit for-loop:
a = [1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0]
cnt = 0
for v in reversed(a):
if v:
break
cnt += 1
print(cnt)
Prints:
2

insert element in a list after every 2 element

I have this code:
l = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
def add(l, n):
lret = l.copy()
i = n
while i <= len(lret):
lret.insert(i, 0.0)
i += (n+1)
return lret
lret = add(l,2)
print(lret)
Is a way to make the add() function a one-line function?
You can use zip() to build tuples that contain n elements from the original list with an added element, and then use chain.from_iterable() to flatten those tuples into the final list:
from itertools import repeat, chain
def add(lst, n):
return list(chain.from_iterable(zip(*(lst[s::n] for s in range(n)), repeat(0.0))))
This outputs:
[1.0, 2.0, 0.0, 3.0, 4.0, 0.0, 5.0, 6.0, 0.0]
Possible solution if you want the whole block in one line:
return [x for y in (l[i:i + n] + [0.0] * (i < len(l) - n + 2) for
i in xrange(0, len(l), n)) for x in y]
Gives the same output
[1.0, 2.0, 0.0, 3.0, 4.0, 0.0, 5.0, 6.0, 0.0]
Found another way to do it:
def add(l,n=2):
return [item for sublist in [(e,) if i % n == 0 or i == 0 else (e, 0.0) for i, e in enumerate(l)] for item in sublist]
[1.0, 2.0, 0.0, 3.0, 4.0, 0.0, 5.0, 6.0, 0.0]

Is there a way to add list elements that is equal to a given number with higher elements

I am trying to add list elements that are closest or equal to 15
I am assuming 1st element from the list as total.
It should add in total with 3rd element from top to bottom.
If total > 15 then it should not add in total and go for the next loop.
I am trying the below code, could you suggest here what I am doing wrong -
list1 = [
[5.0, 1.3, 6.6, 5.076923076923077],
[9.0, 1.5, 7.0, 4.666666666666667],
[4.0, 1.0, 4.0, 4.0],
[3.0, 2.0, 5.5, 2.75],
[7.0, 1.6, 3.5, 2.1875],
[2.0, 1.7, 3.5, 2.058823529411765],
[1.0, 3.0, 6.0, 2.0],
[6.0, 1.0, 2.0, 2.0],
[8.0, 2.5, 5.0, 2.0],
[10.0, 1.8, 1.0, 0.5555555555555556]
]
income = 15
total = 0
for i in list1:
if not (total + i[1] > 15):
total += i[1]
print(total)
the output should be 14.9
The problem is that you use a break.
You have to check that adding the current
number in your loop will not result in the
total sum being more than 15.
income = 15
total = 0
for i in list1:
if not (total + i[1] > income):
total += i[1]
But this code will not always work. because number might come in different orders there might be an order were it adds up to exactly 15 but that's a bit more complicated.

query dataframe column on array values

traj0
Out[52]:
state action reward
0 [1.0, 4.0, 6.0] 3.0 4.0
1 [4.0, 6.0, 11.0] 4.0 5.0
2 [6.0, 7.0, 3.0] 3.0 22.0
3 [3.0, 3.0, 2.0] 1.0 10.0
4 [2.0, 9.0, 5.0] 2.0 2.0
Suppose I have a pandas dataframe looking like this where the state column has as its entries, 3-element numpy arrays.
How can I query for the row that has state as np.array([3.0,3.0,2.0]) here?
I know traj0.query("state == '[3.0,3.0,2.0]'") works, I know. But I don't want to hardcode the array value in my query.
I'm looking for something like
x = np.array([3.0,3.0,2.0])
traj0.query('state ==' + x)
=============
It's not a duplicate question because my previous question pandas query with a column consisting of array entries was only for the case where there was only one value in each array. Here I'm looking for if the arrays have multiple values.
You can do this with df.loc and a lambda function using numpy.array_equal:
x = [1., 4., 6.]
traj0.loc[df.state.apply(lambda a: np.array_equal(a, x))]
Basically this checks each element of the state column for equivalence to x and returns only those rows where the column matches.
Example
df = pd.DataFrame(data={'state': [[1., 4., 6.], [4., 5., 6.]],
'value': [5, 6]})
print(df.loc[df.state.apply(lambda a: np.array_equal(a, x))])
state value
0 [1.0, 4.0, 6.0] 5
import numpy as np
import pandas as pd
df = pd.DataFrame([[np.array([1.0, 4.0, 6.0]), 3.0, 4.0],
[np.array([4.0, 6.0, 11.0]), 4.0, 5.0],
[np.array([6.0, 7.0, 3.0]), 3.0, 22.0],
[np.array([3.0, 3.0, 2.0]), 1.0, 10.0],
[np.array([2.0, 9.0, 5.0]), 2.0, 2.0]
], columns=['state','action','reward'])
x = str(np.array([3.0, 3.0, 2.0]))
df[df.state.astype(str) == x]
// to use pd.query
df['state_str'] = df.state.astype(str)
df.query("state_str == '{}'".format(x))
Output
state action reward
3 [3.0, 3.0, 2.0] 1.0 10.0
Best not to use pd.DataFrame.query here. You can perform a vectorised comparison and then use Boolean indexing:
x = [3, 3, 2]
mask = (np.array(df['state'].values.tolist()) == x).all(1)
res = df[mask]
print(res)
state action reward
3 [3.0, 3.0, 2.0] 1.0 10.0
In general, you shouldn't store lists or arrays within a Pandas series. This is inefficient and removes the possibility of direct vectorised operations. Here, we've had to convert to a NumPy array explicitly for a simple comparison.

identify patterns within an array in python

I've got a question on identifying patterns within an array. I'm working with the following array:
A = [1.0, 1.1, 9.0, 9.2, 0.9, 9.1, 1.0, 1.0, 1.2, 9.2, 8.9, 1.1]
Now, this array is clearly made of elements clustering about ~1 and elements about ~9.
Is there a way to separate these clusters? I.e., to get to something like:
a_1 = [1.0, 1.1, 0.9, 1.0, 1.0, 1.2, 1.1] # elements around ~1
a_2 = [9.0, 9.2, 9.1, 9.2, 8.9] # elements around ~9
Thanks a lot. Best.
You can do that by comparing each element with which is closer. Is it closer to 1 or 9:
a_1 = [i for i in A if abs(i-1)<=abs(i-9)]
a_2 = [i for i in A if abs(i-1)>abs(i-9)]
But of course this is not a general solution for clustering. It only work in this case when you know the center of the cluster (1 and 9).
If you don't know the center of the cluster, I think you should use a clustering algorithm like K-Means
This is a simple K-Means implementation (with k=2 and 100 as limit iteration). You didn't need to know the center of the cluster, it picks randomly at first.
from random import randint
A = [1.0, 1.1, 9.0, 9.2, 0.9, 9.1, 1.0, 1.0, 1.2, 9.2, 8.9, 1.1]
x = A[randint(0,len(A)-1)]
y = A[randint(0,len(A)-1)]
for _ in range(100):
a_1 = [i for i in A if abs(i-x)<=abs(i-y)]
a_2 = [i for i in A if abs(i-x)>abs(i-y)]
print(x,y)
x = sum(a_1)/len(a_1)
y = sum(a_2)/len(a_2)
print a_1
print a_2

Categories