Python, appending to a list using conditions - python

I have a file with two columns, lets say A and B
A B
1 10
0 11
0 12
0 15
1 90
0 41
I want to create a new column (a list), lets call the empty list C = []
I would like to loop through A, find if A == 1, and if it is I want to append the value of B[A==1] (10 in the first case) to C until the next A == 1 arrives.
So my final result would be:
A B C
1 10 10
0 11 10
0 12 10
0 15 10
1 90 90
0 41 90
I have tried using the for loop, but only to my dismay:
for a in A:
if a == 1:
C.append(B[a==1])
elif a == 0:
C.append(B[a==1])

You could use another variable to keep the value of the last index in A that had a value of 1, and update it when the condition is met:
temp = 0
for index, value in enumerate(A):
if value == 1:
C.append(B[index])
temp = index
else:
C.append(B[temp])
enumerate() gives you a list of tuples with index and values from an utterable.
For A, it will be [(0, 1), (1, 0), (2, 0), (3, 0), (4, 1), (5, 0)].
P.S: When you try to address a list using a boolean (B[a == 1]) it will return the item in the first place if the condition is false (B[a != 1] => B[False] => B[0]) or the item in the second place if it's true (B[a == 1] => B[True] => B[1]).

You may also try using groupby.
Though solution I have come up with looks a bit convoluted to me:
>>> from itertools import izip, groupby, count
>>> from operator import itemgetter
>>> def gen_group(L):
acc = 0
for item in L:
acc += item
yield acc
>>> [number_out for number,length in ((next(items)[1], 1 + sum(1 for _ in items)) for group,items in groupby(izip(gen_group(A), B), itemgetter(0))) for number_out in repeat(number, length)]
[10, 10, 10, 10, 90, 90]
The idea is to prepare groups and then use them to group your input:
>>> list(gen_group(A))
[1, 1, 1, 1, 2, 2]

Related

Find largest cycle of 'TRUE' booleans

I need to count the largest cycle of 'TRUE' in a boolean
I have a boolean Series with several TRUE sequences. I would like to be able to identify the largest cycle of TRUE values.
E.G: [0,0,1,1,0,0,0,0,0,0,1,1,1,1,1]
I would like to have the cycle: [10,14]
My first approach would be to compare element by element and take the index of each true value. The problem I see with this it's that I'm working with a considerably large dataset so I'm afraid it will take a long time.
Do you guys have any other idea that might work?
Thanks :)
One possible solution with no loops is count consecutive 1 or Trues and get indices maximal values, last add maximal values for starts of 1s groups:
s = pd.Series([0,0,1,1,0,0,0,0,0,0,1,1,1,1,1])
print (s)
a = s == 1
b = a.cumsum()
c = b.sub(b.mask(a).ffill().fillna(0)).astype(int)
print (c)
0 0
1 0
2 1
3 2
4 0
5 0
6 0
7 0
8 0
9 0
10 1
11 2
12 3
13 4
14 5
dtype: int32
m = c.max()
idx = c.index[c == m]
print (idx)
Int64Index([14], dtype='int64')
out = list(zip(idx - m + 1, idx))
print (out)
[(10, 14)]
Another idea with itertools.groupby - create lists for groups with 1 and enumerate for counter, then get list with maximim length and get minimal and maximal indices:
s = pd.Series([0,0,1,1,0,0,0,0,0,0,1,1,1,1,1])
print (s)
from itertools import groupby
a = [ list(group) for key, group in groupby(enumerate(s), key= lambda x:x[1]) if key]
print (a)
[[(2, 1), (3, 1)], [(10, 1), (11, 1), (12, 1), (13, 1), (14, 1)]]
L=[x[0] for x in max(a, key=len)]
out = [min(L), max(L)]
print (out)
[10, 14]
It looks like you'll have to go through the whole dataset somehow. But you don't need the index of each True value. You just need the index of the final one in the longest streak.
Note that if there's a tie, this will only print the latest one.
my_bools = [0,0,1,1,0,0,0,0,0,0,1,1,1,1,1]
max_streak = 0
cur_streak = 0
max_streak_idx = -1
listlen = len(my_bools)
for x in range(0, listlen):
if my_bools[x]:
cur_streak += 1
if cur_streak > max_streak:
max_streak_idx = x
max_streak += 1
else:
cur_streak = 0
print(x, cur_streak, max_streak)
if max_streak_idx == -1:
print("No trues found")
else:
print("Start of max = ", max_streak_idx - max_streak + 1, "End of max = ", max_streak_idx)

How to use 2 index variable in a single for loop in python

In C language we can use two index variable in a single for loop like below.
for (i = 0, j = 0; i < i_max && j < j_max; i++, j++)
Can some one tell how to do this in Python ?
With zip we can achieve this.
>>> i_max = 5
>>> j_max = 7
>>> for i, j in zip(range(0, i_max), range(0, j_max)):
... print str(i) + ", " + str(j)
...
0, 0
1, 1
2, 2
3, 3
4, 4
If the first answer does not suffice; to account for lists of different sizes, a possible option would be:
a = list(range(5))
b = list(range(15))
for i,j in zip(a+[None]*(len(b)-len(a)),b+[None]*(len(a)-len(b))):
print(i,j)
Or if one wants to cycle around shorter list:
from itertools import cycle
for i,j in zip(range(5),cycle(range(2)):
print(i,j)
One possible way to do that is to iterate over a comprehensive list of lists.
For example, if you want to obtain something like
for r in range(1, 3):
for c in range(5, 7):
print(r, c)
which produces
# 1 5
# 1 6
# 2 5
# 2 6
is by using
for i, j in [[_i, _j] for _i in range(1, 3) for _j in range(5, 7)]:
print(i, j)
Maybe, sometimes one more line is not so bad.
You can do this in Python by using the syntax for i,j in iterable:. Of course in this case the iterable must return a pair of values. So for your example, you have to:
define the list of values for i and for j
build the iterable returning a pair of values from both lists: zip() is your friend in this case (if the lists have different sizes, it stops at the last element of the shortest one)
use the syntax for i,j in iterable:
Here is an example:
i_max = 7
j_max = 9
i_values = range(i_max)
j_values = range(j_max)
for i,j in zip(i_values,j_values):
print(i,j)
# 0 0
# 1 1
# 2 2
# 3 3
# 4 4
# 5 5
# 6 6

Show the number that appears the most times in a row

How to code a program that shows me the item that appears most side-by-side?
Example:
6 1 6 4 4 4 6 6
I want four, not six, because there are only two sixes together.
This is what I tried (from comments):
c = int(input())
h = []
for c in range(c):
h.append(int(input()))
final = []
n = 0
for x in range(c-1):
c = x
if h[x] == h[x+1]:
n+=1
while h[x] != h[c]:
n+=1
final.append([h[c],n])
print(final)
Depends on what exactly you want for an input like
lst = [1, 1, 1, 2, 2, 2, 2, 1, 1, 1]
If you consider the four 2 the most common, because it's the longest unbroken stretch of same items, then you can groupby same values and pick the one with max len:
max((len(list(g)), k) for k, g in itertools.groupby(lst))
# (4, 2) # meaning 2 appeared 4 times
If you are interested in the element that appears the most often next to itself, you can zip the list to get pairs of adjacent items, filter those that are same, pass them through a Counter, and get the most_common:
collections.Counter((x,y) for (x,y) in zip(lst, lst[1:]) if x == y).most_common(1)
# [((1, 1), 4)] # meaning (1,1) appeared 4 times
For your example of 6 1 6 4 4 4 6 6, both will return 4.
maxcount=0; //store maximum number item side by side
num=-1; //store element with max count
for i=0 to n //loop through your array
count=0;
in=i;
while(arr[in++]==arr[i]){//count number of side by side same element
count++;
}
maxcount=max(maxcount,count);
num= maxcount==count? arr[i]:num;
i=in-1;
endfor;

Constructing Lists

I'm new to Python and I came across the following query. Can anyone explain why the following:
[ n**2 for n in range(1, 6)]
gives:
[1, 4, 9, 16, 25]
It is called a list comprehension. What is happening is similar to the following:
results = []
for n in range(1,6):
results.append(n**2)
It therefore iterates through a list containing the values [0, 1, 2, 3, 4, 5] and squares each value. The result of the squaring is then added to the results list, and you get back the result you see (which is equivalent to 0**2, 1**2, 2**2, etc., where the **2 means 'raised to the second power').
This structure (populating a list with values based on some other criteria) is a common one in Python, so the list comprehension provides a shorthand syntax for allowing you to do so.
Breaking it down into manageable chunks in the interpreter:
>>> range(1, 6)
[1, 2, 3, 4, 5]
>>> 2 ** 2 # `x ** 2` means `x * x`
4
>>> 3 ** 2
9
>>> for n in range(1, 6):
...   print n
1
2
3
4
5
>>> for n in range(1, 6):
... print n ** 2
1
4
9
16
25
>>> [n ** 2 for n in range(1, 6)]
[1, 4, 9, 16, 25]
So that's a list comprehension.
If you break it down into 3 parts; separated by the words: 'for' and 'in' ..
eg.
[ 1 for 2 in 3 ]
Probably reading it backwards is easiest:
3 - This is the list of input into the whole operation
2 - This is the single item from the big list
1 - This is the operation to do on that item
part 1 and 2 are run multiple times, once for each item in the list that part 3 gives us. The output of part 1 being run over and over, is the output of the whole operation.
So in your example:
3 - Generates a list: [1, 2, 3, 4, 5] -- Range runs from the first param to one before the second param
2 - 'n' represents a single number in that list
1 - Generates a new list of n**2 (n to the power of 2)
So an equivalent code would be:
result = []
for n in range(1, 6):
result.append(n**2)
Finally breaking it all out:
input = [1, 2, 3, 4, 5]
output = []
v = input[0] # value is 1
o = v**2 # 1 to the power of two is 1
output.append(o)
v = input[1] # value is 2
o = v**2 # 2 to the power of two = (2*2) = 4
output.append(o)
v = input[2] # value is 3
o = v**2 # 3 to the power of two is = (3*3) = 9
output.append(o)
v = input[3] # value is 4
o = v**2 # 4 to the power of two is = (4*4) = 16
output.append(o)
v = input[4] # value is 5
o = v**2 # 5 to the power of two is = (5*5) = 25
output.append(o)

Get value at list/array index or "None" if out of range in Python

Is there clean way to get the value at a list index or None if the index is out or range in Python?
The obvious way to do it would be this:
if len(the_list) > i:
return the_list[i]
else:
return None
However, the verbosity reduces code readability. Is there a clean, simple, one-liner that can be used instead?
Try:
try:
return the_list[i]
except IndexError:
return None
Or, one liner:
l[i] if i < len(l) else None
Example:
>>> l=list(range(5))
>>> i=6
>>> print(l[i] if i < len(l) else None)
None
>>> i=2
>>> print(l[i] if i < len(l) else None)
2
I find list slices good for this:
>>> x = [1, 2, 3]
>>> a = x [1:2]
>>> a
[2]
>>> b = x [4:5]
>>> b
[]
So, always access x[i:i+1], if you want x[i]. You'll get a list with the required element if it exists. Otherwise, you get an empty list.
If you are dealing with small lists, you do not need to add an if statement or something of the sorts. An easy solution is to transform the list into a dict. Then you can use dict.get:
table = dict(enumerate(the_list))
return table.get(i)
You can even set another default value than None, using the second argument to dict.get. For example, use table.get(i, 'unknown') to return 'unknown' if the index is out of range.
Note that this method does not work with negative indices.
Combining slicing and iterating
next(iter(the_list[i:i+1]), None)
For your purposes you can exclude the else part as None is return by default if a given condition is not met.
def return_ele(x, i):
if len(x) > i: return x[i]
Result
>>> x = [2,3,4]
>>> b = return_ele(x, 2)
>>> b
4
>>> b = return_ele(x, 5)
>>> b
>>> type(b)
<type 'NoneType'>
return the_list[i] if len(the_list) > i else None
1. if...else...
l = [1, 2, 3, 4, 5]
for i, current in enumerate(l):
following = l[i + 1] if i + 1 < len(l) else None
print(current, following)
# 1 2
# 2 3
# 3 4
# 4 5
# 5 None
2. try...except...
l = [1, 2, 3, 4, 5]
for i, current in enumerate(l):
try:
following = l[i + 1]
except IndexError:
following = None
print(current, following)
# 1 2
# 2 3
# 3 4
# 4 5
# 5 None
3. dict
suitable for small list
l = [1, 2, 3, 4, 5]
dl = dict(enumerate(l))
for i, current in enumerate(l):
following = dl.get(i + 1)
print(current, following)
# 1 2
# 2 3
# 3 4
# 4 5
# 5 None
4. List slicing
l = [1, 2, 3, 4, 5]
for i, current in enumerate(l):
following = next(iter(l[i + 1:i + 2]), None)
print(current, following)
# 1 2
# 2 3
# 3 4
# 4 5
# 5 None
5. itertools.zip_longest
from itertools import zip_longest
l = [1, 2, 3, 4, 5]
for i, (current, following) in enumerate(zip_longest(l, l[1:])):
print(current, following)
# 1 2
# 2 3
# 3 4
# 4 5
# 5 None
Using Jupyter magic command of %%timeit
init
from itertools import zip_longest
l = list(range(10000000))
Result
Method
Consume
if...else...
2.62 s
try...except...
1.14 s
dict
2.61 s
List slicing
3.75 s
itertools.zip_longest
1.14 s
Another one-liner:
return((the_list + [None] * i)[i])

Categories