Hi I'm looking for a way to split a list based on some values, and assuming the list's length equals to sum of some values, e.g.:
list: l = ['a','b','c','d','e','f']
values: v = (1,1,2,2)
so len(l) = sum(v)
and I'd like to have a function to return a tuple or a list, like: (['a'], ['b'], ['c','d'], ['d','e'])
currently my code is like:
(list1,list2,list3,list4) = (
l[0:v[0]],
l[v[0]:v[0]+v[1]],
l[v[0]+v[1]:v[0]+v[1]+v[2]],
l[v[0]+v[1]+v[2]:v[0]+v[1]+v[2]+v[3]])`
I'm thinking about make this clearer, but closest one I have so far is (note the results are incorrect, not what I wanted)
s=0
[list1,list2,list3,list4] = [l[s:s+i] for i in v]
the problem is I couldn't increase s at the same time while iterating values in v, I'm hoping to get a better code to do so, any suggestion is appreciated, thanks!
If you weren't stuck on ancient Python, I'd point you to itertools.accumulate. Of course, even on ancient Python, you could use the (roughly) equivalent code provided in the docs I linked to do it. Using either the Py3 code or equivalent, you could do:
from itertools import accumulate # Or copy accumulate equivalent Python code
from itertools import chain
# Calls could be inlined in listcomp, but easier to read here
starts = accumulate(chain((0,), v)) # Extra value from starts ignored when ends exhausted
ends = accumulate(v)
list1,list2,list3,list4 = [l[s:e] for s, e in zip(starts, ends)]
Maybe make a generator of the values in l?
def make_list(l, v):
g = (x for x in l)
if len(l) == sum(v):
return [[next(g) for _ in range(val)] for val in v]
return None
You could just write a simple loop to iterate over v to generate a result:
l = ['a','b','c','d','e','f']
v = (1,1,2,2)
result = []
offset = 0
for size in v:
result.append(l[offset:offset+size])
offset += size
print result
Output:
[['a'], ['b'], ['c', 'd'], ['e', 'f']]
The idea here is using a nested loop. Assuming that your condition will always holds true, the logic then is to run through v and pick up i elements from l where i is an number from v.
index = 0 # this is the start index
for num in v:
temp = [] # this is a temp array, to hold individual elements in your result array.
for j in range(index, index+num): # this loop will pickup the next num elements from l
temp.append(l[j])
data.append(temp)
index += num
Output:
[['a'], ['b'], ['c', 'd'], ['e', 'f']]
The first answer https://stackoverflow.com/a/39715361/5759063 is the most pythonic way to do it. This is just the algorithmic backbone.
Best I could find is a two line solution:
breaks=[0]+[sum(v[:i+1]) for i in range(len(v))] #build a list of section indices
result=[l[breaks[i]:breaks[i+1]] for i in range(len(breaks)-1)] #split array according to indices
print result
Related
I'm working with a Pandas dataframe, and I need to reduce a column's list of values while preserving alternating duplicates, if they exist, and while preserving order. I'm able to mask the values such that there are only ever two distinct values to work with (e.g., A and B below).
(It's best to show...) I'm looking to define the reduce_list() method below...
dummy_arr_one = ['A','A','B','B','A','A','A','A','B','B','B']
dummy_arr_two = ['A','A','A','B','B','B']
df = pd.DataFrame({"instance":
["group_one" for x in range(0,len(dummy_arr_one))] + ["group_two" for y in range(0,len(dummy_arr_two))],
"value":dummy_arr_one + dummy_arr_two
})
>> x = df[df['instance']=='group_one']['value'].values # ['A','A','B','B','A','A','A','A','B','B','B']
>> y = reduce_list(x)
[output] >> ['A','B','A','B']
OR
>> x = df[df['instance']=='group_one']['value'].values # ['A','A','A','B','B','B']
>> y = reduce_list(x)
[output] >> ['A','B']
I've tried a few approaches with collections and dictionaries, but I can't wrap my head around getting farther than the following (unrelated to collections attempts):
for group in df['instance'].unique():
val_arr = df[df['instance'] == group]['value'].values
unique_vals = np.unique(val_arr)
...<then what to do?>
since dictionaries need unique keys and I may need to dynamically create the keys (e.g., A_1, B_1, A_2), but then I also need to keep in mind preserving the order.
I feel like I'm overlooking something obvious. So any help is greatly appreciated!
Use itertools.groupby
from itertools import groupby
reduced = [k for k, _ in groupby(df['value'])]
print(reduced)
Output
['A', 'B', 'A', 'B', 'A', 'B']
If you needed by each group of instance, group first, then apply to each instance group:
res = [[k for k, _ in groupby(vs)] for k, vs in df.groupby('instance')['value']]
print(res)
Output
[['A', 'B', 'A', 'B'], ['A', 'B']]
It works for the lists, i might hand mis understood the context.
def reduce_list(x):
unique = []
reduced = []
for k in x:
if k not in unique:
unique.append(k)
# now we have the uniques.
for k in range(len(x)-1):
if x[k] != x[k+1]:
reduced.append(x[k])
if x[len(x)-1] != reduced[len(reduced)-1]:
reduced.append(x[len(x)-1])
return reduced
This is a loop intensive implementation of the code.
First loop collects the uniques which is very easy to understand.
The second loop, checks if two consecutive elements are different. If they are, it appends the one at the previous position to the loop. However, this loop fails when you have repetitive ending.
Therefore, you have to add an additional check, which sees if the last element at x is similar or different from the last element of reduced if not, it appends it.
I want to get an intersection of lists where duplication is not eliminated.
And I hope that the method is a fast way not to use loops.
Below was my attempt, but this method failed because duplicates were removed.
a = ['a','b','c','f']
b = ['a','b','b','o','k']
tmp = list(set(a) & set(b))
>>>tmp
>>>['b','a']
I want the result to be ['a', 'b', 'b'].
In this method, 'a' is a fixed value and 'b' is a variable value.
And the concept of extracting 'a' value from 'b'.
Is there a way to extract a list of cross-values that do not remove duplicate values?
A solution could be
good = set(a)
result = [x for x in b if x in good]
there are two loops here; one is the set-building loop of set (that is implemented in C, a hundred of times faster than whatever you can do in Python) the other is the comprehension and runs in the interpreter.
The first loop is done to avoid a linear search in a for each element of b (if a becomes big this can be a serious problem).
Note that using filter instead is probably not going to gain much (if anything) because despite the filter loop being in C, for each element it will have to get back to the interpreter to call the filtering function.
Note that if you care about speed then probably Python is not a good choice... for example may be PyPy would be better here and in this case just writing an optimal algorithm explicitly should be ok (avoiding re-searching a for duplicates when they are consecutive in b like happens in your example)
good = set(a)
res = []
i = 0
while i < len(b):
x = b[i]
if x in good:
while i < len(b) and b[i] == x: # is?
res.append(x)
i += 1
else:
i += 1
Of course in performance optimization the only real way is try and measure with real data on the real system... guessing works less and less as technology advances and becomes more complicated.
If you insist on not using for explicitly then this will work:
>>> list(filter(a.__contains__, b))
['a', 'b', 'b']
But directly calling magic methods like __contains__ is not a recommended practice to the best of my knowledge, so consider this instead:
>>> list(filter(lambda x: x in a, b))
['a', 'b', 'b']
And if you want to improve the lookup in a from O(n) to O(1) then create a set of it first:
>>> a_set = set(a)
>>> list(filter(lambda x: x in a_set, b))
['a', 'b', 'b']
>>a = ['a','b','c','f']
>>b = ['a','b','b','o','k']
>>items = set(a)
>>found = [i for i in b if i in items]
>>items
{'f', 'a', 'c', 'b'}
>>found
['a', 'b', 'b']
This should do your work.
I guess it's not faster than a loop and finally you probably still need a loop to extract the result. Anyway...
from collections import Counter
a = ['a','a','b','c','f']
b = ['a','b','b','o','k']
count_b = Counter(b)
count_ab = Counter(set(b)-set(a))
count_b - count_ab
#=> Counter({'a': 1, 'b': 2})
I mean, if res holds the result, you need to:
[ val for sublist in [ [s] * n for s, n in res.items() ] for val in sublist ]
#=> ['a', 'b', 'b']
It isn't clear how duplicates are handled when performing an intersection of lists which contain duplicate elements, as you have given only one test case and its expected result, and you did not explain duplicate handling.
According to how keeping duplicates work currently, the common elements are 'a' and 'b', and the intersection list lists 'a' with multiplicity 1 and 'b' with multiplicity 2. Note 'a' occurs once on both lists a and b, but 'b' occurs twice on b. The intersection list lists the common element with multiplicity equal to the list having that element at the maximum multiplicity.
The answer is yes. However, a loop may implicitly be called - though you want your code to not explicitly use any loop statements. This algorithm, however, will always be iterative.
Step 1: Create the intersection set, Intersect that does not contain duplicates (You already done that). Convert to list to keep indexing.
Step 2: Create a second array, IntersectD. Create a new variable Freq which counts the maximum number of occurrences for that common element, using count. Use Intersect and Freq to append the element Intersect[k] a number of times depending on its corresponding Freq[k].
An example code with 3 lists would be
a = ['a','b','c','1','1','1','1','2','3','o']
b = ['a','b','b','o','1','o','1']
c = ['a','a','a','b','1','2']
intersect = list(set(a) & set(b) & set(c)) # 3-set case
intersectD = []
for k in range(len(intersect)):
cmn = intersect[k]
freq = max(a.count(cmn), b.count(cmn), c.count(cmn)) # 3-set case
for i in range(freq): # Can be done with itertools
intersectD.append(cmn)
>>> intersectD
>>> ['b', 'b', 'a', 'a', 'a', '1', '1', '1', '1']
For cases involving more than two lists, freq for this common element can be computed using a more complex set intersection and max expression. If using a list of lists, freq can be computed using an inner loop. You can also replace the inner i-loop with an itertools expression from How can I count the occurrences of a list item?.
x = [['a','b'],['c','d','g'],['e','f','h','i','j'].......['zzy','xxx']]
If I got a compound list like this (a large list) in Python, how can I elegantly remove only, say, the element 'c' without removing the whole element ['c','d','g'] together?
Obviously merely list.remove() doesn't work for this, and implementing a for loop works
for i in x:
for j in i:
if j == 'c':
i = i.remove(j)
but is computationally expensive since it's a very long list...
thank you
The only improvement I could find to your code is:
x = [['a','b'],['c','d','g'],['e','f','h','i','j'],['zzy','xxx']]
def remove(compound_list, elem):
for lst in compound_list:
for ix, item in enumerate(lst):
if item == elem:
del lst[ix]
remove(x, 'c')
It is slow O(n^2) but it is faster that first solution which counting i.remove(j) it's O(n^3)
You can find the "smallest" element using min (provide a custom key function if you want to change the definition of smallest).
An O(n) solution (where n is the total number of elements in all the lists) can be to rebuild each list by omitting the smallest element:
x = [['a','b'],['b','b'],['c','c']]
smallest = min(min(l) for l in x)
x = [[e for e in l if e != smallest] for l in x]
print(x)
# [['b'], ['b', 'b'], ['c', 'c']]
I have a list of lists as follows:
list=[]
*some code to append elements to list*
list=[['a','bob'],['a','bob'],['a','john']]
I want to go through this list and change all instances of 'bob to 'b' and leave others unchanged.
for x in list:
for a in x:
if "bob" in a:
a.replace("bob", 'b')
After printing out x it is still the same as list, but not as follows:
list=[['a','b'],['a','b'],['a','john']]
Why is the change not being reflected in list?
Because str.replace doesn't work in-place, it returns a copy. As immutable objects, you need to assign the strings to elements in your list of lists.
You can assign directly to your list of lists if you extract indexing integers via enumerate:
L = [['a','bob'],['a','bob'],['a','john']]
for i, x in enumerate(L):
for j, a in enumerate(x):
if 'bob' in a:
L[i][j] = a.replace('bob', 'b')
Result:
[['a', 'b'], ['a', 'b'], ['a', 'john']]
More Pythonic would be to use a list comprehension to create a new list. For example, if only the second of two values contains names which need checking:
L = [[i, j if j != 'bob' else 'b'] for i, j in L]
You can try using a dictionary object of python
import numpy as np
L = [['a','bob'],['a','bob'],['a','john']]
dic = {'bob':'b'} # you can specify more changes here
new_list = [dic.get(n, n) for n in np.concatenate(L)]
print(np.reshape(new_list,[-1,2]).tolist())
Result is
[['a', 'b'], ['a', 'b'], ['a', 'john']]
I'm going to use a simple example, but basically x is another variable and isn't linked to the list element. You have to change the list element directly in order to alter the list.
l=[1,2,3,4]
for x in l:
x=x+1
This doesn't change the list
l=[1,2,3,4]
for i,x in enumerate(l):
l[i]=x+1
this changes the list
I might be a little to the party, but a more Pythonic way of doing this is using a map and a list comprehension. It can operate on a list of the list with any number of values.
l = [['a','bob'],['a','bob'],['a','john']]
[list(map(lambda x: x if x != 'bob' else 'b', i)) for i in l]
it gives you the desired output
[['a', 'b'], ['a', 'b'], ['a', 'john']]
The main idea is that the inner loop is iterating through the inner loop and using the simple lambda function to perform the replacement.
I hope that this helps anyone else who is looking out for something similar.
This is the case because you are only changing the temporary variable a.
list = [1,2,3]
for i in list:
i+=1
list will still be [1,2,3]
you have to edit the string based on its index in the list
I'm going through Problem 3 of the MIT lead python course, and I have an admittedly long drawn out script that feels like it's getting close. I need to print the longest substring of s in which the letters occur in alphabetical order. I'm able to pull out any characters that are in alphabetical order with regards to the character next to it. What I need to see is:
Input : 'aezcbobobegghakl'
needed output: 'beggh'
my output: ['a', 'e', 'b', 'b', 'b', 'e', 'g', 'g', 'a', 'k']
My code:
s = 'aezcbobobegghakl'
a = 'abcdefghijklmnopqrstuvwxyz'
len_a = len(a)
len_s = len(s)
number_list = []
letter_list = []
for i in range(len(s)):
n = 0
letter = s[i+n]
if letter in a:
number_list.append(a.index(letter))
n += 1
print(number_list)
for i in number_list:
letter_list.append(a[i])
print(letter_list)
index_list = []
for i in range(len(letter_list)):
index_list.append(i)
print(index_list)
first_check = []
for i in range(len(letter_list)-1):
while number_list[i] <= number_list[i+1]:
print(letter_list[i])
first_check.append(letter_list[i])
break
print(first_check)
I know after looking that there are much shorter and completely different ways to solve the problem, but for the sake of my understanding, is it even possible to finish this code to get the output I'm looking for? Or is this just a lost cause rabbit hole I've dug?
I would build a generator to output all the runs of characters such that l[i] >= l[i-1]. Then find the longest of those runs. Something like
def runs(l):
it = iter(l)
try:
run = [next(it)]
except StopIteration:
return
for i in it:
if i >= run[-1]:
run.append(i)
else:
yield run
run = [i]
yield run
def longest_increasing(l):
return ''.join(max(runs(l), key=len))
Edit: Notes on your code
for i in range(len(s)):
n = 0
letter = s[i+n]
if letter in a:
number_list.append(a.index(letter))
n += 1
is getting the "number value" for each letter. You can use the ord function to simplify this
number_list = [ord(c) - 97 for c in s if c.islower()]
You never use index_list, and you never should. Look into the enumerate function.
first_check = []
for i in range(len(letter_list)-1):
while number_list[i] <= number_list[i+1]:
print(letter_list[i])
first_check.append(letter_list[i])
break
this part doesn't make a ton of sense. You break out of the while loop every time, so it's basically an if. You have no way of keeping track of more than one run. You have no mechanism here for comparing runs of characters against one another. I think you might be trying to do something like
max_run = []
for i in range(len(letter_list)-1):
run = []
for j in range(i, len(letter_list)):
run.append(letter_list[j])
if letter_list[j] > letter_list[j+1]:
break
if len(run) > len(max_run):
max_run = run
(Disclaimer: I'm pretty sure the above is off by one but it should be illustrative). The above can be improved in a lot of ways. Note that it loops over the last character as many as len(s) times, making it a n**2 solution. Also, I'm not sure why you need number_list, as strings can be compared directly.
What about a simple recursive approach :
data = 'ezcbobobegghakl'
words=list(data)
string_s=list(map(chr,range(97,123)))
final_=[]
def ok(list_1,list_2):
if not list_1:
return 0
else:
first = list_1[0]
chunks = list_2[list_2.index(first):]
track = []
for j, i in enumerate(list_1):
if i in chunks:
track.append(i)
chunks=list_2[list_2.index(i):]
else:
final_.append(track)
return ok(list_1[j:],list_2)
final_.append(track)
print(ok(words,string_s))
print(max(final_,key=lambda x:len(x)))
output:
['b', 'e', 'g', 'g', 'h']
You can find a list of all substrings of the input string, and then find all the strings that are sorted alphabetically. To determine of a letter is sorted alphabetically, sorted the original string by position in the alphabet, and then see if the final string equals the original string:
from string import ascii_lowercase as l
s = 'aezcbobobegghakl'
substrings = set(filter(lambda x:x, [s[i:b] for i in range(len(s)) for b in range(len(s))]))
final_substring = max([i for i in substrings if i == ''.join(sorted(list(i), key=lambda x:l.index(x)))], key=len)
Output:
'beggh'
This is one way of getting the job done:
s = 'aezcbobobegghakl'
l = list(s)
run = []
allrun = []
element = 'a'
for e in l:
if e >= element:
run.append(e)
element = e
else:
allrun.append(run)
run = [e]
element = e
lengths = [len(e) for e in allrun]
result = ''.join(allrun[lengths.index(max(lengths))])
"run" is basically an uninterrupted run; it keeps growing as you add elements bigger than what is previously seen ("b" is bigger than "a", just string comparison), and resets else.
"allrun" contains all "run"s, which looks like this:
[['a', 'e', 'z'], ['c'], ['b', 'o'], ['b', 'o'], ['b', 'e', 'g', 'g', 'h']]
"result" finally picks the longest "run" in "allrun", and merges it into one string.
Regarding your code:
It is very very inefficient, I would not proceed with it. I would adopt one of the posted solutions.
Your number_list can be written as [a.index(_) for _ in s], one liner.
Your letter_list is actually just list(s), and you are using a loop for that!
Your index_list, what does it even do? It is equivalent to range(len(letter_list)), so what are you aiming with the append in the loop?
Finally, the way you write loops reminds me of matlab. You can just iterate on the elements of a list, no need to iterate on index and fetch the corresponding element in list.