I know the basic way that enumerate works, but what difference does it make when you have two variables in the for loop? I used count and i in the examples below
This code:
Letters = ['a', 'b', 'c']
for count, i in enumerate(Letters):
print(count, i)
and this:
Letters = ['a', 'b', 'c']
for i in enumerate(Letters):
print(i)
Both give the same output, this:
>>>
0 'a'
1 'b'
2 'c'
Is writing code in the style of the first example beneficial in any circumstances? What is the difference?
If you know any other ways that could be useful, just let me know, I am trying to expand my knowledge within python
In the first example, count is set to the index, and i is set to the element.
In the second example, i is being set to the 2-element tuple (index, element).
The first example is equivalent to:
count, i = 0, 'a'
which is the same as:
count = 0
i = 'a'
And the second example is the same as:
i = (0, 'a')
Related
I'm working with a Pandas dataframe, and I need to reduce a column's list of values while preserving alternating duplicates, if they exist, and while preserving order. I'm able to mask the values such that there are only ever two distinct values to work with (e.g., A and B below).
(It's best to show...) I'm looking to define the reduce_list() method below...
dummy_arr_one = ['A','A','B','B','A','A','A','A','B','B','B']
dummy_arr_two = ['A','A','A','B','B','B']
df = pd.DataFrame({"instance":
["group_one" for x in range(0,len(dummy_arr_one))] + ["group_two" for y in range(0,len(dummy_arr_two))],
"value":dummy_arr_one + dummy_arr_two
})
>> x = df[df['instance']=='group_one']['value'].values # ['A','A','B','B','A','A','A','A','B','B','B']
>> y = reduce_list(x)
[output] >> ['A','B','A','B']
OR
>> x = df[df['instance']=='group_one']['value'].values # ['A','A','A','B','B','B']
>> y = reduce_list(x)
[output] >> ['A','B']
I've tried a few approaches with collections and dictionaries, but I can't wrap my head around getting farther than the following (unrelated to collections attempts):
for group in df['instance'].unique():
val_arr = df[df['instance'] == group]['value'].values
unique_vals = np.unique(val_arr)
...<then what to do?>
since dictionaries need unique keys and I may need to dynamically create the keys (e.g., A_1, B_1, A_2), but then I also need to keep in mind preserving the order.
I feel like I'm overlooking something obvious. So any help is greatly appreciated!
Use itertools.groupby
from itertools import groupby
reduced = [k for k, _ in groupby(df['value'])]
print(reduced)
Output
['A', 'B', 'A', 'B', 'A', 'B']
If you needed by each group of instance, group first, then apply to each instance group:
res = [[k for k, _ in groupby(vs)] for k, vs in df.groupby('instance')['value']]
print(res)
Output
[['A', 'B', 'A', 'B'], ['A', 'B']]
It works for the lists, i might hand mis understood the context.
def reduce_list(x):
unique = []
reduced = []
for k in x:
if k not in unique:
unique.append(k)
# now we have the uniques.
for k in range(len(x)-1):
if x[k] != x[k+1]:
reduced.append(x[k])
if x[len(x)-1] != reduced[len(reduced)-1]:
reduced.append(x[len(x)-1])
return reduced
This is a loop intensive implementation of the code.
First loop collects the uniques which is very easy to understand.
The second loop, checks if two consecutive elements are different. If they are, it appends the one at the previous position to the loop. However, this loop fails when you have repetitive ending.
Therefore, you have to add an additional check, which sees if the last element at x is similar or different from the last element of reduced if not, it appends it.
I need the element that appears only occur once. (python)
For example the result for
mylist = ['a', 'a', 'a', 'a', 'b', 'c']
would be
2
You can use collections.Counter to count the number of occurrences of each distinct item, and retain only those with a count of 1 with a generator expression:
from collections import Counter
sum(1 for c in Counter(mylist).values() if c == 1)
This returns: 2
This situation looks like a pure Set structure.
If I were you I would turn the array to set and check the size of it.
You can check examples how to do it here
You basically want to iterate through the list and check to see how many times each element occurs in the list. If it occurs more than once, you don't want it but if it occurs only once, you increase your counter by 1.
count = 0
for letter in mylist:
if mylist.count(letter) == 1:
count += 1
print (count)
This should work for you:
len(set(mylist))
It does require your values to be hashable.
I want to get an intersection of lists where duplication is not eliminated.
And I hope that the method is a fast way not to use loops.
Below was my attempt, but this method failed because duplicates were removed.
a = ['a','b','c','f']
b = ['a','b','b','o','k']
tmp = list(set(a) & set(b))
>>>tmp
>>>['b','a']
I want the result to be ['a', 'b', 'b'].
In this method, 'a' is a fixed value and 'b' is a variable value.
And the concept of extracting 'a' value from 'b'.
Is there a way to extract a list of cross-values that do not remove duplicate values?
A solution could be
good = set(a)
result = [x for x in b if x in good]
there are two loops here; one is the set-building loop of set (that is implemented in C, a hundred of times faster than whatever you can do in Python) the other is the comprehension and runs in the interpreter.
The first loop is done to avoid a linear search in a for each element of b (if a becomes big this can be a serious problem).
Note that using filter instead is probably not going to gain much (if anything) because despite the filter loop being in C, for each element it will have to get back to the interpreter to call the filtering function.
Note that if you care about speed then probably Python is not a good choice... for example may be PyPy would be better here and in this case just writing an optimal algorithm explicitly should be ok (avoiding re-searching a for duplicates when they are consecutive in b like happens in your example)
good = set(a)
res = []
i = 0
while i < len(b):
x = b[i]
if x in good:
while i < len(b) and b[i] == x: # is?
res.append(x)
i += 1
else:
i += 1
Of course in performance optimization the only real way is try and measure with real data on the real system... guessing works less and less as technology advances and becomes more complicated.
If you insist on not using for explicitly then this will work:
>>> list(filter(a.__contains__, b))
['a', 'b', 'b']
But directly calling magic methods like __contains__ is not a recommended practice to the best of my knowledge, so consider this instead:
>>> list(filter(lambda x: x in a, b))
['a', 'b', 'b']
And if you want to improve the lookup in a from O(n) to O(1) then create a set of it first:
>>> a_set = set(a)
>>> list(filter(lambda x: x in a_set, b))
['a', 'b', 'b']
>>a = ['a','b','c','f']
>>b = ['a','b','b','o','k']
>>items = set(a)
>>found = [i for i in b if i in items]
>>items
{'f', 'a', 'c', 'b'}
>>found
['a', 'b', 'b']
This should do your work.
I guess it's not faster than a loop and finally you probably still need a loop to extract the result. Anyway...
from collections import Counter
a = ['a','a','b','c','f']
b = ['a','b','b','o','k']
count_b = Counter(b)
count_ab = Counter(set(b)-set(a))
count_b - count_ab
#=> Counter({'a': 1, 'b': 2})
I mean, if res holds the result, you need to:
[ val for sublist in [ [s] * n for s, n in res.items() ] for val in sublist ]
#=> ['a', 'b', 'b']
It isn't clear how duplicates are handled when performing an intersection of lists which contain duplicate elements, as you have given only one test case and its expected result, and you did not explain duplicate handling.
According to how keeping duplicates work currently, the common elements are 'a' and 'b', and the intersection list lists 'a' with multiplicity 1 and 'b' with multiplicity 2. Note 'a' occurs once on both lists a and b, but 'b' occurs twice on b. The intersection list lists the common element with multiplicity equal to the list having that element at the maximum multiplicity.
The answer is yes. However, a loop may implicitly be called - though you want your code to not explicitly use any loop statements. This algorithm, however, will always be iterative.
Step 1: Create the intersection set, Intersect that does not contain duplicates (You already done that). Convert to list to keep indexing.
Step 2: Create a second array, IntersectD. Create a new variable Freq which counts the maximum number of occurrences for that common element, using count. Use Intersect and Freq to append the element Intersect[k] a number of times depending on its corresponding Freq[k].
An example code with 3 lists would be
a = ['a','b','c','1','1','1','1','2','3','o']
b = ['a','b','b','o','1','o','1']
c = ['a','a','a','b','1','2']
intersect = list(set(a) & set(b) & set(c)) # 3-set case
intersectD = []
for k in range(len(intersect)):
cmn = intersect[k]
freq = max(a.count(cmn), b.count(cmn), c.count(cmn)) # 3-set case
for i in range(freq): # Can be done with itertools
intersectD.append(cmn)
>>> intersectD
>>> ['b', 'b', 'a', 'a', 'a', '1', '1', '1', '1']
For cases involving more than two lists, freq for this common element can be computed using a more complex set intersection and max expression. If using a list of lists, freq can be computed using an inner loop. You can also replace the inner i-loop with an itertools expression from How can I count the occurrences of a list item?.
I have a set of 4 strings and want to generate a list of 16 elements, but with enforcing the rule (or obtaining the same result as enforcing such rule) to never have the same element repeated in two contiguous positions in the resulting list.
Being almost a total newbie in Python I went to check the different methods in the random library and found many different and useful ways to do something similar (random.shuffle would almost do the trick), but no one of those addressed this my particular need.
What data format and what methods should I use?
Pseudocode algorithm:
For i in n (n being the amount of elements you want)
Generate the next element
If it's the same as the previous element, repeat 2
Use random.choice to pick an element from a list of elements randomly.
Here's a proof of concept Python code:
import random
sources = ['a', 'b', 'c', 'd'] # you said 4 strings
result = [random.choice(sources)]
while len(result) < 16: # you said you need 16 elements
elem = random.choice(sources)
if elem != result[-1]:
result.append(elem)
This code is optimized for clarity, not succinctness, cleverness or speed.
For a more general solution, you could turn to Python generators.
Given an arbitrary iterable of inputs (eg: your four input strings), the following generator will generate an infinite iterable of choices from that list, with no two side-by-side elements being the same:
import random
def noncontiguous(inputs):
last = random.choice(inputs)
yield last
while True:
next = random.choice(inputs)
if next != last:
last = next
yield next
You can then use list comprehensions or a basic for loop to obtain the 16 element subset of this infinite sequence:
>>> gen = noncontiguous(['a', 'b', 'c', 'd'])
>>> [gen.next() for i in range(16)]
['c', 'b', 'c', 'b', 'a', 'c', 'b', 'c', 'd', 'a', 'd', 'c', 'a', 'd', 'b', 'c']
More interestingly, you can continue to use the same generator object to create more noncontiguous elements
>>> for i in range(8):
... gen.next()
...
'b'
'c'
'd'
'c'
'b'
'd'
'a'
'c'
Zart's code modified to (a) work and (b) pre-calculate the set subtractions:
import random
def setsub():
# 4 strings
sources = ['a', 'b', 'c', 'd']
# convert them to set
input = set(sources)
subs = {}
for word in sources:
subs[word] = list(input - set([word]))
# choose first element
output = [random.choice(sources)]
# append random choices excluding previous element till required length
while len(output) < 16:
output.append(random.choice(subs[output[-1]]))
return output
A rather severe abuse of itertools:
import itertools
import random
print list(itertools.islice((x[0] for x in
itertools.groupby(random.randint(1, 10) for y in itertools.count())), 16))
It uses islice() to get the first 16 elements of an infinite generator based around count(), using groupby() to collapse equal adjacent elements.
This is revised Eli's version that doesn't brute-forces elements, and hopefully doesn't lack clarity:
import random
# 4 strings
sources = ['a', 'b', 'c', 'd']
# convert them to set
input = set(sources)
# choose first element
output = [random.choice(input)]
# append random choices excluding previous element till required length
while len(output) < 16:
output.append(random.choice(input - set(output[-1:])))
I have a list in python ('A','B','C','D','E'), how do I get which item is under a particular index number?
Example:
Say it was given 0, it would return A.
Given 2, it would return C.
Given 4, it would return E.
What you show, ('A','B','C','D','E'), is not a list, it's a tuple (the round parentheses instead of square brackets show that). Nevertheless, whether it to index a list or a tuple (for getting one item at an index), in either case you append the index in square brackets.
So:
thetuple = ('A','B','C','D','E')
print thetuple[0]
prints A, and so forth.
Tuples (differently from lists) are immutable, so you couldn't assign to thetuple[0] etc (as you could assign to an indexing of a list). However you can definitely just access ("get") the item by indexing in either case.
values = ['A', 'B', 'C', 'D', 'E']
values[0] # returns 'A'
values[2] # returns 'C'
# etc.
You can use _ _getitem__(key) function.
>>> iterable = ('A', 'B', 'C', 'D', 'E')
>>> key = 4
>>> iterable.__getitem__(key)
'E'
Same as any other language, just pass index number of element that you want to retrieve.
#!/usr/bin/env python
x = [2,3,4,5,6,7]
print(x[5])
You can use pop():
x=[2,3,4,5,6,7]
print(x.pop(2))
output is 4