I need the number of elements that appears only once - python

I need the element that appears only occur once. (python)
For example the result for
mylist = ['a', 'a', 'a', 'a', 'b', 'c']
would be
2

You can use collections.Counter to count the number of occurrences of each distinct item, and retain only those with a count of 1 with a generator expression:
from collections import Counter
sum(1 for c in Counter(mylist).values() if c == 1)
This returns: 2

This situation looks like a pure Set structure.
If I were you I would turn the array to set and check the size of it.
You can check examples how to do it here

You basically want to iterate through the list and check to see how many times each element occurs in the list. If it occurs more than once, you don't want it but if it occurs only once, you increase your counter by 1.
count = 0
for letter in mylist:
if mylist.count(letter) == 1:
count += 1
print (count)

This should work for you:
len(set(mylist))
It does require your values to be hashable.

Related

How to find the two strings that occurs most in a list?

I'm trying to get the number of the two elements that are the most frequent in an array. For example, in the list ['aa','bb','cc','dd','bb','bb','cc','ff'] the number of the most frequent should be 3(the number of times 'bb' appear in the array) and the second most frequent 2(number of times 'cc' appear in the array).
I tried this:
max = 0
snd_max = 0
for i in x:
aux=x.count(i)
if aux > max
snd_max=max
max=aux
print(max, snd_max)
But I was in doubt if there is an easier way?
You can use collections.Counter:
from collections import Counter
x = ['aa','bb','cc','dd','bb','bb','cc','ff']
counter = Counter(x)
print(counter.most_common(2))
[('bb', 3), ('cc', 2)]
Try this:
l = ['aa','bb','cc','dd','bb','bb','cc','ff']
b = list(dict.fromkeys(l))
a = [(l.count(x), x) for x in b]
a.sort(reverse=True)
a = a[:2]
print(a)
I use max(), it's simple.
lst = ['aa','bb','cc','dd','bb','bb','cc','ff']
print(max(set(lst), key=lst.count))
You could use pandas value_counts()
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html
Put the list into the dataframe, then use value counts.
That will give you a dataframe with each element and and how many times it appears, sorted by the most common on top.

Is it possible to extract intersection list that contains duplicate values?

I want to get an intersection of lists where duplication is not eliminated.
And I hope that the method is a fast way not to use loops.
Below was my attempt, but this method failed because duplicates were removed.
a = ['a','b','c','f']
b = ['a','b','b','o','k']
tmp = list(set(a) & set(b))
>>>tmp
>>>['b','a']
I want the result to be ['a', 'b', 'b'].
In this method, 'a' is a fixed value and 'b' is a variable value.
And the concept of extracting 'a' value from 'b'.
Is there a way to extract a list of cross-values ​​that do not remove duplicate values?
A solution could be
good = set(a)
result = [x for x in b if x in good]
there are two loops here; one is the set-building loop of set (that is implemented in C, a hundred of times faster than whatever you can do in Python) the other is the comprehension and runs in the interpreter.
The first loop is done to avoid a linear search in a for each element of b (if a becomes big this can be a serious problem).
Note that using filter instead is probably not going to gain much (if anything) because despite the filter loop being in C, for each element it will have to get back to the interpreter to call the filtering function.
Note that if you care about speed then probably Python is not a good choice... for example may be PyPy would be better here and in this case just writing an optimal algorithm explicitly should be ok (avoiding re-searching a for duplicates when they are consecutive in b like happens in your example)
good = set(a)
res = []
i = 0
while i < len(b):
x = b[i]
if x in good:
while i < len(b) and b[i] == x: # is?
res.append(x)
i += 1
else:
i += 1
Of course in performance optimization the only real way is try and measure with real data on the real system... guessing works less and less as technology advances and becomes more complicated.
If you insist on not using for explicitly then this will work:
>>> list(filter(a.__contains__, b))
['a', 'b', 'b']
But directly calling magic methods like __contains__ is not a recommended practice to the best of my knowledge, so consider this instead:
>>> list(filter(lambda x: x in a, b))
['a', 'b', 'b']
And if you want to improve the lookup in a from O(n) to O(1) then create a set of it first:
>>> a_set = set(a)
>>> list(filter(lambda x: x in a_set, b))
['a', 'b', 'b']
>>a = ['a','b','c','f']
>>b = ['a','b','b','o','k']
>>items = set(a)
>>found = [i for i in b if i in items]
>>items
{'f', 'a', 'c', 'b'}
>>found
['a', 'b', 'b']
This should do your work.
I guess it's not faster than a loop and finally you probably still need a loop to extract the result. Anyway...
from collections import Counter
a = ['a','a','b','c','f']
b = ['a','b','b','o','k']
count_b = Counter(b)
count_ab = Counter(set(b)-set(a))
count_b - count_ab
#=> Counter({'a': 1, 'b': 2})
I mean, if res holds the result, you need to:
[ val for sublist in [ [s] * n for s, n in res.items() ] for val in sublist ]
#=> ['a', 'b', 'b']
It isn't clear how duplicates are handled when performing an intersection of lists which contain duplicate elements, as you have given only one test case and its expected result, and you did not explain duplicate handling.
According to how keeping duplicates work currently, the common elements are 'a' and 'b', and the intersection list lists 'a' with multiplicity 1 and 'b' with multiplicity 2. Note 'a' occurs once on both lists a and b, but 'b' occurs twice on b. The intersection list lists the common element with multiplicity equal to the list having that element at the maximum multiplicity.
The answer is yes. However, a loop may implicitly be called - though you want your code to not explicitly use any loop statements. This algorithm, however, will always be iterative.
Step 1: Create the intersection set, Intersect that does not contain duplicates (You already done that). Convert to list to keep indexing.
Step 2: Create a second array, IntersectD. Create a new variable Freq which counts the maximum number of occurrences for that common element, using count. Use Intersect and Freq to append the element Intersect[k] a number of times depending on its corresponding Freq[k].
An example code with 3 lists would be
a = ['a','b','c','1','1','1','1','2','3','o']
b = ['a','b','b','o','1','o','1']
c = ['a','a','a','b','1','2']
intersect = list(set(a) & set(b) & set(c)) # 3-set case
intersectD = []
for k in range(len(intersect)):
cmn = intersect[k]
freq = max(a.count(cmn), b.count(cmn), c.count(cmn)) # 3-set case
for i in range(freq): # Can be done with itertools
intersectD.append(cmn)
>>> intersectD
>>> ['b', 'b', 'a', 'a', 'a', '1', '1', '1', '1']
For cases involving more than two lists, freq for this common element can be computed using a more complex set intersection and max expression. If using a list of lists, freq can be computed using an inner loop. You can also replace the inner i-loop with an itertools expression from How can I count the occurrences of a list item?.

Different ways of using enumerate

I know the basic way that enumerate works, but what difference does it make when you have two variables in the for loop? I used count and i in the examples below
This code:
Letters = ['a', 'b', 'c']
for count, i in enumerate(Letters):
print(count, i)
and this:
Letters = ['a', 'b', 'c']
for i in enumerate(Letters):
print(i)
Both give the same output, this:
>>>
0 'a'
1 'b'
2 'c'
Is writing code in the style of the first example beneficial in any circumstances? What is the difference?
If you know any other ways that could be useful, just let me know, I am trying to expand my knowledge within python
In the first example, count is set to the index, and i is set to the element.
In the second example, i is being set to the 2-element tuple (index, element).
The first example is equivalent to:
count, i = 0, 'a'
which is the same as:
count = 0
i = 'a'
And the second example is the same as:
i = (0, 'a')

Not able to delete string item with colon in python list

So I'm having the following problem while coding in python: I have a few string items in a list like so:
['X','Y','Z','A', 'B:C', 'D']
I want to delete everything past 'Z'. I use the following code to attempt this:
for item in lines:
if ((item == "A")):
lines.remove(item)
if (item == "B:C"):
lines.remove(item)
if (item == "D"):
lines.remove(item)
A and D get removed perfectly. However, B:C is not removed and stays in the list...
Mind you, A, D, B:C etc represent strings, not characters (e.g. A could be Transaction failed! and B:C can represent WRITE failure: cannot be done!)
How can this be solved?
Modifying a list while iterating over it is usually a bad thing. Some of the elements get skipped when you remove the current element. You may be able to fix it by iterating over reversed(lines), but it is better to create a new list that doesn't have the elements that you want to drop:
to_remove = {'A', 'B:C', 'D'}
new_lines = [line for line in lines if line not in to_remove]
Or, if you want to modify in-place:
to_remove = {'A', 'B:C', 'D'}
lines[:] = [line for line in lines if line not in to_remove]
You may use the .index() method to find the index of a specific element inside a list.
Then after finding the z_index, you may create another list by slicing the first one.
Here's an example:
l1 = ['X','Y','Z','A', 'B:C', 'D']
#finding index of element 'Z'
z_index = l1.index('Z')
#slicing list from 0 until z_index
l2 = l1[:z_index]
print l2
Output:
['X', 'Y']
Generally, it is not a good idea to delete elements from a list you are iterating. In your case, you may consider creating a new list with the result you want:
l = ['X','Y','Z','A', 'B:C', 'D']
clean_l = [i for i in l if i not in ('A', 'B:C', 'D')]
Which is a good option if you know which elements you want to delete. However, if you know that you don't want anything after 'Z' regardless of their value, then just slice the list:
clean_l = l[:l.index('Z') + 1]
Firstly you would want to find the position of 'Z' by using the index() method.
x = ['X','Y','Z','A', 'B:C', 'D']
position = x.index('Z')
Then to delete everything after z i would do this:
del x[postion+1:]
You have to add one to the position otherwise it will delete 'Z' also

Manipulating counter information - Python 2.7

I'm fairly new to Python and I have this program that I was tinkering with. It's supposed to get a string from input and display which character is the most frequent.
stringToData = raw_input("Please enter your string: ")
# imports collections class
import collections
# gets the data needed from the collection
letter, count = collections.Counter(stringToData).most_common(1)[0]
# prints the results
print "The most frequent character is %s, which occurred %d times." % (
letter, count)
However, if the string has 1 of each character, it only displays one letter and says it's the most frequent character. I thought about changing the number in the parenthesis in most_common(number), but I didn't want more to display how many times the other letters every time.
Thank you to all that help!
As I explained in the comment:
You can leave off the parameter to most_common to get a list of all characters, ordered from most common to least common. Then just loop through that result and collect the characters as long as the counter value is still the same. That way you get all characters that are most common.
Counter.most_common(n) returns the n most common elements from the counter. Or in case where n is not specified, it will return all elements from the counter, ordered by the count.
>>> collections.Counter('abcdab').most_common()
[('a', 2), ('b', 2), ('c', 1), ('d', 1)]
You can use this behavior to simply loop through all elements, ordered by their count. As long as the count is the same as of the first element in the output, you know that the element still ocurred in the same quantity in the string.
>>> c = collections.Counter('abcdefgabc')
>>> maxCount = c.most_common(1)[0][1]
>>> elements = []
>>> for element, count in c.most_common():
if count != maxCount:
break
elements.append(element)
>>> elements
['a', 'c', 'b']
>>> [e for e, c in c.most_common() if c == maxCount]
['a', 'c', 'b']

Categories