Counting specific strings within another string [duplicate] - python

This question already has answers here:
String count with overlapping occurrences [closed]
(25 answers)
Closed 5 years ago.
I am trying to figure out how to count how many times one string appears in another string. My code that I'm trying to use and have played around with has not worked so far.
needle = input()
haystack = input()
count = 0
for needle in haystack:
count += 1
print(count)
My expected result if the input for haystack is 'sesses' and needle is 'ses' would be to print out 2. (ses is in haystack twice)

'sesses'.count('ses') will give you the answer

Use count()
needle = input()
haystack = input()
print haystack.count(needle)

for needle in haystack:
count += 1
The reason this doesn't work is that "for x in y" is iterating over values in y, and /placing/ them into x; it does not read from x. So in your code example, you are iterating over haystack (which, in Python, means looping through each letter), and then placing the value (each letter) into needle. Which is not what you want.
There's no built-in iterator to give you what you want -- an iterator of the occurrences of another string. haystack.count(needle) will give you the desired answer; other than that, you could also use haystack.find to find an occurrence starting at a given point, and keep track of how far you are in the string yourself:
index = haystack.find(needle)
while index >= 0:
count += 1
index = haystack.find(needle, index + 1)
Note that this will give a different answer than haystack.count: haystack.count will never count a letter twice, whereas the above will. So "aaaaaa".count("aaa") will return 2 (finding "aaaaaa" and "aaaaaa"), but the above code would return 4, because it would find "aaaaaa", "aaaaaa", "aaaaaa", and "aaaaaa".

Try this :)
characters = 'ses'
word = 'sesses'
chunkSize = 1
count = 0
while chunkSize < len(word):
for i in range(len(word) - chunkSize + 1):
if word[i:chunkSize+i] == characters:
count += 1
chunkSize += 1
print(count)

This should work:
a='seses'
b='se'
print a.count(b)
>>2

Related

Python strings: quickly summarize the character count in order of appearance

Let's say I have the following strings in Python3.x
string1 = 'AAAAABBBBCCCDD'
string2 = 'CCBADDDDDBACDC'
string3 = 'DABCBEDCCAEDBB'
I would like to create a summary "frequency string" that counts the number of characters in the string in the following format:
string1_freq = '5A4B3C2D' ## 5 A's, followed by 4 B's, 3 C's, and 2D's
string2_freq = '2C1B1A5D1B1A1C1D1C'
string3_freq = '1D1A1B1C1B1E1D2C1A1E1D2B'
My problem:
How would I quickly create such a summary string?
My idea would be: create an empty list to keep track of the count. Then create a for loop which checks the next character. If there's a match, increase the count by +1 and move to the next character. Otherwise, append to end of the string 'count' + 'character identity'.
That's very inefficient in Python. Is there a quicker way (maybe using the functions below)?
There are several ways to count the elements of a string in python. I like collections.Counter, e.g.
from collections import Counter
counter_str1 = Counter(string1)
print(counter_str1['A']) # 5
print(counter_str1['B']) # 4
print(counter_str1['C']) # 3
print(counter_str1['D']) # 2
There's also str.count(sub[, start[, end]
Return the number of non-overlapping occurrences of substring sub in
the range [start, end]. Optional arguments start and end are
interpreted as in slice notation.
As an example:
print(string1.count('A')) ## 5
The following code accomplishes the task without importing any modules.
def freq_map(s):
num = 0 # number of adjacent, identical characters
curr = s[0] # current character being processed
result = '' # result of function
for i in range(len(s)):
if s[i] == curr:
num += 1
else:
result += str(num) + curr
curr = s[i]
num = 1
result += str(num) + curr
return result
Note: Since you requested a solution based on performance, I suggest you use this code or a modified version of it.
I have executed rough performance test against the code provided by CoryKramer for reference. This code performed the same function in 58% of the time without using external modules. The snippet can be found here.
I would use itertools.groupby to group consecutive runs of the same letter. Then use a generator expression within join to create a string representation of the count and letter for each run.
from itertools import groupby
def summarize(s):
return ''.join(str(sum(1 for _ in i[1])) + i[0] for i in groupby(s))
Examples
>>> summarize(string1)
'5A4B3C2D'
>>> summarize(string2)
'2C1B1A5D1B1A1C1D1C'
>>> summarize(string3)
'1D1A1B1C1B1E1D2C1A1E1D2B'

Matching exact elements in a set where order doesn't matter

I'm new to python and I'm trying to match the exact elements between two sets, regardless of order. So if my 2 sets are:
reflist = [1],[2,3,4],[5,6]
qlist = [1,2,3,4],[6,5]
The number of matches should be 1, which is 5,6
I tried to write the following loop to match the elements in qlist against reflist, and count the number of matches:
i = 0
count = 0
for each in qlist:
while i < len(qlist):
if each.split(",").sort == reflist[i].split(",").sort:
count = count + 1
i = i + 1
print count
However, I keep getting count = 0, even if the order of 5 and 6 in qlist is 5,6. Would really appreciate any help with this!
If there are no duplicates in your "sets", convert your "sets" to a set of frozensets, and find the set intersection -
i = set(map(frozenset, reflist))
j = map(frozenset, qlist)
len(i.intersection(j))
1
This could do:
If you have no duplicates:
matches = [x for x in map(set, reflist) if x in map(set, qlist)]
If you have duplicates:
matches = [x for x in map(sorted, reflist) if x in map(sorted, qlist)]
You could always use collections.Counter() for this:
from collections import Counter
reflist = [[1],[2,3,4],[5,6]]
qlist = [[1,2,3,4],[6,5]]
result = [list(x.keys()) for x in [Counter(y) for y in reflist] if x in [Counter(y) for y in qlist]]
print(result)
Which Outputs:
[[5,6]]
Here is my one-liner, using frozensets and and:
len(set(map(frozenset, qlist)) and set(map(frozenset, reflist)))
I understand you are new to Python, hence I will answer your question using your own method, just for the sake of recording the basic straightforward answer for future reference.
First of all, your code shouldn't run at all. It must error out, because both each and reflist[i] are lists, and you are applying a string method of split(",") on them. Therefore you are getting the initial value of count = 0. You must check in the first place whether your code is even touching all the elements of qlist and reflist. This is not Code Review, hence I will leave it to you to run this and see the answer:
i = 0
count = 0
for each in qlist:
while i < len(qlist):
print i
print each
print reflist[i]
i = i + 1
Keep in mind: You don't have to iterate on index! You can just loop over the elements of iterables directly! This is the answer you are looking for:
match = [] # Optional, to see all the matching elements
count = 0
for q in qlist:
for r in reflist:
if set(q) == set(r):
print q, r
match.append(q)
count += 1
print match
print count, len(match)

Intro to Python - Lists questions

we've started doing Lists in our class and I'm a bit confused thus coming here since previous questions/answers have helped me in the past.
The first question was to sum up all negative numbers in a list, I think I got it right but just want to double check.
import random
def sumNegative(lst):
sum = 0
for e in lst:
if e < 0:
sum = sum + e
return sum
lst = []
for i in range(100):
lst.append(random.randrange(-1000, 1000))
print(sumNegative(lst))
For the 2nd question, I'm a bit stuck on how to write it. The question was:
Count how many words occur in a list up to and including the first occurrence of the word “sap”. I'm assuming it's a random list but wasn't given much info so just going off that.
I know the ending would be similar but no idea how the initial part would be since it's string opposed to numbers.
I wrote a code for a in-class problem which was to count how many odd numbers are on a list(It was random list here, so assuming it's random for that question as well) and got:
import random
def countOdd(lst):
odd = 0
for e in lst:
if e % 2 = 0:
odd = odd + 1
return odd
lst = []
for i in range(100):
lst.append(random.randint(0, 1000))
print(countOdd(lst))
How exactly would I change this to fit the criteria for the 2nd question? I'm just confused on that part. Thanks.
The code to sum -ve numbers looks fine! I might suggest testing it on a list that you can manually check, such as:
print(sumNegative([1, -1, -2]))
The same logic would apply to your random list.
A note about your countOdd function, it appears that you are missing an = (== checks for equality, = is for assignment) and the code seems to count even numbers, not odd. The code should be:
def countOdd(lst):
odd = 0
for e in lst:
if e%2 == 1: # Odd%2 == 1
odd = odd + 1
return odd
As for your second question, you can use a very similar function:
def countWordsBeforeSap(inputList):
numWords = 0
for word in inputList:
if word.lower() != "sap":
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList))
To explain the above, the countWordsBeforeSap function:
Starts iterating through the words.
If the word is anything other than "sap" it increments the counter and continues
If the word IS "sap" then it returns early from the function
The function could be more general by passing in the word that you wanted to check for:
def countWordsBefore(inputList, wordToCheckFor):
numWords = 0
for word in inputList:
if word.lower() != wordToCheckFor:
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList, "sap"))
If the words that you are checking come from a single string then you would initially need to split the string into individual words like so:
inputString = "Trees produce sap"
inputList = inputString.split(" ")
Which splits the initial string into words that are separated by spaces.
Hope this helps!
Tom
def count_words(lst, end="sap"):
"""Note that I added an extra input parameter.
This input parameter has a default value of "sap" which is the actual question.
However you can change this input parameter to any other word if you want to by
just doing "count_words(lst, "another_word".
"""
words = []
# First we need to loop through each item in the list.
for item in lst:
# We append the item to our "words" list first thing in this loop,
# as this will make sure we will count up to and INCLUDING.
words.append(item)
# Now check if we have reached the 'end' word.
if item == end:
# Break out of the loop prematurely, as we have reached the end.
break
# Our 'words' list now has all the words up to and including the 'end' variable.
# 'len' will return how many items there are in the list.
return len(words)
lst = ["something", "another", "woo", "sap", "this_wont_be_counted"]
print(count_words(lst))
Hope this helps you understand lists better!
You can make effective use of list/generator comprehensions. Below are fast and memory efficient.
1. Sum of negatives:
print(sum( i<0 for i in lst))
2. Count of words before sap: Like you sample list, it assumes no numbers are there in list.
print(lst.index('sap'))
If it's a random list. Filter strings. Find Index for sap
l = ['a','b',1,2,'sap',3,'d']
l = filter(lambda x: type(x)==str, l)
print(l.index('sap'))
3. Count of odd numbers:
print(sum(i%2 != 0 for i in lst))

How to count the number of a specific character at the end of a string ignoring duplicates? [duplicate]

This question already has answers here:
Pythonic way to count the number of trailing zeros in base 2
(5 answers)
Closed 5 years ago.
I have a series of strings like:
my_text = "one? two three??"
I want to count only the number of ? at the end of the string. The above should return 2 (rather than 3).
What I've tried so far:
my_text.count("?") # returns 3
There's not a built-in method for it. But something simple like this should do the trick:
>>> len(my_text) - len(my_text.rstrip('?'))
2
You could also use a regexp to count the number of trailing question marks :
import re
def count_trailing_question_marks(text):
last_question_marks = re.compile("\?*$")
return len(last_question_marks.search(text).group(0))
print count_trailing_question_marks("one? two three??")
# 2
print count_trailing_question_marks("one? two three")
# 0
Not so clean but simple way:
my_text = "one? two three??"
total = 0
question_mark = '?'
i = 0
for c in my_text:
i -= 1
if my_text[i] == question_mark:
total += 1
else:
break
One-liner using my favourite itertools:
First reverse the string, then continue iterating (taking the values) while our condition is satisfied (value == '?'). This returns an iterable which we exhaust into a list and finally take its length.
len(list(itertools.takewhile(lambda x:x=='?',reversed(my_text))))

Counting the number of times a character in string1 is found in string2?

I'm trying to create a function which takes two strings and then returns the sum total of how many times every character in the first string is found in the second string under the condition that duplicate characters in the first are ignored.
e.g. search_counter('aabbaa','a') would mean a count of 1 since the the second string only has one a and no bs and we only want to search for a once despite there being four as.
Here's my attempt so far:
def search_counter(search_string, searchme):
count = 0
for x in search_string:
for y in searchme:
if x == y:
count = count + 1
return count
The problem with my example is that there is no check to ignore duplicate characters in search_string.
So instead of getting search_counter('aaa','a') = 1 I get 3.
for x in search_string:
You can get a list of characters without duplicates by converting the string to a set.
for x in set(search_string):
You can eliminate repetitions from a string by transforming it into a set:
>>> set("asdddd")
set(['a', 's', 'd'])
Preprocess your string this way, and the rest of the algorithm will remain the same (set objects are iterables, just like strings)
You can use iteration to do this
def search_counter(string, search):
count = 0
for i in range(len(string)):
count += string[i:].startswith(search)
return count
Or this one-liner
search_counter = lambda string, search: sum([string[i:].startswith(search) for i in range(len(string))])

Categories