Count occurrences of char in a single string

Count occurrences of char in a single string - python

string = input(" ")
count = string.count()
print(string + str(count))
Need to use a for loop to get the output: ll2a1m1a1

Use groupby from itertools
>>> from itertools import groupby
>>> s = 'llama'
>>> [[k, len(list(g))] for k, g in groupby(s)]
[['l', 2], ['a', 1], ['m', 1], ['a', 1]]
If you want exactly that output you asked, try the following, and as suggested by #DanielMesejo, use sum(1 for _ in g) instead of len(list(g)):
>>> from itertools import groupby
>>> s = 'llama'
>> groups = [[k, sum(1 for _ in g)] for k, g in groupby(s)]
>>> ''.join(f'{a * b}{b}' for a, b in groups)
'll2a1m1a1'
This works for any word you want, let's say the word is 'happen', so
>>> from itertools import groupby
>>> s = 'happen'
>> groups = [[k, sum(1 for _ in g)] for k, g in groupby(s)]
>>> ''.join(f'{a * b}{b}' for a, b in groups)
'h1a1pp2e1n1'

a more basic approach:
string = 'llama'
def get_count_str(s):
previous = s[0]
for c in s[1:]:
if c != previous:
yield f'{previous}{len(previous)}'
previous = c
else:
previous += c
# yield last
yield f'{previous}{len(previous)}'
print(*get_count_str(string ), sep='')
output:
ll2a1m1a1

Look bud, you gotta explain more, this loops through and counts how many times each letter and prints it out.
greeting = 'llama'
for i in range(0, len(greeting)):
#start count at 1 for original instance.
count = 1
for k in range(0, len(greeting)):
# check letters are not the same position letter.
if not k == i:
#check if letters match
if greeting[i] == greeting[k]:
count += 1
print(greeting[i] + str(count))

Related

How to get multiple most frequent k-mers of a string using Python?

If I insert the following
Insert the Text:
ACACACA
Insert a value for k:
2
For the following codes
print("Insert the Text:")
Text = input()
print("Insert a value for k:")
k = int(input())
Pattern = " "
count = [ ]
FrequentPatterns = [ ]
def FrequentWords(Text, k):
for i in range (len(Text)-k+1):
Pattern = Text[i: i+k]
c = 0
for i in range (len(Text)-len(Pattern)+1):
if Text[i: i+len(Pattern)] == Pattern:
c = c+1
else:
continue
count.extend([c])
print(count)
if count[i] == max(count):
FrequentPatterns.extend([Pattern])
return FrequentPatterns
FrequentWords(Text, k)
I get the following out put
Insert the Text:
ACACACA
Insert a value for k:
2
[3, 3, 3, 3, 3, 3]
['CA']
Clearly there are two FrequentPatterns. So the last list output should be ['AC', 'CA']
I don't know why this code isn't working. Really appreciate if anyone could help.

Here's how would solve this:
from itertools import groupby
def find_kgrams(string, k):
kgrams = sorted(
string[j:j+k]
for i in range(k)
for j in range(i, (len(string) - i) // k * k, k)
)
groups = [(k, len(list(g))) for k, g in groupby(kgrams)]
return sorted(groups, key=lambda i: i[1], reverse=True)
The way this works is:
it produces string chunks of the given length k, e.g.:
starting from 0: 'ACACACA' -> 'AC', 'AC', 'AC'
starting from 1: 'ACACACA' -> 'CA', 'CA', 'CA'
...up to k - 1 (1 is the maximum for k == 2)
groupby() groups those chunks
sorted() sorts them by count
a list of tuples of kgrams and their count is returned
Test:
s = 'ACACACA'
kgrams = find_kgrams(s, 2)
print(kgrams)
prints:
[('AC', 3), ('CA', 3)]
It's already sorted, you can pick the most frequent one(s) from the front of the returned list:
max_kgrams = [k for k, s in kgrams if s == kgrams[1][1])
print(max_kgrams)
prints:
['AC', 'CA']

Grouping the nested attribute list in Python

I have a list
lst = ['orb|2|3|4', 'obx|2|3|4', 'orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']
How can I group the list by the initial three lines, so that in the end it's like this. Grouping occurs on three characters of the line. If the line starts with "orb", then subsequent lines are added to the list that begins with this line. Thanks for the answer.
result = [['orb|2|3|4', 'obx|2|3|4'], ['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']]

Here is an algorithm of O(N) complexity:
res = []
tmp = []
for x in lst:
if x.startswith('orb'):
if tmp:
res.append(tmp)
tmp = [x]
elif tmp:
tmp.append(x)
res.append(tmp)
result:
In [133]: res
Out[133]:
[['orb|2|3|4', 'obx|2|3|4'],
['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3', 'obx|1|2|3']]

You can use itertools.groupby:
import itertools, re
lst = ['orb|2|3|4', 'obx|2|3|4', 'orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']
new_result = [list(b) for _, b in itertools.groupby(lst, key=lambda x:re.findall('^\w+', x)[0])]
final_result = [new_result[i]+new_result[i+1] for i in range(0, len(new_result), 2)]
Output:
[['orb|2|3|4', 'obx|2|3|4'], ['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3', 'obx|1|2|3']]

search an array element in other 2d list and count sublists in python

I'm new in python,
I have a list like : A=['a','b','c']
and a list like B=[['a','c'],['a','c'],['b','b']]
i want to have a list like C=[2,1,2]
C stores occurrence of sublists that each element of A comes in B
that means 'a' is in 2 sublists
'b' is in 1 sublist
and 'c' is in 2 sublists,
how can I achieve this?
thanks

You can use sum:
a=['a','b','c']
b=[['a','c'],['a','c'],['b','b']]
final_list = [sum(i in c for c in b) for i in a]
Output:
[2, 1, 2]

You can loop over b and update a collections.Counter for each sublist, using set to remove duplicates:
from collections import Counter
a = ['a','b','c']
b = [['a','c'],['a','c'],['b','b']]
counter = Counter()
for sublist in b:
counter.update(set(sublist))
c = [counter[x] for x in a]
# result: [2, 1, 2]

You can loop and compare in both lists
a=['a','b','c']
b=[['a','c'],['a','c'],['b','b']]
result = []
for letter in a:
count = 0
for l in b:
if letter in l:
count += 1
result.append(count)

You can try dict approach :
A=['a','b','c']
B=[['a','c'],['a','c'],['b','b']]
d={}
for i in A:
for j in B:
if i in j:
if i not in d:
d[i]=1
else:
d[i]+=1
print(d)
output:
{'c': 2, 'b': 1, 'a': 2}

You can use a list comprehension with sum to construct C.
C = [sum(elem in sub for sub in B) for elem in A]
This has the same effect as using nested for loops:
C = []
for elem in A:
sum = 0
for sub in B:
sum += elem in sub
C.append(sum)

Here is a solution with collections.defaultdict.
from collections import defaultdict
a = ['a','b','c']
b = [['a','c'],['a','c'],['b','b']]
# initialise defaultdict
d = defaultdict(int)
# convert to sets for performance
a_set = set(a)
b_sets = list(map(set, b))
# loop through list of sets
for item in b_sets:
for i in item & a_set:
d[i] += 1
# retrieve counts in correct order
res = list(map(d.get, a))
print(res)
# [2, 1, 2]
Performance note
This may not matter, but the performance differential is interesting as it shows clearly the Counter overhead (4x slower).
from collections import defaultdict, Counter
a = ['a','b','c']
b = [['a','c'],['a','c'],['b','b']]
b = b*100000
def dd(a, b):
d = defaultdict(int)
a_set = set(a)
b_sets = list(map(set, b))
for item in b_sets:
for i in item & a_set:
d[i] += 1
return list(map(d.get, a))
def counter(a, b):
counter = Counter()
for sublist in b:
counter.update(set(sublist))
return [counter[x] for x in a]
assert dd(a, b) == counter(a, b)
%timeit dd(a, b) # 414 ms
%timeit counter(a, b) # 1.65 s

Find the position of the longest repeated letter

I have a file that contains letters. I need to find the position of the longest repeated letters. For example, if the file contains aaassdddffccsdddfgssfrsfspppppppppppddsfs, I need a program that finds the position of ppppppppppp. I know that I need to use a .index function to find the location however I am stuck on the loop.

Using itertools.groupby:
import itertools
mystr = 'aaassdddffccsdddfgssfrsfspppppppppppddsfs'
idx = 0
maxidx, maxlen = 0, 0
for _, group in itertools.groupby(mystr):
grouplen = sum(1 for _ in group)
if grouplen > maxlen:
maxidx, maxlen = idx, grouplen
idx += grouplen
Gives the idx and the length of the longest identical substring:
>>> print(maxidx, maxlen)
25, 11
>>> mystr[25:25+11]
'ppppppppppp'

You're going to need to loop through the entire string. Keep track of each new letter you come across as well as it's index and how long each sequence is. Only store the max sequence
s = 'aaassdddffccsdddfgssfrsfspppppppppppddsfs'
max_c = max_i = max_len = None
cur_c = cur_i = cur_len = None
for i, c in enumerate(s):
if c != cur_c:
if max_len is None or cur_len > max_len:
max_c, max_i, max_len = cur_c, cur_i, cur_len
cur_c = c
cur_i = i
cur_len = 1
else:
cur_len += 1
else:
# One last check when the loop completes
if max_len is None or cur_len > max_len:
max_c, max_i, max_len = cur_c, cur_i, cur_len
print max_c, max_i, max_len

Here is an oneliner
from itertools import groupby
from functools import reduce
[(k, next(g)[0], sum(1 for _ in g)+1) for k, g in groupby(enumerate(
'aaassdddffccsdddfgssfrsfspppppppppppddsfs'), key=itemgetter(1))]
The above generates (key, position, length). You can get the maximum length by
applying reduce
from itertools import groupby
from functools import reduce
from operator import itemgetter
reduce(lambda x,y:x if x[2] >= y[2] else y,
((k, next(g)[0], sum(1 for _ in g)+1) for k, g in groupby(enumerate(
'aaassdddffccsdddfgssfrsfspppppppppppddsfs'), key=itemgetter(1))))

A quick way of achieving this is to use a regex to match repeating characters with (.)(\1+). Then we loop over all those results using a generator comprehension and find the max according to the length (key=len). Finally having found the largest string, we call thestr.index() to find where the longest repeated letter occurred:
import re
txt = "aaassdddffccsdddfgssfrsfspppppppppppddsfs"
idx = txt.index(max((''.join(f) for f in re.findall(r"(.)(\1+)", txt)), key=len))
print(idx)
Here is the same code broken out into stages:
>>> import re
>>> txt = "aaassdddffccsdddfgssfrsfspppppppppppddsfs"
>>> matches = list(''.join(f) for f in re.findall(r"(.)(\1+)", txt))
>>> print(matches)
['aaa', 'ss', 'ddd', 'ff', 'cc', 'ddd', 'ss', 'ppppppppppp', 'dd']
>>> longest = max(matches, key=len)
>>> print(longest)
ppppppppppp
>>> print(txt.index(longest))
25

how to apply a groupby on list of tuples in python?

In my function I will create different tuples and add to an empty list :
tup = (pattern,matchedsen)
matchedtuples.append(tup)
The patterns have format of regular expressions. I am looking for apply groupby() on matchedtuples in following way:
For example :
matchedtuples = [(p1, s1) , (p1,s2) , (p2, s5)]
And I am looking for this result:
result = [ (p1,(s1,s2)) , (p2, s5)]
So, in this way I will have groups of sentences with the same pattern. How can I do this?

My answer for your question will work for any input structure you will use and print the same output as you gave. And i will use only groupby from itertools module:
# Let's suppose your input is something like this
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5")]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[('p1', ('s1', 's2')), ('p2', 's5')]
The same solution work if you add more values to your input:
# When you add more values to your input
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5"), ("p2", "s6"), ("p3", "s7")]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[('p1', ('s1', 's2')), ('p2', ('s5', 's6')), ('p3', 's7')]
Now, if you modify your input structure:
# Let's suppose your modified input is something like this
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"])]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[(['p1'], (['s1'], ['s2'])), (['p2'], ['s5'])]
Also, the same solution work if you add more values to your new input structure:
# When you add more values to your new input
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"]), (["p2"], ["s6"]), (["p3"], ["s7"])]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[(['p1'], (['s1'], ['s2'])), (['p2'], (['s5'], ['s6'])), (['p3'], ['s7'])]
Ps: Test this code and if it breaks with any other kind of inputs please let me know.

If you require the output you present, you'll need to manually loop through the grouping of matchedtuples and build your list.
First, of course, if the matchedtuples list isn't sorted, sort it with itemgetter:
from operator import itemgetter as itmg
li = sorted(matchedtuples, key=itmg(0))
Then, loop through the result supplied by groupby and append to the list r based on the size of the group:
r = []
for i, j in groupby(matchedtuples, key=itmg(0)):
j = list(j)
ap = (i, j[0][1]) if len(j) == 1 else (i, tuple(s[1] for s in j))
r.append(ap)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Count occurrences of char in a single string - python

string = input(" ") count = string.count() print(string + str(count)) Need to use a for loop to get the output: ll2a1m1a1

a more basic approach: string = 'llama' def get_count_str(s): previous = s[0] for c in s[1:]: if c != previous: yield f'{previous}{len(previous)}' previous = c else: previous += c # yield last yield f'{previous}{len(previous)}' print(*get_count_str(string ), sep='') output: ll2a1m1a1

Related

How to get multiple most frequent k-mers of a string using Python?

Grouping the nested attribute list in Python

search an array element in other 2d list and count sublists in python

Find the position of the longest repeated letter

how to apply a groupby on list of tuples in python?

Categories

Resources