Counting consecutive characters in a string

Counting consecutive characters in a string - python

I need to write a code that slices the string (which is an input), append it to a list, count the number of each letter - and if it is identical to the letter before it, don't put it in the list, but rather increase the appearance number of that letter in the one before..
Well this is how it should look like :
assassin [['a', 1], ['s', 2], ['a', 1], ['s', 2]], ['i', 1], ['n', 1]
the word assassin is just an example of the need..
My code so far goes like this:
userin = raw_input("Please enter a string :")
inputlist = []
inputlist.append(userin)
biglist = []
i=0
count = {}
while i<(len(userin)):
slicer = inputlist[0][i]
for s in userin:
if count.has_key(s):
count[s] += 1
else:
count[s] = 1
biglist.append([slicer,s])
i = i+1
print biglist
Thanks!

Use Collections.Counter(), dictionary is a better way to store this:
>>> from collections import Counter
>>> strs="assassin"
>>> Counter(strs)
Counter({'s': 4, 'a': 2, 'i': 1, 'n': 1})
or using itertools.groupby():
>>> [[k, len(list(g))] for k, g in groupby(strs)]
[['a', 1], ['s', 2], ['a', 1], ['s', 2], ['i', 1], ['n', 1]]

last = ''
results = []
word = 'assassin'
for letter in word:
if letter == last:
results[-1] = (letter, results[-1][1] +1)
else:
results.append((letter, 1))
last = letter
print result # [('a', 1), ('s', 2), ('a', 1), ('s', 2), ('i', 1), ('n', 1)]

Using only builtins:
def cnt(s):
current = [s[0],1]
out = [current]
for c in s[1:]:
if c == current[0]:
current[1] += 1
else:
current = [c, 1]
out.append(current)
return out
print cnt('assassin')

Related

Getting the desired string

def split(word):
return [char for char in word]
a = "8hypotheticall024y6wxz"
alp = "ABCDEFGHIKLMNOPQRSTVXYZ"
alph = alp.lower()
b= split(alph)
c = set(b)-set(a)
c = sorted(c)
c = str(c)
res = [int(i) for i in a if i.isdigit()]
num_list = [0,1,2,3,4,5,6,7,8,9]
l = set(num_list)-set(res)
l = sorted(l)
l = str(l)
print(l,c)
The output I get is
[1, 3, 5, 7, 9] ['b', 'd', 'f', 'g', 'k', 'm', 'n', 'q', 'r', 's', 'v']
The output I want is
"13579bdfgjkmnqrsuv"
How do I get it?
Please provide me with the code to get rid of these square brackets, commas and quotation marks.

Add this is in the last -
l.extend(c)
l = [str(i) for i in l]
print(''.join(l)) # 13579bdfgkmnqrsv
Additionally, you could simplify your code (No need of function and many reassignments) -
a = "8hypotheticall024y6wxz"
alp = "ABCDEFGHIKLMNOPQRSTVXYZ"
b = alp.lower()
c = sorted(set(b)-set(a))
res = [int(i) for i in a if i.isdigit()]
num_list = [0,1,2,3,4,5,6,7,8,9]
l = sorted(set(num_list)-set(res))
l.extend(c)
l = [str(i) for i in l]
print(''.join(l))

I guess you can make another string like this
str = ""
for i in [your output list]:
str += i

For the list c, which is a list of strings, you can convert it with ''.join(c) instead of str(c).
For the list l, you can do the same thing but you have to convert every element to a string. One concise way to do this is with, map(str, l). To accomplish the conversion in one line, you could to ''.join(map(str,l))

Find the first repeated letter in a string and the times it is repeated

I have the following string: "WPCOPEO" and I need to find the first repeated letter and the times it is repeated. I would appreciate some help with the coding.
string = "WPCOPEO"
def is_repeated(letter):
for letter in String:
if letter == letter
print (letter)

Its pretty easy if you think about it
check if element exists in set else insert into set
>>> s=set()
>>> for i in string:
... if i in s:
... c=i
... break
... else:
... s.add(i)
...
>>> c
'P'
>>> string.count(c)
2

One of the way getting it using list comprehension is given below:
word = "WPCOPEO"
print (next([letter, word.count(letter)] for pos, letter in enumerate(word) if letter in word[pos+1:]))
Using for loop:
for pos, letter in enumerate(word):
if letter in word[pos+1:]:
print (letter, word.count(letter))
break

You can find the appearance by first storing each letters initial appearance with its frequency.
{'O': [3, 2], 'E': [4, 1], 'P': [1, 2], 'W': [0, 1], 'C': [2, 1]}
Next, you can transform that into a list of (appearance, letter frequency).
[[0, 'W', 1], [1, 'P', 2], [2, 'C', 1], [3, 'O', 2], [4, 'E', 1]]
Then you can sort by frequency and appearance and grab the first item.
[[1, 'P', 2], [3, 'O', 2], [0, 'W', 1], [2, 'C', 1], [4, 'E', 1]]
Example
#! /usr/bin/env python3
def letter_frequency_by_initial_pos(word):
occurs = {}
first_appearance = 0
for letter in word:
if letter in occurs:
occurs[letter][1] += 1
else:
occurs[letter] = [ first_appearance, 1 ]
first_appearance += 1
return occurs
def appearance_to_frequency(occurs):
freq_by_appearance = [None] * len(occurs)
for letter in occurs:
freq_by_appearance[occurs[letter][0]] = [ occurs[letter][0], letter, occurs[letter][1] ]
return sorted(freq_by_appearance, key = lambda x: (-x[2], x[0], x[1]))
if __name__ == '__main__':
freq = appearance_to_frequency(letter_frequency_by_initial_pos('WPCOPEO'))
print('The letter {} appears {} times.'.format(freq[0][1], freq[0][2]))
Output:
The letter P appears 2 times.

def is_repeated(string):
for i in range(1,len(string)):
check=string[0]
if check == string[i]:
print("This character is frequent:",string[i])
string = "WPCOPEO"
is_repeated(string)

Detecting if string iterator is a blank space

I'm attempting to write a small block of code that detects the most frequently occurring character. However, I've become stuck on not being able to detect if a value is blank space.
Below is the code I have:
text = "Hello World!"
## User lower() because case does not matter
setList = list(set(textList.lower()))
for s in setList:
if s.isalpha() and s != " ":
## Do Something
else:
setList.remove(s)
The problem is that set list ends with the following values:
[' ', 'e', 'd', 'h', 'l', 'o', 'r', 'w']
I've tried multiple ways of detecting the blank space with no luck, including using strip() on the original string value. isspace() will not work because it looks for at least one character.

The problem is, you are removing items from a list while iterating it. Never do that. Consider this case
['!', ' ', 'e', 'd', 'h', 'l', 'o', 'r', 'w']
This is how the setList looks like, after converting to a set and list. In the first iteration, ! will be seen and that will be removed from the setList. Now that ! is removed, the next character becomes the current character, which is . For the next iteration, the iterator is incremented and it points to e (since space is the current character). That is why it is still there in the output. You can check this with this program
num_list = range(10)
for i in num_list:
print i,
if i % 2 == 1:
num_list.remove(i)
pass
Output
0 1 3 5 7 9
But if you comment num_list.remove(i), the output will become
0 1 2 3 4 5 6 7 8 9
To solve your actual problem, you can use collections.Counter to find the frequency of characters, like this
from collections import Counter
d = Counter(text.lower())
if " " in d: del d[" "] # Remove the count of space char
print d.most_common()
Output
[('l', 3), ('o', 2), ('!', 1), ('e', 1), ('d', 1), ('h', 1), ('r', 1), ('w', 1)]

A short way is to first remove the spaces from the text
>>> text = "Hello world!"
>>> text = text.translate(None, " ")
>>> max(text, key=text.count)
'l'
This isn't very efficient though, because count scans the entire string once for each character (O(n2))
For longer strings it's better to use Collections.Counter, or Collections.defaultdict to do the counting in a single pass

How about removing the blank spaces before you start with lists and sets:
text = "Hello world!"
text = re.sub(' ', '', text)
# text = "Helloworld!"

the above answers are legitimate. you could also use the built-in count operator if you are not concerned with algorithmic complexity. For example:
## User lower() because case does not matter
text=text.lower()
num=0
most_freq=None
for s in text:
cur=text.count(s)
if cur>num:
most_freq=s
num=cur
else:
pass

How about using split(): it will fail if its blank space:
>>> [ x for x in text if x.split()]
['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd', '!']
>>>
To count the duplicate:
>>> d = dict()
>>> for e in [ x for x in text if x.split()]:
... d[e] = d.get(e,0) + 1
...
>>> print d
{'!': 1, 'e': 1, 'd': 1, 'H': 1, 'l': 3, 'o': 2, 'r': 1, 'W': 1}
>>>

To get the single most frequent, use max:
text = "Hello World!"
count={}
for c in text.lower():
if c.isspace():
continue
count[c]=count.get(c, 0)+1
print count
# {'!': 1, 'e': 1, 'd': 1, 'h': 1, 'l': 3, 'o': 2, 'r': 1, 'w': 1}
print max(count, key=count.get)
# 'l'
If you want the whole shebang:
print sorted(count.items(), key=lambda t: (-t[1], t[0]))
# [('l', 3), ('o', 2), ('!', 1), ('d', 1), ('e', 1), ('h', 1), ('r', 1), ('w', 1)]
If you want to use Counter and use a generator type approach, you could do:
from collections import Counter
from string import ascii_lowercase
print Counter(c.lower() for c in text if c.lower() in ascii_lowercase)
# Counter({'l': 3, 'o': 2, 'e': 1, 'd': 1, 'h': 1, 'r': 1, 'w': 1})

Turning a string into a list with specifications

I want to create a list out of my string in python that would show me how many times a letter is shown in a row inside the string.
for example:
my_string= "google"
i want to create a list that looks like this:
[['g', 1], ['o', 2], ['g', 1], ['l', 1], ['e', 1]]
Thanks!

You could use groupby from itertools:
from itertools import groupby
my_string= "google"
[(c, len(list(i))) for c, i in groupby(my_string)]

You can use a regular expression and a dictionary to find and store the longest string of each letter like this
s = 'google'
nodubs = [s[0]] + [s[x] if s[x-1] != s[x] else '' for x in range(1,len(s))]
nodubs = ''.join(nodubs)
import re
dic = {}
for letter in set(s):
matches = re.findall('%s+' % letter, s)
longest = max([len(x) for x in matches])
dic[letter] = longest
print [[n,dic[n]] for n in nodubs]
Result:
[['g', 1], ['o', 2], ['g', 1], ['l', 1], ['e', 1]]

Finding neighbors in a list

I have a list:
l=['a','>>','b','>>','d','e','f','g','>>','i','>>','>>','j','k','l','>>','>>']
I need to extract all the neighbors of '>>' and split them into groups where they have elements in between that are neither '>>' or neigbors of '>>'.
For the example list the expected outcome would be:
[['a', 'b', 'd'], ['g', 'i', 'j'], ['l']]
I have tried quite a few things, but all the simple ones have failed one way or another. At the moment the only code that seems to work is this:
def func(L,N):
outer=[]
inner=[]
for i,e in enumerate(L):
if e!=N:
try:
if L[i-1]==N or L[i+1]==N:
inner.append(e)
elif len(inner)>0:
outer.append(inner)
inner=[]
except IndexError:
pass
if len(inner):
outer.append(inner)
return outer
func(l,'>>')
Out[196]:
[['a', 'b', 'd'], ['g', 'i', 'j'], ['l']]
Although it seems to work, i am wondering if there is a better,cleaner way to do it?

I would argue that the most pythonic and easy to read solution would be something like this:
import itertools
def neighbours(items, fill=None):
"""Yeild the elements with their neighbours as (before, element, after).
neighbours([1, 2, 3]) --> (None, 1, 2), (1, 2, 3), (2, 3, None)
"""
before = itertools.chain([fill], items)
after = itertools.chain(items, [fill]) #You could use itertools.zip_longest() later instead.
next(after)
return zip(before, items, after)
def split_not_neighbour(seq, mark):
"""Split the sequence on each item where the item is not the mark, or next
to the mark.
split_not_neighbour([1, 0, 2, 3, 4, 5, 0], 0) --> (1, 2), (5)
"""
output = []
for items in neighbours(seq):
if mark in items:
_, item, _ = items
if item != mark:
output.append(item)
else:
if output:
yield output
output = []
if output:
yield output
Which we can use like so:
>>> l = ['a', '>>', 'b', '>>', 'd', 'e', 'f', 'g', '>>', 'i', '>>', '>>',
... 'j', 'k', 'l', '>>', '>>']
>>> print(list(split_not_neighbour(l, ">>")))
[['a', 'b', 'd'], ['g', 'i', 'j'], ['l']]
Note the neat avoidance of any direct indexing.
Edit: A more elegant version.
def split_not_neighbour(seq, mark):
"""Split the sequence on each item where the item is not the mark, or next
to the mark.
split_not_neighbour([1, 0, 2, 3, 4, 5, 0], 0) --> (1, 2), (5)
"""
neighboured = neighbours(seq)
for _, items in itertools.groupby(neighboured, key=lambda x: mark not in x):
yield [item for _, item, _ in items if item != mark]

Here is one alternative:
import itertools
def func(L, N):
def key(i_e):
i, e = i_e
return e == N or (i > 0 and L[i-1] == N) or (i < len(L) and L[i+1] == N)
outer = []
for k, g in itertools.groupby(enumerate(L), key):
if k:
outer.append([e for i, e in g if e != N])
return outer
Or an equivalent version with a nested list comprehension:
def func(L, N):
def key(i_e):
i, e = i_e
return e == N or (i > 0 and L[i-1] == N) or (i < len(L) and L[i+1] == N)
return [[e for i, e in g if e != N]
for k, g in itertools.groupby(enumerate(L), key) if k]

You can simplify it like this
l = ['']+l+['']
stack = []
connected = last_connected = False
for i, item in enumerate(l):
if item in ['','>>']: continue
connected = l[i-1] == '>>' or l[i+1] == '>>'
if connected:
if not last_connected:
stack.append([])
stack[-1].append(item)
last_connected = connected

my naive attempt
things = (''.join(l)).split('>>')
output = []
inner = []
for i in things:
if not i:
continue
i_len = len(i)
if i_len == 1:
inner.append(i)
elif i_len > 1:
inner.append(i[0])
output.append(inner)
inner = [i[-1]]
output.append(inner)
print output # [['a', 'b', 'd'], ['g', 'i', 'j'], ['l']]

Something like this:
l=['a','>>','b','>>','d','e','f','g','>>','i','>>','>>','j','k','l','>>','>>']
l= filter(None,"".join(l).split(">>"))
lis=[]
for i,x in enumerate(l):
if len(x)==1:
if len(lis)!=0:
lis[-1].append(x[0])
else:
lis.append([])
lis[-1].append(x[0])
else:
if len(lis)!=0:
lis[-1].append(x[0])
lis.append([])
lis[-1].append(x[-1])
else:
lis.append([])
lis[-1].append(x[0])
lis.append([])
lis[-1].append(x[-1])
print lis
output:
[['a', 'b', 'd'], ['g', 'i', 'j'], ['l']]
or:
l=['a','>>','b','>>','d','e','f','g','>>','i','>>','>>','j','k','l','>>','>>']
l= filter(None,"".join(l).split(">>"))
lis=[[] for _ in range(len([1 for x in l if len(x)>1])+1)]
for i,x in enumerate(l):
if len(x)==1:
for y in reversed(lis):
if len(y)!=0:
y.append(x)
break
else:
lis[0].append(x)
else:
if not all(len(x)==0 for x in lis):
for y in reversed(lis):
if len(y)!=0:
y.append(x[0])
break
for y in lis:
if len(y)==0:
y.append(x[-1])
break
else:
lis[0].append(x[0])
lis[1].append(x[-1])
print lis
output:
[['a', 'b', 'd'], ['g', 'i', 'j'], ['l']]

Another medthod using superimposition of original list
import copy
lis_dup = copy.deepcopy(lis)
lis_dup.insert(0,'')
prev_in = 0
tmp=[]
res = []
for (x,y) in zip(lis,lis_dup):
if '>>' in (x,y):
if y!='>>' :
if y not in tmp:
tmp.append(y)
elif x!='>>':
if x not in tmp:
print 'x is ' ,x
tmp.append(x)
else:
if prev_in ==1:
res.append(tmp)
prev_in =0
tmp = []
prev_in = 1
else:
if prev_in == 1:
res.append(tmp)
prev_in =0
tmp = []
res.append(tmp)
print res

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Counting consecutive characters in a string - python

last = '' results = [] word = 'assassin' for letter in word: if letter == last: results[-1] = (letter, results[-1][1] +1) else: results.append((letter, 1)) last = letter print result # [('a', 1), ('s', 2), ('a', 1), ('s', 2), ('i', 1), ('n', 1)]

Using only builtins: def cnt(s): current = [s[0],1] out = [current] for c in s[1:]: if c == current[0]: current[1] += 1 else: current = [c, 1] out.append(current) return out print cnt('assassin')

Related

Getting the desired string

Find the first repeated letter in a string and the times it is repeated

Detecting if string iterator is a blank space

Turning a string into a list with specifications

Finding neighbors in a list

Categories

Resources