how to find a continuous string using python - python

Given a string (e.g., jaghiuuabc ), i want to find a string with subsequent letter in alphabet
here is my code
import string
alpha = list(string.ascii_lowercase)
s = 'jaghiuuabc'
a = []
for i in range(len(alpha)-1):
for j in range(len(s)-1)
if s[j] in alpha[i]:
a.append(s[j])
print(a)

There's a nice example in the Python 2.6 itertools docs that shows how to find consecutive sequences. To quote:
Find runs of consecutive numbers using groupby. The key to the
solution is differencing with a range so that consecutive numbers all
appear in same group.
For some strange reason, that example is not in the later versions of the docs. That code works for sequences of numbers, the code below shows how to adapt it to work on letters.
from itertools import groupby
s = 'jaghiuuabc'
def keyfunc(t):
''' Subtract the character's index in the string
from its Unicode codepoint number.
'''
i, c = t
return ord(c) - i
a = []
for k, g in groupby(enumerate(s), key=keyfunc):
# Extract the chars from the (index, char) tuples in the group
seq = [t[1] for t in g]
if len(seq) > 1:
a.append(''.join(seq))
print(a)
output
['ghi', 'abc']
How it works
The heart of this code is
groupby(enumerate(s), key=keyfunc)
enumerate(s) generates tuples containing the index number and character for each character in s. For example:
s = 'ABCEF'
for t in enumerate(s):
print(t)
output
(0, 'A')
(1, 'B')
(2, 'C')
(3, 'E')
(4, 'F')
groupby takes items from a sequence or iterator and gathers adjacent equal items together into groups. By default, it simply compares the values of the items to see if they're equal. But you can also give it a key function. When you do that, it passes each item to the key function and uses the result returned by that key function for its equality test.
Here's a simple example. First, we define a function div_by_10 that divides a number by 10, using integer division. This basically gets rid of the last digit in the number.
def div_by_10(n):
return n // 10
a = [2, 5, 10, 13, 17, 21, 22, 29, 33, 35]
b = [div_by_10(u) for u in a]
print(a)
print(b)
output
[2, 5, 10, 13, 17, 21, 22, 29, 33, 35]
[0, 0, 1, 1, 1, 2, 2, 2, 3, 3]
So if we use div_by_10 as the key function to groupby it will ignore the last digit in each number and thus it will group adjacent numbers together if they only differ in the last digit.
from itertools import groupby
def div_by_10(n):
return n // 10
a = [2, 5, 10, 13, 17, 21, 22, 29, 33, 35]
print(a)
for key, group in groupby(a, key=div_by_10):
print(key, list(group))
output
[2, 5, 10, 13, 17, 21, 22, 29, 33, 35]
0 [2, 5]
1 [10, 13, 17]
2 [21, 22, 29]
3 [33, 35]
My keyfunc receives a (index_number, character) tuple and subtracts that index_number from the character's code number and returns the result. Let's see what that does with my earlier example of 'ABCEF':
def keyfunc(t):
i, c = t
return ord(c) - i
for t in enumerate('ABCEF'):
print(t, keyfunc(t))
output
(0, 'A') 65
(1, 'B') 65
(2, 'C') 65
(3, 'E') 66
(4, 'F') 66
The code number for 'A' is 65, the code number for 'B' is 66, the code number for 'C' is 67, etc. So when we subtract the index from the code number for each of 'A', 'B', and 'C' we get 65. But we skipped over 'D' so when we do the subtractions for 'E' and 'F' we get 66. And that's how groupby can put 'A', 'B', & 'C' in one group and 'E' & 'F' in the next group.
This can be tricky stuff. Don't expect to understand it all completely straight away. But if you do some experiments yourself I'm sure it will gradually sink in. ;)
Just for fun, here's the unreadable multiply-nested list comprehension version of that code. ;)
print([z for _, g in groupby(enumerate(s),lambda t:ord(t[1])-t[0])for z in[''.join([*zip(*g)][1])]if len(z)>1])
Here's another version which was inspired by Amit Tripathi's answer. This one doesn't use any imports because it does the grouping manually. prev contains the codepoint number of the previous character. We initialize prev to -2 so that the first time the if i != prev + 1 test is performed it's guaranteed to be true because the smallest possible value of ord(ch) is zero, so a new empty list will be added to groups.
s = 'jaghiuuabcxyzq'
prev, groups = -2, []
for ch in s:
i = ord(ch)
if i != prev + 1:
groups.append([])
groups[-1].append(ch)
prev = i
print(groups)
a = [''.join(u) for u in groups if len(u) > 1]
print(a)
output
[['j'], ['a'], ['g', 'h', 'i'], ['u'], ['u'], ['a', 'b', 'c'], ['x', 'y', 'z'], ['q']]
['ghi', 'abc', 'xyz']

This can be done easily with pure Python
Python 3(should work with Python 2 also) implementation. A simple 8 liner
s = 'jaghiuuabc'
prev, counter, dct = None, 0, dict()
for i in s:
if prev is not None:
if not chr(ord(prev) + 1) == i:
counter += 1
prev = i
dct.setdefault(counter, []).append(prev)
[''.join(dct[d]) for d in dct if len(dct[d]) > 1]
Out[51]: ['ghi', 'abc']
ord converts char to equivalent ASCII number
chr converts a number to equivalent ASCII char
setdefault set default value as list if a key doesn't exists

What about some recursion without any external module ?
a='jaghiuuabc'
import string
alpha = list(string.ascii_lowercase)
def trech(string_1,chr_list,new_string):
final_list=[]
if not string_1:
return 0
else:
for chunk in range(0,len(string_1),chr_list):
for sub_chunk in range(2,len(string_1)+1):
if string_1[chunk:chunk + sub_chunk] in ["".join(alpha[i:i + sub_chunk]) for i in range(0, len(alpha), 1)]:
final_list.append(string_1[chunk:chunk + sub_chunk])
if final_list:
print(final_list)
return trech(string_1[1:],chr_list-1,new_string)
print(trech(a,len(a),alpha))
output:
['gh', 'ghi']
['hi']
['ab', 'abc']
['bc']
0

Related

Explain [:0] in Python

Ok, so before I get flamed for not RTFM, I understand that [:0] in my case of:
s ="itsastring"
newS= []
newS[:0] = s
ends up converting s to a list through slicing. This is my end goal, but coming from a Java background, I don't fully understand the "0" part in "[:0] and syntactically why it's placed there (I know it roughly means increase by 0). Finally, how does Python know that I want to have each char of s be an element based on this syntax? I want to understand it so I can remember it more clearly.
If S and T are sequences, S[a:b] = T will replace the subsequence from index a to b-1 of S by the elements of T.
If a == b, it will act as a simple insertion.
And S[:0] is the same thing as S[0:0] : so it's a simple insertion at the front.
s = [11,22,33,44,55,66,77]
s[3:3] = [1,2,3] # insertion at position 3
print( s )
s = [11,22,33,44,55,66,77]
s[3:4] = [1,2,3] # deletion of element at position 3, and then insertion
print( s )
s = [11,22,33,44,55,66,77]
s[3:6] = [1,2,3] # deletion of elements from position 3 to 5, and then insertion
print( s )
s = [11,22,33,44,55,66,77]
s[:] = [1,2,3] # deletion of all elements, and then insertion : whole replacement
print( s )
output:
[11, 22, 33, 1, 2, 3, 44, 55, 66, 77]
[11, 22, 33, 1, 2, 3, 55, 66, 77]
[11, 22, 33, 1, 2, 3, 77]
[1, 2, 3]
Hope it helps:
s ="itsastring"
#if you add any variable in left side then python will start slicing from there
#and slice to one less than last index assigned on right side
newS = s[:]
So the [:0] means slice the string from the beggining to the 0 (or again the first) element in the s string and the result is nothing. By default the slicing is done from the first element to the last or print(s[0:]) is the same as print(s).
I suggest you try a loop through the whole string like this:
s = [x for x in "itsastring"]
print(s)
# result
['i', 't', 's', 'a', 's', 't', 'r', 'i', 'n', 'g']

What is the fastest way to convert a dictionary frequency to list in Python?

I have dictionary frequency as follows:
freq = {'a': 1, 'b': 2, 'c': 3}
It simply means that I have one a's, twob's, and three c's.
I would like to convert it into a complete list:
lst = ['a', 'b', 'b', 'c', 'c', 'c']
What is the fastest way (time-efficient) or most compact way (space-efficient) to do so?
Yes, but only if the items are (or can be represented as) integers, and if the number of items between the smallest and largest item is sufficiently close to the difference between the two, in which case you can use bucket sort, resulting in O(n) time complexity, where n is the difference between the smallest and the largest item. This would be more efficient than using other sorting algorithms, with an average time complexity of O(n log n).
In the case of List = [1, 4, 5, 2, 6, 7, 9, 3] as it is in your question, it is indeed more efficient to use bucket sort when it is known that 1 is the smallest item and 9 is the largest item, since only 8 is missing between the range. The following example uses collections.Counter to account for the possibility that there can be duplicates in the input list:
from collections import Counter
counts = Counter(List)
print(list(Counter({i: counts[i] for i in range(1, 10)}).elements()))
This outputs:
[1, 2, 3, 4, 5, 6, 7, 9]
Let's break this into two O(N) passes: one to catalog the numbers, and one to create the sorted list. I updated the variable names; List is an especially bad choice, given the built-in type list. I also added 10 to each value, so you can see how the low-end offset works.
coll = [11, 14, 15, 12, 16, 17, 19, 13]
last = 19
first = 11
offset = first
size = last-first+1
# Recognize all values in a dense "array"
need = [False] * size
for item in coll:
need[item - offset] = True
# Iterate again in numerical order; for each True value, add that item to the new list
sorted_list = [idx + offset for idx, needed_flag in enumerate(need) if needed_flag]
print(sorted_list)
OUTPUT:
[11, 12, 13, 14, 15, 16, 17, 19]
The most compact way I usually use is list comprehension -
lst = ['a', 'b', 'b', 'c', 'c', 'c']
freq = {i: 0 for i in lst}
for i in lst: freq[i] += 1
Space complexity - O(n)
Time complexity - O(n)

Count all sequences in a list

My self-learning task is to find how many sequences are on the list. A sequence is a group of numbers, where each is one 1 bigger than the previous one. So, in the list:
[1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
there are 3 sequences:
1,2,3
12,13,14,15
23,24,25,26
I've spent few hours and got a solution, which I think is a workaround rather than the real solution.
My solution is to have a separate list for adding sequences and count the attempts to update this list. I count the very first appending, and every new appending except for the sequence, which already exists.
I believe there is a solution without additional list, which allows to count the sequences itself rather than the list manipulation attempts.
numbers = [1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
goods = []
count = 0
for i in range(len(numbers)-1):
if numbers[i] + 1 == numbers[i+1]:
if goods == []:
goods.append(numbers[i])
count = count + 1
elif numbers[i] != goods[-1]:
goods.append(numbers[i])
count = count + 1
if numbers[i+1] != goods[-1]:
goods.append(numbers[i+1])
The output from my debugging:
Number 1 added to: [1]
First count change: 1
Number 12 added to: [1, 2, 3, 12]
Normal count change: 2
Number 23 added to: [1, 2, 3, 12, 13, 14, 15, 23]
Normal count change: 3
Thanks everyone for your help!
Legman suggested the original solution I failed to implemented before I end up with another solution in this post.
MSeifert helped to find a the right way with the lists:
numbers = [1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
print("Numbers:", numbers)
goods = []
count = 0
for i in range(len(numbers)-1):
if numbers[i] + 1 == numbers[i+1]:
if goods == []:
goods.append([numbers[i]])
count = count + 1
elif numbers[i] != goods[-1][-1]:
goods.append([numbers[i]])
count = count + 1
if numbers[i+1] != goods[-1]:
goods[-1].extend([numbers[i+1]])
print("Sequences:", goods)
print("Number of sequences:", len(goods))
One way would be to iterate over pairwise elements:
l = [1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
res = [[]]
for item1, item2 in zip(l, l[1:]): # pairwise iteration
if item2 - item1 == 1:
# The difference is 1, if we're at the beginning of a sequence add both
# to the result, otherwise just the second one (the first one is already
# included because of the previous iteration).
if not res[-1]: # index -1 means "last element".
res[-1].extend((item1, item2))
else:
res[-1].append(item2)
elif res[-1]:
# The difference isn't 1 so add a new empty list in case it just ended a sequence.
res.append([])
# In case "l" doesn't end with a "sequence" one needs to remove the trailing empty list.
if not res[-1]:
del res[-1]
>>> res
[[1, 2, 3], [12, 13, 14, 15], [23, 24, 25, 26]]
>>> len(res) # the amount of these sequences
3
A solution without zip only requires small changes (the loop and the the beginning of the loop) compared to the approach above:
l = [1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
res = [[]]
for idx in range(1, len(l)):
item1 = l[idx-1]
item2 = l[idx]
if item2 - item1 == 1:
if not res[-1]:
res[-1].extend((item1, item2))
else:
res[-1].append(item2)
elif res[-1]:
res.append([])
if not res[-1]:
del res[-1]
Taken from python itertools documentation, as demonstrated here you can use itemgetter and groupby to do that using only one list, like so:
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> l = [1, 2, 3, 5, 8, 10, 12, 13, 14, 15, 17, 19, 21, 23, 24, 25, 26]
>>>
>>> counter = 0
>>> for k, g in groupby(enumerate(l), lambda (i,x):i-x):
... seq = map(itemgetter(1), g)
... if len(seq)>1:
... print seq
... counter+=1
...
[1, 2, 3]
[12, 13, 14, 15]
[23, 24, 25, 26]
>>> counter
3
Notice: As correctly mentioned by #MSeifert, tuple unpacking in the signature is only possible in Python 2 and it will fail on Python 3 - so this is a python 2.x solution.
This could be solved with dynamic programming. If you only want to know the number of sequences and don't actually need to know what the sequences are you should be able to do this with only a couple of variables. Realistically, as you're going through the list you only really need to know if you are currently in a sequence, if not if the next one is incremented by 1 making this the beginning of a sequence and if so is the next one greater than 1 making it the exit of a sequence. After that, you just need to make sure to end the loop one cell before the end of the list since the last cell cant form a sequence by itself and so that it doesn't cause an error when you're performing a check. Below is example code
isSeq=false
for i in range(len(numbers)-1):
if isSeq==false:
if numbers[i]+1==numbers[i+1]:
isSeq=true
count=count+1
elif
if numbers[i]+1!=numbers[i+1]:
isSeq=false
Here is a link to a dynamic programming tutorial.
https://www.codechef.com/wiki/tutorial-dynamic-programming

How to apply a dict in python to a string as opposed to a single letter

I am trying to output the alphabetical values of a user entered string, I have created a dict and this process works, but only with one letter.
If I try entering more than one letter, it returns a KeyError: (string I entered)
If I try creating a list of the string so it becomes ['e', 'x', 'a', 'm', 'p', 'l', 'e'] and I get a TypeError: unhashable type: 'list'
I cannot use the chr and ord functions (I know how to but they aren't applicable in this situation) and I have tried using the map function once I've turned it to a list but only got strange results.
I've also tried turning the list into a tuple but that produces the same error.
Here is my code:
import string
step = 1
values = dict()
for index, letter in enumerate(string.ascii_lowercase):
values[letter] = index + 1
keyw=input("Enter your keyword for encryption")
keylist=list(keyw)
print(values[keylist])
Alt version without the list:
import string
step=1
values=dict()
for index, letter in enumerate(string.ascii_lowercase):
values[letter] = index + 1
keyw=input("Enter your keyword for encryption")
print(values[keyw])
You need to loop through all the letters and map each one individually:
mapped = [values[letter] for letter in keyw]
print(mapped)
This uses a list comprehension to build the list of integers:
>>> [values[letter] for letter in 'example']
[5, 24, 1, 13, 16, 12, 5]
The map() function would do the same thing, essentially, but returns an iterator; you need to loop over that object to see the results:
>>> for result in map(values.get, 'example'):
... print(result)
5
24
1
13
16
12
5
Note that you can build your values dictionary in one line; enumerate() takes a second argument, the start value (which defaults to 0); using a dict comprehension to reverse the value-key tuple would give you:
values = {letter: index for index, letter in enumerate(string.ascii_lowercase, 1)}
You most certanly can use ord()
inp = input('enter stuff:')
# a list of the ord() value of alphabetic character
# made uppercase and subtracted 64 --> position in the alphabet
alpha_value = [ord(n.upper())-64 for n in inp if n.isalpha()]
print(alpha_value)
Test:
import string
print([ord(n.upper())-64 for n in string.ascii_lowercase if n.isalpha()])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
You can write simple for loop to map alphabet to integer.
try to this.
print[(item, (values[item]))for item in keylist]

How to change letters in python with a different way

I want to learn python and i thought changing letters without any module or library i tried something like this but it doesn't work:
d=list('banana')
a=list('abcdefghijklmnopqrstuvwxyz')
for i in range:
d[i]=a[i+2]
print d
I got this error:
TypeError: 'builtin_function_or_method' object is not iterable
I would be appreciated if you help me.
You forgot to specify parameters for range function:
d=list('banana')
a=list('a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z')
for i in range(len(d)):
d[i]=a[i+2]
print d
From python documentation:
range(start, stop[, step]) This is a versatile function to create
lists containing arithmetic progressions. It is most often used in for
loops. The arguments must be plain integers. If the step argument is
omitted, it defaults to 1. If the start argument is omitted, it
defaults to 0. The full form returns a list of plain integers [start,
start + step, start + 2 * step, ...]. If step is positive, the last
element is the largest start + i * step less than stop; if step is
negative, the last element is the smallest start + i * step greater
than stop. step must not be zero (or else ValueError is raised).
Example:
>>>
>>> range(10) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> range(1, 11) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> range(0, 30, 5)
Edit per request:
d = list('banana')
a = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
mappings = dict((ch, a[idx+2]) for idx, ch in enumerate(set(d)))
for idx in range(len(d)):
d[idx] = mappings[d[idx]]
#OR:
d = [mappings[d[idx]] for idx in range(len(d))]
print d
In [63]: d=list('aabbcc')
In [64]: a='a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z'.split(",")
In [65]: for i,x in enumerate(d):
d[i]=a[(a.index(x)+3)%26]
In [66]: d
Out[66]: ['d', 'd', 'e', 'e', 'f', 'f']
string.translate is ideal for this ... Im not sure if that counts as a library ...
>>> import string
>>> tab = string.maketrans("abcdefghijklmnopqrstuvwxyz","mnopqrstuvwxyzabcdefghi
jkl")
>>> print "hello".translate(tab)
tqxxa
alternativly
>>> print "".join([chr(ord(c)+13) if ord(c) + 13 < ord('z') else chr(ord('a')+(ord(c)+13)%ord('z')) for c in "hello"])
'uryyc'

Categories