Persistent index in python string - python

I'm trying to get string.index() to ignore instances of a character that it has already located within a string. Here is my best attempt:
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def save_alphabet(phrase):
saved_alphabet = ""
for item in phrase:
if item in alphabet:
saved_alphabet = saved_alphabet + str(phrase.index(item))
return saved_alphabet
print save_alphabet("aAaEaaUA")
The output I'd like is "1367" but, as it only finds the first instance of item it is outputting "1361".
What's the best way to do this? The returned value should be in string format.

>>> from string import ascii_uppercase as alphabet
>>> "".join([str(i) for i, c in enumerate("aAaEaaUA") if c in alphabet])
'1367'
regex solution (do not prefer regex in this case)
>>> import re
>>> "".join([str(m.start()) for m in re.finditer(r'[A-Z]', "aAaEaaUA")])
'1367'

Related

How to join '\' and string to get unicode char

I want to get a dictionary with chars from that range from Unicode, but unable to join '\' and 'u0061' for example
for i in range(97, 123):
dict[('\\u00' + (hex(i)[2:]))] = ''
'\u00' + '61' #does not work because after '\u' required 4 symbols
r'\u00' + '61' #returns '\\u0061' instead of 'a'
'\\u0061'[1:] # slices both "\\"
To get the corresponding character from an int, use the built-in chr function:
>>> chr(101)
'e'
>>> chr(0x1F600)
'😀'
However you seem to want to iterate over all lowercase letters. The constant ascii_lowercase from the string module is better suited for this purpose.
import string
dct = {} # don't use 'dict' as a variable name, it shadows the dict constructor
for c in string.ascii_lowercase:
dct[c] = ''

Python function with two strings - sub-anagram

I'm wanting to define a function with two strings that takes two arguments. I'm wanting this to then return true if the first string is a 'sub-anagram' of the second string. The function should only return true if every letter that's in the first string appears at least as many times in the second string.
eg. key is a 'sub-anagram' of keyboard but mouse isn't.
Here's my code so far:
# -*- coding: utf-8 -*-
def anagram(str1,str2):
# string to list
str1 = list(str1.lower())
str2 = list(str2.lower())
#sort list
str1.sort()
str2.sort()
#join list back to string
str1 = ''.join(str1)
str2 = ''.join(str2)
return str1 == str2
print(anagram('trainers', 'strainer'))
So far it will return true if both strings are exact anagrams and I am not sure how to change it.
Thankyou
As #achampion mentioned, Counter is the best way to go about this. To check if string a has all the characters to make string b:
from collections import Counter
def contains_anagram(a, b):
a = Counter(a)
b = Counter(b)
return all(b[letter] <= a[letter] for letter in b)

How do I reverse words in a string with Python

I am trying to reverse words of a string, but having difficulty, any assistance will be appreciated:
S = " what is my name"
def reversStr(S):
for x in range(len(S)):
return S[::-1]
break
What I get now is: eman ym si tahw
However, I am trying to get: tahw is ym eman (individual words reversed)
def reverseStr(s):
return ' '.join([x[::-1] for x in s.split(' ')])
orig = "what is my name"
reverse = ""
for word in orig.split():
reverse = "{} {}".format(reverse, word[::-1])
print(reverse)
Since everyone else's covered the case where the punctuation moves, I'll cover the one where you don't want the punctuation to move.
import re
def reverse_words(sentence):
return re.sub(r'[a-zA-Z]+', lambda x : x.group()[::-1], sentence)
Breaking this down.
re is python's regex module, and re.sub is the function in that module that handles substitutions. It has three required parameters.
The first is the regex you're matching by. In this case, I'm using r'\w+'. The r denotes a raw string, [a-zA-Z] matches all letters, and + means "at least one".
The second is either a string to substitute in, or a function that takes in a re.MatchObject and outputs a string. I'm using a lambda (or nameless) function that simply outputs the matched string, reversed.
The third is the string you want to do a find in a replace in.
So "What is my name?" -> "tahW si ym eman?"
Addendum:
I considered a regex of r'\w+' initially, because better unicode support (if the right flags are given), but \w also includes numbers and underscores. Matching - might also be desired behavior: the regexes would be r'[a-zA-Z-]+' (note trailing hyphen) and r'[\w-]+' but then you'd probably want to not match double-dashes (ie --) so more regex modifications might be needed.
The built-in reversed outputs a reversed object, which you have to cast back to string, so I generally prefer the [::-1] option.
inplace refers to modifying the object without creating a copy. Yes, like many of us has already pointed out that python strings are immutable. So technically we cannot reverse a python string datatype object inplace. However, if you use a mutable datatype, say bytearray for storing the string characters, you can actually reverse it inplace
#slicing creates copy; implies not-inplace reversing
def rev(x):
return x[-1::-1]
# inplace reversing, if input is bytearray datatype
def rev_inplace(x: bytearray):
i = 0; j = len(x)-1
while i<j:
t = x[i]
x[i] = x[j]
x[j] = t
i += 1; j -= 1
return x
Input:
x = bytearray(b'some string to reverse')
rev_inplace(x)
Output:
bytearray(b'esrever ot gnirts emose')
Try splitting each word in the string into a list (see: https://docs.python.org/2/library/stdtypes.html#str.split).
Example:
>>string = "This will be split up"
>>string_list = string.split(" ")
>>string_list
>>['This', 'will', 'be', 'split', 'up']
Then iterate through the list and reverse each constituent list item (i.e. word) which you have working already.
def reverse_in_place(phrase):
res = []
phrase = phrase.split(" ")
for word in phrase:
word = word[::-1]
res.append(word)
res = " ".join(res)
return res
[thread has been closed, but IMO, not well answered]
the python string.lib doesn't include an in place str.reverse() method.
So use the built in reversed() function call to accomplish the same thing.
>>> S = " what is my name"
>>> ("").join(reversed(S))
'eman ym si tahw'
There is no obvious way of reversing a string "truly" in-place with Python. However, you can do something like:
def reverse_string_inplace(string):
w = len(string)-1
p = w
while True:
q = string[p]
string = ' ' + string + q
w -= 1
if w < 0:
break
return string[(p+1)*2:]
Hope this makes sense.
In Python, strings are immutable. This means you cannot change the string once you have created it. So in-place reverse is not possible.
There are many ways to reverse the string in python, but memory allocation is required for that reversed string.
print(' '.join(word[::-1] for word in string))
s1 = input("Enter a string with multiple words:")
print(f'Original:{s1}')
print(f'Reverse is:{s1[::-1]}')
each_word_new_list = []
s1_split = s1.split()
for i in range(0,len(s1_split)):
each_word_new_list.append(s1_split[i][::-1])
print(f'New Reverse as List:{each_word_new_list}')
each_word_new_string=' '.join(each_word_new_list)
print(f'New Reverse as String:{each_word_new_string}')
If the sentence contains multiple spaces then usage of split() function will cause trouble because you won't know then how many spaces you need to rejoin after you reverse each word in the sentence. Below snippet might help:
# Sentence having multiple spaces
given_str = "I know this country runs by mafia "
tmp = ""
tmp_list = []
for i in given_str:
if i != ' ':
tmp = tmp + i
else:
if tmp == "":
tmp_list.append(i)
else:
tmp_list.append(tmp)
tmp_list.append(i)
tmp = ""
print(tmp_list)
rev_list = []
for x in tmp_list:
rev = x[::-1]
rev_list.append(rev)
print(rev_list)
print(''.join(rev_list))
output:
def rev(a):
if a == "":
return ""
else:
z = rev(a[1:]) + a[0]
return z
Reverse string --> gnirts esreveR
def rev(k):
y = rev(k).split()
for i in range(len(y)-1,-1,-1):
print y[i],
-->esreveR gnirts

Most Common letter in a string

Completing an exercise to find the most common letter in a string, excluding punctuation and the result should be in lowercase. So in the example "HHHHello World!!!!!!!!!!" the result should be "h".
What I have so far is:
text=input('Insert String: ')
def mwl(text):
import string
import collections
for p in text:
p.lower()
for l in string.punctuation:
for x in text:
if x==l:
text.replace(x,'')
collist=collections.Counter(text).most_common(1)
print(collist[0][0])
mwl(text)
I would appreciate your help to understand why:
The case is not remaining changed to lower in text
The punctuation is not being permanently removed from the text string
There are several issues:
Strings are immutable. This means that functions like lower() and replace() return the results and leave the original string as is. You need to assign that return value somewhere.
lower() can operate on the entire string: text = text.lower().
For some ideas on how to remove punctuation characters from a string, see Best way to strip punctuation from a string in Python
you can try this:
>>> import re
>>> from collections import Counter
>>> my_string = "HHHHello World!!!!!!!!!!"
>>> Counter("".join(re.findall("[a-z]+",my_string.lower()))).most_common(1)
[('h', 4)]
text = input('Insert String: ')
from string import punctuation
from collections import Counter
def mwl(text):
st = set(punctuation)
# remove all punctuation and make every letter lowercase
filtered = (ch.lower() for ch in text if ch not in st)
# make counter dict from remaining letters and return the most common
return Counter(filtered).most_common()[0][0]
Or use str.translate to remove the punctuation :
from string import punctuation
from collections import Counter
def mwl(text):
text = text.lower().translate(str.maketrans(" "*len(punctuation),punctuation))
return Counter(text).most_common()[0][0]
Using your own code you need to reassign text to the updated string:
def mwl(text):
import string
import collections
text = text.lower()
for l in string.punctuation:
for x in text:
if x == l:
text = text.replace(x,'')
collist=collections.Counter(text).most_common(1)
print(collist[0][0])
Also instead of looping over the text in your code you could just use in:
for l in string.punctuation:
if l in text:
text = text.replace(l,'')
First big issues is you never actually assign anything.
p.lower()
just returns a lowercase version of p. It does not set p to the lowercase version. Should be
p = p.lower()
Same with the text.replace(x,''). It should be
text = text.replace(x,'')
You could do:
>>> from collections import Counter
>>> from string import ascii_letters
>>> tgt="HHHHello World!!!!!!!!!!"
>>> Counter(c.lower() for c in tgt if c in ascii_letters).most_common(1)
[('h', 4)]
If input is ascii-only then you could use bytes.translate() to convert it to lowercase and remove punctuation:
#!/usr/bin/env python3
from string import ascii_uppercase, ascii_lowercase, punctuation
table = b''.maketrans(ascii_uppercase.encode(), ascii_lowercase.encode())
def normalize_ascii(text, todelete=punctuation.encode()):
return text.encode('ascii', 'strict').translate(table, todelete)
s = "HHHHello World!!!!!!!!!!"
count = [0]*256 # number of all possible bytes
for b in normalize_ascii(s): count[b] += 1 # count bytes
# print the most common byte
print(chr(max(range(len(count)), key=count.__getitem__)))
If you want to count letters in a non-ascii Unicode text then you could use .casefold() method (proper caseless comparison) and remove_punctuation() function:
#!/usr/bin/env python3
from collections import Counter
import regex # $ pip install regex
def remove_punctuation(text):
return regex.sub(r"\p{P}+", "", text)
s = "HHHHello World!!!!!!!!!!"
no_punct = remove_punctuation(s)
characters = (c.casefold() for c in regex.findall(r'\X', no_punct))
print(Counter(characters).most_common(1)[0][0])
r'\X' regex is used to count user-perceived characters instead of mere Unicode codepoints.

python string manipulation

I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?
Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK
You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'
Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'
Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)
You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?
You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E
this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()

Categories