How to use the strip() function efficiently - python

Can you please tell me why the strip() function does not work?
str1= 'aaaadfffdswefoijeowji'
def char_freq():
for x in range (0, len(str1)):
sub = str1[x]
print 'the letter',str1[x],'appearence in the sentence=', str1.count(sub, 0,len(str1))
str1.strip(str1[x])
def main():
char_freq()
main()

.strip() is working just fine, but strings are immutable. str.strip() returns the new stripped string:
>>> str1 = 'foofoof'
>>> str1.strip('f')
'oofoo'
>>> str1
'foofoof'
You are ignoring the return value. If you do store the altered string, however, your for loop will run into an IndexError, as the string will be shorter the next iteration:
>>> for x in range (0, len(str1)):
... str1 = str1.strip(str1[x])
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
IndexError: string index out of range
To count strings, don't str.strip(); that just removes characters from the start and end of a string, not in the middle. You could use str.replace(character, '') but that would be inefficient too; but combined with a while loop to avoid the IndexError problem that'd look like:
while str1:
c = str1[0]
print 'the letter {} appearence in the sentence={}'.format(c, str1.count(c))
str1 = str1.replace(c, '')
Much easier would be to just use a collections.Counter() object:
from collections import Counter
freq = Counter(str1)
for character, count in freq.most_common():
print '{} appears {} times'.format(character, count)
Without a dedicated Counter object, you could use a dictionary to count characters instead:
freq = {}
for c in str1:
if c not in freq:
freq[c] = 0
freq[c] += 1
for character, count in freq.items():
print '{} appears {} times'.format(character, count)
where freq then holds character counts after the loop.

Related

How do I find the predominant letters in a list of strings

I want to check for each position in the string what is the character that appears most often on that position. If there are more of the same frequency, keep the first one. All strings in the list are guaranteed to be of identical length!!!
I tried the following way:
print(max(((letter, strings.count(letter)) for letter in strings), key=lambda x:[1])[0])
But I get: mistul or qagic
And I can not figure out what's wrong with my code.
My list of strings looks like this:
Input: strings = ['mistul', 'aidteh', 'mhfjtr', 'zxcjer']
Output: mister
Explanation: On the first position, m appears twice. Second, i appears twice twice. Third, there is no predominant character, so we chose the first, that is, s. On the fourth position, we have t twice and j twice, but you see first t, so we stay with him, on the fifth position we have e twice and the last r twice.
Another examples:
Input: ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
Output: magic
Input: ['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv', 'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq', 'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq', 'djtfzr', 'uenleo']
Expected Output: secret
Some help?
Finally a use case for zip() :-)
If you like cryptic code, it could even be done in one statement:
def solve(strings):
return ''.join([max([(letter, letters.count(letter)) for letter in letters], key=lambda x: x[1])[0] for letters in zip(*strings)])
But I prefer a more readable version:
def solve(strings):
result = ''
# "zip" the strings, so in the first iteration `letters` would be a list
# containing the first letter of each word, the second iteration it would
# be a list of all second letters of each word, and so on...
for letters in zip(*strings):
# Create a list of (letter, count) pairs:
letter_counts = [(letter, letters.count(letter)) for letter in letters]
# Get the first letter with the highest count, and append it to result:
result += max(letter_counts, key=lambda x: x[1])[0]
return result
# Test function with input data from question:
assert solve(['mistul', 'aidteh', 'mhfjtr', 'zxcjer']) == 'mister'
assert solve(['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn',
'mzsev', 'saqbl', 'myead']) == 'magic'
assert solve(['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv',
'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq',
'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq',
'djtfzr', 'uenleo']) == 'secret'
UPDATE
#dun suggested a smarter way of using the max() function, which makes the one-liner actually quite readable :-)
def solve(strings):
return ''.join([max(letters, key=letters.count) for letters in zip(*strings)])
Using collections.Counter() is a nice strategy here. Here's one way to do it:
from collections import Counter
def most_freq_at_index(strings, idx):
chars = [s[idx] for s in strings]
char_counts = Counter(chars)
return char_counts.most_common(n=1)[0][0]
strings = ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih',
'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
result = ''.join(most_freq_at_index(strings, idx) for idx in range(5))
print(result)
## 'magic'
If you want something more manual without the magic of Python libraries you can do something like this:
def f(strings):
dic = {}
for string in strings:
for i in range(len(string)):
word_dic = dic.get(i, { string[i]: 0 })
word_dic[string[i]] = word_dic.get(string[i], 0) + 1
dic[i] = word_dic
largest_string = max(strings, key = len)
result = ""
for i in range(len(largest_string)):
result += max(dic[i], key = lambda x : dic[i][x])
return result
strings = ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
f(strings)
'magic'

Python isspace function

I'm having difficulty with the isspace function. Any idea why my code is wrong and how to fix it?
Here is the problem:
Implement the get_num_of_non_WS_characters() function. get_num_of_non_WS_characters() has a string parameter and returns the number of characters in the string, excluding all whitespace.
Here is my code:
def get_num_of_non_WS_characters(s):
count = 0
for char in s:
if char.isspace():
count = count + 1
return count
You want non whitespace, so you should use not
def get_num_of_non_WS_characters(s):
count = 0
for char in s:
if not char.isspace():
count += 1
return count
>>> get_num_of_non_WS_characters('hello')
5
>>> get_num_of_non_WS_characters('hello ')
5
For completeness, this could be done more succinctly using a generator expression
def get_num_of_non_WS_characters(s):
return sum(1 for char in s if not char.isspace())
A shorter version of #CoryKramer answer:
def get_num_of_non_WS_characters(s):
return sum(not c.isspace() for c in s)
As an alternative you could also simple do:
def get_num_of_non_WS_characters(s):
return len(''.join(s.split()))
Then
s = 'i am a string'
get_num_of_non_WS_characters(s)
will return 10
This will also remove tabs and new line characters:
s = 'i am a string\nwith line break'
''.join(s.split())
will give
'iamastringwithlinebreak'
I would just use n=s.replace(" " , "") and then len(n).
Otherwise I think you should increase the count after the if statement and put a continue inside it.

Python replacing string given a word

Hi does anyone know how to make a function that replaces every alphabetic character in a string with a character from a given word (repeated indefinitely). If a character is not alphabetic it should stay where it is. Also this has to be done without importing anything.
def replace_string(string,word)
'''
>>>replace_string('my name is','abc')
'ab cabc ab'
So far i come up with:
def replace_string(string,word):
new=''
for i in string:
if i.isalpha():
new=new+word
else: new=new+i
print(new)
but, this function just prints 'abcabc abcabcabcabc abcabc' instead of 'ab cabc ab'
Change as follows:
def replace(string, word):
new, pos = '', 0
for c in string:
if c.isalpha():
new += word[pos%len(word)] # rotate through replacement string
pos += 1 # increment position in current word
else:
new += c
pos = 0 # reset position in current word
return new
>>> replace('my name is greg', 'hi')
'hi hihi hi hihi'
If you can't use the itertools module, first create a generator function that will cycle through your replacement word indefinitely:
def cycle(string):
while True:
for c in string:
yield c
Then, adjust your existing function just a little bit:
def replace_string(string,word):
new=''
repl = cycle(word)
for i in string:
if i.isalpha():
new = new + next(repl)
else:
new = new+i
return new
Output:
>>> replace_string("Hello, I'm Greg, are you ok?", "hi")
"hihih, i'h ihih, ihi hih ih?"
Another way to write this (but I think the first version is more readable and therefore better):
def replace_string(string,word):
return ''.join(next(cycle(word)) if c.isalpha() else c for c in string)

I'm trying to replace a character in Python while iterating over a string and but it doesn't work

This is the code I currently have:
letter = raw_input("Replace letter?")
traversed = raw_input("Traverse in?")
replacewith = raw_input("Replace with?")
traverseint = 0
for i in traversed:
traverseint = traverseint + 1
if i == letter:
traversed[traverseint] = replacewith
print i
print(traversed)
str in python are immutable by nature. That means, you can not modify the existing object. For example:
>>> 'HEllo'[3] = 'o'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
In order to replace the character in the string, ideal way is to use str.replace() method. For example:
>>> 'HEllo'.replace('l', 'o')
'HEooo'
Without using str.replace(), you may make your program run by using a temporary string as:
my_str = '' # Temporary string
for i in traversed:
# traverseint = traverseint + 1 # Not required
if i == letter:
i = replacewith
my_str += i
Here my_str will hold the value of transformed traversed. OR, even better way to do this is by transforming the string to list (as mentioned by #chepner), update the values of list and finally join the list to get back the string. For example:
traversed_list = list(traversed)
for i, val in enumerate(traversed_list):
if val == letter:
traversed_list[i] = replacewith
print i
my_str = ''.join(traversed_list)
I can not comment yet, but want add a bit to Moinuddin Quadri answer.
If index of replacement is not required, str.replace() should be a best solution.
If replacement index is required, just use str.index() or str.find() for determine an replacement index, then use slice (see table) to "cut" ends and sum replacement between begin and end, or just call str.replace().
while True:
index = traversed.find(letter)
if index < 0:
break
print index
traversed = traversed[:index] + replacewith + traversed[index + len(letter):]
#or
traversed = traversed.replace(letter, replacewith, 1)
Str is immutable, so direct slice assignment is not possible.
If you want directly modify a string, you should use a mutable type, like bytearray.
To check if string contains a substring you can use in
letter in traversed
"System" does not allow me to post more than 2 links. But all methods I have mentioned are on the same page.
You shouldn't modify containers you are iterating over. And you cant edit strings by position.
Make a copy of the string first and make it a list object
letter = raw_input("Replace letter?")
traversed = raw_input("Traverse in?")
modify = list(traversed)
replacewith = raw_input("Replace with?")
for traverseint,i in enumerate(modify):
if i == letter:
modify[traverseint] = replacewith
print i
print(''.join(modify))
You can also just create empty string and add letters (python 3.5)
letter = input("Replace letter?")
traversed = input("Traverse in?")
replacewith = input("Replace with?")
temp = ''
for i in traversed:
if i == letter:
temp += replacewith
else:
temp += i
print(temp)
We can also define own replace like below:
def replace(str, idx, char):
if -1 < idx < len(str):
return '{str_before_idx}{char}{str_after_idx}'.format(
str_before_idx=str[0:idx],
char=char,
str_after_idx=str[idx+1:len(str)]
)
else:
raise IndexError
Where str is string to be manipulated, idx is an index, char is character to be replaced at index idx.

How to get the position of a character in Python?

How can I get the position of a character inside a string in Python?
There are two string methods for this, find() and index(). The difference between the two is what happens when the search string isn't found. find() returns -1 and index() raises a ValueError.
Using find()
>>> myString = 'Position of a character'
>>> myString.find('s')
2
>>> myString.find('x')
-1
Using index()
>>> myString = 'Position of a character'
>>> myString.index('s')
2
>>> myString.index('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
From the Python manual
string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.
And:
string.index(s, sub[, start[, end]])
Like find() but raise ValueError when the substring is not found.
Just for a sake of completeness, if you need to find all positions of a character in a string, you can do the following:
s = 'shak#spea#e'
c = '#'
print([pos for pos, char in enumerate(s) if char == c])
which will print: [4, 9]
>>> s="mystring"
>>> s.index("r")
4
>>> s.find("r")
4
"Long winded" way
>>> for i,c in enumerate(s):
... if "r"==c: print i
...
4
to get substring,
>>> s="mystring"
>>> s[4:10]
'ring'
Just for completion, in the case I want to find the extension in a file name in order to check it, I need to find the last '.', in this case use rfind:
path = 'toto.titi.tata..xls'
path.find('.')
4
path.rfind('.')
15
in my case, I use the following, which works whatever the complete file name is:
filename_without_extension = complete_name[:complete_name.rfind('.')]
What happens when the string contains a duplicate character?
from my experience with index() I saw that for duplicate you get back the same index.
For example:
s = 'abccde'
for c in s:
print('%s, %d' % (c, s.index(c)))
would return:
a, 0
b, 1
c, 2
c, 2
d, 4
In that case you can do something like that:
for i, character in enumerate(my_string):
# i is the position of the character in the string
string.find(character)
string.index(character)
Perhaps you'd like to have a look at the documentation to find out what the difference between the two is.
A character might appear multiple times in a string. For example in a string sentence, position of e is 1, 4, 7 (because indexing usually starts from zero). but what I find is both of the functions find() and index() returns first position of a character. So, this can be solved doing this:
def charposition(string, char):
pos = [] #list to store positions for each 'char' in 'string'
for n in range(len(string)):
if string[n] == char:
pos.append(n)
return pos
s = "sentence"
print(charposition(s, 'e'))
#Output: [1, 4, 7]
If you want to find the first match.
Python has a in-built string method that does the work: index().
string.index(value, start, end)
Where:
Value: (Required) The value to search for.
start: (Optional) Where to start the search. Default is 0.
end: (Optional) Where to end the search. Default is to the end of the string.
def character_index():
string = "Hello World! This is an example sentence with no meaning."
match = "i"
return string.index(match)
print(character_index())
> 15
If you want to find all the matches.
Let's say you need all the indexes where the character match is and not just the first one.
The pythonic way would be to use enumerate().
def character_indexes():
string = "Hello World! This is an example sentence with no meaning."
match = "i"
indexes_of_match = []
for index, character in enumerate(string):
if character == match:
indexes_of_match.append(index)
return indexes_of_match
print(character_indexes())
# [15, 18, 42, 53]
Or even better with a list comprehension:
def character_indexes_comprehension():
string = "Hello World! This is an example sentence with no meaning."
match = "i"
return [index for index, character in enumerate(string) if character == match]
print(character_indexes_comprehension())
# [15, 18, 42, 53]
more_itertools.locate is a third-party tool that finds all indicies of items that satisfy a condition.
Here we find all index locations of the letter "i".
Given
import more_itertools as mit
text = "supercalifragilisticexpialidocious"
search = lambda x: x == "i"
Code
list(mit.locate(text, search))
# [8, 13, 15, 18, 23, 26, 30]
Most methods I found refer to finding the first substring in a string. To find all the substrings, you need to work around.
For example:
Define the string
vars = 'iloveyoutosimidaandilikeyou'
Define the substring
key = 'you'
Define a function that can find the location for all the substrings within the string
def find_all_loc(vars, key):
pos = []
start = 0
end = len(vars)
while True:
loc = vars.find(key, start, end)
if loc is -1:
break
else:
pos.append(loc)
start = loc + len(key)
return pos
pos = find_all_loc(vars, key)
print(pos)
[5, 24]
A solution with numpy for quick access to all indexes:
string_array = np.array(list(my_string))
char_indexes = np.where(string_array == 'C')

Categories