Detecting if string iterator is a blank space - python

I'm attempting to write a small block of code that detects the most frequently occurring character. However, I've become stuck on not being able to detect if a value is blank space.
Below is the code I have:
text = "Hello World!"
## User lower() because case does not matter
setList = list(set(textList.lower()))
for s in setList:
if s.isalpha() and s != " ":
## Do Something
else:
setList.remove(s)
The problem is that set list ends with the following values:
[' ', 'e', 'd', 'h', 'l', 'o', 'r', 'w']
I've tried multiple ways of detecting the blank space with no luck, including using strip() on the original string value. isspace() will not work because it looks for at least one character.

The problem is, you are removing items from a list while iterating it. Never do that. Consider this case
['!', ' ', 'e', 'd', 'h', 'l', 'o', 'r', 'w']
This is how the setList looks like, after converting to a set and list. In the first iteration, ! will be seen and that will be removed from the setList. Now that ! is removed, the next character becomes the current character, which is . For the next iteration, the iterator is incremented and it points to e (since space is the current character). That is why it is still there in the output. You can check this with this program
num_list = range(10)
for i in num_list:
print i,
if i % 2 == 1:
num_list.remove(i)
pass
Output
0 1 3 5 7 9
But if you comment num_list.remove(i), the output will become
0 1 2 3 4 5 6 7 8 9
To solve your actual problem, you can use collections.Counter to find the frequency of characters, like this
from collections import Counter
d = Counter(text.lower())
if " " in d: del d[" "] # Remove the count of space char
print d.most_common()
Output
[('l', 3), ('o', 2), ('!', 1), ('e', 1), ('d', 1), ('h', 1), ('r', 1), ('w', 1)]

A short way is to first remove the spaces from the text
>>> text = "Hello world!"
>>> text = text.translate(None, " ")
>>> max(text, key=text.count)
'l'
This isn't very efficient though, because count scans the entire string once for each character (O(n2))
For longer strings it's better to use Collections.Counter, or Collections.defaultdict to do the counting in a single pass

How about removing the blank spaces before you start with lists and sets:
text = "Hello world!"
text = re.sub(' ', '', text)
# text = "Helloworld!"

the above answers are legitimate. you could also use the built-in count operator if you are not concerned with algorithmic complexity. For example:
## User lower() because case does not matter
text=text.lower()
num=0
most_freq=None
for s in text:
cur=text.count(s)
if cur>num:
most_freq=s
num=cur
else:
pass

How about using split(): it will fail if its blank space:
>>> [ x for x in text if x.split()]
['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd', '!']
>>>
To count the duplicate:
>>> d = dict()
>>> for e in [ x for x in text if x.split()]:
... d[e] = d.get(e,0) + 1
...
>>> print d
{'!': 1, 'e': 1, 'd': 1, 'H': 1, 'l': 3, 'o': 2, 'r': 1, 'W': 1}
>>>

To get the single most frequent, use max:
text = "Hello World!"
count={}
for c in text.lower():
if c.isspace():
continue
count[c]=count.get(c, 0)+1
print count
# {'!': 1, 'e': 1, 'd': 1, 'h': 1, 'l': 3, 'o': 2, 'r': 1, 'w': 1}
print max(count, key=count.get)
# 'l'
If you want the whole shebang:
print sorted(count.items(), key=lambda t: (-t[1], t[0]))
# [('l', 3), ('o', 2), ('!', 1), ('d', 1), ('e', 1), ('h', 1), ('r', 1), ('w', 1)]
If you want to use Counter and use a generator type approach, you could do:
from collections import Counter
from string import ascii_lowercase
print Counter(c.lower() for c in text if c.lower() in ascii_lowercase)
# Counter({'l': 3, 'o': 2, 'e': 1, 'd': 1, 'h': 1, 'r': 1, 'w': 1})

Related

How to count each occurence in a nested list and print progress into console in python?

I'm writing a program that is supposed to read a .txt file containing coordinates expressed as N, S, W, E. For example, my test file is:
"NNEEENNW
NSWENNNS"
This function reads the data file and converts the data into a list of list splitting each character.
def read_file(file_name):
with open(file_name, "r") as f:
lines = [list(str(line.rstrip().upper())) for line in f]
return lines
So for my file above it gives me
[['N', 'N', 'E', 'E', 'E', 'N', 'N', 'W'], ['N', 'S', 'W', 'E', 'N', 'N', 'N','S']]
In a Cartesian map at (y, x) starting at (0, 0) each movement to N and E adds 1, and each movement to W and S subtracts 1.
I have this function that gives me the final coordinates:
def convert_coodinates(coordinates):
s = Counter({'N': 0, "S":0, "W": 0, 'E':0})
for i in coordinates:
s.update(i)
final_latitude = s['N'] - s["S"]
final_longitude = s['E'] - s["W"]
final = (final_longitude, final_latitude)
print(final)
Sor the example above it would give me:
(2, 6)
Ideally I would print
(2, 4)
(2, 6)
Basically it supposed to print the first set of coordinates, then first set + second set...and so on until the last line which would be the final destination.
Any ideas?
The problem is you are updating the Counter in the loop all at once and then calculating the latitude and the longitude.
This is causing s to only contain the value after accumulating all the list elements together.
Dumping s will confirm the same that it has the values Counter({'N': 8, 'E': 4, 'S': 2, 'W': 2}).
What you basically need is to calculate the values in each iteration:
from collections import Counter
def convert_coodinates(coordinates):
s = Counter({'N': 0, "S":0, "W": 0, 'E':0})
for idx, i in enumerate(coordinates):
s.update(i)
final_latitude = s['N'] - s["S"]
final_longitude = s['E'] - s["W"]
final = (final_longitude, final_latitude)
print(final)
coord = [['N', 'N', 'E', 'E', 'E', 'N', 'N', 'W'], ['N', 'S', 'W', 'E', 'N', 'N', 'N','S']]
convert_coodinates(coord)
Output:
(2, 4)
(2, 6)
Emmm, the simple method is that you put the calculation and print code into your for loop:
def convert_coodinates(coordinates):
s = Counter({'N': 0, "S":0, "W": 0, 'E':0})
longitude,latitude = 0,0
for i in coordinates:
s.update(i)
longitude = s['E'] - s["W"]
latitude = s['N'] - s["S"]
final = (longitude, latitude)
print(final)

Creating a word-translator application in Python 3. Any "Best practices"?

I am trying to write a code that translates English into the Chef's(from the muppet show) language
it should change these letters/sounds into the other language
Input Output
tion shun
an un
th z
v f
w v
c k
o oo
i ee
I have to write a function (def) that converts it so that it will print the new language
print(eng2chef('this is a chicken'))
should return
zees ees a kheekken
my code so far is:
help_dict = {
'tion': 'shun',
'an': 'un',
'th': 'z',
'v': 'f',
'w': 'v',
'c': 'k',
'o': 'oo',
'i': 'ee',
}
def eng2chef(s):
s.split()
for i in s:
if i in help_dict:
i = help_dict[i]
print(i)
eng2chef('this is a chicken')
but this only changes certain letters and then prints those letters
ee
ee
k
ee
k
Can someone please Help!!
You can do this by iteratively replace strings:
help_dict = [
('tion', 'shun'),
('an', 'un'),
('th', 'z'),
('v', 'f'),
('w', 'v'),
('c', 'k'),
('o', 'oo'),
('i', 'ee')
]
def eng2chef(s):
for x, y in help_dict:
s = s.replace(x,y)
return s
But note that I changed your dict into a list so that the replacement order is enforced. This is critical because you do not want to replace 'w' into 'v' then replace 'v' into 'f', only the other way round.
Your code splits each character making it to target only single letters and change them.
help_dict = {
'tion': 'shun',
'an': 'un',
'th': 'z',
'v': 'f',
'w': 'v',
'c': 'k',
'o': 'oo',
'i': 'ee',
}
def eng2chef(word):
new_word = word
for key in help_dict.keys():
new_word = word.replace(key, help[key])
print(new_word)
eng2chef('this is a chicken')
Output : zees ees a kheeken
This targets all the words in the passed in input that are found in the dictionary, replacing them with values given.
I hope this helps.

Getting ord() to read from a file

What my code does is it counts the amount of times a letter has appeared and counts it to the respected letter. So if A appears two times, it will show 2:A. My problem is that i want it to read from a file and when ord() tries to, it cant. I dont know how to work around this.
t=open('lettersTEst.txt','r')
tList=[0]*26
aL=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
idx=0
for char in t:
ch=ord(char)
if ch >=65 and ch <= 90:
pos=int(ch)-65
tList[pos]+=1
for ele in tList:
print(idx, ": ", tList[ch])
idx+=1
When you iterate over a file you get lines. If you want characters you need to iterate over each line as well.
for line in t:
for char in line:
ch = ord(char)
...
You need to loop over the indivdual characters of the each line of the file, and you could use a Counter instead of an array.
And if you want uppercase characters only, then add if char.isupper() before you add to the Counter.
Example
>>> from collections import Counter
>>> c = Counter()
>>> with open('lettersTEst.txt') as f:
... for line in f:
... for char in line:
... c[char] += 1
...
>>> for k,v in c.items():
... print('{}:{}'.format(k,v))
...
a:2
:4
e:1
g:1
i:3
h:1
m:1
l:1
n:1
p:1
s:4
r:1
t:2
While I prefer #JohnKugelman's answer over my own, I'd like to show two alternate methods of iterating over every character of a file in a single for loop
The first is using the second form of iter using a callable (read one character) and a sentinel (keep calling the function until it returns this value) In this case I'd use functools.partial to make the function that reads one byte:
import functools
read_a_byte = functools.partial(t.read, 1)
for char in iter(read_a_byte,''):
ch = ord(char)
...
The second is frequently used to flatten two dimensional lists, itertools.chain.from_iterable takes something that is iterated over (the file) and chains each generated value (each line) together in iteration.
import itertools
char_iterator = itertools.chain.from_iterable(t)
for char in char_iterator:
ch = ord(char)
...
Then you could pass either to collections.Counter to construct a basic counter but it wouldn't follow the same logic you have applied with ord:
read_a_byte = functools.partial(t.read, 1)
c = collections.Counter(iter(read_a_byte,''))
>>> pprint.pprint(dict(c))
{'a': 8,
'b': 2,
'c': 9,
'd': 4,
'e': 11,
...}

Counting consecutive characters in a string

I need to write a code that slices the string (which is an input), append it to a list, count the number of each letter - and if it is identical to the letter before it, don't put it in the list, but rather increase the appearance number of that letter in the one before..
Well this is how it should look like :
assassin [['a', 1], ['s', 2], ['a', 1], ['s', 2]], ['i', 1], ['n', 1]
the word assassin is just an example of the need..
My code so far goes like this:
userin = raw_input("Please enter a string :")
inputlist = []
inputlist.append(userin)
biglist = []
i=0
count = {}
while i<(len(userin)):
slicer = inputlist[0][i]
for s in userin:
if count.has_key(s):
count[s] += 1
else:
count[s] = 1
biglist.append([slicer,s])
i = i+1
print biglist
Thanks!
Use Collections.Counter(), dictionary is a better way to store this:
>>> from collections import Counter
>>> strs="assassin"
>>> Counter(strs)
Counter({'s': 4, 'a': 2, 'i': 1, 'n': 1})
or using itertools.groupby():
>>> [[k, len(list(g))] for k, g in groupby(strs)]
[['a', 1], ['s', 2], ['a', 1], ['s', 2], ['i', 1], ['n', 1]]
last = ''
results = []
word = 'assassin'
for letter in word:
if letter == last:
results[-1] = (letter, results[-1][1] +1)
else:
results.append((letter, 1))
last = letter
print result # [('a', 1), ('s', 2), ('a', 1), ('s', 2), ('i', 1), ('n', 1)]
Using only builtins:
def cnt(s):
current = [s[0],1]
out = [current]
for c in s[1:]:
if c == current[0]:
current[1] += 1
else:
current = [c, 1]
out.append(current)
return out
print cnt('assassin')

How do I replace characters in a string in Python?

I'm trying to find the best way to do the following:
I have a string lets say:
str = "pkm adp"
and I have a certain code in a dictionary to replace each charecter such as this one:
code = {'a': 'c', 'd': 'a', 'p': 'r', 'k': 'e', 'm': 'd'}
('a' should be replaced by 'c', 'd' by 'a' ...)
How can I convert the first string using the required characters from the dictionary to get the new string? Here for example I should get "red car" as the new string.
Try this:
>>> import string
>>> code = {'a': 'c', 'd': 'a', 'p': 'r', 'k': 'e', 'm': 'd'}
>>> trans = string.maketrans(*["".join(x) for x in zip(*code.items())])
>>> str = "pkm adp"
>>> str.translate(trans)
'red car'
Explanation:
>>> help(str.translate)
Help on built-in function translate:
translate(...)
S.translate(table [,deletechars]) -> string
Return a copy of the string S, where all characters occurring
in the optional argument deletechars are removed, and the
remaining characters have been mapped through the given
translation table, which must be a string of length 256.
>>> help(string.maketrans)
Help on built-in function maketrans in module strop:
maketrans(...)
maketrans(frm, to) -> string
Return a translation table (a string of 256 bytes long)
suitable for use in string.translate. The strings frm and to
must be of the same length.
The maketrans line turns the dictionary into two separate strings suitable for input into maketrans:
>>> code = {'a': 'c', 'd': 'a', 'p': 'r', 'k': 'e', 'm': 'd'}
>>> code.items()
[('a', 'c'), ('p', 'r'), ('k', 'e'), ('m', 'd'), ('d', 'a')]
>>> zip(*code.items())
[('a', 'p', 'k', 'm', 'd'), ('c', 'r', 'e', 'd', 'a')]
>>> ["".join(x) for x in zip(*code.items())]
['apkmd', 'creda']
"".join(code.get(k, k) for k in str)
would also work in your case.
code.get(k, k) returns code[k] if k is a valid key in code; if it isn't, it returns k itself.
>>> s = "pkm adp"
>>> code = {'a': 'c', 'd': 'a', 'p': 'r', 'k': 'e', 'm': 'd'}
>>> from string import maketrans
>>> s.translate(maketrans(''.join(code.keys()), ''.join(code.values())))
'red car'
though it would be tedious but a quick fix is str.replace("old", "new"). Here is the documentation for your help too http://www.tutorialspoint.com/python/string_replace.htm
Assuming you are using Python 2.x:
>>> from string import translate, maketrans
>>> data = "pkm adp"
>>> code = {'a': 'c', 'd': 'a', 'p': 'r', 'k': 'e', 'm': 'd'}
>>> table = maketrans(''.join(code.keys()), ''.join(code.values()))
>>> translate(data, table)
'red car'
>>>print ''.join([code.get(s,s) for s in str])
'red car'

Categories