String editing in python to find anagrams [duplicate] - python

This question already has an answer here:
Python Anagram Finder from given File
(1 answer)
Closed 9 years ago.
Given the string...
able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n
I am trying to figure out how to assign each word in the string to a variable, and then sort each word alphabetically which will allow me to compare them to see which ones are anagrams and which ones are not. I have around a month of Python experience so dumb everything WAY down if you could.

Instead of saving each word to a variable, you should save them all to a list. Here is how I would approach the complete problem:
from itertools import groupby
from operator import itemgetter
s = 'able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n'
words = s.strip().split()
sorted_words = (''.join(sorted(line)) for line in words)
grouped = sorted((v, i) for i, v in enumerate(sorted_words))
anagrams = [[words[i] for v, i in g] for k, g in groupby(grouped, itemgetter(0))]
Result:
>>> import pprint
>>> pprint.pprint(anagrams)
[['able', 'bale'],
['binary', 'brainy'],
['boat'],
['acre', 'care', 'race'],
['cater', 'crate', 'react', 'trace'],
['cat'],
['lawn'],
['beyond'],
['sheet'],
['list', 'silt', 'slit']]

In [27]: s = 'able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n'
In [28]: words = s.split()
In [29]: [''.join(sorted(w)) for w in words]
Out[29]:
['abel',
'acer',
'abel',
'bdenoy',
'abinry',
'abot',
'abinry',
...

You can do yourstring.split('whattosplitat'). In this case, that would be
l='able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n'.split('\n')
Then you can do l.sort() which will sort your list alphabetically.

s = 'able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n'
words = sorted(s.split('\n')[:-1]) # the last one will be '', so you want to get rid of that
To test whether or not a string is an anagram of another string:
def isAnagram(a, b):
aLtrs = sorted(list(a)) # if a='test', aLtrs=['e', 's', 't', 't']
bLtrs = sorted(list(a)) # same as above
return True if aLtrs==bLtrs else False

Related

Counting by partial string in Python

So, I have a list of strings (upper case letters).
list = ['DOG01', 'CAT02', 'HORSE04', 'DOG02', 'HORSE01', 'CAT01', 'CAT03', 'HORSE03', 'HORSE02']
How can I group and count occurrence in the list?
Expected output:
You may try using the Counter library here:
from collections import Counter
import re
list = ['DOG01', 'CAT02', 'HORSE04', 'DOG02', 'HORSE01', 'CAT01', 'CAT03', 'HORSE03', 'HORSE02']
list = [re.sub(r'\d+$', '', x) for x in list]
print(Counter(list))
This prints:
Counter({'HORSE': 4, 'CAT': 3, 'DOG': 2})
Note that the above approach simply strips off the number endings of each list element, then does an aggregation on the alpha names only.
you can also use dictionary
list= ['DOG01', 'CAT02', 'HORSE04', 'DOG02', 'HORSE01',
'CAT01', 'CAT03', 'HORSE03', 'HORSE02']
dic={}
for i in list:
i=i[:-2]
if i in dic:
dic[i]=dic[i]+1
else:
dic[i]=1
print(dic)

how can I found the most repeated word and how much repeated it [duplicate]

I am using Python 3.3
I need to create two lists, one for the unique words and the other for the frequencies of the word.
I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.
I have the design in text but am uncertain how to implement it in Python.
The methods I have found so far use either Counter or dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.
Here's the basic design:
original list = ["the", "car",....]
newlst = []
frequency = []
for word in the original list
if word not in newlst:
newlst.append(word)
set frequency = 1
else
increase the frequency
sort newlst based on frequency list
use this
from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})
You can use
from collections import Counter
It supports Python 2.7,read more information here
1.
>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
use dict
>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]
But, You have to read the file first, and converted to dict.
2.
it's the python docs example,use re and Counter
# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
print words.count(word), word
Pandas answer:
import pandas as pd
original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"]
pd.Series(original_list).value_counts()
If you wanted it in ascending order instead, it is as simple as:
pd.Series(original_list).value_counts().sort_values(ascending=True)
Yet another solution with another algorithm without using collections:
def countWords(A):
dic={}
for x in A:
if not x in dic: #Python 2.7: if not dic.has_key(x):
dic[x] = A.count(x)
return dic
dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items()) # if you want it sorted
One way would be to make a list of lists, with each sub-list in the new list containing a word and a count:
list1 = [] #this is your original list of words
list2 = [] #this is a new list
for word in list1:
if word in list2:
list2.index(word)[1] += 1
else:
list2.append([word,0])
Or, more efficiently:
for word in list1:
try:
list2.index(word)[1] += 1
except:
list2.append([word,0])
This would be less efficient than using a dictionary, but it uses more basic concepts.
You can use reduce() - A functional way.
words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})
returns:
{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}
Using Counter would be the best way, but if you don't want to do that, you can implement it yourself this way.
# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
freq[word] = word_list.count(word) / float(len(word_list))
freq will end up with the frequency of each word in the list you already have.
You need float in there to convert one of the integers to a float, so the resulting value will be a float.
Edit:
If you can't use a dict or set, here is another less efficient way:
# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
if word not in unique_words:
unique_words += [word]
word_frequencies = []
for word in unique_words:
word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
print(unique_words[i] + ": " + word_frequencies[i])
The indicies of unique_words and word_frequencies will match.
The ideal way is to use a dictionary that maps a word to it's count. But if you can't use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.
Try this:
words = []
freqs = []
for line in sorted(original list): #takes all the lines in a text and sorts them
line = line.rstrip() #strips them of their spaces
if line not in words: #checks to see if line is in words
words.append(line) #if not it adds it to the end words
freqs.append(1) #and adds 1 to the end of freqs
else:
index = words.index(line) #if it is it will find where in words
freqs[index] += 1 #and use the to change add 1 to the matching index in freqs
Here is code support your question
is_char() check for validate string count those strings alone, Hashmap is dictionary in python
def is_word(word):
cnt =0
for c in word:
if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
cnt +=1
if cnt==len(word):
return True
return False
def words_freq(s):
d={}
for i in s.split():
if is_word(i):
if i in d:
d[i] +=1
else:
d[i] = 1
return d
print(words_freq('the the sky$ is blue not green'))
for word in original_list:
words_dict[word] = words_dict.get(word,0) + 1
sorted_dt = {key: value for key, value in sorted(words_dict.items(), key=lambda item: item[1], reverse=True)}
keys = list(sorted_dt.keys())
values = list(sorted_dt.values())
print(keys)
print(values)
Simple way
d = {}
l = ['Hi','Hello','Hey','Hello']
for a in l:
d[a] = l.count(a)
print(d)
Output : {'Hi': 1, 'Hello': 2, 'Hey': 1}
word and frequency if you need
def counter_(input_list_):
lu = []
for v in input_list_:
ele = (v, lc.count(v)/len(lc)) #if you don't % remove <</len(lc)>>
if ele not in lu:
lu.append(ele)
return lu
counter_(['a', 'n', 'f', 'a'])
output:
[('a', 0.5), ('n', 0.25), ('f', 0.25)]
the best thing to do is :
def wordListToFreqDict(wordlist):
wordfreq = [wordlist.count(p) for p in wordlist]
return dict(zip(wordlist, wordfreq))
then try to :
wordListToFreqDict(originallist)

Detect substrings using regex in python [duplicate]

This question already has answers here:
Identify strings while removing substrings in python
(3 answers)
Closed 5 years ago.
I have a dictionary as follows.
myfood = {'yummy tim tam': 1, 'tasty chips': 3, 'yummy': 10, 'a loaf of bread': 5}
I also have a set as follows.
myset = {'yummy', 'a', 'tasty', 'of', 'delicious', 'yum'}
Now I want to identify the elements of myset in substrings of myfood and remove them. Hence, my final myfood dictionary should look as follows.
myfood = {'tim tam': 1, 'chips': 3, 'yummy': 10, 'loaf bread':5}
NOTE: I do not want to remove myset elements if they are full strings. E.g., 'yummy': 10 in myfood is not removed as it is not a substring, but a full string.
My current code is as follows.
for word in myfood.keys():
if word in myset:
#Do nothing
else:
######Find the substring part and remove it
Please help me.
Use re.sub to replace only keys that are substrings:
pat = re.compile(r'|'.join([r'(\s|\b){}\b'.format(x) for x in myset]))
dct = {}
for k, v in myfood.items():
if k not in myset: # exclude full strings
k = pat.sub('', k).strip()
dct[k] = v
print(dct)
# {'yummy': 10, 'loaf bread': 5, 'tim tam': 1, 'chips': 3}

put for loop in dict comprehension [duplicate]

I am using Python 3.3
I need to create two lists, one for the unique words and the other for the frequencies of the word.
I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.
I have the design in text but am uncertain how to implement it in Python.
The methods I have found so far use either Counter or dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.
Here's the basic design:
original list = ["the", "car",....]
newlst = []
frequency = []
for word in the original list
if word not in newlst:
newlst.append(word)
set frequency = 1
else
increase the frequency
sort newlst based on frequency list
use this
from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})
You can use
from collections import Counter
It supports Python 2.7,read more information here
1.
>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
use dict
>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]
But, You have to read the file first, and converted to dict.
2.
it's the python docs example,use re and Counter
# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
print words.count(word), word
Pandas answer:
import pandas as pd
original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"]
pd.Series(original_list).value_counts()
If you wanted it in ascending order instead, it is as simple as:
pd.Series(original_list).value_counts().sort_values(ascending=True)
Yet another solution with another algorithm without using collections:
def countWords(A):
dic={}
for x in A:
if not x in dic: #Python 2.7: if not dic.has_key(x):
dic[x] = A.count(x)
return dic
dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items()) # if you want it sorted
One way would be to make a list of lists, with each sub-list in the new list containing a word and a count:
list1 = [] #this is your original list of words
list2 = [] #this is a new list
for word in list1:
if word in list2:
list2.index(word)[1] += 1
else:
list2.append([word,0])
Or, more efficiently:
for word in list1:
try:
list2.index(word)[1] += 1
except:
list2.append([word,0])
This would be less efficient than using a dictionary, but it uses more basic concepts.
You can use reduce() - A functional way.
words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})
returns:
{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}
Using Counter would be the best way, but if you don't want to do that, you can implement it yourself this way.
# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
freq[word] = word_list.count(word) / float(len(word_list))
freq will end up with the frequency of each word in the list you already have.
You need float in there to convert one of the integers to a float, so the resulting value will be a float.
Edit:
If you can't use a dict or set, here is another less efficient way:
# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
if word not in unique_words:
unique_words += [word]
word_frequencies = []
for word in unique_words:
word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
print(unique_words[i] + ": " + word_frequencies[i])
The indicies of unique_words and word_frequencies will match.
The ideal way is to use a dictionary that maps a word to it's count. But if you can't use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.
Try this:
words = []
freqs = []
for line in sorted(original list): #takes all the lines in a text and sorts them
line = line.rstrip() #strips them of their spaces
if line not in words: #checks to see if line is in words
words.append(line) #if not it adds it to the end words
freqs.append(1) #and adds 1 to the end of freqs
else:
index = words.index(line) #if it is it will find where in words
freqs[index] += 1 #and use the to change add 1 to the matching index in freqs
Here is code support your question
is_char() check for validate string count those strings alone, Hashmap is dictionary in python
def is_word(word):
cnt =0
for c in word:
if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
cnt +=1
if cnt==len(word):
return True
return False
def words_freq(s):
d={}
for i in s.split():
if is_word(i):
if i in d:
d[i] +=1
else:
d[i] = 1
return d
print(words_freq('the the sky$ is blue not green'))
for word in original_list:
words_dict[word] = words_dict.get(word,0) + 1
sorted_dt = {key: value for key, value in sorted(words_dict.items(), key=lambda item: item[1], reverse=True)}
keys = list(sorted_dt.keys())
values = list(sorted_dt.values())
print(keys)
print(values)
Simple way
d = {}
l = ['Hi','Hello','Hey','Hello']
for a in l:
d[a] = l.count(a)
print(d)
Output : {'Hi': 1, 'Hello': 2, 'Hey': 1}
word and frequency if you need
def counter_(input_list_):
lu = []
for v in input_list_:
ele = (v, lc.count(v)/len(lc)) #if you don't % remove <</len(lc)>>
if ele not in lu:
lu.append(ele)
return lu
counter_(['a', 'n', 'f', 'a'])
output:
[('a', 0.5), ('n', 0.25), ('f', 0.25)]
the best thing to do is :
def wordListToFreqDict(wordlist):
wordfreq = [wordlist.count(p) for p in wordlist]
return dict(zip(wordlist, wordfreq))
then try to :
wordListToFreqDict(originallist)

How to find the most frequent character? [duplicate]

This question already has answers here:
Finding the most frequent character in a string
(18 answers)
Closed 9 years ago.
I am looking to write a program that lets a user enter a string and then shows the character that appears the most often in their inputted string. I originally wrote a program that counted the number of times a specific letter appeared (the letter T), but was looking to convert that. It may take a whole new code, though.
My former program is here:
# This program counts the number of times
# the letter T (uppercase or lowercase)
# appears in a string.
def main():
# Create a variable to use to hold the count.
# The variable must start with 0.
count = 0
# Get a string from the user.
my_string = input('Enter a sentence: ')
# Count the Ts.
for ch in my_string:
if ch == 'T' or ch == 't':
count += 1
# Print the result.
print('The letter T appears', count, 'times.')
# Call the main function.
main()
Similar to this code, but I am not familiar with a lot of what is used there. I am thinking the version of Python I use is older, but I may not be correct.
You could use collections.Counter from Py's std lib.
Take a look here.
Also, a small example:
>>> from collections import Counter
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
EDIT
One liner!
count = [(i, string.count(i)) for i in set(string)]
Or if you want to write your own function (that's interesting!) you could start with something like this:
string = "abracadabra"
def Counter(string):
count = {}
for each in string:
if each in count:
count[each] += 1
else:
count[each] = 1
return count # Or 'return [(k, v) for k, v in count]'
Counter(string)
out: {'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
>>> s = 'abracadabra'
>>> max([(i, s.count(i)) for i in set(s.lower)], key=lambda x: x[1])
('a', 5)
This will not show ties though. This will:
>>> s = 'abracadabrarrr'
>>> lcounts = sorted([(i, s.count(i)) for i in set(s)], key=lambda x: x[1])
>>> count = max(lcounts, key=lambda x: x[1])[1]
>>> filter(lambda x: x[1] == count, lcounts)
[('a', 5), ('r', 5)]

Categories