Find the most common letter in a word; alphabetically

Find the most common letter in a word; alphabetically - python

program input is a string. From this string I would want the most common letter. In the case that there are multiple letters with the same frequency, I would return the one that comes first in the latin alphabet
code:
def most_Wanted(text="Hello Oman"):
lst = [x for x in text.replace(" ","").lower() if x.isalpha]
count = {}
for letter in lst:
if letter in count:
count[letter] += 1
else:
count[letter] = 1
count = list(count.items())
sorted(count, key=lambda x: (-x[1],x[0]))
print(count[0][0])
expected:
l #though o and l appear 3 times, l is before o in the latin alphabet
output:
h #there seems to be an error in the sorting as the first pair of tuples in the list always seems to be the first letter of the text?
Any suggestions to spruce up the code would be fine, though I would prefer not using modules at the moment so I can learn the core python. Thank you:)

The main issue is that sorted returns a new list, it is not in-place.
You should either reassign its return value, or use .sort():
count = sorted(count, key=lambda x: (-x[1],x[0]))
or
count.sort(key=lambda x: (-x[1],x[0]))
There is also an issue in the line
lst = [x for x in text.replace(" ","").lower() if x.isalpha]
if x.isalpha is always going to return True since it is only referencing the function instead of actually calling it. It should be changed to
lst = [x for x in text.replace(" ","").lower() if x.isalpha()]

Related

Return Alternating Letters With the Same Length From two Strings

there was a similar question asked on here but they wanted the remaining letters returned if one word was longer. I'm trying to return the same number of characters for both strings.
Here's my code:
def one_each(st, dum):
total = ""
for i in (st, dm):
total += i
return total
x = one_each("bofa", "BOFAAAA")
print(x)
It doesn't work but I'm trying to get this desired output:
>>>bBoOfFaA
How would I go about solving this? Thank you!

str.join with zip is possible, since zip only iterates pairwise up to the shortest iterable. You can combine with itertools.chain to flatten an iterable of tuples:
from itertools import chain
def one_each(st, dum):
return ''.join(chain.from_iterable(zip(st, dum)))
x = one_each("bofa", "BOFAAAA")
print(x)
bBoOfFaA

I'd probably do something like this
s1 = "abc"
s2 = "123"
ret = "".join(a+b for a,b in zip(s1, s2))
print (ret)

Here's a short way of doing it.
def one_each(short, long):
if len(short) > len(long):
short, long = long, short # Swap if the input is in incorrect order
index = 0
new_string = ""
for character in short:
new_string += character + long[index]
index += 1
return new_string
x = one_each("bofa", "BOFAAAA") # returns bBoOfFaA
print(x)
It might show wrong results when you enter x = one_each("abcdefghij", "ABCD") i.e when the small letters are longer than capital letters, but that can be easily fixed if you alter the case of each letter of the output.

Finding regular expression with at least one repetition of each letter

From any *.fasta DNA sequence (only 'ACTG' characters) I must find all sequences which contain at least one repetition of each letter.
For examle from sequence 'AAGTCCTAG' I should be able to find: 'AAGTC', 'AGTC', 'GTCCTA', 'TCCTAG', 'CCTAG' and 'CTAG' (iteration on each letter).
I have no clue how to do that in pyhton 2.7. I was trying with regular expressions but it was not searching for every variants.
How can I achive that?

You could find all substrings of length 4+, and then down select from those to find only the shortest possible combinations that contain one of each letter:
s = 'AAGTCCTAG'
def get_shortest(s):
l, b = len(s), set('ATCG')
options = [s[i:j+1] for i in range(l) for j in range(i,l) if (j+1)-i > 3]
return [i for i in options if len(set(i) & b) == 4 and (set(i) != set(i[:-1]))]
print(get_shortest(s))
Output:
['AAGTC', 'AGTC', 'GTCCTA', 'TCCTAG', 'CCTAG', 'CTAG']

This is another way you can do it. Maybe not as fast and nice as chrisz answere. But maybe a little simpler to read and understand for beginners.
DNA='AAGTCCTAG'
toSave=[]
for i in range(len(DNA)):
letters=['A','G','T','C']
j=i
seq=[]
while len(letters)>0 and j<(len(DNA)):
seq.append(DNA[j])
try:
letters.remove(DNA[j])
except:
pass
j+=1
if len(letters)==0:
toSave.append(seq)
print(toSave)

Since the substring you are looking for may be of about any length, a LIFO queue seems to work. Append each letter at a time, check if there are at least one of each letters. If found return it. Then remove letters at the front and keep checking until no longer valid.
def find_agtc_seq(seq_in):
chars = 'AGTC'
cur_str = []
for ch in seq_in:
cur_str.append(ch)
while all(map(cur_str.count,chars)):
yield("".join(cur_str))
cur_str.pop(0)
seq = 'AAGTCCTAG'
for substr in find_agtc_seq(seq):
print(substr)
That seems to result in the substrings you are looking for:
AAGTC
AGTC
GTCCTA
TCCTAG
CCTAG
CTAG

I really wanted to create a short answer for this, so this is what I came up with!
See code in use here
s = 'AAGTCCTAG'
d = 'ACGT'
c = len(d)
while c <= len(s):
x,c = s[:c],c+1
if all(l in x for l in d):
print(x)
s,c = s[1:],len(d)
It works as follows:
c is set to the length of the string of characters we are ensuring exist in the string (d = ACGT)
The while loop iterates over each possible substring of s such that c is smaller than the length of s.
This works by increasing c by 1 upon each iteration of the while loop.
If every character in our string d (ACGT) exist in the substring, we print the result, reset c to its default value and slice the string by 1 character from the start.
The loop continues until the string s is shorter than d
Result:
AAGTC
AGTC
GTCCTA
TCCTAG
CCTAG
CTAG
To get the output in a list instead (see code in use here):
s = 'AAGTCCTAG'
d = 'ACGT'
c,r = len(d),[]
while c <= len(s):
x,c = s[:c],c+1
if all(l in x for l in d):
r.append(x)
s,c = s[1:],len(d)
print(r)
Result:
['AAGTC', 'AGTC', 'GTCCTA', 'TCCTAG', 'CCTAG', 'CTAG']

If you can break the sequence into a list, e.g. of 5-letter sequences, you could then use this function to find repeated sequences.
from itertools import groupby
import numpy as np
def find_repeats(input_list, n_repeats):
flagged_items = []
for item in input_list:
# Create itertools.groupby object
groups = groupby(str(item))
# Create list of tuples: (digit, number of repeats)
result = [(label, sum(1 for _ in group)) for label, group in groups]
# Extract just number of repeats
char_lens = np.array([x[1] for x in result])
# Append to flagged items
if any(char_lens >= n_repeats):
flagged_items.append(item)
# Return flagged items
return flagged_items
#--------------------------------------
test_list = ['aatcg', 'ctagg', 'catcg']
find_repeats(test_list, n_repeats=2) # Returns ['aatcg', 'ctagg']

Intro to Python - Lists questions

we've started doing Lists in our class and I'm a bit confused thus coming here since previous questions/answers have helped me in the past.
The first question was to sum up all negative numbers in a list, I think I got it right but just want to double check.
import random
def sumNegative(lst):
sum = 0
for e in lst:
if e < 0:
sum = sum + e
return sum
lst = []
for i in range(100):
lst.append(random.randrange(-1000, 1000))
print(sumNegative(lst))
For the 2nd question, I'm a bit stuck on how to write it. The question was:
Count how many words occur in a list up to and including the first occurrence of the word “sap”. I'm assuming it's a random list but wasn't given much info so just going off that.
I know the ending would be similar but no idea how the initial part would be since it's string opposed to numbers.
I wrote a code for a in-class problem which was to count how many odd numbers are on a list(It was random list here, so assuming it's random for that question as well) and got:
import random
def countOdd(lst):
odd = 0
for e in lst:
if e % 2 = 0:
odd = odd + 1
return odd
lst = []
for i in range(100):
lst.append(random.randint(0, 1000))
print(countOdd(lst))
How exactly would I change this to fit the criteria for the 2nd question? I'm just confused on that part. Thanks.

The code to sum -ve numbers looks fine! I might suggest testing it on a list that you can manually check, such as:
print(sumNegative([1, -1, -2]))
The same logic would apply to your random list.
A note about your countOdd function, it appears that you are missing an = (== checks for equality, = is for assignment) and the code seems to count even numbers, not odd. The code should be:
def countOdd(lst):
odd = 0
for e in lst:
if e%2 == 1: # Odd%2 == 1
odd = odd + 1
return odd
As for your second question, you can use a very similar function:
def countWordsBeforeSap(inputList):
numWords = 0
for word in inputList:
if word.lower() != "sap":
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList))
To explain the above, the countWordsBeforeSap function:
Starts iterating through the words.
If the word is anything other than "sap" it increments the counter and continues
If the word IS "sap" then it returns early from the function
The function could be more general by passing in the word that you wanted to check for:
def countWordsBefore(inputList, wordToCheckFor):
numWords = 0
for word in inputList:
if word.lower() != wordToCheckFor:
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList, "sap"))
If the words that you are checking come from a single string then you would initially need to split the string into individual words like so:
inputString = "Trees produce sap"
inputList = inputString.split(" ")
Which splits the initial string into words that are separated by spaces.
Hope this helps!
Tom

def count_words(lst, end="sap"):
"""Note that I added an extra input parameter.
This input parameter has a default value of "sap" which is the actual question.
However you can change this input parameter to any other word if you want to by
just doing "count_words(lst, "another_word".
"""
words = []
# First we need to loop through each item in the list.
for item in lst:
# We append the item to our "words" list first thing in this loop,
# as this will make sure we will count up to and INCLUDING.
words.append(item)
# Now check if we have reached the 'end' word.
if item == end:
# Break out of the loop prematurely, as we have reached the end.
break
# Our 'words' list now has all the words up to and including the 'end' variable.
# 'len' will return how many items there are in the list.
return len(words)
lst = ["something", "another", "woo", "sap", "this_wont_be_counted"]
print(count_words(lst))
Hope this helps you understand lists better!

You can make effective use of list/generator comprehensions. Below are fast and memory efficient.
1. Sum of negatives:
print(sum( i<0 for i in lst))
2. Count of words before sap: Like you sample list, it assumes no numbers are there in list.
print(lst.index('sap'))
If it's a random list. Filter strings. Find Index for sap
l = ['a','b',1,2,'sap',3,'d']
l = filter(lambda x: type(x)==str, l)
print(l.index('sap'))
3. Count of odd numbers:
print(sum(i%2 != 0 for i in lst))

Find longest string in a list

I need to write a code where a function takes in a list and then returns the longest string from that list.
So far I have:
def longestword(alist):
a = 0
answer = ''
for i in alist:
x = i
if x > a:
a = x
answer = x
elif i == a:
if i not in alist:
answer = answer + ' ' + i
return answer
The example I have is longestword([11.22,"hello",20000,"Thanksgiving!",True])
which is supposed to return 'Thanksgiving!' but my function always returns True.

For starters, this always assigns x to the very last value in the list, which in your example is True.
for i in alist:
x = i
And you should try not to access a loop value outside of the loop because, again, it's the last value of the thing you looped over, so True
elif i == a:
The key to solving the problem is to pick out which values are strings (using isinstance()) and tracking the longest length ones (using the len() function)
def longeststring(lst):
longest = ""
for x in lst:
if isinstance(x, str) and len(x) > len(longest):
longest = x
return longest
Do be mindful of equal length strings. I don't know the requirements of your assignment.

I prefer to keep the for loops to a minimum; here's how I find the longest string in a list:
listo = ["long word 1 ", " super OMG long worrdddddddddddddddddddddd", "shortword" ]
lenList = []
for x in listo:
lenList.append(len(x))
length = max(lenList)
print("Longest string is: {} at {} characters.".format(listo[lenList.index(length)] , length))

Why not use...
str_list = [x for x in alist if isinstance(x, str)]
longestword = sorted(str_list, key=len, reverse=True)[0]
Using a list comprehension, create new_list by iterating the elements of your original list and retaining only the strings with.
And then your list will be sorted by the sorted() function.
Applying the key=len argument, your list will be sorted by the length of the list element using the built-in-function len().
And with the reverse=True argument, your list will be returned sorted in descending order, i.e. longest first.
Then you select index [0] of the sorted list which in this case, is the longest string.
To complete then:
def longestword(alist):
str_list = [x for x in alist if isinstance(x, str)]
longestword = sorted(str_list, key=len, reverse=True)[0]
return longestword
As Gavin rightly points out below, the sorting could be achieved without having to pass the reverse=True argument by returning the last element in the list with [-1], i.e.:
longestword = sorted(str_list, key=len)[-1]

You can try something like this :
def longest_string(some_list):
only_str = [s for s in some_list if isinstance(s, str)]
return max(only_str, key=len)
This method will create a list of strings named only_str. Then it will return the max length string using max() function.
When we run the longest_string() method with the example provided in the question:
some_list = [11.22,"hello",20000,"Thanksgiving!",True]
longest_string(some_list)
The output is: 'Thanksgiving!'
Hope this solves this problem!!

Determine prefix from a set of (similar) strings

I have a set of strings, e.g.
my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter
I simply want to find the longest common portion of these strings, here the prefix. In the above the result should be
my_prefix_
The strings
my_prefix_what_ever
my_prefix_what_so_ever
my_doesnt_matter
should result in the prefix
my_
Is there a relatively painless way in Python to determine the prefix (without having to iterate over each character manually)?
PS: I'm using Python 2.6.3.

Never rewrite what is provided to you: os.path.commonprefix does exactly this:
Return the longest path prefix (taken
character-by-character) that is a prefix of all paths in list. If list
is empty, return the empty string (''). Note that this may return
invalid paths because it works a character at a time.
For comparison to the other answers, here's the code:
# Return the longest prefix of all list elements.
def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1

Ned Batchelder is probably right. But for the fun of it, here's a more efficient version of phimuemue's answer using itertools.
import itertools
strings = ['my_prefix_what_ever',
'my_prefix_what_so_ever',
'my_prefix_doesnt_matter']
def all_same(x):
return all(x[0] == y for y in x)
char_tuples = itertools.izip(*strings)
prefix_tuples = itertools.takewhile(all_same, char_tuples)
''.join(x[0] for x in prefix_tuples)
As an affront to readability, here's a one-line version :)
>>> from itertools import takewhile, izip
>>> ''.join(c[0] for c in takewhile(lambda x: all(x[0] == y for y in x), izip(*strings)))
'my_prefix_'

Here's my solution:
a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
prefix_len = len(a[0])
for x in a[1 : ]:
prefix_len = min(prefix_len, len(x))
while not x.startswith(a[0][ : prefix_len]):
prefix_len -= 1
prefix = a[0][ : prefix_len]

The following is an working, but probably quite inefficient solution.
a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
b = zip(*a)
c = [x[0] for x in b if x==(x[0],)*len(x)]
result = "".join(c)
For small sets of strings, the above is no problem at all. But for larger sets, I personally would code another, manual solution that checks each character one after another and stops when there are differences.
Algorithmically, this yields the same procedure, however, one might be able to avoid constructing the list c.

Just out of curiosity I figured out yet another way to do this:
def common_prefix(strings):
if len(strings) == 1:#rule out trivial case
return strings[0]
prefix = strings[0]
for string in strings[1:]:
while string[:len(prefix)] != prefix and prefix:
prefix = prefix[:len(prefix)-1]
if not prefix:
break
return prefix
strings = ["my_prefix_what_ever","my_prefix_what_so_ever","my_prefix_doesnt_matter"]
print common_prefix(strings)
#Prints "my_prefix_"
As Ned pointed out it's probably better to use os.path.commonprefix, which is a pretty elegant function.

The second line of this employs the reduce function on each character in the input strings. It returns a list of N+1 elements where N is length of the shortest input string.
Each element in lot is either (a) the input character, if all input strings match at that position, or (b) None. lot.index(None) is the position of the first None in lot: the length of the common prefix. out is that common prefix.
val = ["axc", "abc", "abc"]
lot = [reduce(lambda a, b: a if a == b else None, x) for x in zip(*val)] + [None]
out = val[0][:lot.index(None)]

Here's a simple clean solution. The idea is to use zip() function to line up all the characters by putting them in a list of 1st characters, list of 2nd characters,...list of nth characters. Then iterate each list to check if they contain only 1 value.
a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
list = [all(x[i] == x[i+1] for i in range(len(x)-1)) for x in zip(*a)]
print a[0][:list.index(0) if list.count(0) > 0 else len(list)]
output: my_prefix_

Here is another way of doing this using OrderedDict with minimal code.
import collections
import itertools
def commonprefix(instrings):
""" Common prefix of a list of input strings using OrderedDict """
d = collections.OrderedDict()
for instring in instrings:
for idx,char in enumerate(instring):
# Make sure index is added into key
d[(char, idx)] = d.get((char,idx), 0) + 1
# Return prefix of keys while value == length(instrings)
return ''.join([k[0] for k in itertools.takewhile(lambda x: d[x] == len(instrings), d)])

I had a slight variation of the problem and google sends me here, so I think it will be useful to document:
I have a list like:
my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter
some_noise
some_other_noise
So I would expect my_prefix to be returned. That can be done with:
from collections import Counter
def get_longest_common_prefix(values, min_length):
substrings = [value[0: i-1] for value in values for i in range(min_length, len(value))]
counter = Counter(substrings)
# remove count of 1
counter -= Counter(set(substrings))
return max(counter, key=len)

In one line without using itertools, for no particular reason, although it does iterate through each character:
''.join([z[0] for z in zip(*(list(s) for s in strings)) if all(x==z[0] for x in z)])

Find the common prefix in all words from the given input string, if there is no common prefix print -1
stringList = ['my_prefix_what_ever', 'my_prefix_what_so_ever', 'my_prefix_doesnt_matter']
len2 = len( stringList )
if len2 != 0:
# let shortest word is prefix
prefix = min( stringList )
for i in range( len2 ):
word = stringList[ i ]
len1 = len( prefix )
# slicing each word as lenght of prefix
word = word[ 0:len1 ]
for j in range( len1 ):
# comparing each letter of word and prefix
if word[ j ] != prefix[ j ]:
# if letter does not match slice the prefix
prefix = prefix[ :j ]
break # after getting comman prefix move to next word
if len( prefix ) != 0:
print("common prefix: ",prefix)
else:
print("-1")
else:
print("string List is empty")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find the most common letter in a word; alphabetically - python

Related

Return Alternating Letters With the Same Length From two Strings

Finding regular expression with at least one repetition of each letter

Intro to Python - Lists questions

Find longest string in a list

Determine prefix from a set of (similar) strings

Categories

Resources