Sort text based on last 3rd character

Sort text based on last 3rd character - python

I am using the sorted() function to sort the text based on last character
which works perfectly
def sort_by_last_letter(strings):
def last_letter(s):
return s[-1]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))
Output
['a', 'from', 'hello', 'letter', 'last']
My requirement is to sort based on last 3rd character .But problem is few of the words are less than 3 character in that case it should be sorted based on next lower placed character (2 if present else last).Searching to do it in pythonic way
Presently I am getting
IndexError: string index out of range
def sort_by_last_letter(strings):
def last_letter(s):
return s[-3]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))

You can use:
return sorted(strings,key=lambda x: x[max(0,len(x)-3)])
So thus we first calculate the length of the string len(x) and subtract 3 from it. In case the string is not that long, we will thus obtain a negative index, but by using max(0,..) we prevent that and thus take the last but one, or the last character in case these do not exist.
This will work given every string has at least one character. This will produce:
>>> sorted(["hello","from","last","letter","a"],key=lambda x: x[max(0,len(x)-3)])
['last', 'a', 'hello', 'from', 'letter']
In case you do not care about tie-breakers (in other words if 'a' and 'abc' can be reordered), you can use a more elegant approach:
from operator import itemgetter
return sorted(strings,key=itemgetter(slice(-3,None)))
What we here do is generating a slice with the last three characters, and then compare these substrings. This then generates:
>>> sorted(strings,key=itemgetter(slice(-3,None)))
['a', 'last', 'hello', 'from', 'letter']
Since we compare with:
['a', 'last', 'hello', 'from', 'letter']
# ['a', 'ast', 'llo', 'rom', 'ter'] (comparison key)

You can simply use the minimum of the string length and 3:
def sort_by_last_letter(strings):
def last_letter(s):
return s[-min(len(s), 3)]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))

Related

python string match partial match [duplicate]

How can I check if any of the strings in an array exists in another string?
For example:
a = ['a', 'b', 'c']
s = "a123"
if a in s:
print("some of the strings found in s")
else:
print("no strings found in s")
How can I replace the if a in s: line to get the appropriate result?

You can use any:
a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]
if any([x in a_string for x in matches]):
Similarly to check if all the strings from the list are found, use all instead of any.

any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.
If you want the first match (with False as a default):
match = next((x for x in a if x in str), False)
If you want to get all matches (including duplicates):
matches = [x for x in a if x in str]
If you want to get all non-duplicate matches (disregarding order):
matches = {x for x in a if x in str}
If you want to get all non-duplicate matches in the right order:
matches = []
for x in a:
if x in str and x not in matches:
matches.append(x)

You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).

Just to add some diversity with regex:
import re
if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
print 'possible matches thanks to regex'
else:
print 'no matches'
or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))

A surprisingly fast approach is to use set:
a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
print("some of the strings found in str")
else:
print("no strings found in str")
This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.

You need to iterate on the elements of a.
a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:
if item in str:
found_a_string = True
if found_a_string:
print "found a match"
else:
print "no match found"

a = ['a', 'b', 'c']
str = "a123"
a_match = [True for match in a if match in str]
if True in a_match:
print "some of the strings found in str"
else:
print "no strings found in str"

jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.
Here is one way to use it in Python:
Download aho_corasick.py from here
Put it in the same directory as your main Python file and name it aho_corasick.py
Try the alrorithm with the following code:
from aho_corasick import aho_corasick #(string, keywords)
print(aho_corasick(string, ["keyword1", "keyword2"]))
Note that the search is case-sensitive

The regex module recommended in python docs, supports this
words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)
output:
['he', 'low', 'or']
Some details on implementation: link

A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.
>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring) # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>

Just some more info on how to get all list elements availlable in String
a = ['a', 'b', 'c']
str = "a123"
list(filter(lambda x: x in str, a))

It depends on the context
suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough
original_word ="hackerearcth"
for 'h' in original_word:
print("YES")
if you want to check any of the character among the original_word:
make use of
if any(your_required in yourinput for your_required in original_word ):
if you want all the input you want in that original_word,make use of all
simple
original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
print("yes")

flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
for fstr in strlist:
if line.find(fstr) != -1:
print('found')
res = True
if res:
print('res true')
else:
print('res false')

I would use this kind of function for speed:
def check_string(string, substring_list):
for substring in substring_list:
if substring in string:
return True
return False

Yet another solution with set. using set.intersection. For a one-liner.
subset = {"some" ,"words"}
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
print("All values present in text")
if subset & set(text.split()):
print("Atleast one values present in text")

If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:
from nltk.tokenize import word_tokenize
Here is the tokenized string from the accepted answer:
a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']
The accepted answer gets modified as follows:
matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]
As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.
matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]
Using word tokenization, "mo" is no longer matched:
[x in tokens for x in matches_2]
Out[44]: [False, False, False]
That is the additional behavior that I wanted. This answer also responds to the duplicate question here.

data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']
# for each
for field in mandatory_fields:
if field not in data:
print("Error, missing req field {0}".format(field));
# still fine, multiple if statements
if ('firstName' not in data or
'lastName' not in data or
'age' not in data):
print("Error, missing a req field");
# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
print("Error, missing fields {0}".format(", ".join(missing_fields)));

Want to remove elements based on first character - Python

This is a program that lists all the substrings except the one that starts with vowel letters.
However, I don't understand why startswith() function doesn't work as I expected. It is not removing the substrings that start with the letter 'A'.
Here is my code:
ban = 'BANANA'
cur_pos=0
sub = []
#Finding the substrings
for i in range(len(ban)):
limit=1
for j in range(len(ban)):
a = ban[cur_pos:limit]
sub.append(a)
limit+=1
cur_pos+=1
#removing the substrings that starts with vowels
for i in sub:
if (i.startswith(('A','E','I','O','U'))):
sub.remove(i)
print(sub)

Why this doesn't work...
To answer your question, the mantra for this issue is delete array elements in reverse order, which I occasionally forget and wonder whatever has gone wrong.
Explanation
The problem isn't with startswith() but using remove() inside this specific type of for loop, which uses an iterator rather than a range.
for i in sub:
This fails in this code for the following reason.
ban = 'BANANA'
cur_pos=0
sub = []
#Finding the substrings
for i in range(len(ban)):
limit=1
for j in range(len(ban)):
a = ban[cur_pos:limit]
sub.append(a)
limit+=1
cur_pos+=1
print(sub)
#removing the subtrings that start with vowels
for i in sub:
if (i.startswith(('A','E','I','O','U'))):
sub.remove(i)
print(sub)
print(sub)
I've added some print statements to assist debugging.
Initially the array is:
['B', 'BA', 'BAN', 'BANA', 'BANAN', 'BANANA', '', 'A', 'AN', 'ANA', 'ANAN', 'ANANA', '', '', 'N', 'NA', 'NAN', 'NANA', '', '', '', 'A', 'AN', 'ANA', '', '', '', '', 'N', 'NA', '', '', '', '', '', 'A']
...then we eventually get to remove the first 'A', which seems to be removed fine...
['B', 'BA', 'BAN', 'BANA', 'BANAN', 'BANANA', '', 'AN', 'ANA', 'ANAN', ...etc...
...but there is some nastiness happening behind the scenes that shows up when we reach the next vowel...
['B', 'BA', 'BAN', 'BANA', 'BANAN', 'BANANA', '', 'AN', 'ANAN',
Notice that 'ANA' was removed, not the expected 'AN'!
Why?
Because the remove() modified the array and shifted all the elements along by one position, but the for loop index behind the scenes does not know about this. The index is still pointing to the next element which it expects is 'AN' but because we moved all the elements by one position it is actually pointing to the 'ANA' element.
Fixing the problem
One way is to append vowel matches to a new empty array:
ban = 'BANANA'
cur_pos=0
sub = []
add = []
#Finding the subtrings
for i in range(len(ban)):
limit=1
for j in range(len(ban)):
a = ban[cur_pos:limit]
sub.append(a)
limit+=1
cur_pos+=1
#adding the subtrings that don't start with vowels
for i in sub:
if (not i.startswith(('A','E','I','O','U'))):
add.append(i)
print(add)
Another way
There is, however a simple way to modifying the original array, as you wanted, and that's to iterate through the array in reverse order using an index-based for loop.
The important part here is that you are not modifying any of the array elements that you are processing, only the parts that you are finished with, so that when you remove an element from the array, the array index won't point to the wrong element. This is common and acceptable practice, so long as you understand and make clear what you're doing.
ban = 'BANANA'
cur_pos=0
sub = []
#Finding the subtrings
for i in range(len(ban)):
limit=1
for j in range(len(ban)):
a = ban[cur_pos:limit]
sub.append(a)
limit+=1
cur_pos+=1
#removing the badtrings that start with vowels, in reverse index order
start = len(sub)-1 # last element index, less one (zero-based array indexing)
stopAt = -1 # first element index, less one (zero-based array indexing)
step = -1 # step backwards
for index in range(start,stopAt,step): # count backwards from last element to the first
i = sub[index]
if (i.startswith(('A','E','I','O','U'))):
print('#'+str(index)+' = '+i)
del sub[index]
print(sub)
For more details, see the official page on for
https://docs.python.org/3/reference/compound_stmts.html#index-6
Aside: This is my favourite array problem.
Edit: I just got bitten by this in Javascript, while removing DOM nodes.

It is not a good practice to iterate a list then removing item during the loop. I suggest you change it to this:
sub2=list()
#removing the substrings that starts with vowels
for i in sub:
if not (i.startswith(('A','E','I','O','U'))):
sub2.append(i)
print(sub2)
So if the substring do not starts with vowel, then add it to another list sub2.

As mentioned in the comments in python you shouldn't remove items from a list while iterating its elements since you mutate the original list before the loop ends. If you want to do that you'll either have to use a another list and then assign it to your old one or do it directly using a list comprehension like so:
sub = [i for i in sub if not i.startswith(('A','E','I','O','U'))]

Identify a sequence of numbers written as words

I have lists of words in python. In the list elements I have numbers written as words. For example:
list = ['man', 'ball', 'apple', 'thirty-one', 'five', 'seven', 'twelve', 'queen']
I have also the dictionary with every number written as word as the key and the corresponding digit as value. For example:
n_dict = {'zero':0, 'one':1, 'two':2, ...., 'hundred':100}
What I need to do is to identify let's say 4 or more (greater than 4) numbers written as words consecutively in the list and convert them to digits based on the dictionary. For example list should be like:
list = ['man', 'ball', 'apple', '31', '5', '7', '12', 'queen']
However, if there are less consecutive elements than the number specified (in our case 4) the list shall be the same. For example:
list2 = ['bike', 'earth', 't-shirt', 'twenty-five', 'zero', 'seven', 'home', 'bottle']
list2 Shall remain as it is.
In addition, if there are multiple sequences with numbers written as words but they are not reaching the minimum amount of consecutive words required the words should not change to digits. For example:
list3 = ['stairs', 'tree', 'street', 'forty-two', 'nine', 'submarine', 'two', 'eighty-five']
list3 Shall remain as it is.
The sequence of numbers written as words can be anywhere at the list. At the beginning, at the end, somewhere in the middle.
What I have tried so far:
def checkConsecutive(l):
return sorted(l) == list(range(min(l), max(l)+1))
def replace_numbers(word_list, num_dict):
flag = False
intersect = list(set(word_list) & set(n_dict.keys()))
intersect_index = [word_list.index(elem) for elem in intersect]
flag = check_if_consecutive(intersect_index)
if (len(intersect_index) > 4) & flag:
flag = True
for index in intersect_index:
word_list[index] = n_dict[word_list[index]]
return word_list, flag
I need to return the flag as well to keep track which of the lists changed.
The above code works fine but I think it's not that efficient. My question is whether can be implemented in a better way. E.g. using operator.itemgetter or something in a similar fashion.

For digits
from itertools import filterfalse
list_of_strings_that_are_secretly_integers = [*filterfalse(lambda x: isinstance(x, bool), (n_dict.get(i, False) for i in list_of_strings))]
For consecutivity, the following should work for any indexed candidate
def continuous(candidate, differential=1):
return all(e == candidate[i-1] + differential for i, e in enumerate(candidate[1:]))

Comparing strings in a list and appending those that have the same first and last character to a new list

I'm in an Intro to Python class and was given this assignment:
Given a list of strings, return a new list containing all the strings from the original list that begin and end with the same character. Matching is not case-sensitive, meaning 'a' should match with 'A'. Do not alter the original list in any way.
I was running into problems with slicing and comparing the strings because the possible lists given include '' (empty string). I'm pretty stumped and any help would be appreciated.
def first_last(strings):
match=[]
x=''
count=0
while count<len(strings):
if x[0] == x[-1]:
match.append(x)
x+=x
count+=1
So, when given:
['aba', 'dcn', 'z', 'zz', '']
or
['121', 'NbA', '898', '']
I get this:
string index out of range
When I should be seeing:
['aba', 'z', 'zz']
and
['121', '898']

Your list contains an empty string (''). Thus, you will have to check for the length of each element that you are currently iterating over. Also, it does not seem that you use x:
def first_last(strings):
match=[]
count=0
while count<len(strings):
if strings[count]:
if strings[count][0].lower() == strings[count][-1].lower():
match.append(strings[count])
count += 1
return match
Note, however, that you can also use list comprehension:
s = ['aba', 'dcn', 'z', 'zz', '']
final_strings = [i for i in s if i and i[0].lower() == i[-1].lower()]

def first_last(strings):
match=[]
for x in strings:
if x is '' continue;
if x.lower()[0] == x.lower()[-1]:
match.append(x)
return match

Test if the list element is not None first:
def first_last(strings):
match = []
for element in strings:
if element and element[0].lower() == element[-1].lower():
match.append(element)
return match
or with list comp:
match = [element for element in strings if element and element[0].lower() == element[-1].lower()]

Split string into array in Python

I have a string with the following structure.
string = "[abcd, abc, a, b, abc]"
I would like to convert that into an array. I keep using the split function in Python but I get spaces and the brackets on the start and the end of my new array. I tried working around it with some if statements but I keep missing letters in the end from some words.
Keep in mind that I don't know the length of the elements in the string. It could be 1, 2, 3 etc.

Assuming your elements never end or start with spaces or square brackets, you could just strip them out (the bracket can be stripped out before splitting):
arr = [ x.strip() for x in string.strip('[]').split(',') ]
It gives as expected
print (arr)
['abcd', 'abc', 'a', 'b', 'abc']
The nice part with strip is that it leaves all inner characters untouched. With:
string = "[ab cd, a[b]c, a, b, abc]"
You get: ['ab cd', 'a[b]c', 'a', 'b', 'abc']

You can also do this
>>> s = string[1:len(string)-1].split(", ")
>>> s
['abcd', 'abc', 'a', 'b', 'abc']

If the values in this list are variables themselves (looks like it because they're not quoted) the easiest way to convert this string to the equivalent list is
string = eval(string)
Caution: If the values in your list should be strings this will not work.

another way to solve this problem
string = "[abcd, abc, a, b, abc]"
result = string[1:len(string)-1].split(", ")
print(result)
Hope this helps

First remove [ and ] from your string, then split on commas, then remove spaces from resulting items (using strip).

If you do not want to use strip, it can be done by following rather clumsy way:
arr = [e[1:] for e in string.split(',')]
arr[len(arr)-1]=arr[len(arr)-1].replace(']', '')
print(arr)
['abcd', 'abc', 'a', 'b', 'abc']

I would suggest following.
[list_element.strip() for list_element in string.strip("[]").split(",")]
First remove brackets and then split it accordingly.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sort text based on last 3rd character - python

You can simply use the minimum of the string length and 3: def sort_by_last_letter(strings): def last_letter(s): return s[-min(len(s), 3)] return sorted(strings,key=last_letter) print(sort_by_last_letter(["hello","from","last","letter","a"]))

Related

python string match partial match [duplicate]

Want to remove elements based on first character - Python

Identify a sequence of numbers written as words

Comparing strings in a list and appending those that have the same first and last character to a new list

Split string into array in Python

Categories

Resources