Identify a sequence of numbers written as words - python

I have lists of words in python. In the list elements I have numbers written as words. For example:
list = ['man', 'ball', 'apple', 'thirty-one', 'five', 'seven', 'twelve', 'queen']
I have also the dictionary with every number written as word as the key and the corresponding digit as value. For example:
n_dict = {'zero':0, 'one':1, 'two':2, ...., 'hundred':100}
What I need to do is to identify let's say 4 or more (greater than 4) numbers written as words consecutively in the list and convert them to digits based on the dictionary. For example list should be like:
list = ['man', 'ball', 'apple', '31', '5', '7', '12', 'queen']
However, if there are less consecutive elements than the number specified (in our case 4) the list shall be the same. For example:
list2 = ['bike', 'earth', 't-shirt', 'twenty-five', 'zero', 'seven', 'home', 'bottle']
list2 Shall remain as it is.
In addition, if there are multiple sequences with numbers written as words but they are not reaching the minimum amount of consecutive words required the words should not change to digits. For example:
list3 = ['stairs', 'tree', 'street', 'forty-two', 'nine', 'submarine', 'two', 'eighty-five']
list3 Shall remain as it is.
The sequence of numbers written as words can be anywhere at the list. At the beginning, at the end, somewhere in the middle.
What I have tried so far:
def checkConsecutive(l):
return sorted(l) == list(range(min(l), max(l)+1))
def replace_numbers(word_list, num_dict):
flag = False
intersect = list(set(word_list) & set(n_dict.keys()))
intersect_index = [word_list.index(elem) for elem in intersect]
flag = check_if_consecutive(intersect_index)
if (len(intersect_index) > 4) & flag:
flag = True
for index in intersect_index:
word_list[index] = n_dict[word_list[index]]
return word_list, flag
I need to return the flag as well to keep track which of the lists changed.
The above code works fine but I think it's not that efficient. My question is whether can be implemented in a better way. E.g. using operator.itemgetter or something in a similar fashion.

For digits
from itertools import filterfalse
list_of_strings_that_are_secretly_integers = [*filterfalse(lambda x: isinstance(x, bool), (n_dict.get(i, False) for i in list_of_strings))]
For consecutivity, the following should work for any indexed candidate
def continuous(candidate, differential=1):
return all(e == candidate[i-1] + differential for i, e in enumerate(candidate[1:]))

Related

how do i align a list of strings to the right?

Is there a way for me to 'align' a list of strings to the right? I'm performing a counting sort here and I want sort my characters from the right.
For example, given a list of strings
eg. list = ['abc', 'a','qwerty', 'cd']
The length of the longest string in the list is 6 (qwerty),
list = ['abc', 'a','qwerty', 'cd']
biggest = max(list, key=len)
max = biggest - 1
list2= []
for col in range(-max, 0):
for i in list:
list2.append(i[abs(col)])
As my other strings are not the same length as qwerty, there will be an error, how do I 'align' all my strings to the right? so when I try to sort from the last alphabet, 'a' would be aligned with 'y' from 'qwerty' too.
a
cd
abc
qwerty
And I would like to accomplish this without padding
You can sort your whole list by length and use the format mini language for output:
data = ['abc', 'a','qwerty', 'cd']
s = sorted(data, key=len) # sorted copy of your list
maxlen = len(s[-1]) # longest is last element in sorted list
for l in s:
print(f"{l:>{maxlen}}") # not padded, just printed out right aligned
Output:
a
cd
abc
qwerty
As far as I understand the question, this should be the solution:
list_1 = ['aaaz', 'abc', 'a', 'qwerty', 'cd', "xxxxxxxxca"]
def my_sort(data):
inverted = sorted(data, key=lambda x: x[::-1])
return inverted
max_len = max([len(s) for s in list_1])
list_2 = my_sort(list_1)
print(list_2)
>>> ['a', 'xxxxxxxxca', 'abc', 'cd', 'qwerty', 'aaaz']
I understand that the strings should be sorted alphabetically but from right to left.
list_1 = ['abc', 'a','qwerty', 'cd']
biggest = max(list, key=len)
biggest=len(biggest)
list_2=[]
//with padding
for i in list_1:
list_2.append(' '*(biggest-len(i))+i)
//without padding
for i in list_1:
list_2.append(f"{i:>{biggest}}")
I'd go with this approach

joining strings iteratively between two indexes python

I have a test file with the following format:
one
two
three
four
=
five
six
seven
eight
=
nine
ten
one
two
=
and I am writing a python code to create a list, with each element in the text to be an item in a list:
dump = sys.argv[1]
lines = []
with open(dump) as f:
for line in f:
x = line.strip()
lines.append(x)
print(lines)
lines list =
['one', 'two', 'three', 'four', '=', 'five', 'six', 'seven', 'eight', '=', 'nine', 'ten', 'one', 'two', '=']
I then get the indexes of the equals signs in order to try to use those at a later point to make a new list, combining the strings:
equals_indexes = [i for i, x in enumerate(lines) if x == '=']
equals_indexes list:
[4, 9, 14]
I am good up until this point. Now I would like to join the strings one, two, three, four before the first index as new_list element 1. I would like to join the next group of strings between equals sign 1 and 2, and the next group of strings between equals sign 2 and 3 to produce the following:
[[one two three four], [five six seven eight], [nine ten one two]]
I have tried to do this by iterating over the list of equals indexes, then iterating over the list lines:
for i in equals_indexes:
sequences = ""
for x,y in enumerate(lines):
if x < i:
sequences = ' '.join(lines[x:i])
groups.append(sequences)
print(groups)
Which produces the following:
['one two three four', 'two three four', 'three four', 'four', 'one two three four = five six seven eight', 'two three four = five six seven eight', ....]
I understand why this is happening, because at each iteration of x, it is checking to see if it is less than i and if so appending each string at x to the string "sequences". I am doing this because I have a large file with huge blocks of text corresponding to one iteration of a program. The separator between iteration 1 and iteration 2 of the program is a single '=' in the line. This way I can parse the list elements after I am able to split them by equals sign.
Any help would be great!
I think this gets you what you are looking for, although there is one part that is unclear. If you want to join the strings between equals signs as each element in your final list:
with open(dump) as f:
full_string = ' '.join([line.strip() for line in f])
my_list = [string.strip() for string in full_string.split('=') if string is not '']
print(my_list)
['one two three four', 'five six seven eight', 'nine ten one two']
If, instead, you want sub-lists comprising each string between the equals signs, just replace my_list above with:
my_list = [[s for s in string.split()] for string in full_string.split('=') if string is not '']
[['one', 'two', 'three', 'four'], ['five', 'six', 'seven', 'eight'], ['nine', 'ten', 'one', 'two']]
Bonus, they use list comprehensions which are a much more pythonic way of looping:
Here's a small IDLE example:
>>> stuff = ['a', 'b', 'c', '=', 'd', 'e', '=', 'f', 'g']
>>> "".join(stuff).split('=')
['abc', 'de', 'fg']
It joins all of the characters together (So you can skip separating them out into separate lists), and then splits that string on the = character.
Read in lines until you hit a =, merge them as one listentry and add it, continue until done, put last line-list content in:
t = """one
two
three
four
=
five
six
seven
eight
=
nine
ten
one
two
="""
data = [] # global list
line = [] # temp list
for n in [x.strip() for x in t.splitlines()]:
if n == "=":
if line:
data.append(' '.join(line))
line = []
else:
line.append(n)
if line:
data.append(' '.join(line))
print(data)
Output:
['one two three four', 'five six seven eight', 'nine ten one two']

Sort text based on last 3rd character

I am using the sorted() function to sort the text based on last character
which works perfectly
def sort_by_last_letter(strings):
def last_letter(s):
return s[-1]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))
Output
['a', 'from', 'hello', 'letter', 'last']
My requirement is to sort based on last 3rd character .But problem is few of the words are less than 3 character in that case it should be sorted based on next lower placed character (2 if present else last).Searching to do it in pythonic way
Presently I am getting
IndexError: string index out of range
def sort_by_last_letter(strings):
def last_letter(s):
return s[-3]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))
You can use:
return sorted(strings,key=lambda x: x[max(0,len(x)-3)])
So thus we first calculate the length of the string len(x) and subtract 3 from it. In case the string is not that long, we will thus obtain a negative index, but by using max(0,..) we prevent that and thus take the last but one, or the last character in case these do not exist.
This will work given every string has at least one character. This will produce:
>>> sorted(["hello","from","last","letter","a"],key=lambda x: x[max(0,len(x)-3)])
['last', 'a', 'hello', 'from', 'letter']
In case you do not care about tie-breakers (in other words if 'a' and 'abc' can be reordered), you can use a more elegant approach:
from operator import itemgetter
return sorted(strings,key=itemgetter(slice(-3,None)))
What we here do is generating a slice with the last three characters, and then compare these substrings. This then generates:
>>> sorted(strings,key=itemgetter(slice(-3,None)))
['a', 'last', 'hello', 'from', 'letter']
Since we compare with:
['a', 'last', 'hello', 'from', 'letter']
# ['a', 'ast', 'llo', 'rom', 'ter'] (comparison key)
You can simply use the minimum of the string length and 3:
def sort_by_last_letter(strings):
def last_letter(s):
return s[-min(len(s), 3)]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))

Assigning words a unique number identifier

Task
I am trying to assign an number identifier for words in a string.
Code
I have currently done the following:
mystr = 'who are you you are who'
str_values = mystr.split()
list_values = [str(i) for i, w in enumerate(mystr.split())]
Output:
>>> str_values
['0', '1', '2', '3', '4', '5']
>>> list_values
['who', 'are', 'you', 'you', 'are', 'who']
Query/Desired Output
mystr contains repeating words, and so I would like to assign each word a number rather than different numbers each time but aren't sure how I should begin doing so. Therefore, I would like list_values to output something along the line of:
['0', '1', '2', '2', '1', '0']
You could do this with help of another list -
n = []
output = [n.index(i) for i in mystr.split() if i in n or not n.append(i)]
First n is empty list. Now list comprehension iterate over all the element of mystr.split(). It adds the index of the element in list n if condition met.
Now for the condition. There are two parts with an or. First it checks if the element is present in n. If yes, then get the index of the element. If no, it goes to the second part, which just appends the element to the list n. Now append() returns None. That is why I added a not before it. So, that condition will be satisfied and it will give the newly inserted elements index.
Basically the first part of if condition restricts duplicate element addition in n and the second part does the addition.
Well we can work in two phases:
first we construct a dictionary that maps words on indices, given they do not exist yet, and
next we use the dictionary to obtain the word identifiers.
Like:
identifiers = {}
idx = 0
for word in mystr.split():
if word not in identifiers:
identifiers[word] = idx
idx += 1
list_values = [identifiers[word] for word in mystr.split()]
This generates:
>>> [identifiers[word] for word in mystr.split()]
[0, 1, 2, 2, 1, 0]
If you want, you can also convert the identifiers to strings, with str(..), but I do not see why wou would do that:
>>> [str(identifiers[word]) for word in mystr.split()]
['0', '1', '2', '2', '1', '0']
The algorithm will usually work in O(n).
You need to use a dictionary to keep track of which words have already been seen
word_map = {}
word_id_counter = 0
def word_id(word):
global word_id_counter
if word in word_map:
return word_map[word]
else:
word_map[word] = word_id_counter
word_id_counter += 1
return word_map[word]
To avoid using global variables you can wrap it in a class
class WordIdGenerator:
word_map = {}
word_id_counter = 0
def word_id(self, word):
if word in self.word_map:
return self.word_map[word]
else:
self.word_map[word] = self.word_id_counter
self.word_id_counter += 1
return self.word_map[word]
And you can use it like this:
gen = WordIdGenerator()
[gen.word_id(w) for w in 'who are you you are who'.split()]
And the output will be:
[0, 1, 2, 2, 1, 0]

Returning Dictionary-length of words in string [duplicate]

This question already has answers here:
Adding more than one value to dictionary when looping through string
(7 answers)
Closed 6 years ago.
I need to build a function that takes as input a string and returns a dictionary.
The keys are numbers and the values are lists that contain the unique words that have a number of letters equal to the keys.
For example, if the input function is as follows:
n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")
The function should return:
{2: ['is'], 3: ['and', 'see', 'the', 'way', 'you'], 4: ['them', 'they', 'what'], 5: ['treat'], 6: ['become', 'people']}
The code that I have written is as follows:
def n_letter_dictionary(my_string):
my_string=my_string.lower().split()
sample_dictionary={}
for word in my_string:
words=len(word)
sample_dictionary[words]=word
print(sample_dictionary)
return sample_dictionary
The function is returning a dictionary as follows:
{2: 'is', 3: 'you', 4: 'they', 5: 'treat', 6: 'become'}
The dictionary does not contain all the words with the same number of letters but is returning only the last one in the string.
Since you only want to store unique values in your lists, it actually makes more sense to use a set. Your code is almost right, you just need to make sure that you create a set if words isn't already a key in your dictionary, but that you add to the set if words is already a key in your dictionary. The following displays this:
def n_letter_dictionary(my_string):
my_string=my_string.lower().split()
sample_dictionary={}
for word in my_string:
words=len(word)
if words in sample_dictionary:
sample_dictionary[words].add(word)
else:
sample_dictionary[words] = {word}
print(sample_dictionary)
return sample_dictionary
n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")
Output
{2: set(['is']), 3: set(['and', 'the', 'see', 'you', 'way']),
4: set(['them', 'what', 'they']), 5: set(['treat']), 6: set(['become', 'people'])}
The problem with your code is that you just put the latest word into the dictionary. Instead, you have to add that word to some collection of words that have the same length. In your example, that is a list, but a set seems to be more appropriate, assuming order is not important.
def n_letter_dictionary(my_string):
my_string=my_string.lower().split()
sample_dictionary={}
for word in my_string:
if len(word) not in sample_dictionary:
sample_dictionary[len(word)] = set()
sample_dictionary[len(word)].add(word)
return sample_dictionary
You can make this a bit shorter by using a collections.defaultdict(set):
my_string=my_string.lower().split()
sample_dictionary=collections.defaultdict(set)
for word in my_string:
sample_dictionary[len(word)].add(word)
return dict(sample_dictionary)
Or use itertools.groupby, but for this you have to sort by length, first:
words_sorted = sorted(my_string.lower().split(), key=len)
return {k: set(g) for k, g in itertools.groupby(words_sorted, key=len)}
Example (same result for each of the three implementations):
>>> n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")
{2: {'is'}, 3: {'way', 'the', 'you', 'see', 'and'}, 4: {'what', 'them', 'they'}, 5: {'treat'}, 6: {'become', 'people'}}
With sample_dictionary[words]=word you overwrite the current contents which you have put there so far. You need a list, and to that you can append.
Instead of that you need:
if words in sample_dictionary.keys():
sample_dictionary[words].append(word)
else:
sample_dictionary[words]=[word]
So if there is a value to this key, I append to it, and else create a new list.
You can use a defaultdict found in the collections library. You can use it to create a default type for the value portion of your dictionary, in this case a list, and just append to it based on the length of your word.
from collections import defaultdict
def n_letter_dictionary(my_string):
my_dict = defaultdict(list)
for word in my_string.split():
my_dict[len(word)].append(word)
return my_dict
You could still do this without defaultdict's, but would just be a little longer in length.
def n_letter_dictionary(my_string):
my_dict = {}
for word in my_string.split():
word_length = len(word)
if word_length in my_dict:
my_dict[word_length].append(word)
else:
my_dict[word_length] = [word]
return my_dict
To ensure no duplicated in the values list, without using set(). Be warned though, if your value lists are large, and your input data is fairly unique, you'll experience a performance setback as checking if the value already exists in the list will only early exit once it is encountered.
from collections import defaultdict
def n_letter_dictionary(my_string):
my_dict = defaultdict(list)
for word in my_string.split():
if word not in my_dict[len(word)]:
my_dict[len(word)].append(word)
return my_dict
# without defaultdicts
def n_letter_dictionary(my_string):
my_dict = {} # Init an empty dict
for word in my_string.split(): # Split the string and iterate over it
word_length = len(word) # Get the length, also the key
if word_length in my_dict: # Check if the length is in the dict
if word not in my_dict[word_length]: # If the length exists as a key, but the word doesn't exist in the value list
my_dict[word_length].append(word) # Add the word
else:
my_dict[word_length] = [word] # The length/key doesn't exist, so you can safely add it without checking for its existence
So if you have a high frequency of duplicates and a short list of words to scan through, this approach would be acceptable. If you had for example a list of randomly generated words with just permutations of alphabetic characters, causing the value list to bloat, scanning through them will become expensive.
The shortest solution I came up with uses a defaultdict:
from collections import defaultdict
sentence = ("The way you see people is the way you treat them"
" and the Way you treat them is what they become")
Now the algorithm:
wordsOfLength = defaultdict(list)
for word in sentence.split():
wordsOfLength[len(word)].append(word)
Now wordsOfLength will hold the desired dictionary.
itertools groupby is the perfect tools for this.
from itertools import groupby
def n_letter_dictionary(string):
result = {}
for key, group in groupby(sorted(string.split(), key = lambda x: len(x)), lambda x: len(x)):
result[key] = list(group)
return result
print n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")
# {2: ['is', 'is'], 3: ['The', 'way', 'you', 'see', 'the', 'way', 'you', 'and', 'the', 'Way', 'you'], 4: ['them', 'them', 'what', 'they'], 5: ['treat', 'treat'], 6: ['people', 'become']}
my_string="a aa bb ccc a bb".lower().split()
sample_dictionary={}
for word in my_string:
words=len(word)
if words not in sample_dictionary:
sample_dictionary[words] = []
sample_dictionary[words].append(word)
print(sample_dictionary)

Categories