My issue is as follows: I want to create a program which accepts strings divided from each other by one space. Then the program should prompt a number, which is going to be the amount of words it's going to shift forward. I also want to use lists for words as well as for the output, because of practice.
Input: one two three four five six seven 3
Output: ['four', 'five', 'six', 'seven', 'one', 'two', 'three']
This is what I've came up with. For the input I've used the same input as above. However, when I try increasing a prompt number by N, the amount of appended strings to list cuts by N. Same happens when I decrease the prompt number by N (the amount of appended strings increases by N). What can be an issue here?
l_words = list(input().split())
shift = int(input()) + 1 #shifting strings' number
l = [l_words[shift - 1]]
for k in range(len(l_words)):
if (shift+k) < len(l_words):
l.append(l_words[shift+k])
else:
if (k-shift)>=0:
l.append(l_words[k-shift])
print(l)
You can use slicing by just joining the later sliced part to the initial sliced part given the rotation number.
inp = input().split()
shift_by = int(inp[-1])
li = inp[:-1]
print(li[shift_by:] + li[:shift_by]) # ['four', 'five', 'six', 'seven', 'one', 'two', 'three']
Related
A search on SO with just [regex] gave me 249'446 hits and a search with [regex] inclusion exclusion gave me 47 hits but I guess none of the latter (maybe some of the former?) fit my case.
I am also aware, e.g. about this regex page https://www.regular-expressions.info/refquick.html,
but I guess there might be a regex concept which I am not yet familiar with
and would be grateful for hints.
Here is a minimal example of what I am trying to do with a given list of strings.
Find all items which:
have a fixed defined number of characters, i.e. length
must include all characters from a certain list (doesn't matter at what position and if multiple times)
must NOT include any characters from a certain list
Constructs like: [ei^no]{4}, ((?![no])[ei]){4} and a lot of other more complex trials didn't give the desired results.
Hence, I currently implemented this as a 3 step process with checking the length, doing a search and a match. This looks pretty cumbersome and inefficient to me.
Is there a more efficient way to do this?
Script:
import re
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve']
count = 4
mustContain = 'ei' # all of these charactes at least once
mustNotContain = 'no' # none of those chars
hits1 = []
for item in items:
if len(item)==count:
hits1.append(item)
print("Hits1:",hits1)
hits2 = []
for hit in hits1:
regex = '[{}]'.format(mustContain)
if re.search(regex,hit):
hits2.append(hit)
print("Hits2:", hits2)
hits3 = []
for hit in hits2:
regex = '[{}]'.format(mustNotContain)
if re.match(regex,hit):
hits3.append(hit)
print("Hits3:", hits3)
Result:
Hits1: ['four', 'five', 'nine']
Hits2: ['five', 'nine']
Hits3: ['five']
If you are interested in a regex approach, you can create a single dynamic pattern that looks like:
^(?=.{4}$)(?![^no\n]*[no])(?=[^e\n]*e)[^i\n]*i.*$
Explanation
^ Start of string
(?=.{4}$) Assert 4 characters
(?![^no\n]*[no]) Assert no occurrence of n or o to the right using a leading negated character class
(?=[^e\n]*e) Assert an e char to the right
[^i\n]*i Match any char except i and then match i
.* Match the rest of the line
$ end of string
See a regex demo and a Python demo.
Example
import re
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'tree']
hits = [item for item in items if re.match(r"(?=.{4}$)(?![^no\n]*[no])(?=[^e\n]*e)[^i\n]*i.*$", item)]
print(hits)
Output
['five']
Using a variation of all and a list comprehension:
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'tree']
count = 4
mustContain = ["e", "i"] # all of these characters at least once
mustNotContain = ["n", "o"] # none of those chars
hits = [
item for item in items if
len(item) == count and
all([c in item for c in mustContain]) and
all([c not in item for c in mustNotContain])
]
print(hits)
Output
['five']
See a Python demo.
Apparently, the "trick" which I was missing was the "Positive lookahead" (?=regex).
I guess the regex in #Thefourthbird's solution can be shortened,
unless I overlooked something and somebody will prove me wrong.
The regex for the included characters can be generated dynamically.
The regex for the original minimal example of the question would be:
^(?=.{4}$)(?!.*[no])(?=.*e)(?=.*i)
Script: (dynamically generated regex)
import re
items = ['one', 'two', 'three', 'four', 'five', 'six',
'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve',
'tree', 'mean', 'mine', 'fine', 'dime', 'eire']
count = 4
mustContain = 'ei' # all of these characters at least once
mustNotContain = 'no' # none of those chars
hits = []
regex1 = '^(?=.{' + str(count) + '}$)' # limit number of chars
regex2 = '(?!.*[' + mustNotContain + '])' if mustNotContain else '' # excluded chars
regex3 = ''.join(['(?=.*{})'.format(c) for c in mustContain]) # included chars
regex = regex1 + regex2 + regex3
for item in items:
if re.match(regex,item,re.IGNORECASE):
hits.append(item)
print("Hits:", hits)
Result:
Hits: ['five', 'dime', 'eire']
I have lists of words in python. In the list elements I have numbers written as words. For example:
list = ['man', 'ball', 'apple', 'thirty-one', 'five', 'seven', 'twelve', 'queen']
I have also the dictionary with every number written as word as the key and the corresponding digit as value. For example:
n_dict = {'zero':0, 'one':1, 'two':2, ...., 'hundred':100}
What I need to do is to identify let's say 4 or more (greater than 4) numbers written as words consecutively in the list and convert them to digits based on the dictionary. For example list should be like:
list = ['man', 'ball', 'apple', '31', '5', '7', '12', 'queen']
However, if there are less consecutive elements than the number specified (in our case 4) the list shall be the same. For example:
list2 = ['bike', 'earth', 't-shirt', 'twenty-five', 'zero', 'seven', 'home', 'bottle']
list2 Shall remain as it is.
In addition, if there are multiple sequences with numbers written as words but they are not reaching the minimum amount of consecutive words required the words should not change to digits. For example:
list3 = ['stairs', 'tree', 'street', 'forty-two', 'nine', 'submarine', 'two', 'eighty-five']
list3 Shall remain as it is.
The sequence of numbers written as words can be anywhere at the list. At the beginning, at the end, somewhere in the middle.
What I have tried so far:
def checkConsecutive(l):
return sorted(l) == list(range(min(l), max(l)+1))
def replace_numbers(word_list, num_dict):
flag = False
intersect = list(set(word_list) & set(n_dict.keys()))
intersect_index = [word_list.index(elem) for elem in intersect]
flag = check_if_consecutive(intersect_index)
if (len(intersect_index) > 4) & flag:
flag = True
for index in intersect_index:
word_list[index] = n_dict[word_list[index]]
return word_list, flag
I need to return the flag as well to keep track which of the lists changed.
The above code works fine but I think it's not that efficient. My question is whether can be implemented in a better way. E.g. using operator.itemgetter or something in a similar fashion.
For digits
from itertools import filterfalse
list_of_strings_that_are_secretly_integers = [*filterfalse(lambda x: isinstance(x, bool), (n_dict.get(i, False) for i in list_of_strings))]
For consecutivity, the following should work for any indexed candidate
def continuous(candidate, differential=1):
return all(e == candidate[i-1] + differential for i, e in enumerate(candidate[1:]))
I have two lists, a short one and a longer one.
list1= ['one', 'two']
list2= ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
I need to search the long list for every word in the short list. If it finds a match, stop searching and do something. If it doesn't find it, do something else. The actual list can be quite long so if it finds it I don't want it to keep looking. The only part I can't figure out is getting it to stop once found. Maybe my search terms are wrong. How do I get it to stop search once found, return None if not found? What's the most efficient or pythonic way of doing this? Here is what I have (the fuzzy search is part of something else):
for name in list1:
for dict in reversed(list2):
if fuzz.WRatio(name, dict['Number']) > 90:
I know I can add what to do when found and then break but then I'm not sure what to do if it isn't found except put in another if but now it's starting to seem kludgy.
The pattern you described is often designed to be a function of the form def find(content, pattern) -> offset.
You iterate over the candidates and find the first one matching the pattern, which in your case is by checking if it matches any string in the second list.
When there's no match found, this kind of function often returned -1, for example, the string.find method in Python returns -1 when nothing's found.
So in your case you may create a function like the following:
def find(candidates, patterns):
for i, name in enumerate(candidates):
for dict in reversed(patterns):
if fuzz.WRatio(name, dict['Number']) > 90:
return i # return the index of the name match a pattern
return -1
As far as I understand, maybe code like this is what you want.
list1 = ['one', 'two']
list2 = ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
list1_count = 0
for name1 in list1:
for name2 in list2:
if name1 == name2:
list1_count = list1_count + 1
break
if list1_count == len(list1):
print("found")
else:
print("not found")
Lines from list1_count = 0 to break can be (maybe more Pythonically) replaced to:
list1_count = 0
for name1 in list1:
if name1 in list2:
list1_count = list1_count + 1
I don't know if I understand what you're looking for, but something that finds the first value and stops it
list1 = ['one', 'two']
list2 = ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
for l in list1:
a = list2.index(l)
break
print(a)
If you want to return None if you find nothing, try
list1 = ['one', 'two']
list2 = ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
try:
for l in list1:
a = list2.index(l)
break
except:
a = None
print(a)
The following will tell you if all of the values from list1 are in list2.
all_in = all([val in list2 for val in list1])
If all of the values from list1 are in list2, the value of all_in will be True, and if they weren't, the value of all_in will be False.
If you wanted, you could use this line directly to control your if-else logic.
if all([val in list2 for val in list1]):
#do thing if match
else:
#do thing if no match
Edit
If you were looking for the first match of any word in the first list, this might be closer to what you were looking for.
This will give you a True value if there is any match from the first list in the second. Again you can use this for an if statement.
any_in = any((val in list2 for val in list1))
If you need the value of the first match, or a None value if no match is found, this should work.
first_match = next((val for val in list1 if val in list2), None)
That will make use of Python's generators to stop on the very first matching case of any of the words in the first list.
Edit 2
I think I'm pretty sure that the behavior that you were trying to describe was nesting the loops.
for val in list1:
if val in list2:
#do something
else:
#do something else
Say I have a list of strings such as
words = ['one', 'two', 'one', 'three', 'three']
I want to create a new list in alphabetical order like
newList = ['one', 'three', 'two']
Anyone have any solutions? I have seen suggestions that output duplicates, but I cannot figure out how to achieve this particular goal (or maybe I just can't figure out how to google well.)
Throw the contents into a set to remove duplicates and sort:
newList = sorted(set(words))
OR maybe this, using set:
newList=sorted({*words})
If Order of elements in words is important for you. You can try this.
from collections import OrderedDict
words = ['one', 'two', 'one', 'three', 'three']
w1 = OrderedDict()
for i in words:
if i in w1:
w1[i]+=1
else:
w1[i] = 1
print(w1.keys())
I have a test file with the following format:
one
two
three
four
=
five
six
seven
eight
=
nine
ten
one
two
=
and I am writing a python code to create a list, with each element in the text to be an item in a list:
dump = sys.argv[1]
lines = []
with open(dump) as f:
for line in f:
x = line.strip()
lines.append(x)
print(lines)
lines list =
['one', 'two', 'three', 'four', '=', 'five', 'six', 'seven', 'eight', '=', 'nine', 'ten', 'one', 'two', '=']
I then get the indexes of the equals signs in order to try to use those at a later point to make a new list, combining the strings:
equals_indexes = [i for i, x in enumerate(lines) if x == '=']
equals_indexes list:
[4, 9, 14]
I am good up until this point. Now I would like to join the strings one, two, three, four before the first index as new_list element 1. I would like to join the next group of strings between equals sign 1 and 2, and the next group of strings between equals sign 2 and 3 to produce the following:
[[one two three four], [five six seven eight], [nine ten one two]]
I have tried to do this by iterating over the list of equals indexes, then iterating over the list lines:
for i in equals_indexes:
sequences = ""
for x,y in enumerate(lines):
if x < i:
sequences = ' '.join(lines[x:i])
groups.append(sequences)
print(groups)
Which produces the following:
['one two three four', 'two three four', 'three four', 'four', 'one two three four = five six seven eight', 'two three four = five six seven eight', ....]
I understand why this is happening, because at each iteration of x, it is checking to see if it is less than i and if so appending each string at x to the string "sequences". I am doing this because I have a large file with huge blocks of text corresponding to one iteration of a program. The separator between iteration 1 and iteration 2 of the program is a single '=' in the line. This way I can parse the list elements after I am able to split them by equals sign.
Any help would be great!
I think this gets you what you are looking for, although there is one part that is unclear. If you want to join the strings between equals signs as each element in your final list:
with open(dump) as f:
full_string = ' '.join([line.strip() for line in f])
my_list = [string.strip() for string in full_string.split('=') if string is not '']
print(my_list)
['one two three four', 'five six seven eight', 'nine ten one two']
If, instead, you want sub-lists comprising each string between the equals signs, just replace my_list above with:
my_list = [[s for s in string.split()] for string in full_string.split('=') if string is not '']
[['one', 'two', 'three', 'four'], ['five', 'six', 'seven', 'eight'], ['nine', 'ten', 'one', 'two']]
Bonus, they use list comprehensions which are a much more pythonic way of looping:
Here's a small IDLE example:
>>> stuff = ['a', 'b', 'c', '=', 'd', 'e', '=', 'f', 'g']
>>> "".join(stuff).split('=')
['abc', 'de', 'fg']
It joins all of the characters together (So you can skip separating them out into separate lists), and then splits that string on the = character.
Read in lines until you hit a =, merge them as one listentry and add it, continue until done, put last line-list content in:
t = """one
two
three
four
=
five
six
seven
eight
=
nine
ten
one
two
="""
data = [] # global list
line = [] # temp list
for n in [x.strip() for x in t.splitlines()]:
if n == "=":
if line:
data.append(' '.join(line))
line = []
else:
line.append(n)
if line:
data.append(' '.join(line))
print(data)
Output:
['one two three four', 'five six seven eight', 'nine ten one two']