regex for combining length, inclusion and exclusion?

regex for combining length, inclusion and exclusion? - python

A search on SO with just [regex] gave me 249'446 hits and a search with [regex] inclusion exclusion gave me 47 hits but I guess none of the latter (maybe some of the former?) fit my case.
I am also aware, e.g. about this regex page https://www.regular-expressions.info/refquick.html,
but I guess there might be a regex concept which I am not yet familiar with
and would be grateful for hints.
Here is a minimal example of what I am trying to do with a given list of strings.
Find all items which:
have a fixed defined number of characters, i.e. length
must include all characters from a certain list (doesn't matter at what position and if multiple times)
must NOT include any characters from a certain list
Constructs like: [ei^no]{4}, ((?![no])[ei]){4} and a lot of other more complex trials didn't give the desired results.
Hence, I currently implemented this as a 3 step process with checking the length, doing a search and a match. This looks pretty cumbersome and inefficient to me.
Is there a more efficient way to do this?
Script:
import re
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve']
count = 4
mustContain = 'ei' # all of these charactes at least once
mustNotContain = 'no' # none of those chars
hits1 = []
for item in items:
if len(item)==count:
hits1.append(item)
print("Hits1:",hits1)
hits2 = []
for hit in hits1:
regex = '[{}]'.format(mustContain)
if re.search(regex,hit):
hits2.append(hit)
print("Hits2:", hits2)
hits3 = []
for hit in hits2:
regex = '[{}]'.format(mustNotContain)
if re.match(regex,hit):
hits3.append(hit)
print("Hits3:", hits3)
Result:
Hits1: ['four', 'five', 'nine']
Hits2: ['five', 'nine']
Hits3: ['five']

If you are interested in a regex approach, you can create a single dynamic pattern that looks like:
^(?=.{4}$)(?![^no\n]*[no])(?=[^e\n]*e)[^i\n]*i.*$
Explanation
^ Start of string
(?=.{4}$) Assert 4 characters
(?![^no\n]*[no]) Assert no occurrence of n or o to the right using a leading negated character class
(?=[^e\n]*e) Assert an e char to the right
[^i\n]*i Match any char except i and then match i
.* Match the rest of the line
$ end of string
See a regex demo and a Python demo.
Example
import re
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'tree']
hits = [item for item in items if re.match(r"(?=.{4}$)(?![^no\n]*[no])(?=[^e\n]*e)[^i\n]*i.*$", item)]
print(hits)
Output
['five']
Using a variation of all and a list comprehension:
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'tree']
count = 4
mustContain = ["e", "i"] # all of these characters at least once
mustNotContain = ["n", "o"] # none of those chars
hits = [
item for item in items if
len(item) == count and
all([c in item for c in mustContain]) and
all([c not in item for c in mustNotContain])
]
print(hits)
Output
['five']
See a Python demo.

Apparently, the "trick" which I was missing was the "Positive lookahead" (?=regex).
I guess the regex in #Thefourthbird's solution can be shortened,
unless I overlooked something and somebody will prove me wrong.
The regex for the included characters can be generated dynamically.
The regex for the original minimal example of the question would be:
^(?=.{4}$)(?!.*[no])(?=.*e)(?=.*i)
Script: (dynamically generated regex)
import re
items = ['one', 'two', 'three', 'four', 'five', 'six',
'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve',
'tree', 'mean', 'mine', 'fine', 'dime', 'eire']
count = 4
mustContain = 'ei' # all of these characters at least once
mustNotContain = 'no' # none of those chars
hits = []
regex1 = '^(?=.{' + str(count) + '}$)' # limit number of chars
regex2 = '(?!.*[' + mustNotContain + '])' if mustNotContain else '' # excluded chars
regex3 = ''.join(['(?=.*{})'.format(c) for c in mustContain]) # included chars
regex = regex1 + regex2 + regex3
for item in items:
if re.match(regex,item,re.IGNORECASE):
hits.append(item)
print("Hits:", hits)
Result:
Hits: ['five', 'dime', 'eire']

Related

How to continue for loop until it meets condition? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to looping until it meets the condition. In this case i want to continue till List_list looks like
["one","one","two","two","three","three","four","four","five","five","six","seven","eight","nine","ten"]
lst =["one","two","three","four","five","six","seven","eight","nine","ten”]
List_list = list()
for rn in lst:
List_list.append(rn)
if 15 == len(List_list):
break

Ask #2:
Solution to repeat first 5 items, then single instance of next 5 items
lst =["one","two","three","four","five","six","seven","eight","nine","ten"]
List_list = []
for i in range(10):
List_list.append(lst[i])
if i < 5:
List_list.append(lst[i])
print (List_list)
The output of this will be:
['one', 'one', 'two', 'two', 'three', 'three', 'four', 'four', 'five', 'five', 'six', 'seven', 'eight', 'nine', 'ten']
If you are looking for a single line answer using list comprehension, then you can use this.
List_list = [y for x in lst[:5] for y in [x,x]] + [x for x in lst[5:]]
print (List_list)
Output is the same:
['one', 'one', 'two', 'two', 'three', 'three', 'four', 'four', 'five', 'five', 'six', 'seven', 'eight', 'nine', 'ten']
Ask #1:
Solution for earlier question: Add 15 items to a list: All 10 items from original list + first from original list
You can do something as simple as this:
List_lst = lst + lst[:5]
print (List_lst)
If you still insist on using a for loop and you want 15 items, then do this and it will give you same output.
List_list = list()
for i in range(15):
List_list.append(lst[i%10])
print (List_list)
A list comprehension version of this will be:
List_list = [lst[i%10] for i in range(15)]
print (List_list)
If you want to fix your code with a while loop, see the details below.
Convert the for loop to while True:. Start iterating using a counter i and check for mod of 10 to get the position to be inserted.
lst = ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"]
List_list = list()
i = 0
while True:
List_list.append(lst[i%10])
i+=1
if len(List_list) == 15:
break
print (List_list)
This will result in
["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "one", "two", "three", "four", "five"]

This seems like a simple modulo 10 loop...
lst =["01.one","02.two","03.three","04.four","05.five","06.six","07.seven","08.eight","09.nine","10.ten"]
[w[3:] for w in sorted([lst[n%10] for n in range(15)])]
output
['one', 'one', 'two', 'two', 'three', 'three', 'four', 'four', 'five', 'five', 'six', 'seven', 'eight', 'nine', 'ten']

["one","two","three","four","five","six","seven","eight","nine","ten","one","two","three","four","five"]
lst =["one","two","three","four","five","six","seven","eight","nine","ten"]
List_list = []
length = 0
# Respect the looping statement
while length<15:
List_list.append(lst[length%10])
length+=1
print(List_list)
#Output ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'one', 'two', 'three', 'four', 'five']

EDIT 2: Complemented the answer for the updated question.
Using itertools from Python Standard Library, you can do it in three steps (steps 1 and 2 can be combined):
Use the cycle function to create an infinite iterator of the original list.
Use the islice function to get the first 15 elements from the infinite iterator.
Sort items in the resultant list by the position in the original list.
from itertools import cycle, islice
lst = ["one","two","three","four","five","six","seven","eight","nine","ten"]
infinite_lst = cycle(lst)
List_list = list(islice(infinite_lst, 15))
List_list.sort(key=lst.index)
print(List_list)
And here you have:
['one', 'one', 'two', 'two', 'three', 'three', 'four', 'four', 'five', 'five', 'six', 'seven', 'eight', 'nine', 'ten']

Use itertools.cycle:
from itertools import cycle
lst = {
"one": 1,
"two": 2,
"three": 3,
"four": 4,
"five": 5,
"six": 6,
"seven": 7,
"eight": 8,
"nine": 9,
"ten": 10
}
def get_lst(times):
output = []
for item in cycle(lst):
output.append(item)
if len(output) >= times:
break
return sorted(output, key=lambda i: lst[i])
print(get_lst(10))
-> ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten']
print(get_lst(11))
-> ['one', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten']

I agree that the second code in this answer is not more complicated as it is much easier to replicate for any other code. The best programmers create code which can be reproduced in any circumstance the best.

How to filter list items based on regex in python?

I have two list items and I want to generate a list based on non-matching items. Here is what I am trying to do:
from __future__ import print_function
import os
mainline_list = ['one', 'two', 'three', 'four']
non_main_list = ['two', 'seven', 'six', 'four', 'four-3.1', 'new_three']
itemized_list = [item for item in non_main_list if item not in mainline_list]
print(itemized_list)
What is returns is
['seven', 'six', 'four-3.1', 'new_three']
while what I want is:
['seven', 'six']

Regex is not necessary, you can use all() builtin function:
mainline_list = ['one', 'two', 'three', 'four']
non_main_list = ['two', 'seven', 'six', 'four', 'four-3.1', 'new_three']
print([item for item in non_main_list if all(i not in item for i in mainline_list)])
Prints:
['seven', 'six']

Search list with another list but stop on first match

I have two lists, a short one and a longer one.
list1= ['one', 'two']
list2= ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
I need to search the long list for every word in the short list. If it finds a match, stop searching and do something. If it doesn't find it, do something else. The actual list can be quite long so if it finds it I don't want it to keep looking. The only part I can't figure out is getting it to stop once found. Maybe my search terms are wrong. How do I get it to stop search once found, return None if not found? What's the most efficient or pythonic way of doing this? Here is what I have (the fuzzy search is part of something else):
for name in list1:
for dict in reversed(list2):
if fuzz.WRatio(name, dict['Number']) > 90:
I know I can add what to do when found and then break but then I'm not sure what to do if it isn't found except put in another if but now it's starting to seem kludgy.

The pattern you described is often designed to be a function of the form def find(content, pattern) -> offset.
You iterate over the candidates and find the first one matching the pattern, which in your case is by checking if it matches any string in the second list.
When there's no match found, this kind of function often returned -1, for example, the string.find method in Python returns -1 when nothing's found.
So in your case you may create a function like the following:
def find(candidates, patterns):
for i, name in enumerate(candidates):
for dict in reversed(patterns):
if fuzz.WRatio(name, dict['Number']) > 90:
return i # return the index of the name match a pattern
return -1

As far as I understand, maybe code like this is what you want.
list1 = ['one', 'two']
list2 = ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
list1_count = 0
for name1 in list1:
for name2 in list2:
if name1 == name2:
list1_count = list1_count + 1
break
if list1_count == len(list1):
print("found")
else:
print("not found")
Lines from list1_count = 0 to break can be (maybe more Pythonically) replaced to:
list1_count = 0
for name1 in list1:
if name1 in list2:
list1_count = list1_count + 1

I don't know if I understand what you're looking for, but something that finds the first value and stops it
list1 = ['one', 'two']
list2 = ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
for l in list1:
a = list2.index(l)
break
print(a)
If you want to return None if you find nothing, try
list1 = ['one', 'two']
list2 = ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
try:
for l in list1:
a = list2.index(l)
break
except:
a = None
print(a)

The following will tell you if all of the values from list1 are in list2.
all_in = all([val in list2 for val in list1])
If all of the values from list1 are in list2, the value of all_in will be True, and if they weren't, the value of all_in will be False.
If you wanted, you could use this line directly to control your if-else logic.
if all([val in list2 for val in list1]):
#do thing if match
else:
#do thing if no match
Edit
If you were looking for the first match of any word in the first list, this might be closer to what you were looking for.
This will give you a True value if there is any match from the first list in the second. Again you can use this for an if statement.
any_in = any((val in list2 for val in list1))
If you need the value of the first match, or a None value if no match is found, this should work.
first_match = next((val for val in list1 if val in list2), None)
That will make use of Python's generators to stop on the very first matching case of any of the words in the first list.
Edit 2
I think I'm pretty sure that the behavior that you were trying to describe was nesting the loops.
for val in list1:
if val in list2:
#do something
else:
#do something else

Printing numbers from 1-100 as words in Python 3

List_of_numbers1to19 = ['one', 'two', 'three', 'four', 'five', 'six', 'seven',
'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen',
'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen',
'nineteen']
List_of_numbers1to9 = List_of_numbers1to19[0:9]
List_of_numberstens = ['twenty', 'thirty', 'fourty', 'fifty', 'sixty', 'seventy',
'eighty', 'ninety']
for i in List_of_numbers1to19:
print(i)
list_of_numbers21to99 = []
count = 19
tens_count = 0
for j in List_of_numberstens:
for k in List_of_numbers1to9:
if tens_count%10 == 0:
#should print an iteration of List_of_numberstens
tens_count +=1
tens_count +=1
print(j, k)
As you can see, this is getting messy :P So sorry for that.
Basically I'm trying to print them using three different for-loops with a different index. I have tried slicing the list and indexing the list, but I keep getting output for the numbers multipliable by 10 as the full list of List_of_numberstens.
I think it's clear what I'm trying to do here.
Thanks in advance for your help!

I know you already accepted an answer, but you particularly mentioned nested loops - which it doesn't use - and you're missing what's great about Python's iteration and not needing to do that kind of i//10-2 and print(j,k) stuff to work out indexes into lists.
Python's for loop iteration runs over the items in the list directly and you can just print them, so I answer:
digits = ['one', 'two', 'three', 'four', 'five',
'six', 'seven', 'eight', 'nine']
teens = ['ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen',
'sixteen', 'seventeen', 'eighteen', 'nineteen']
tens = ['twenty', 'thirty', 'fourty', 'fifty',
'sixty', 'seventy', 'eighty', 'ninety']
for word in digits + teens:
print(word)
for tens_word in tens:
print(tens_word) # e.g. twenty
for digits_word in digits:
print(tens_word, digits_word) # e.g. twenty one
print("one hundred")
Try it online at repl.it

I think you're overcomplicating the 20-100 case. From 20-100, numbers are very regular. (i.e. they come in the form <tens_place> <ones_place>).
By using just one loop instead of nested loops makes the code simpler to follow. Now we just need to figure out what the tens place is, and what the ones place is.
The tens place can be easily found by using integer division by 10. (we subtract 2 since the list starts with twenty).
The ones place can similarly be found by using the modulo operator by 10.
(we subtract 1 since the list starts with 1 and not 0).
Finally we just take care of the case of the ones place being 0 separately by using an if statement (and just not print any ones place value).
List_of_numbers1to19 = ['one', 'two', 'three', 'four', 'five', 'six', 'seven',
'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen',
'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen',
'nineteen']
List_of_numberstens = ['twenty', 'thirty', 'fourty', 'fifty', 'sixty', 'seventy',
'eighty', 'ninety']
for i in range(19):
print(List_of_numbers1to19[i])
for i in range(20, 100):
if i%10 == 0: #if multiple of ten only print tens place
print(List_of_numberstens[i//10-2]) #20/10-2 = 0, 30/10-2 = 1, ...
else: #if not, print tens and ones place
print(List_of_numberstens[i//10-2] + ' ' + List_of_numbers1to19[i%10-1])

Python - Unique Lists [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a list of lists. I want to get all unique lists based on just the first three elements. If there are duplicates, then it should just return the last item. So for instance based on this
[['one', 'two', 'three', 'teennn'], ['five', 'five', 'five', 'five'],
['seven', 'nine', 'ten', 'eleven'], ['one', 'two', 'three', 'four']]
I want to return this
[['five', 'five', 'five', 'five'],
['seven', 'nine', 'ten', 'eleven'], ['one', 'two', 'three', 'four']]

lst = [['one', 'two', 'three', 'teennn'],
['five', 'five', 'five', 'five'],
['seven', 'nine', 'ten', 'eleven'],
['one', 'two', 'three', 'four']]
output = []
seen = set()
lst.reverse()
for item in lst:
if not item[:3] in seen:
output.append(item)
seen.add(item[:3])
output.reverse()
This ensures that the first three items are always unique. Starting from the end of your list lst, using reverse, ensures that the last appearance of each starting set is included.

If order isn't important, then you can use a dict:
data = [['one', 'two', 'three', 'teennn'], ['five', 'five', 'five', 'five'], ['seven', 'nine', 'ten', 'eleven'], ['one', 'two', 'three', 'four']]
new = {tuple(el[:3]): el for el in data}.values()
# [['one', 'two', 'three', 'four'], ['seven', 'nine', 'ten', 'eleven'], ['five', 'five', 'five', 'five']]
Or, if you really wanted to maintain order, then something like:
new = [data[idx] for idx in sorted({tuple(el[:3]): idx for idx, el in enumerate(data)}.values())]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

regex for combining length, inclusion and exclusion? - python

Related

How to continue for loop until it meets condition? [closed]

How to filter list items based on regex in python?

Search list with another list but stop on first match

Printing numbers from 1-100 as words in Python 3

Python - Unique Lists [closed]

Categories

Resources