A simple question about regex usage in list

A simple question about regex usage in list - python

I have a list in list as ''list_all'' below, I am looking for a word as stated 'c' below. Ther is no 'c' in second list. Codes below give results as ['c', 'c'] but I want to have ['c', '', 'c'] as to be same lenght 'list_all'. Could you please help me on it how can I put empty element to result.
import re
list_all = [['a','b','c','d'],['a','b','d'],['a','b','c','d','e']]
listofresult =[]
for h in [*range(len(list_all))]:
for item in list_all[h]:
patern = r"(c)"
if re.search(patern, item):
listofresult.append(item)
else:
None
print(listofresult)

try this
import re
list_all = [['a','b','c','d'],['a','b','d'],['a','b','c','d','e']]
temp = True
listofresult =[]
for h in range(len(list_all)):
for item in list_all[h]:
patern = r"(c)"
if re.search(patern, item):
listofresult.append(item)
temp = False
if temp:
listofresult.append("")
temp = True
print(listofresult)

That's an unusual use of regex! but if you insist, this correction might help:
import re
list_all = [['a', 'b', 'c', 'd'], ['a', 'b', 'd'], ['a', 'b', 'c', 'd', 'e']]
list_of_result = []
for h in list_all:
result = ''
for item in h:
pattern = r"(c)"
if re.search(pattern, item):
result = item
break
if result:
list_of_result.append(result)
else:
list_of_result.append('')
print(list_of_result)

Related

How to remove elements from a list that appear less than k = 2?

I am trying to keep elements of a list that appear at least twice, and remove the elements that appear less than twice.
For example, my list can look like:
letters = ['a', 'a', 'b', 'b', 'b', 'c']
I want to get a list with the numbers that appear at least twice, to get the following list:
letters_appear_twice = ['a', 'b'].
But since this is part of a bigger code, I don't know exactly what my lists looks like, only that I want to keep the letters that are repeated at least twice. But for the sake of understanding, we can assume I know what the list looks like!
I have tried the following:
'''
letters = ['a', 'a', 'b', 'b', 'b', 'c']
for x in set(letters):
if letters.count(x) > 2:
while x in letters:
letters.remove(x)
print(letters)
'''
But this doesn't quite work like I want it too...
Thank you in advance for any help!

letters = ['a', 'a', 'b', 'b', 'b', 'c']
res = []
for x in set(letters):
if letters.count(x) >= 2:
res.append(x)
print(res)
Prints:
['b', 'a']

Using your code above. You can make a new list, and append to it.
new_list = []
for x in set(letters):
if letters.count(x) >= 2:
new_list.append(x)
print(new_list)
Output
['b', 'a']

Easier to create a new list instead of manipulating the source list
def letters_more_or_equal_to_k(letters, k):
result = []
for x in set(letters):
if letters.count(x) >= k:
result.append(x)
result.sort()
return result
def main():
letters = ['a', 'a', 'b', 'b', 'b', 'c']
k = 2
result = letters_more_or_equal_to_k(letters, k)
print(result) # prints ['a', 'b']
if __name__ == "__main__":
main()

If you don't mind shuffling the values, here's one possible solution:
from collections import Counter
letters = ['a', 'a', 'b', 'b', 'b', 'c']
c = Counter(letters)
to_remove = {x for x, i in c.items() if i < 2}
result = list(set(letters) - to_remove)
print(result)
Output:
['a', 'b']
You can always sort later.
This solution is efficient for lists with more than ~10 unique elements.

How to compare strings containing numbers in a list?

how do I delete a row of strings containing numbers bigger than a certain threshold from a list.
I want my threshold to be more than 60, since it's impossible for songs to have a duration of 4:63 (4 mins 63 secs)
List before being filtered:
[['4:63', ' Test', 'Results'], ['A', 'B', 'C'], ['D', '4:20', 'F']]
Intended List after being filtered:
[['A', 'B', 'C'], ['D', '4:20', 'F']]
So basically my question is: how do I identify inside a list of list, and delete the rows for the strings containing a number bigger than 60.
My code:
def get_csv_as_table(a, b):
import csv
with open(a) as csv_file:
file_reader = csv.reader(csv_file, delimiter=b)
member = list(file_reader)
for i in range(len(member)):
for j in range(len(member)):
if member[i][j].isdigit():
member[i][j] = int(member[i][j])
print(member)
return member
def filter_table(member):
clean_sample_data = [row for row in member if not "" in row]
print(clean_sample_data)
member = []
print ("Enter filename: ")
a = input()
print ("Enter the delimiter: ")
b = input()
member = get_csv_as_table(a, b)
filter_table(member)

You can use a regex to get all the numbers and then use any() to check that none of them are above 60 in a list comprehension:
import re
lst = [['4:63', ' Test', 'Results'], ['A', 'B', 'C'], ['D', '4:20', 'F']]
filtered_lst = [
sub_lst
for sub_lst in lst
if not any(x > 60 for x in map(int, re.findall(r'\d+', ''.join(sub_lst))))
]
print(filtered_lst)
Output:
[['A', 'B', 'C'], ['D', '4:20', 'F']]

One way letting datetime.strptime to judge:
import re
from datetime import datetime
matcher = lambda x: re.match("\d+:\d+", x)
def is_validtime(str_):
if matcher(str_):
try:
ms = datetime.strptime(str_, "%M:%S")
except:
return False
return True
[i for i in l if all(is_validtime(j) for j in i)]
Output:
[['A', 'B', 'C'], ['D', '4:20', 'F']]

How to number elements in a list of string and return a new list?

So I have a list of strings. I want to create a new list of string which turns the same string into a new string and name it "A". If there's a different string in the list, name it "B" and so on.
If the string is:
['F4','A3','F4','B5','A3','K2']
Then it should give me a result of:
['A','B','A','C','B','D']
I don't know how to start the code and can only think of something like a dictionary.
dict = {}
result = []
for line in list1:
if line not in dict:
dict.update({line:str(chr(65+len(dict)))})
result.append(dict.get(line))
Then I don't know how to continue. Any help will be appreciated.

You can make an iterator of ascii upper-case strings and pull them off one-at-a-time in a defaultdict constructor. One you have that, it's just a list comprehension. Something like:
import string
from collections import defaultdict
keys = iter(string.ascii_uppercase)
d = defaultdict(lambda: next(keys))
l = ['F4','A3','F4','B5','A3','K2']
[d[k] for k in l]
# ['A', 'B', 'A', 'C', 'B', 'D']

import string
mapping = {}
offset = 0
for item in l:
if item in mapping:
continue
mapping[item] = string.ascii_uppercase[offset]
offset += 1
[mapping.get(item) for item in l]
Output
['A', 'B', 'A', 'C', 'B', 'D']

You can create a simple class to store the running results:
import string
class L:
def __init__(self):
self.l = {}
def __getitem__(self, _v):
if (val:=self.l.get(_v)) is not None:
return val
self.l[_v]= (k:=string.ascii_uppercase[len(self.l)])
return k
l = L()
vals = ['F4','A3','F4','B5','A3','K2']
result = [l[i] for i in vals]
Output:
['A', 'B', 'A', 'C', 'B', 'D']

How to create a list which only adds letters that are unique to adjacent indices in python?

I have created a function which randomly generates a list of the letters "a", "b", "c", and "d". I would like to create a new list which is the same as the first list but with any letters/items which are the same as the previous letter/item removed. Where I am having problems is referring to the previous letter in the list.
For example, if :
letterlist = ['a','a','a','b','b','a,',b']
then the output should be,
nondupelist = ['a','b','a','b']
The problem is that nodupeletterlist is the same as letterlist - meaning it's not removing items which are the same as the last - because I am getting the function to refer to the previous item in letterlist wrong. I have tried using index and enumerate, but I am obviously using them wrong because I'm not getting the correct results. Below is my current attempt.
import random
def rdmlist(letterlist, nodupeletterlist):
for item in range(20):
rng = random.random()
if rng < 0.25:
letterlist.append("a")
elif 0.25 <= rng and rng < 0.5:
letterlist.append("b")
elif 0.5 <= rng and rng < 0.75:
letterlist.append("c")
else:
letterlist.append("d")
for letter in letterlist:
if letter != letterlist[letterlist.index(letter)-1]:
nodupeletterlist.append(letter)
else:
pass
return
letterlist1 = []
nodupeletterlist1 = []
rdmlist(letterlist1, nodupeletterlist1)
EDIT:
This is what I ended up using. I used this solution simply because I understand how it works. The answers below may provide more succinct or pythonic solutions.
for index, letter in enumerate(letterlist, start=0):
if 0 == index:
nodupeletterlist.append(letter)
else:
pass
for index, letter in enumerate(letterlist[1:], start = 1):
if letter != letterlist[index-1]:
nodupeletterlist.append(letter)
else:
pass

for i, letter in enumerate(([None]+letterlist)[1:], 1):
if letter != letterlist[i-1]:
nodupeletterlist.append(letter)

You can use itertools.groupby:
import itertools
nodupeletterlist = [k for k, _ in itertools.groupby(letterlist)]
Solution without using itertools, as requested in the comments:
def nodupe(letters):
if not letters:
return []
r = [letters[0]]
for ch in letters[1:]:
if ch != r[-1]:
r.append(ch)
return r
nodupeletterlist = nodupe(letterlist)
A fixed version of the proposed "working solution":
def nodupe(letters):
if not letters:
return []
r = [letters[0]]
r += [l for i, l in enumerate(letters[1:]) if l != letters[i]]
return r
nodupeletterlist = nodupe(letterlist)
You can also simplify your random generator a bit, by using random.choices:
import random
chars = 'abcd'
letterlist = random.choices(chars, k=20)
or by using random.randint:
import random
start, end = ord('a'), ord('d')
letterlist = [chr(random.randint(start, end)) for _ in range(20)]

Here's what I came up with. Using random.choices() would be better than what I have below, but same idea. doesn't involve itertools
>>> li_1 = [random.choice("abcdefg") for i in range(20)]
>>> li_1
['c', 'e', 'e', 'g', 'b', 'd', 'b', 'g', 'd', 'c', 'e', 'g', 'e', 'c', 'd',
'e', 'e', 'f', 'd', 'd']
>>>
>>> li_2 = [li_1[i] for i in range(len(li_1))
... if not i or i and li_1[i - 1] != li_1[i]]
>>> li_2
['c', 'e', 'g', 'b', 'd', 'b', 'g', 'd', 'c', 'e', 'g', 'e', 'c',
'd', 'e', 'f', 'd']

The problem with the way that you are using letterlist.index(letter)-1 is that list.index(arg) returns the the index of the first occurrence of arg in list, in this case the letter. This means that if you have list = ["a", "b", "a"] and you run list.index("a") it will always return 0.
A way to do what you intend to (removing consecutive repetitions of letters) would be:
nodupeletterlist.append(letterlist[0])
for idx in range(1, len(letterlist)):
if letterlist[idx] != letterlist[idx-1]:
nodupeletterlist.append(letterlist[idx])

Do This:
L1 = ['a','a','a','b','b','c','d']
L2 = []
L2.append(L1[0])
for i in range(1,len(L1)):
if L1[i] != L1[i-1]:
L2.append(L1[i])
set() will create a set with only unique values,then the list() will convert it back to a a list containing values without any repetition.
I hope this helps...

Obtain all subtrees in value

Given "a.b.c.d.e" I want to obtain all subtrees, efficiently, e.g. "b.c.d.e" and "c.d.e", but not "a.d.e" or "b.c.d".
Real world situation:
I have foo.bar.baz.example.com and I want all possible subdomain trees.

listed = "a.b.c.d.e".split('.')
subtrees = ['.'.join(listed[idx:]) for idx in xrange(len(listed))]
Given your sample data, subtrees equals ['a.b.c.d.e', 'b.c.d.e', 'c.d.e', 'd.e', 'e'].

items = data.split('.')
['.'.join(items[i:]) for i in range(0, len(items))]

def parts( s, sep ):
while True:
yield s
try:
# cut the string after the next sep
s = s[s.index(sep)+1:]
except ValueError:
# no `sep` left
break
print list(parts("a.b.c.d.e", '.'))
# ['a.b.c.d.e', 'b.c.d.e', 'c.d.e', 'd.e', 'e']

Not sure, if this is what you want.
But slicing of the list with varying sizes yields that.
>>> x = "a.b.c.d.e"
>>> k = x.split('.')
>>> k
['a', 'b', 'c', 'd', 'e']
>>> l = []
>>> for el in range(len(k)): l.append(k[el+1:])
...
>>> l
[['b', 'c', 'd', 'e'], ['c', 'd', 'e'], ['d', 'e'], ['e'], []]
>>> [".".join(l1) for l1 in l if l1]
['b.c.d.e', 'c.d.e', 'd.e', 'e']
>>>
Of course, the above was to illustrate the process. You could combine them into one liner.
[Edit: I thought the answer is same as any here and explains it well]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

A simple question about regex usage in list - python

Related

How to remove elements from a list that appear less than k = 2?

How to compare strings containing numbers in a list?

How to number elements in a list of string and return a new list?

How to create a list which only adds letters that are unique to adjacent indices in python?

Obtain all subtrees in value

Categories

Resources