check a list of strings with list comprehension - python

I want to filter a list of strings that includes any specific string with using list comprehension. I tried following the code, but it didn't work.
ignore_paths = ['.ipynb_checkpoints', 'New', '_calibration', 'images']
A = [x for x in lof1 if ignore_paths not in x]
but when I try with one string, it will work:
A = [x for x in lof1 if 'images' not in x]
in fact, I want to define a list of a forbidden path (or string) and check if there is a certain path that exists, then ignore that.
I can do that with normal for and if loop and check one by one, but it is not fast. I am looking for a fast way because I need to check around 150k paths.
Thanks
EDIT:
To make it clear, if I had a list of files as below:
lof1 = ['User/images/20210701_151111_G1100_E53100_r121_g64_b154_WBA0_GA0_EA0_6aa87af_crop.png', 'User/images/16f48a97-7111-4f66-92cc-dc7329e7ec92.png', 'User/images/image_2022_06_21-11_41_04_AM.png']
I need to return an empty list since all of the elements contain 'images'

Do you have duplicates? if not, you can use sets which will be very fast
allowed = set(lof1) - set(ignore_paths)

Try:
A = [x for x in lof1 if x not in ignore_paths]

Related

Conditionally join with previous list entry using comprehension

I am trying to fix some broken linux paths in a list I am working with.
List:
mylist = ['/root/path/path', '/cat', '/dog', '/root/path/path', '/blue', '/red']
Requirements:
If element does not begin with '/root', join to the element to the left of it.
Code so far:
mylist2 = [''.join(x) for x in mylist]
print(mylist2)
Expected output:
['/root/path/path/cat/dog', '/root/path/path/blue/red']
Actual output:
['/root/path/path', '/cat', '/dog', '/root/path/path', '/blue', '/red']
I've also tried:
mylist2 = [''.join(x) if myroot not x for mylist]
...which produces a syntax error...
Any ideas on what I am doing wrong?
This is simpler if you just use a regular loop. The problem with the list comprehension is that you don't have a uniform operation on each element of the first list that creates an element for the new list. (Think of a list comphension as a combination of map and filter. You can map one old value to one new value, or drop an old value, but you can't combine multiple old values into a single new value.)
mylist2 = []
for path in mylist:
if path.startswith('/root'):
mylist2.append(path)
else:
mylist2[-1] += path
(This is only partially correct; it assumes the first element of mylist will actually start with /root, so that mylist2[-1] will never be used if mylist2 is empty.)
This is one method using list comprehension:
mylist2 = ['/root' + x for x in ''.join(mylist).split('/root') if x] # if x eliminates the empty split elements
# ['/root/path/path/cat/dog', '/root/path/path/blue/red']
Since your goal is basically to join everything and then split them by /root, this line of list comprehension does exactly that and adds /root back to each element.
But as you can see, given just the code, #chepner's answer is much more understandable and clearer. Just because list comprehension exist doesn't mean it should be your go-to.
Also I should note, if there's /root within any of your elements (not necessarily at the beginning), this code will also separate it because of the split, so it's not as exact as explicitly going through the loop. If you wanted to handle that scenario it becomes very ugly...:
['/root' + y for y in ''.join("_" + x if x.startswith("/root") else x for x in lst).split("_/root") if y]
# eww

List comprehension: retrieve valid strings if not contained in a list of substrings

I need to check if a list of filenames can be copied and I check this against a list of fobidden substrings.
Here is what I have:
exclude = ['ex','xe']
files = ['include', 'exclude']
And this is what I expect:
['include']
I already got it working with a list comprehension, like this:
[a[0] for a in [(f, any([e in f for e in exclude])) for f in files] if not a[1]]
Where I create a tuple (f, any([e in f for e in exclude])) checking if there is any correspondence on the filename to the excluding substrings.
I do this for every file in the list of files for f in files and include only those that don't exist on the excluding substrings if not a[1].
Is there a better way to this? A more pythonic one?
Because I'm looping through the file list 2 times and I'm guessing there is a way to this in one go!
Thank you!
I don't really understand your logic. Looks like you're building tuples with boolean values, then filter the False values out.
It seems to me that this also works and is simpler:
exclude = ['ex','xe']
files = ['include', 'exclude']
print([x for x in files if not any(e in x for e in exclude)])
it loops through files, and for each file, checks that no exclude member is included in it.
Note that you don't have to build an actual list in any. Omit the square brackets, let any perform a lazy evaluation, it's faster.
This will only loop through the file list one time:
[file for file in files if not any(item in file for item in exclude)]
It does loop through the exclude list once for every item in the file list so if you have a long exclude list the performance might take a hit.

Sorting out unique elements from a list to a set

I was writing a function to save unique values returned by a list "list_accepted_car" to a set "unique_accepted_ant". list_car_id is list with the value ['12','18','3','7']. When i run the code i am getting error , "unhashable list ". Can anyone suggest me what is the error?
list_accepted_car = [] #set to store the value list_accepted_car
unique_accepted_car = set() #set to store the value unique_accepted_car
num_accepted = 2 #predifined value for the number of cars allowed to enter
def DoIOpenTheDoor(list_car_id): #list_ant_id is a list of cars allowed to enter
if len(list_accepted_car) < num_accepted:
if len(list_car_id) > 0:
list_accepted_car.append(list_car_id[0:min(len(list_car_id),num_accepted-len(list_accepted_car))])
unique_accepted_list = set(list_accepted_car)
print unique_accepted_list
return list_accepted_car
Under the assumption that list_car_id looks like: [1,2,3,4,5].
You add in list_accepted_car a sublist of list_car_id, so list_accepted_car will look like [[1,2]] i.e. a list of a list.
Then you should change
unique_accepted_list = set(list_accepted_car)
to
unique_accepted_list = set([x for y in list_accepted_car for x in y])
which will extract each element of the sublist and provide a flatten list. (There exists other options to flatten a list of list)
You are saving a list of lists, which can't be converted to a set. You have to flatten it first. There are many examples of how to do it (I'll supply one using itertools.chain which I prefer to python's nested comprehension).
Also, as a side note, I'd make this line more readable by separating to several lines:
list_accepted_car.append(list_car_id[0:min(len(list_car_id),num_accepted-len(list_accepted_car))])
You can do:
from itertools import chain
# code ...
unique_accepted_list = set(chain.from_iterable(list_accepted_car))
The best option would be to not use a list at all here, and use a set from the start.
Lists are not hashable objects, and only hashable objects can be members of sets. So, you can't have a set of lists. This instruction:
list_accepted_car.append(list_car_id[0:min(len(list_car_id),num_accepted-len(list_accepted_car))])
appends a slice of list_car_id to list_accepted_car, and a slice of a list is a list. So in effect list_accepted_car becomes a list of lists, and that's why converting it to a set:
unique_accepted_list = set(list_accepted_car)
fails. Maybe what you wanted is extend rather than append? I can't say, because I don't know what you wanted to achieve.

How to find an item with a specific start string in a set

I have a set of ~10 million items which look something like this:
1234word:something
4321soup:ohnoes
9cake123:itsokay
[...]
Now I'd need to quickly check if an item witha specific start is in the set.
For example
x = "4321soup"
is x+* in a_set:
print ("somthing that looks like " +x +"* is in the set!")
How do I accomplish this? I've considered using a regex, but I have no clue whether it is even possible in this scenario.
^4321soup.*$
Yes it is possible.Try match.If result is positive you have it.If it is None you dont have it.
Do not forget to set m and g flags.
See demo.
http://regex101.com/r/lS5tT3/28
use str.startswith instead of using regex, if you want to match only with the start of the string, also considering the number of lines you are having ~10 million items
#!/usr/bin/python
str = "1234word:something";
print str.startswith( '1234' );
python, considering your contents are inside a file named "mycontentfile"
>>> with open("mycontentfile","r") as myfile:
... data=myfile.read()
...
>>> for item in data.split("\n"):
... if item.startswith("4321soup"):
... print item.strip()
...
4321soup:ohnoes
In this case, the importance is how to iterate set in the optimistic way.
Since you should check every result until you find the matching result, the best way is create a generator (list expression form) and execute it until you find a result.
To accomplish this, I should use next approach.
a_set = set(['1234word:something','4321soup:ohnoes','9cake123:itsokay',]) #a huge set
prefix = '4321soup' #prefix you want to search
next(x for x in a_set if x.startswith(prefix), False) #pass a generator with the desired match condition, and invoke it until it exhaust (will return False) or until it find something
Hash-set's are very good for checking existance of some element, completely. In your task you need check existence of starting part, not complete element. That's why better use tree or sorted sequence instead of hash mechanism (internal implementation of python set).
However, according to your examples, it looks like you want to check whole part before ':'. For that purpose you can buildup set with these first parts, and then it will be good for checking existence with sets:
items = set(x.split(':')[0] for x in a_set) # a_set can be any iterable
def is_in_the_set(x):
return x in items
is_in_the_set("4321soup") # True
I'm currently thinking that the most reasonable solution would be
something like a sorted tree of dicts (key = x and value = y) and the
tree is sorted by the dicts keys. - no clue how to do that though –
Daedalus Mythos
No need for a tree of dicts ... just a single dictionary would do. If you have the key:value pairs stored in a dictionary, let's say itemdict, you can write
x = "4321soup"
if x in itemdict:
print ("something that looks like "+x+"* is in the set!")

Check if an int/str is in the list and its location. Python 3.3.2

Say that I have a list that looks something like the this :
MyList = [1,2,3,4,5,"z","x","c","v","b"]
Now the users inputs : "5z1b3". How would you replace each int/str with its location in the list. I'm thinking of using something like this:
for x in MyList.... if located in list, replace with letter/number with its location.
Not entirely sure how to do it though. Help would be much appreciated.
edit::::: It's something I'm working on and I must use both ints and strs in the list. Also I lied about the output I need. Thanks for mentioning it avarnert. commas between each letter/number in the output would make it work for me. Any ideas how to do it ?
Use a list comprehension:
[MyList.index(c) for c in inputstring]
This'll have to scan through MyList for each entry; you could optimize that quite a bit by using a dictionary indexing from character to position; this has the added advantage we can ensure we only have strings as well:
index = {str(c): i for i, c in enumerate(MyList)}
[index[c] for c in inputstring]
If you then need a formatted string, turn the indices to strings and join the final output:
index = {str(c): str(i) for i, c in enumerate(MyList)}
','.join([index[c] for c in inputstring])
I would go about using the list.index() method. See below for example:
MyList = [MyList.index(chr) for chr in user_input]
EDIT:
This however assumes that each character from the user input will be found in MyList, and also that each character in MyList will appear only once.

Categories