Extracting sublists of specific elements from Python lists of strings - python

I have a large list of elements
a = [['qc1l1.1',
'qc1r2.1',
'qc1r3.1',
'qc2r1.1',
'qc2r2.1',
'qt1.1',
'qc3.1',
'qc4.1',
'qc5.1',
'qc6.1',.................]
From this list i want to extract several sublists for elements start with the letters "qfg1" "qdg2" "qf3" "qd1" and so on.
such that:
list1 = ['qfg1.1', 'qfg1.2',....]
list2 = ['qfg2.1', 'qfg2.2',,,]
I tried to do:
list1 = []
for i in all_quads_names:
if i in ['qfg']:
list1.append(i)
but it gives an empty lists, how can i do this without the need of doing loops as its a very large list.

Using in (as others have suggested) is incorrect. Because you want to check it as a prefix not merely whether it is contained.
What you want to do is use startswith() for example this:
list1 = []
for name in all_quads_names:
if name.startswith('qc1r'):
list1.append(name)
The full solution would be something like:
prefixes = ['qfg', 'qc1r']
lists = []
for pref in prefixes:
list = []
for name in all_quads_names:
if name.startswith(pref):
list.append(name)
lists.append(list)
Now lists will contain all the lists you want.

You are looking for 'qfg' in the ith element of the all_quads_names. So, try the following:
list1 = []
for i in all_quads_names:
if 'qfg' in i:
list1.append(i)

Try to revert you if statement and to compare a string to the list elements :
if "qfg" in i:
you can refer to this question :
Searching a sequence of characters from a string in python
you have many python methods to do just that described here:
https://stackabuse.com/python-check-if-string-contains-substring/
Edit after AminM coment :
From your exemple it doesn't seem necessary to use the startwith() method, but if your sublist ("qfg") can also be found later in the string and you need to find only the string that is starting with "qfg", then you should have something like :
if i.startswith("qfg"):

Try something like this:
prefixes = ['qfg1', … ]
top = [ […], […], … ]
extracted = []
for sublist in top:
if sublist and any(prefix in sublist[0] for prefix in prefixes):
extracted.append(sublist)

Related

Segregate the list based on condition that starts with same pattern `string`

I have below list where i would like to segregate based on condition where all strings that starts with same string would become a newlist
Eg:-
list1 = ["glibc-2.11.3/include/sys/file.h", "glibc-2.11.3/include/sys/ioctl.h", "glibc-2.11.3/lib/crtn.o", "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h" , "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h", "test-3.7.10/asm/posix_types.h", "test-3.7.10/dsm/posix_types.h"]
Here is my try:-
list1 = ["glibc-2.11.3/include/sys/file.h", "glibc-2.11.3/include/sys/ioctl.h", "glibc-2.11.3/lib/crtn.o", "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h" , "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h"]
element = list1[0].split("/")[0]
newlist = []
for i in list1:
if i.startswith(element):
newlist.append(i)
print newlist
o/p:- ['glibc-2.11.3/include/sys/file.h', 'glibc-2.11.3/include/sys/ioctl.h', 'glibc-2.11.3/lib/crtn.o']
I get the 1st set of paths that starts with same string. I need to loop over for other remaining sets.
Basically What i am looking is , for a 1st iteration i am expecting to get all paths that starts with glibc-2.11.3 and for 2nd iteration all paths that starts with linux-libc-headers-2.6.32..so on. Actually i need to perform some check on set of same paths (starts with same string) that gets returned. Please help!
Use a dictionary to keep track of your filepaths
list1 = ["glibc-2.11.3/include/sys/file.h", "glibc-2.11.3/include/sys/ioctl.h", "glibc-2.11.3/lib/crtn.o", "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h" , "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h", "test-3.7.10/asm/posix_types.h", "test-3.7.10/dsm/posix_types.h"]
directories = {}
for filepath in list1:
key = filepath.split("/")[0]
directories.setdefault(key, []).append(filepath)
print(directories)
Outputs:
{'glibc-2.11.3': ['glibc-2.11.3/include/sys/file.h',
'glibc-2.11.3/include/sys/ioctl.h',
'glibc-2.11.3/lib/crtn.o'],
'linux-libc-headers-2.6.32': ['linux-libc-headers-2.6.32/asm-generic/bitsperlong.h',
'linux-libc-headers-2.6.32/asm-generic/bitsperlong.h'],
'test-3.7.10': ['test-3.7.10/asm/posix_types.h',
'test-3.7.10/dsm/posix_types.h']}
list(directories.items()) would give you the list of lists you were trying to create, but instead of doing that you can just use directories.items() the exact same way you would use a list of lists.
dictionary.setdefault(key, []) is a quirky way of saying give me the list at this dictionary key or if there is not already a list there, create a new list and save it in the dictionary under this dictionary key and then give me that. documentation.

Python - How to create sublists from list of strings based on part of the string?

I saw similar questions but unfortunately I didnt found answer for my problem.
I have a list:
list = ['a_abc', 'a_xyz', 'a_foo', 'b_abc', 'b_xyz', 'b_foo']
I want to split this list into 3 based on character after underscore _.
Desired output would be:
list_1 = ['a_abc', 'b_abc']
list_2 = ['a_xyz', 'b_xyz']
list_3 = ['a_foo', 'b_foo']
I would like to avoid something like:
for element in list:
if 'abc' in element...
if 'xyz' in element...
because I have over 200 strings to group in this way in my use case. So code "should recognize" the same part of the string (after underscore) and group this in sublists.
Since I didnt notice similar issue any advice is highly appreciated.
You shouldn't want to do this with one or more lists, because you don't know at runtime how many there are (or, even if you know, it will be repeated code).
Instead, you can use defaultdict; it's like a default dictionary, but handles missing value simply creating a new element with your specified factory.
In this case, defaultdict(list) means to create a dictionary with a list factory; when a key is missing, the object will create an empty list for that key.
from collections import defaultdict
l = ['a_abc', 'a_xyz', 'a_foo', 'b_abc', 'b_xyz', 'b_foo']
d = defaultdict(list)
for el in l:
key = el.split("_")[1]
# key = el[2:] # use this if the format of elements is <letter>_<other_chars>
d[key].append(el)
print(d)
# defaultdict(<class 'list'>, {'abc': ['a_abc', 'b_abc'], 'xyz': ['a_xyz', 'b_xyz'], 'foo': ['a_foo', 'b_foo']})
print(d["abc"])
# ['a_abc', 'b_abc']

How to read and print a list in a specific order/format based on the content in the list for python?

New to python and for this example list
lst = ['<name>bob</name>', '<job>doctor</job>', '<gender>male</gender>', '<name>susan</name>', '<job>teacher</job>', '<gender>female</gender>', '<name>john</name>', '<gender>male</gender>']
There are 3 categories of name, job, and gender. I would want those 3 categories to be on the same line which would look like
<name>bob</name>, <job>doctor</job>, <gender>male</gender>
My actual list is really big with 10 categories I would want to be on the same line. I am also trying to figure out a way where if one of the categories is not in the list, it would print something like N/A to indicate that it is not in the list
for example I would want it to look like
<name>bob</name>, <job>doctor</job>, <gender>male</gender>
<name>susan</name>, <job>teacher</job>, <gender>female</gender>
<name>john</name>, N/A, <gender>male</gender>
What would be the best way to do this?
This is one way to do it. This would handle any length list, and guarantee grouping no matter how long the lists are as long as they are in the correct order.
Updated to convert to dict, so you can test for key existence.
lst = ['<name>bob</name>', '<job>doctor</job>', '<gender>male</gender>', '<name>susan</name>', '<job>teacher</job>', '<gender>female</gender>', '<name>john</name>', '<gender>male</gender>']
newlst = []
tmplist = {}
for item in lst:
value = item.split('>')[1].split('<')[0]
key = item.split('<')[1].split('>')[0]
if '<name>' in item:
if tmplist:
newlst.append(tmplist)
tmplist = {}
tmplist[key] = value
#handle the remaining items left over in the list
if tmplist:
newlst.append(tmplist)
print(newlst)
#test for existance
for each in newlst:
print(each.get('job', 'N/A'))

How to remove items in a list of strings based on duplicate substrings among the elements?

I have a list of files from different paths, but some of that paths contain the same file(and file name).
I would like to remove these duplicate files, but since they're from different paths, I just can't do set(thelist)
Minimal Example
Say that my list looks like this
thelist = ['/path1/path2/file13332', '/path11/path21/file21', 'path1232/path1112/file13332', '/path1/path2/file13339']
What is the most pythonic way to get this
deduplicatedList = ['/path1/path2/file13332', '/path11/path21/file21', '/path1/path2/file13339']
File file13332 was in the list twice. I am not concerned about which element was removed
One way is to use dictionary:
thelist = ['/path1/path2/file13332', '/path11/path21/file21', 'path1232/path1112/file13332', '/path1/path2/file13339']
deduplicatedList = list({f.split('/')[-1]: f for f in thelist}.values())
print(deduplicatedList)
['path1232/path1112/file13332', '/path11/path21/file21', '/path1/path2/file13339']
s = set()
deduped = [s.add(os.path.basename(i)) or i for i in l if os.path.basename(i) not in s]
s contains the unique basenames which guards against adding non-unique basenames to deduped.

inserting into a list but ensuring that there isnt multiple insertions in the list

Ive been trying to create a part of code that takes data from an excel file then adds it into a list but only once. all other times should be ignored, ive managed to get all the data i need, just need to know how to pop unwanted duplicates. Also wondering if i should do this in a dictionary and how it would be done if i did
for cellObj in rows:<br>
Lat = str(cellObj[5].value)<br>
if 'S' in Lat:<br>
majorCity.append(str(cellObj[3].value))<br>
print(majorCity)<br>
elif majorCity == majorCity:<br>
majorCity.pop(str(cellObj[3].value))<br>
You can use set(), it will remove duplicates from a sequence.
a= set()
a.add("1")
a.add("1")
print a
Output:
set(['1'])
set is indeed a good way to do this:
>>> my_list = [1,1,2,2]
>>> my_list_no_dups = list(set(my_list))
>>> my_list_no_dups
[1, 2]
but it will not necessarily preserve the order of the list. If you do care about the order, you can do it like this:
my_list_no_dups = []
for item in my_list:
if item not in my_list_no_dups:
my_list_no_dups.append(item)

Categories