substring of a list from another list in python - python

I have a list like below
fulllist = ['item1 cost is 1', 'item2 cost is 2', 'bla bla bla', 'item3 cost is 3', 'item4 cost is 4', 'bla bla bla']
Another list with keywords
keywords = ['item1','item2','item4','item3']
I need to search fulllist with items of the keywords list and if the keyword is found as a substring from fulllist, I need that statement to be written to a new list in the same order as keywords list
like:
newlist = ['item1 cost is 1', 'item2 cost is 2', 'item4 cost is 4', 'item3 cost is 3']
I tried below piece of code:
fulllist = ['item1 cost is 1', 'item2 cost is 2', 'bla bla bla', 'item3 cost is 3', 'item4 cost is 4', 'bla bla bla']
keywords = ['item1','item2','item4','item3']
newlist = [fulllistitem for fulllistitem in fulllist for i in range(0,len(keywords)) if keywords[i] in fulllistitem]
but newlist is not in the matching order of keywords list but in fullist order, like
newlist = ['item1 cost is 1', 'item2 cost is 2', 'item3 cost is 3', 'item4 cost is 4']
instead of
newlist = ['item1 cost is 1', 'item2 cost is 2', 'item4 cost is 4', 'item3 cost is 3']
how to the list as intended?

You are pretty close. It could be done like this, also there is no need to use range.
newlist = [item for keyword in keywords for item in fulllist if keyword in item]
Modified answer. That allows you to have 'dummy' objects in result list if there is no such object in fulllist
result = ['dummy'] * len(keywords)
for index, keyword in enumerate(keywords):
for item in fulllist:
if keyword in item:
res[index] = item

Alternate solution, using regex.
import re
join_ = "|".join(fulllist)
print([re.search(x + "[^|]+", join_).group() for x in keywords if x in join_])
['item1 cost is 1', 'item2 cost is 2', 'item4 cost is 4', 'item3 cost is 3']

Related

How can I return the entire dictionary?

This is my method. I am having trouble with returning the entire dictionary
def get_col(amount):
letter = 0
value = []
values = {}
for i in range(amount):
letter = get_column_letter(i + 1)
[value.append(row.value) for row in ws[letter]]
values = dict(zip(letter, [value]))
value = []
return values
I want it to output it like this:
{'A': ['ID', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
{'B': ['Name', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
{'C': ['Math', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
But when the return is onside the 'for' it only returns
{'A': ['ID', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
and when the return is outside the 'for' loop, it returns
{'C': ['Math', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
Any help would be appreciated. Thank you!
I am assuming you want all of the data in one dictionary:
values = dict(zip(letter, [value]))
Currently this part of your code overites the dictionary everytime. It is why you get the "A" dict with returning before the for loop finishes, and why after the loop finishes when return the dict is only the "C" dict as the "A" and "B" were overwriten.
Put the return outside the for loop afterwards, and instead of
values = dict(zip(letter, [value]))
use
values[letter] = value
as this will append more keys/values to the dict.
ps. This is my first post, I hope it helps and is understandable.
edit: If you are wanting a list of three dictionaries like your desired output shows do this:
def get_col(amount):
letter = 0
value = []
values = []
for i in range(amount):
letter = get_column_letter(i + 1)
[value.append(row.value) for row in ws[letter]]
values.append(dict(zip(letter, [value])))
value = []
return values
Your desired output is not a single dictionary. It's a list of dictionaries.
In the for loop, at each iteration you are creating a new dictionary. When you return, you either return the first one you create or the last one if you put the return inside or outside respectevely.
You need to return a list of the created dictionaries
def get_col(amount):
letter = 0
value = []
values = {}
values_list = []
for i in range(amount):
letter = get_column_letter(i + 1)
[value.append(row.value) for row in ws[letter]]
values = dict(zip(letter, [value]))
value = []
values_list.append(values)
return values_list

Adding values from one list to another when they share value

I'm trying to add values from List2 if the type is the same in List1. All the data is strings within lists. This isn't the exact data I'm using, just a representation. This is my first programme so please excuse any misunderstandings.
List1 = [['Type A =', 'Value 1', 'Value 2', 'Value 3'], ['Type B =', 'Value 4', 'Value 5']]
List2 = [['Type Z =', 'Value 6', 'Value 7', 'Value 8'], ['Type A =', 'Value 9', 'Value 10', 'Value 11'], ['Type A =', 'Value 12', 'Value 13']]
Desired result:
new_list =[['Type A =', 'Value 1', 'Value 2', 'Value 3', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 13'], ['Type B =', 'Value 4', 'Value 5']]
Current attempt:
newlist = []
for values in List1:
for valuestoadd in List2:
if values[0] == valuestoadd[0]:
newlist = [List1 + [valuestoadd[1:]]]
else:
print("Types don't match")
return newlist
This works for me if there weren't two Type A's in List2 as this causes my code to create two instances of List1. If I was able to add the values at a specific index of the list then that would be great but I can work around that.
It's probably easier to use a dictionary for this:
def merge(d1, d2):
return {k: v + d2[k] if k in d2 else v for k, v in d1.items()}
d1 = {'A': [1, 2, 3], 'B': [4, 5, 6]}
d2 = {'A': [7, 8, 9], 'C': [0]}
print(merge(d1, d2))
If you must use a list, it's fairly easy to temporarily convert to a dictionary and back to a list:
from collections import defaultdict
def list_to_dict(xss):
d = defaultdict(list)
for xs in xss:
d[xs[0]].extend(xs[1:])
return d
def dict_to_list(d):
return [[k, *v] for k, v in d.items()]
Rather than using List1 + [valuestoadd[1:]], you should be using newlist[0].append(valuestoadd[1:]) so that it doesn't ever create a new list and only appends to the old one. The [0] is necessary so that it appends to the first sublist rather than the whole list.
newlist = List1 #you're doing this already - might as well initialize the new list with this code
for values in List1:
for valuestoadd in List2:
if values[0] == valuestoadd[0]:
newlist[0].append(valuestoadd[1:]) #adds the values on to the end of the first list
else:
print("Types don't match")
Output:
[['Type A =', 'Value 1', 'Value 2', 'Value 3', ['Value 9', 'Value 10', 'Value 11'], ['Value 12', 'Value 13']], ['Type B =', 'Value 4', 'Value 5']]
This does, sadly, input the values as a list - if you want to split them into individual values, you would need to iterate through the lists you're adding on, and append individual values to newlist[0].
This could be achieved with another for loop, like so:
if values[0] == valuestoadd[0]:
for subvalues in valuestoadd[1:]: #splits the list into subvalues
newlist[0].append(subvalues) #appends those subvalues
Output:
[['Type A =', 'Value 1', 'Value 2', 'Value 3', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 13'], ['Type B =', 'Value 4', 'Value 5']]
I agree with the other answers that it would be better to use a dictionary right away. But if you want, for some reason, stick to the data structure you have, you could transform it into a dictionary and back:
type_dict = {}
for tlist in List1+List2:
curr_type = tlist[0]
type_dict[curr_type] = tlist[1:] if not curr_type in type_dict else type_dict[curr_type]+tlist[1:]
new_list = [[k] + type_dict[k] for k in type_dict]
In the creation of new_list, you can take the keys from a subset of type_dict only if you do not want to include all of them.

Get elements of a sublist in python based on indexes of a different list

I have two lists of lists.
I want to get the elements from second list of lists, based on a value from the first list of lists.
I if I have simple lists, everything go smooth, but once I have list of list, I'm missing something at the end.
Here is the code working for two lists (N = names, and V = values):
N = ['name 1', 'name 2','name 3','name 4','name 5','name 6','name 7','name 8','name 9','name 10']
V = ['val 1', 'val 2','val 3','val 4','val 5','val 6','val 7','val 8','val 9','val 10']
bool_ls = []
NN = N
for i in NN:
if i == 'name 5':
i = 'y'
else:
i = 'n'
bool_ls.append(i)
# GOOD INDEXES = GI
GI = [i for i, x in enumerate(bool_ls) if x == 'y']
# SELECT THE GOOD VALUES = "GV" FROM V
GV = [V[index] for index in GI]
if I define a function, works well applied to the two lists:
def GV(N,V,name):
bool_ls = []
NN = N
for i in NN:
if i == name:
i = 'y'
else:
i = 'n'
bool_ls.append(i)
GI = [i for i, x in enumerate(bool_ls) if x == 'y']
GV = [V[index] for index in GI]
return GV
Once I try "list of list", I cannot get the similar results. My code looks like below so far:
NN = [['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3']]
VV = [['val 1', 'val 2', 'val 3'], ['val 1', 'val 2', 'val 3'], ['val 1', 'val 2', 'val 3'], ['val 1', 'val 2', 'val 3'], ['val 1', 'val 2', 'val 3']]
def GV(NN,VV,name):
bool_ls = []
NNN = NN
for j in NNN:
for i in j:
if i == name:
i = 'y'
else:
i = 'n'
bool_ls.append(i)
# here is where I'm lost
Help greatly appreciated! Thank you.
You can generate pair-wise combinations from both list using zip and then filter in a list comprehension.
For the flat lists:
def GV(N, V, name):
return [j for i, j in zip(N, V) if i==name]
For the nested lists, you'll add an extra nesting:
def GV(NN,VV,name):
return [j for tup in zip(NN, VV) for i, j in zip(*tup) if i==name]
In case you want a list of lists, you can move the nesting into new lists inside the parent comprehension.
There's an easier way to do what your function is doing, but, to answer your question, you just need two loops (one for each level of lists): the first list iterates over the list of lists, the second iterates over the inner lists and does the somewhat odd y or n thing to chose a value.

How to use lower() with multiple lists in python?

I'm trying to find certain keywords in the html source code of multiple websites. I want my crawler to find these keywords regardless whether they are written uppercased, or lowercased in the website's html source code. To get this done I've tried using the .lower() query in this script:
from selenium import webdriver
import csv
def keywords():
with open('urls.csv') as csv_file:
csv_reader = csv.reader(csv_file)
driver = webdriver.Chrome(executable_path=r'C:\Users\Peter\PycharmProjects\Testing\chromedriver_win32\chromedriver.exe')
list_1 = ['keyword 1', 'keyword 2', 'keyword 3']
list_2 = ['keyword 4', 'keyword 5', 'keyword 6']
list_3 = ['keyword 7', 'keyword 8']
keywords = [list_1, list_2, list_3]
for row in csv_reader:
driver.get(row[0])
html = driver.page_source
for searchstring in keywords:
if searchstring.lower() in html.lower():
print (row[0], searchstring, 'found')
else:
print (row[0], searchstring, 'not found')
print keywords()
Error:
AttributeError: 'list' object has no attribute 'lower'
So i found out that .lower() doesn't work on lists, works only with strings.
I've googled the error and my issue but didn't found a solution to my problem. Any suggestion how i can solve this with my current script?
You can make your keywords as a list of strings in list of list of strings.
Here i am already lowering the keywords.
from selenium import webdriver
import csv
def keywords():
with open('urls.csv') as csv_file:
csv_reader = csv.reader(csv_file)
driver = webdriver.Chrome(executable_path=r'C:\Users\Peter\PycharmProjects\Testing\chromedriver_win32\chromedriver.exe')
list_1 = ['keyword 1', 'keyword 2', 'keyword 3']
list_2 = ['keyword 4', 'keyword 5', 'keyword 6']
list_3 = ['keyword 7', 'keyword 8']
lower_list = lambda x: x.lower()
keywords = map(lower_list, list_1 + list_2 + list_3)
for row in csv_reader:
driver.get(row[0])
html = driver.page_source
You can use the map function, like so
l = ['Item 1', 'ITEM 2', 'ITEM 3', 'ItEM 4']
m = map(str.lower, l)
print(list(m))
This gets you ['item 1', 'item 2', 'item 3', 'item 4']
map applies a function to every element of an iterable and returns a map object, which is itself an iterable. You can just do map(str.lower, keywords) in your for searchstring in map(str.lower, keywords)
EDIT: Oops, didn't notice that you wanted to combine three lists that way. You could flatten the lists with [item.lower() for sublist in keywords for item in sublist] and get the results you want.

Append every other line from a text file? (Python)

What I'm trying to do is append every other line in my text file into a list, and then the other lines into a serperate list? E.g.
Text File 'example'
Item 1
Item 2
Item 3
Item 4
Item 5
So I want 'Item 1', 'Item 3' and 'Item 5' in a list called exampleOne and the other items in a list called exampleTwo?
I've tried for ages to try and work this out by myself by slicing and then appending in different ways, but I just can't seem to get it, if anyone could help it would be greatly appreciated!
from itertools import izip_longest as zip2
with open("some_file.txt") as f:
linesA,linesB = zip2(*zip(f,f))
is one way you could do something like this
this basically is just abusing the fact that filehandles are iterators
What about
with open('example') as f:
lists = [[], []]
i = 0
for line in f:
lists[i].append(line.strip())
i ^= 1
print(lists[0]) # ['Item 1', 'Item 3', 'Item 5']
print(lists[1]) # ['Item 2', 'Item 4']
Or simpler, with enumerate:
with open('example') as f:
lists = [[], []]
for i,line in enumerate(f):
lists[i%2].append(line.strip())
print(lists[0]) # ['Item 1', 'Item 3', 'Item 5']
print(lists[1]) # ['Item 2', 'Item 4']
EDIT
print(lists[0][0]) # 'Item 1'
print(lists[0][1]) # 'Item 3'
print(lists[0][2]) # 'Item 5'
print(lists[1][0]) # 'Item 2'
print(lists[1][1]) # 'Item 4'

Categories