Combinations in Lists of List - python

Having the following list of lists ['boundari', 'special', ['forest', 'arb'], 'abod'], I would like to obtain the the following combination:
[['boundari', 'special', 'forest', 'abod'], ['boundari', 'special', 'arb', 'abod']]
The nearest solution applying the next product when removing the last item abod (which I need to keep):
print([list(p) for p in product([toks[:2]], *toks[2:])])
[[['boundari', 'special'], 'forest'], [['boundari', 'special'], 'arb']]
However, I have not obtained the correct combination:
[['boundari', 'special', 'forest', 'abod'], ['boundari', 'special', 'arb', 'abod']]

You can do something like this:
arr = ['boundari', 'special', ['forest', 'arb'], 'abod']
def get_combinations(arr):
n = len(arr)
def _get_combinations(so_far, idx):
if idx >= n:
yield so_far[:]
return
if isinstance(arr[idx], list):
for val in arr[idx]:
so_far.append(val)
yield from _get_combinations(so_far, idx + 1)
so_far.pop()
else:
so_far.append(arr[idx])
yield from _get_combinations(so_far, idx + 1)
so_far.pop()
yield from _get_combinations([], 0)
expected_ans = [
['boundari', 'special', 'forest', 'abod'],
['boundari', 'special', 'arb', 'abod'],
]
assert list(get_combinations(arr)) == expected_ans

Another solution using only simple loops would be something like this and in case order of lists inside is not important:
my_list_of_list = ['boundari', 'special', ['forest', 'arb'], 'abod']
indecies_of_lists = []
base_list = []
lists_of_lists = []
output = []
for item in my_list_of_list:
if type(item) == list:
lists_of_lists.append(item)
else:
base_list.append(item)
for item in lists_of_lists:
for sub_item in item:
new_list = list(base_list)
new_list.append(sub_item)
output.append(new_list)
print(output)
output will be [['boundari', 'special', 'abod', 'forest'], ['boundari', 'special', 'abod', 'arb']]

Related

How to insert values to a Sqlite database from a Python Set or Dict?

I have the following function which produces results;
myNames = ['ULTA', 'CSCO', ...]
def get_from_min_match(var):
temp = []
count_elem = generate_elem_count()
for item in count_elem:
if var <= count_elem[item]:
temp.append(item)
return set(temp) if len(set(temp)) > 0 else "None"
def generate_elem_count():
result_data = []
for val in mapper.values():
if type(val) == list:
result_data += val
elif type(val) == dict:
for key in val:
result_data.append(key)
count_elem = {elem: result_data.count(elem) for elem in result_data}
return count_elem
I call this function like this;
myNames_dict_1 = ['AME', 'IEX', 'PAYC']
myNames_dict_1 = ['ULTA', 'CSCO', 'PAYC']
mapper = {1: myNames_dict_1, 2: myNames_dict_2}
print(" These meet three values ", get_from_min_match(3))
print(" These meet four values ", get_from_min_match(4))
The output I get from these functions are as follows;
These meet three values {'ULTA', 'CSCO', 'SHW', 'MANH', 'TTWO', 'SAM', 'RHI', 'PAYC', 'AME', 'CCOI', 'RMD', 'AMD', 'UNH', 'AZO', 'APH', 'EW', 'FFIV', 'IEX', 'IDXX', 'ANET', 'SWKS', 'HRL', 'ILMN', 'PGR', 'ATVI', 'CNS', 'EA', 'ORLY', 'TSCO'}
These meet four values {'EW', 'PAYC', 'TTWO', 'AME', 'IEX', 'IDXX', 'ANET', 'RMD', 'SWKS', 'HRL', 'UNH', 'CCOI', 'ORLY', 'APH', 'PGR', 'TSCO'}
Now, I want to insert the output, of the get_from_min_match function into a Sqlite database. Its structure looks like this;
dbase.execute("INSERT OR REPLACE INTO min_match (DATE, SYMBOL, NAME, NUMBEROFMETRICSMET) \
VALUES (?,?,?,?)", (datetime.today(), symbol, name, NUMBEROFMETRICSMET?))
dbase.commit()
So, it's basically a new function to calculate the "NUMBEROFMETRICSMET" parameter rather than calling each of these functions many times. And I want the output of the function inserted into the database. How to achieve this? Here 3, 4 would be the number of times the companies matched.
date ULTA name 3
date EW name 4
...
should be the result.
How can I achieve this? Thanks!
I fixed this by just using my already written function;
count_elem = generate_elem_count()
print("Count Elem: " + str(count_elem))
This prints {'AMPY': 1} and so on.

How to map nested list to flat values

I`m trying to parse a spreadsheet with a header that looks something like this:
My problem is those nested keys below "Контрагент". I decided to parse it like this:
['Дата',
'Номер документа',
'Дебет',
'Кредит',
['Контрагент',
['Наименование', 'ИНН', 'КПП', 'Счет', 'БИК', 'Наименование банка']],
'Назначение платежа',
'Код дебитора',
'Тип документа']
But now, I don`t really have an idea as how to map it to a flat list of values:
['21.05.2021',
'591324565436',
'0.00',
'526345428.99',
'asdasd',
'234525460140679',
'77130100123412341',
'302328105423534200000000280',
'0445252345234974',
'asdfsadfsd',
'sdfghsfgdhfdghdfgh',
'',
'dfghfgdhfdgh']
Given these variables, I want a function to return following dict:
{
"Дата": "21.05.2021",
"Номер документа": "591324565436",
"Дебет": "0.00",
"Кредит": "526345428.99",
"Контрагент": {
"Наименование": "asdasd",
"ИНН": "234525460140679",
"КПП": "77130100123412341",
"Счет": "302328105423534200000000280",
"БИК": "0445252345234974",
"Наименование банка": "asdfsadfsd"
},
"Назначение платежа": "sdfghsfgdhfdghdfgh",
"Код дебитора": "",
"Тип документа": "dfghfgdhfdgh"
}
I've gone this far without realizing it'd be raising IndexError on the 3rd line:
def map_to_schema(schema, data):
for i, elem in enumerate(data):
key = schema[i]
if isinstance(key, list):
if key[0] not in result:
result[key[0]] = {}
result[key[0]] |= {
key[1][i-len(key)]: elem
}
else:
result[key] = elem
What should I do? Maybe the structure for the schema isn't good enough? I really have no idea...
You could use a dictionary comprehension and an iterator:
headers = ['Дата', 'Номер документа', 'Дебет', 'Кредит',
['Контрагент', ['Наименование', 'ИНН', 'КПП', 'Счет', 'БИК', 'Наименование банка']],
'Назначение платежа', 'Код дебитора', 'Тип документа']
values = ['21.05.2021', '591324565436', '0.00', '526345428.99', 'asdasd', '234525460140679', '77130100123412341',
'302328105423534200000000280', '0445252345234974', 'asdfsadfsd', 'sdfghsfgdhfdghdfgh', '',
'dfghfgdhfdgh']
it = iter(values)
out = {k[0] if (islist := isinstance(k, list)) else k:
{k2: next(it) for k2 in k[1]} if islist else next(it)
for k in headers}
output:
{'Дата': '21.05.2021',
'Номер документа': '591324565436',
'Дебет': '0.00',
'Кредит': '526345428.99',
'Контрагент': {'Наименование': 'asdasd',
'ИНН': '234525460140679',
'КПП': '77130100123412341',
'Счет': '302328105423534200000000280',
'БИК': '0445252345234974',
'Наименование банка': 'asdfsadfsd'},
'Назначение платежа': 'sdfghsfgdhfdghdfgh',
'Код дебитора': '',
'Тип документа': 'dfghfgdhfdgh'}
Thanks #mozway for this solution! This is essentially the same algorithm, using a for loop.
def map(schema, s_length, row: list):
# If len(row) was less then *true* schema length, it would have thrown StopIteration.
# I ended up just extending row list by delta elements.
if (delta := s_length - len(row)) > 0:
row.extend([""] * delta)
iter_row = iter(row)
result = {}
for key in schema:
if isinstance(key, list):
result[key[0]] = {}
for sub_key in key[1]:
result[key[0]][sub_key] = next(iter_row)
else:
result[key] = next(iter_row)
return result

Matching a list's item with another list in python

I have list1 let's say:
items=['SETTLEMENT DATE:', 'CASH ACCOUNT:', 'ISIN:', 'TRADE DATE:', 'PRICE CFA', 'CASH ACCOUNT:', 'SECURITY NAME:']
I have a list2 let's say:
split_t=['{1:F01SCBLMUMUXSSU0438794344}{2:O5991054200218SCBLGHACXSSU04387943442002181454N}{3:{108:2175129}}{4:', ':20:EPACK', 'SALE', 'CDI', ':21:EPACK', 'SALE', 'CDI', ':79:ATTN:MU', 'TEAM', 'KINDLY', 'ACCEPT', 'THIS', 'AS', 'AUTHORISATION', 'TO', 'SETTLE', 'TRADE', 'WITH', 'DETAILS', 'BELOW', 'MARKET:', 'COTE', 'DIVOIRE', 'CLIENT', 'NAME:', 'EPACK', 'OFFSHORE', 'ACCOUNT', 'NAME:', 'STANDARD', 'CHARTERED', 'GHANA', 'NOMINEE', 'RE', 'DATABANK', 'EPACK', 'INVESTMENT', 'FUND', 'LTD', 'IVORY', 'COAST', 'TRADE', 'TYPE:', 'DELIVER', 'AGAINST', 'PAYMENT', 'SCA:', '2CEPACKIVO', 'CASH', 'ACCOUNT:', '420551901501', 'TRADE', 'DETAILS:', 'TRADE', 'DATE:', '17.02.2020', 'SETTLEMENT', 'DATE:', '20.02.2020', 'SECURITY', 'NAME:', 'SONATEL', 'ISIN:', 'SN0000000019', 'CLEARING', 'BIC:', 'SCBLCIABSSUXXX', 'QUANTITY:', '10,500', 'PRICE', 'CFA', '14,500.4667', 'CONSIDERATION', 'CFA', '152,254,900.00', 'TOTAL', 'FEES', '1,796,608.00', 'SETTLEMENT', 'AMOUNT', 'CFA', '150,458,292.35', 'CURRENCY:', 'CFA', 'AC:', 'CI0000010373', 'REGARDS', 'STANDARD', 'CHARTERED', 'BANK', '-}']
I want to search contiguously the items of list1 in list2 and return the immediate next element of list2 when there's a match.
As you can see, one item of list1 is probably two contiguous item in list2.
For example, the 1st element of list1, 'SETTLEMENT DATE:', There's a match in list2 and I want to return the next element of the match in list2, '20.02.2020'.
I have written my python function accordingly:
def test(items, split_t):
phrases = [w for w in items]
for i, t in enumerate(split_t):
to_match = split_t[i+1: i+1+len(phrases)]
if to_match and all(p == m for p,m in zip(phrases, to_match)):
return [*map(lambda x:split_t[i])]
Which is returning None even when it has matches as you can see. I might be wrong in implementing the *map in the return statement which I'm failing to understand from debugging. Any help is highly appreciated.
One way is:
>>> import re
>>> def test(items, split_t):
... split_t_str = ' '.join(split_t)
... res = {}
... for i in items:
... m = re.search(rf'(?<={i})\s(.*?)\s', split_t_str)
... res[i] = m.group(1)
... return res
...
>>> test(items, split_t)
{'SETTLEMENT DATE:': '20.02.2020', 'CASH ACCOUNT:': '420551901501', 'ISIN:': 'SN0000000019', 'TRADE DATE:': '17.02.2020', 'PRICE CFA': '14,500.4667', 'SECURITY NAME:': 'SONATEL'}
The above:
creates a str from split_t, i.e., split_t_str,
iterates over items using each element to construct a regex for performing a positive lookbehind assertion (see re's docs) against split_t_str,
stores each element as key in a dict, called res, and the corresponding match as value, and
returns the dict
If there is no spaces in "list 2" items. This way you can.
def match(l1, l2):
result = []
string = ' '.join(l2) + ' '
for i in l1:
index = string.find(i)
if index != -1:
result.append(string[index + len(i) + 1:string.find(' ', index + len(i) + 1)])
return result
print(match(items, split_t))
Output:
['20.02.2020', '420551901501', 'SN0000000019', '17.02.2020', '14,500.4667', '420551901501', 'SONATEL']

Trouble getting right values against each item

I'm trying to parse the item names and it's corresponding values from the below snippet. dt tag holds names and dd containing values. There are few dt tags which do not have corresponding values. So, all the names do not have values. What I wish to do is keep the values blank against any name if the latter doesn't have any values.
These are the elements I would like to scrape data from:
content="""
<div class="movie_middle">
<dl>
<dt>Genres:</dt>
<dt>Resolution:</dt>
<dd>1920*1080</dd>
<dt>Size:</dt>
<dd>1.60G</dd>
<dt>Quality:</dt>
<dd>1080p</dd>
<dt>Frame Rate:</dt>
<dd>23.976 fps</dd>
<dt>Language:</dt>
</dl>
</div>
"""
I've tried like below:
soup = BeautifulSoup(content,"lxml")
title = [item.text for item in soup.select(".movie_middle dt")]
result = [item.text for item in soup.select(".movie_middle dd")]
vault = dict(zip(title,result))
print(vault)
It gives me messy results (wrong pairs):
{'Genres:': '1920*1080', 'Resolution:': '1.60G', 'Size:': '1080p', 'Quality:': '23.976 fps'}
My expected result:
{'Genres:': '', 'Resolution:': '1920*1080', 'Size:': '1.60G', 'Quality:': '1080p','Frame Rate:':'23.976 fps','Language:':''}
Any help on fixing the issue will be highly appreciated.
You can loop through the elements inside dl. If the current element is dt and the next element is dd, then store the value as the next element, else set the value as empty string.
dl = soup.select('.movie_middle dl')[0]
elems = dl.find_all() # Returns the list of dt and dd
data = {}
for i, el in enumerate(elems):
if el.name == 'dt':
key = el.text.replace(':', '')
# check if the next element is a `dd`
if i < len(elems) - 1 and elems[i+1].name == 'dd':
data[key] = elems[i+1].text
else:
data[key] = ''
You can use BeautifulSoup to parse the dl structure, and then write a function to create the dictionary:
from bs4 import BeautifulSoup as soup
import re
def parse_result(d):
while d:
a, *_d = d
if _d:
if re.findall('\<dt', a) and re.findall('\<dd', _d[0]):
yield [a[4:-5], _d[0][4:-5]]
d = _d[1:]
else:
yield [a[4:-5], '']
d = _d
else:
yield [a[4:-5], '']
d = []
print(dict(parse_result(list(filter(None, str(soup(content, 'html.parser').find('dl')).split('\n')))[1:-1])))
Output:
{'Genres:': '', 'Resolution:': '1920*1080', 'Size:': '1.60G', 'Quality:': '1080p', 'Frame Rate:': '23.976 fps', 'Language:': ''}
For a slightly longer, although cleaner solution, you can create a decorator to strip the HTML tags of the output, thus removing the need for the extra string slicing in the main parse_result function:
def strip_tags(f):
def wrapper(data):
return {a[4:-5]:b[4:-5] for a, b in f(data)}
return wrapper
#strip_tags
def parse_result(d):
while d:
a, *_d = d
if _d:
if re.findall('\<dt', a) and re.findall('\<dd', _d[0]):
yield [a, _d[0]]
d = _d[1:]
else:
yield [a, '']
d = _d
else:
yield [a, '']
d = []
print(parse_result(list(filter(None, str(soup(content, 'html.parser').find('dl')).split('\n')))[1:-1]))
Output:
{'Genres:': '', 'Resolution:': '1920*1080', 'Size:': '1.60G', 'Quality:': '1080p', 'Frame Rate:': '23.976 fps', 'Language:': ''}
from collections import defaultdict
test = soup.text.split('\n')
d = defaultdict(list)
for i in range(len(test)):
if (':' in test[i]) and (':' not in test[i+1]):
d[test[i]] = test[i+1]
elif ':' in test[i]:
d[test[i]] = ''
d
defaultdict(list,
{'Frame Rate:': '23.976 fps',
'Genres:': '',
'Language:': '',
'Quality:': '1080p',
'Resolution:': '1920*1080',
'Size:': '1.60G'})
The logic here is that you know that every key will have a colon. Knowing this, you can write an if else statement to capture the unique combinations, whether that is key followed by key or key followed by value
Edit:
In case you wanted to clean your keys, below replaces the : in each one:
d1 = { x.replace(':', ''): d[x] for x in d.keys() }
d1
{'Frame Rate': '23.976 fps',
'Genres': '',
'Language': '',
'Quality': '1080p',
'Resolution': '1920*1080',
'Size': '1.60G'}
The problem is that empty elements are not present. Since there is no hierarchy between the <dt> and the <dd>, I'm afraid you'll have to craft the dictionary yourself.
vault = {}
category = ""
for item in soup.find("dl").findChildren():
if item.name == "dt":
if category == "":
category = item.text
else:
vault[category] = ""
category = ""
elif item.name == "dd":
vault[category] = item.text
category = ""
Basically this code iterates over the child elements of the <dl> and fills the vault dictionary with the values.

How to uncollapse a list with coma separated columns in python?

I'm working in a code and I have a list like:
listA = (
['name1', 'A11,A12,A13', 'B11,B12,B13', 'C11,C12,C13'],
['name2', 'A21,A22', 'B21,B22', 'C21,C22'],
['name3', 'A31,A32,A33,A34,A35', 'B31,B32,B33,B34,B35', 'C31,C32,C33,C34,C35' ],
)
and I need to get:
listA = (
['name1', 'A11', 'B11', 'C11'],
['name1', 'A12', 'B12', 'C12'],
['name1', 'A13', 'B13', 'C13'],
['name2', 'A21', 'B21', 'C21'],
['name2', 'A22', 'B22', 'C22'],
['name3', 'A31', 'B31', 'C31'],
['name3', 'A32', 'B32', 'C32'],
['name3', 'A33', 'B33', 'C33'],
['name3', 'A34', 'B34', 'C34'],
['name3', 'A35', 'B35', 'C35'],
)
please help me, I'm staked.
Thanks for your time.
list_b = []
for x in list_a:
i = iter(x)
name = next(i)
list_b.extend((name,) + t for t in zip(*(y.split(",") for y in i)))
Kinda ugly, but...
listB = []
for tup in listA:
tmptup = []
for elt in tup:
splt = elt.split(',')
for n in splt:
tmptup.append(n)
listB.append(tmptup)
listA = tuple([name, a, b, c] for (name, aas, bbs, ccs) in listA
for (a, b, c) in zip(aas.split(','), bbs.split(','), ccs.split(',')))
def uncollapse(L):
temp = []
answer = []
for item in L:
temp= [item[0]] + [i.split(',') for i in item]
for i in range(len(temp[1])):
answer.append([temp[0]] + [zip(*temp[1:])[i]])
return answer
Tested and working

Categories