How to create unique list from list with duplicates

How to create unique list from list with duplicates - python

I know how to remove duplicates from a list using set() or two lists, but how do I maintain the same list and add a number at the end for duplicates? I could do it using if, but it´s not pythonic. Thanks guys!!
nome_a = ['Anthony','Rudolph', 'Chuck', 'Chuck', 'Chuck', 'Rudolph', 'Bob']
nomes = []
for item in nome_a:
if item in nomes:
if (str(item) + ' 5') in nomes:
novoitem = str(item) + ' 6'
nomes.append(novoitem)
if (str(item) + ' 4') in nomes:
novoitem = str(item) + ' 5'
nomes.append(novoitem)
if (str(item) + ' 3') in nomes:
novoitem = str(item) + ' 4'
nomes.append(novoitem)
if (str(item) + ' 2') in nomes:
novoitem = str(item) + ' 3'
nomes.append(novoitem)
else:
novoitem = str(item) + ' 2'
nomes.append(novoitem)
if item not in nomes:
nomes.append(item)
print(nomes)
Edit(1): Sorry. I edited for clarification.

You could use the following:
names = ['Anthony','Rudolph', 'Chuck', 'Chuck', 'Chuck', 'Rudolph', 'Bob']
answer = []
name_dict = {}
for name in names:
if name_dict.get(name):
name_dict[name] += 1
answer.append('{}_{}'.format(name, name_dict[name]))
else:
name_dict[name] = 1
answer.append(name)
print(answer)
Output
['Anthony', 'Rudolph', 'Chuck', 'Chuck_2', 'Chuck_3', 'Rudolph_2', 'Bob']

Related

TypeError when parsing XML

I have an XML file of metadata on dissertations and I'm trying to get the author name as a single string. Names in the XML look like this:
<DISS_name>
<DISS_surname>Clark</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle/>
<DISS_suffix/>
</DISS_name>
All names have first and last names, but only some have middle names and/or suffixes. Here is my code:
author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle')
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix')
if author_mname is not None and author_suffix is not None:
author_name = author_surname + ', ' + author_fname + author_mname.text + ', ' + author_suffix.text
if author_mname is not None and author_suffix is None:
author_name = author_surname + ', ' + author_fname + author_mname.text
if author_mname is None and author_suffix is None:
author_name = author_surname + ', ' + author_fname
Why am I getting this output and how can I fix it?
Traceback (most recent call last):
File "C:\Users\bpclark2\pythonProject3\prqXML-to-dcCSV.py", line 185, in <module>
author_name = author_surname + ', ' + author_fname + author_mname.text + author_suffix.text
TypeError: can only concatenate str (not "NoneType") to str
Revised code:
author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle').text or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix').text or ''
author_name = author_surname + ', ' + author_fname + ' ' + str(author_mname.strip().title()) + str(', ' + author_suffix.strip().title())
row.append(author_name)
This gets the output I was looking for:
author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle').text or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix').text or ''
author_name = author_surname + ', ' + author_fname + ' ' + author_mname.strip().title() + ', ' + author_suffix.strip().title()
if author_mname != '' and author_suffix != '':
author_name = author_surname + ', ' + author_fname + ' ' + author_mname.strip().title() + ', ' + author_suffix.strip().title()
row.append(author_name)
if author_mname != '' and author_suffix == '':
author_name = author_surname + ', ' + author_fname + ' ' + author_mname.strip().title()
row.append(author_name)
if author_mname == '' and author_suffix != '':
author_name = author_surname + ', ' + author_fname + ', ' + author_suffix.strip().title()
row.append(author_name)
if author_mname == '' and author_suffix == '':
author_name = author_surname + ', ' + author_fname
row.append(author_name)

What about changing your code to something like this:
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle') or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix') or ''
Also you could add str casts like:
... + str(author_suffix.text)
And if you are on new python please use f-strings! Life is much easier with them.

I'd keep everything simple with just minor edits of code. You can use an XPath .//DISS_name to find all <DISS_name> nodes and then just unpack it into a separate variables with corresponding names. Code:
import xml.etree.ElementTree as ET
data = """\
<DISS_authorship>
<DISS_author>
<DISS_name>
<DISS_surname>Clark</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle/>
<DISS_suffix/>
</DISS_name>
</DISS_author>
</DISS_authorship>"""
root = ET.fromstring(data)
row = []
for name_node in root.iterfind(".//DISS_name"):
surname, fname, middle, suffix = name_node # 4 child nodes in this order
name_str = surname.text + ", " + fname.text
if middle.text:
name_str += " " + middle.text
if suffix.text:
name_str += ", " + suffix.text
row.append(name_str)
Or even shorter:
import xml.etree.ElementTree as ET
data = ...
root = ET.fromstring(data)
row = []
for (surname, fname, middle, suffix) in root.iterfind(".//DISS_name"):
name_str = surname.text + ", " + fname.text
if middle.text:
name_str += " " + middle.text
if suffix.text:
name_str += ", " + suffix.text
row.append(name_str)

A shorter concept below
import xml.etree.ElementTree as ET
xml = '''<r><DISS_name>
<DISS_surname>Clark</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle/>
<DISS_suffix/>
</DISS_name>
<DISS_name>
<DISS_surname>Jack</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle>Smith</DISS_middle>
<DISS_suffix/>
</DISS_name>
</r>'''
root = ET.fromstring(xml)
for name in root.findall('.//DISS_name'):
parts = [name.find(f'DISS_{f}').text for f in ['surname','fname','middle','suffix'] if name.find(f'DISS_{f}').text is not None ]
print(", ".join(parts))
output
Clark, Brian
Jack, Brian, Smith

how to remove elements with a common characters in a list?

i would like to remove elements of the form ' + 0x^n' (except the last one if its in the form ' + 0x^0') from this list:
polynomial = ['-7x^5', ' + 0x^4', ' + 0x^3', ' + 4x^2', ' + 4x^1', ' + 2x^0']
i.e. the output should look like this:
['-7x^5', ' + 4x^2', ' + 4x^1', ' + 2x^0']
i tried looping through each element in elements followed by an if statement that would remove list elements with the third index being '0'(see code below)
res = []
for elements in range(0, len(polynomial) - 1):
if polynomial[elements][3] == '0':
polynomial.remove(polynomial[elements])
res.append(polynomial)
else:
res.append(polynomial)
print(res[0])

Try :
polynomial = ['-7x^5', ' + 0x^4', ' + 0x^3', ' + 4x^2', ' + 4x^1', ' + 2x^0',' + 0x^0']
res=[]
for p in polynomial:
if p==' + 0x^0' or p[:-1]!=' + 0x^':
res.append(p)
print(res) #['-7x^5', ' + 4x^2', ' + 4x^1', ' + 2x^0', ' + 0x^0']

Without regex :
polynomial = ['-7x^5', ' + 0x^4', ' + 0x^3', ' + 4x^2', ' + 4x^1', ' + 2x^0',' + 0x^0']
res = [i for i in polynomial if "0x^" not in i or "0x^0" in i]
print(res)

You can try:
polynomial = ['-7x^5', ' + 0x^4', ' + 0x^3', ' + 4x^2', ' + 4x^1', ' + 2x^0', '0x^2']
res = []
s = "0x^" # delete string tag, you can input
for i in polynomial:
if s not in i:
res.append(i)
print(res)

Difficulty parsing a section of XML file with ElementTree

I have written the code below to parse this XML file. You can see it's still a bit messy, but that I'm on the right track for most of it.
You can see one part that I'm stuck on is the 'targets' section (I've left the code that I've tried for this section in here with triple quotes, but you can see that section doesn't work).
I'm wondering if someone could help show me where I'm going wrong/how to parse the targets section? If you look at the HTML of the XML file here, I basically just want to extract the information in the targets section, for each gene/entry (or if it was possible, there seems to be more info in the targets section of the XML file, so if I could take that either)?
Thanks
import requests
import xml.etree.ElementTree as ET
import urllib2
#get the XML file
#response = requests.get('https://www.drugbank.ca/drugs/DB01048.xml')
#with open('output.txt', 'w') as input:
# input.write(response.content)
tree = ET.parse('output.txt')
root = tree.getroot()
val = lambda x: "{http://www.drugbank.ca}" + str(x)
key_list = ['drugbank-id','name','description','cas-number','unii','average-mass','monoisotopic-mass','state','indication','pharmacodynamics','mechanism-of-action','toxicity','metabolism','absorption','half-life','protein-binding','route-of-elimination','volume-of-distribution','fda-label','msds']
key_dict = {}
for i in key_list:
for child in root.getchildren():
key_dict[i] = child.find(val(i)).text.encode('utf-8')
#print key_dict
def method1(str_name,list_name):
if subnode.tag == str_name:
list_name = []
for i in subnode:
list_name.append(i.text)
return list_name
def method2(list1_name,list2_name,list3_name,list4_name):
if subnode.tag == list1_name:
for i in subnode:
if i.tag == list2_name:
for a in i:
if a.tag == list3_name:
for u in a:
if u.tag == list4_name:
yield u.text
def method3(list1_name, list2_name):
list_of_tuples = []
if subnode.tag == list1_name:
for i in subnode:
if i.tag == list2_name:
temp_list = []
for a in i:
temp_list.append(a.text)
list_of_tuples.append(temp_list)
return list_of_tuples
alternative_parents = []
substituents = []
list_to_run_thru = ['description','direct-parent','kingdom','superclass','class','subclass']
ap_sub = lambda x:'{http://www.drugbank.ca}'+ x
for node in root:
for subnode in node:
print method1('{http://www.drugbank.ca}groups','group_list')
print method1('{http://www.drugbank.ca}synonyms','synonym_list')
print method1('{http://www.drugbank.ca}patent','patent_list')
print method2('{http://www.drugbank.ca}general-references','{http://www.drugbank.ca}articles','{http://www.drugbank.ca}article','{http://www.drugbank.ca}pubmed-id')#
if subnode.tag == '{http://www.drugbank.ca}classification':
for each_item in list_to_run_thru:
for i in subnode:
if i.tag == ap_sub(each_item):
print i.text
if i.tag == '{http://www.drugbank.ca}alternative-parent':
alternative_parents.append(i.text)
if i.tag == '{http://www.drugbank.ca}substituent':
substituents.append(i.text)
print method3('{http://www.drugbank.ca}salts','{http://www.drugbank.ca}salt')
print method3('{http://www.drugbank.ca}products','{http://www.drugbank.ca}product')
print method3('{http://www.drugbank.ca}mixtures','{http://www.drugbank.ca}mixture')
print method3('{http://www.drugbank.ca}packagers','{http://www.drugbank.ca}packager')
print method3('{http://www.drugbank.ca}categories','{http://www.drugbank.ca}category')
print method3('{http://www.drugbank.ca}dosages','{http://www.drugbank.ca}dosage')
print method3('{http://www.drugbank.ca}atc-codes','{http://www.drugbank.ca}atc-code')
print method3('{http://www.drugbank.ca}ahfs-codes','{http://www.drugbank.ca}ahfs-code')
print method3('{http://www.drugbank.ca}pdb-entries','{http://www.drugbank.ca}pdb-entry')
print method3('{http://www.drugbank.ca}food-interactions','{http://www.drugbank.ca}food-interaction')
print method3('{http://www.drugbank.ca}drug-interactions','{http://www.drugbank.ca}drug-interaction')
print method3('{http://www.drugbank.ca}calculated-properties','{http://www.drugbank.ca}property')
print method3('{http://www.drugbank.ca}external-identifiers','{http://www.drugbank.ca}external-identifier')
print method3('{http://www.drugbank.ca}external-links','{http://www.drugbank.ca}external-link')
print method3('{http://www.drugbank.ca}snp-adverse-drug-reactions','{http://www.drugbank.ca}reaction')
print substituents
print alternative_parents
'''
if subnode.tag == '{http://www.drugbank.ca}pathways':
for i in subnode:
if i.tag == '{http://www.drugbank.ca}pathway':
for a in i:
print a.text
for u in a:
if u.tag == '{http://www.drugbank.ca}drug':
for x in u:
print x.text
#missing a bit of data here
if subnode.tag == '{http://www.drugbank.ca}targets':
for i in subnode:
if i.tag == '{http://www.drugbank.ca}target':
print i.text
for a in i:
print a.text
if a.tag == '{http://www.drugbank.ca}actions':
for u in a:
print u.text
if a.tag == '{http://www.drugbank.ca}references':
for u in a:
if u.tag == '{http://www.drugbank.ca}articles':
for x in u:
if x.tag == '{http://www.drugbank.ca}article':
for z in x:
print z.text
'''

I used BeautifulSoup for parsing because it is a simple library.
Code:
import pprint
import requests
from bs4 import BeautifulSoup
html = requests.get('https://www.drugbank.ca/drugs/DB01048#BE0004136').text
soup = BeautifulSoup(html, 'html.parser')
div_targets = soup.find('div', class_='bond-list-container targets')
targets = div_targets.find_all('div', class_='bond card')
t = {}
for target in targets:
k = []
v = []
for property in target.find_all('dt'):
k.append(property.get_text())
for property in target.find_all('dd'):
v.append(property.get_text())
t[target.find('strong').get_text()] = dict(zip(k, v))
pprint.pprint(t)
Output:
{'1. Reverse transcriptase/RNaseH': {'Actions': 'Inhibitor',
'Gene Name': 'pol',
'General Function': 'Rna-dna hybrid '
'ribonuclease '
'activity',
'Kind': 'Protein',
'Molecular Weight': '65223.615 Da',
'Organism': 'Human immunodeficiency virus '
'1',
'Pharmacological action': 'Yes',
'Specific Function': 'Not Available',
'Uniprot ID': 'Q72547',
'Uniprot Name': 'Reverse '
'transcriptase/RNaseH'},
'2. HLA class I histocompatibility antigen, B-57 alpha chain': {'Gene Name': 'HLA-B',
'General Function': 'Involved '
'in '
'the '
'presentation '
'of '
'foreign '
'antigens '
'to '
'the '
'immune '
'system.',
'Kind': 'Protein',
'Molecular Weight': '40223.825 '
'Da',
'Organism': 'Human',
'Pharmacological action': 'Unknown',
'Specific Function': 'Peptide '
'antigen '
'binding',
'Uniprot ID': 'P18465',
'Uniprot Name': 'HLA '
'class '
'I '
'histocompatibility '
'antigen, '
'B-57 '
'alpha '
'chain'}}

Mix columns in CSV?

I have a csv file and I need to mix 2 of its columns:
Sitio, ID_espacio, Espacio, Tamano, Country, Impresiones_exchange, Importe_a_cobrar, eCPM, Subastas, Fill_rate
NUEVO_Infotechnology, 264244, NUEVO_Infotechnology - Home_IT - IT_Header, Variable (1240x90), Bangladesh, 0, 0.00, 0.00, 1, 0.00
NUEVO Apertura, 274837, NUEVO Apertura - Nota_Ap - Right3_300x250, 300x250, Paises Bajos, 0, 0.00, 0.00, 4, 0.00
The problem is I need to mix ID_espaciowith Espacio but in this way:
example:
NUEVO_Infotechnology, 264244, NUEVO_Infotechnology - Home_IT - IT_Header, Variable (1240x90), Bangladesh, 0, 0.00, 0.00, 1, 0.00
What I need:
NUEVO_Infotechnology, 264244 - Home_IT - IT_Header, Variable (1240x90), Bangladesh, 0, 0.00, 0.00, 1, 0.00
As you can see I remove the first name of the Espacio until the '-' and then i put the ID_espacio.
I tried to do it and I could but the now I need to have all the csv and not only my modification:
import csv
lista_ok = []
base = []
with open("test.csv", 'rb') as f:
reader = csv.reader(f)
your_list = list(reader)
for item in your_list[1:]:
a = item[2].split(" - ")
base.append(a)
for item in base:
for itemf in your_list[1:]:
b = []
a = itemf[1] + ' - ' + ' - '.join(item[1:])
b.append(a)
lista_ok.append(b)
Output:
[[' 264244 - Home_IT - IT_Header'], [' 274837 - Home_IT - IT_Header'], [' 264244 - Nota_Ap - Right3_300x250'], [' 274837 - Nota_Ap - Right3_300x250']]
Output I need:
[['Sitio', ' ID_espacio', ' Espacio', ' Tamano', ' Country', ' Impresiones_exchange', ' Importe_a_cobrar', ' eCPM', ' Subastas', ' Fill_rate'], ['NUEVO_Infotechnology', ' 264244 - Home_IT - IT_Header', ' Variable (1240x90)', ' Bangladesh', ' 0', ' 0.00', ' 0.00', ' 1', ' 0.00'], ['NUEVO Apertura', ' 274837 - Nota_Ap - Right3_300x250', ' 300x250', ' Paises Bajos', ' 0', ' 0.00', ' 0.00', ' 4', ' 0.00']]

Here another version:
import csv
lista_ok = []
with open("test.csv", 'rb') as f:
reader = csv.reader(f)
your_list = list(reader)
for item in your_list:
sitio = item[0]
id_espacio = item[1]
item.remove(id_espacio)
espacio_parts = item[1].split(' - ')
if your_list.index(item) > 0:
espacio_parts[0] = espacio_parts[0].lstrip().replace(sitio,id_espacio)
espacio = ' - '.join(espacio_parts)
item[1] = espacio
lista_ok.append(item)

You could write a function that transforms a single row the way you want. Then call that function for each row as you read it from the file and put it in your final list:
def row_transform(row, is_header=False):
if not is_header:
# trim Sitio from Espacio
row[2] = row[2].split(" - ", 1)[1]
# add ID to espacio
row[2] = " - ".join((row[1], espacio))
# remove ID col
del row[1]
return row
with open("test.csv") as fp:
reader = csv.reader(fp)
lista_ok = [row_transform(next(reader), True)]
lista_ok.extend((row_transform(row) for row in reader))

Function that transforms a list in a list of dictionary

I have to make a function whose purpose is taking in parameter a list.
Such as this one :
['music', ' extension=mp3', 'reports/INFOB131', ' extension=doc,docx,pdf', ' name_contains=INFOB131', ' max_size=100000', 'reports/INFOB132', ' extension=doc,docx,pdf', ' name_contains=INFOB132', ' max_size=100000', 'games', ' name_contains=SC2,Wesnoth', 'pictures/Namur', ' extension=jpeg', ' min_size=5000000', ' name_contains=cercle', 'pictures/autres', ' extension=jpeg', ' min_size=5000000']
And return a list similar to this :
data_config = [{'music' : {'extension':'mp3'}}, {'reports/INFOB131': {'extension': ['doc', 'docx','pdf'], 'name_contains':'INFOB131', 'max_size':100000}}, {'reports/INFOB132': {'extension': ['doc', 'docx','pdf'], 'name_contains':'INFOB132', 'max_size':100000}}]
So I made that function :
def my_function(list_in_question, my_config_list =[], prev_list = []):
""" """
enumerated_list = list(enumerate(list_in_question))
if not '=' in enumerated_list[0][1]:
main_key = enumerated_list[0][1]# référencé avant assignement
pre_dict = {main_key : {}}
for i in enumerated_list[1:]:
if '=' in i[1] :
splitted = i[1].split('=')
prev_list.append({splitted[0] : splitted[1]})
elif not '=' in i[1] and i[1] != main_key:
for j in prev_list:
pre_dict[main_key].update(j)
my_config_list.append(pre_dict)
return my_function(list_in_question[i[0]:])
elif not '=' in i[1] and i[1] == main_key and main_key!= enumerated_list[0][1]:
return my_config_list
else:
print("The format of the file containig the data in not adequate !")
But I don't understand why when I execute it this way :
new_lines = ['music', ' extension=mp3', '', 'reports/INFOB131', ' extension=doc,docx,pdf', ' name_contains=INFOB131', ' max_size=100000', '', 'reports/INFOB132', ' extension=doc,docx,pdf', ' name_contains=INFOB132', ' max_size=100000', '', 'games', ' name_contains=SC2,Wesnoth', '', 'pictures/Namur', ' extension=jpeg', ' min_size=5000000', ' name_contains=cercle', '', 'pictures/autres', ' extension=jpeg', ' min_size=5000000']
my_function(new_lines)
I end up with this output...
None
I would be very grateful if someone could help me,
Thank you !
PS : If anyone have an idea of how I could do without loop and do it in a recursive way, it would be awesome !
Everyone... Thank you !!! You really hepled me, all your answers are awesome, I have some issues to understand some parts so I'll be annoying just a little longer with some questions of you code. Anyway, thank you for the time you took to help me, you were all more than great help !!!

Try the following code;
def foo(my_list):
# Create an iterator
list_iter = iter(my_list)
# zip the iterator with itself
key_val_tuple = zip(list_iter, list_iter) # This will group two items in the list at a time
output_list = []
for i in key_val_tuple:
value_dict = {}
value = i[1].split('=')
value_dict[value[0]] = value[1].split(",") if len(value[1].split(","))>1 else value[1]
element_dict = {}
element_dict[i[0]] = value_dict
output_list.append(element_dict)
return output_list
input_list = ['music', ' extension=mp3', 'reports/INFOB131', ' extension=doc,docx,pdf', ' name_contains=INFOB131', ' max_size=100000', 'reports/INFOB132', ' extension=doc,docx,pdf', ' name_contains=INFOB132', ' max_size=100000', 'games', ' name_contains=SC2,Wesnoth', 'pictures/Namur', ' extension=jpeg', ' min_size=5000000', ' name_contains=cercle', 'pictures/autres', ' extension=jpeg', ' min_size=5000000']
# Call the function foo
output = foo(input_list)
print(output) # python3
Got the following output
[{'music': {' extension': 'mp3'}}, {'reports/INFOB131': {' extension': ['doc', 'docx', 'pdf']}}, {' name_contains=INFOB131': {' max_size': '100000'}}, {'reports/INFOB132': {' extension': ['doc', 'docx', 'pdf']}}, {' name_contains=INFOB132': {' max_size': '100000'}}, {'games': {' name_contains': ['SC2', 'Wesnoth']}}, {'pictures/Namur': {' extension': 'jpeg'}}, {' min_size=5000000': {' name_contains': 'cercle'}}, {'pictures/autres': {' extension': 'jpeg'}}]
zip(list_iter, list_iter) : This will group two items in the list at a time.
output : [('music', ' extension=mp3'), ('reports/INFOB131', ' extension=doc,docx,pdf'), ...]
Reference:
python zip()
What exactly are Python's iterator, iterable, and iteration protocols?
Convert List to a list of tuples python

You need to traverse the list one time. The pattern is this:
Start an empty list (let's call it new_list)
You find an element in the original list (original_list).
If it does not contain '=', you create a new dictionary in the new_list
If it contains the '=' sign, split the element into k and v (before and after the '='), and in the last entry in the new_list, for the only key, you add a key-value pair
def parse_list(original_list):
new_list=[]
for element in original_list:
if not '=' in element:
new_list.append({element:{}})
else:
k,w=element.split('=')
new_list[-1][new_list[-1].keys()[0]][k]=w
return new_list
new_lines = ['music', ' extension=mp3', '', 'reports/INFOB131', ' extension=doc,docx,pdf', ' name_contains=INFOB131', ' max_size=100000', '', 'reports/INFOB132', ' extension=doc,docx,pdf', ' name_contains=INFOB132', ' max_size=100000', '', 'games', ' name_contains=SC2,Wesnoth', '', 'pictures/Namur', ' extension=jpeg', ' min_size=5000000', ' name_contains=cercle', '', 'pictures/autres', ' extension=jpeg', ' min_size=5000000']
parse_list(new_lines)
Now I should explain the line before the return statement:
new_list[-1] is the dictionary corresponding to the last entry without an equal sign that was found in the original_list. After the first pass through the loop,
new_list=[{'music': {}}]
during the second pass
new_list[-1]={'music': {}}
new_list[-1].keys()=['music']
new_list[-1].keys()[0]='music'
new_list[-1][new_list[-1].keys()[0]]={}
now you just update this dictionary with the parsed k,w pair

One more way of doing it:
import re
def my_function(list_in_question, my_config_list=[], prev_list=[]):
""" """
result = {}
main_key = ''
for element in list_in_question:
if element == '':
main_key = ''
if re.search('=', element):
key, value = element.split('=')
print "key, value = ", key, value
if re.search(',', value):
value_list = value.split(',')
print "value list =", value_list
result[main_key][key] = value_list
else:
result[main_key][key] = value
else:
main_key = element
result[main_key] = {}
return (result)
new_lines = ['music', ' extension=mp3', '', 'reports/INFOB131', ' extension=doc,docx,pdf', ' name_contains=INFOB131',
' max_size=100000', '', 'reports/INFOB132', ' extension=doc,docx,pdf', ' name_contains=INFOB132',
' max_size=100000', '', 'games', ' name_contains=SC2,Wesnoth', '', 'pictures/Namur', ' extension=jpeg',
' min_size=5000000', ' name_contains=cercle', '', 'pictures/autres', ' extension=jpeg',
' min_size=5000000']
print (my_function(new_lines))

Yet another try with only lists and dicts:
def make(lst):
data_config=[]
for st in lst:
if '=' not in st: # new entry
dd = dict()
dds = dd[st] = dict()
data_config.append(dd)
else: # fill entry
k,v = st.split('=')
if ',' in v:
v = v.split(',')
dds[k] = v
return data_config
For :
In [564]: make(l)
Out[564]:
[{'music': {' extension': 'mp3'}},
{'reports/INFOB131': {' extension': ['doc', 'docx', 'pdf'],
' max_size': '100000',
' name_contains': 'INFOB131'}},
{'reports/INFOB132': {' extension': ['doc', 'docx', 'pdf'],
' max_size': '100000',
' name_contains': 'INFOB132'}},
{'games': {' name_contains': ['SC2', 'Wesnoth']}},
{'pictures/Namur': {' extension': 'jpeg',
' min_size': '5000000',
' name_contains': 'cercle'}},
{'pictures/autres': {' extension': 'jpeg', ' min_size': '5000000'}}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to create unique list from list with duplicates - python

Related

TypeError when parsing XML

how to remove elements with a common characters in a list?

Difficulty parsing a section of XML file with ElementTree

Mix columns in CSV?

Function that transforms a list in a list of dictionary

Categories

Resources