python transform complex list of lists into a string - python

I have a complex list of lists that looks like that :
[[['MARIA DUPONT',
' infos : ',
[' age = 28',
' yeux = bleus',
' sexe = femme']],
[' + ']],
[['PATRICK MARTIN',
' infos : ',
[' age = 53',
' yeux = marrons',
' sexe = homme']],
[' + ']],
[['JULIE SMITH',
' infos : ',
[' age = 17',
'yeux = verts',
'sexe = femme']],
[' fin ']]]
I am trying to transform it into a string. At the end I want to print that :
MARIA DUPONT,
infos :
age = 28
yeux = bleus
sexe = femme
+
PATRICK MARTIN
infos :
age = 53
yeux = marrons
sexe = homme
+
JULIE SMITH
infos :
age = 17
yeux = verts
sexe = femme
fin
My real data are more complicated and I have lists into level 5.
So I am looking for a way to solve the problem I explained to be able to adapt it and apply it to my real data.
I am trying with
''.join(list)
and
''.join(x for x in list)
But in both cases I have the error TypeError: list indices must be integers or slices, not list
I've tryed other ways but now I'm confused and I didn't found a good solution to reach my goal.
Any help would be appreciated, and thanks in advance. (and sorry for my bad english!)

You can use str.join with a single pass over the lists:
data = [[['MARIA DUPONT', ' infos : ', [' age = 28', ' yeux = bleus', ' sexe = femme']], [' + ']], [['PATRICK MARTIN', ' infos : ', [' age = 53', ' yeux = marrons', ' sexe = homme']], [' + ']], [['JULIE SMITH', ' infos : ', [' age = 17', 'yeux = verts', 'sexe = femme']], [' fin ']]]
r = '\n'.join('\n'.join([a, b, *c, f'\n{k}\n']) for [a, b, c], [k] in data)
Output:
MARIA DUPONT
infos :
age = 28
yeux = bleus
sexe = femme
+
PATRICK MARTIN
infos :
age = 53
yeux = marrons
sexe = homme
+
JULIE SMITH
infos :
age = 17
yeux = verts
sexe = femme
fin
If your lists are arbitrarily nested, then you can use recursion with a generator:
def flatten(d):
if isinstance(d, str):
yield d
else:
yield from [i for b in d for i in flatten(b)]
print('\n'.join(flatten(data)))

.join() won't work with a list in the list. I can offer you a solution based on recursion.
def list_to_str(_list):
result = ""
if isinstance(_list, list):
for l in _list:
result += list_to_str(l)
else:
result += _list
return result
result_string = list_to_str(your_list)
print(result_string)

I can't tell if you have a list with varying levels of lists but if so, you would probably need a conditional to see if the list goes further and recursively iterate the list.
def convert_list(dataset):
result = ''
for element in dataset:
if isinstance(element, list):
result += convert_list(element)
else:
result += str(element)
return result
This will not print the newlines you want but it does return the list as a string.

Write a recursive function to get inside your lists like below:
def print_data(input_list):
for obj in input_list:
if isinstance(obj, list):
print_data(obj)
else:
print(obj)
input_list = [[['MARIA DUPONT',
' infos : ',
[' age = 28',
' yeux = bleus',
' sexe = femme']],
[' + ']],
[['PATRICK MARTIN',
' infos : ',
[' age = 53',
' yeux = marrons',
' sexe = homme']],
[' + ']],
[['JULIE SMITH',
' infos : ',
[' age = 17',
'yeux = verts',
'sexe = femme']],
[' fin ']]]
print_data(input_list)

Related

There is a confusing issue in for loop on Python

This code:
user_data_list = [['Full Name', ' Email Address'],['Blossom Gill', ' blossom#abc.edu'],
['Hayes Delgado', ' nonummy#utnisia.com'], ['Petra Jones', ' ac#abc.edu'],
['Oleg Noel', ' noel#liberomauris.ca']]
old_domain_email_list = ['blossom#abc.edu','ac#abc.edu']
new_domain_email_list = ['blossom#xyz.edu','ac#xyz.edu']
for user in user_data_list[1:]:
for old_domain, new_domain in zip(old_domain_email_list, new_domain_email_list):
if user[1] == ' ' + old_domain:
user[1] = ' ' + new_domain
print(user_data_list)
The result:
[['Full Name', ' Email Address'], ['Blossom Gill', ' blossom#xyz.edu'], ['Hayes Delgado', ' nonummy#utnisia.com'], ['Petra Jones', ' ac#xyz.edu'], ['Oleg Noel', ' noel#liberomauris.ca']]
I really don't understand why the value of user_data_list list changed in this code.
As i can see, just the user variable that was unpacked in the for loop is changed when the if statement is true.
i have tried the same code and i adjust my_list list a bit differently. But the result is different than above code, my_list list did't changed
my_list = ['a','b','c','d']
old_my_list = ['b','d']
new_my_list = ['repalce_1','repalce_2']
for i in my_list:
for old_, new_ in zip(old_my_list,new_my_list):
if i == old_:
i= new_
print(my_list)
The result:
['a', 'b', 'c', 'd']
Though it unpacks, behind the scenes it is referring to the same element hence it is being effected. Look at the memory address it is poiting to the same in the below code.
user_data_list = [['Full Name', ' Email Address'],['Blossom Gill', ' blossom#abc.edu'],
['Hayes Delgado', ' nonummy#utnisia.com'], ['Petra Jones', ' ac#abc.edu'],
['Oleg Noel', ' noel#liberomauris.ca']]
print("External id -", id(user_data_list[0]))
for item in user_data_list:
print("internal for loop id -", id(item))
break
# Output
# External id - 2306933340288
# internal for loop id - 2306933340288
In addition to Abhi's answer, you can work around this behaviour by creating a deepcopy of the list. This will have a different memory address
from copy import deepcopy
user_data_list = [['Full Name', ' Email Address'],['Blossom Gill', ' blossom#abc.edu'],
['Hayes Delgado', ' nonummy#utnisia.com'], ['Petra Jones', ' ac#abc.edu'],
['Oleg Noel', ' noel#liberomauris.ca']]
original = deepcopy(user_data_list) # 140007211377088
# OR use the one below.
original = user_data_list[:] # 140007211479424
print(id(user_data_list))
print(id(original))
old_domain_email_list = ['blossom#abc.edu','ac#abc.edu']
new_domain_email_list = ['blossom#xyz.edu','ac#xyz.edu']
for user in user_data_list[1:]:
for old_domain, new_domain in zip(old_domain_email_list, new_domain_email_list):
if user[1] == ' ' + old_domain:
user[1] = ' ' + new_domain

Convert a list of tab prefixed strings to a dictionary

Text mining attempts here, I would like to turn the below:
a=['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n'
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
to this:
{'Colors.of.the universe':{Black:111,Grey:222,White:11},
'Movies of the week':{Mission Impossible:121,Die_Hard:123,Jurassic Park:33},
'Lands.categories.said': {Desert:33212,forest:4532,grassland:431,tundra:243451}}
Tried this code below but it was not good:
{words[1]:words[1:] for words in a}
which gives
{'o': 'olors.of.the universe:\n',
' ': ' tundra : 243451\n',
'a': 'ands.categories.said:\n'}
It only takes the first word as the key which is not what's needed.
A dict comprehension is an interesting approach.
a = ['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n',
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
result = dict()
current_key = None
for w in a:
# If starts with tab - its an item (under category)
if w.startswith(' '):
# Splitting item (i.e. ' Desert: 33212\n' -> [' Desert', ' 33212\n']
splitted = w.split(':')
# Setting the key and the value of the item
# Removing redundant spaces and '\n'
# Converting value to number
k, v = splitted[0].strip(), int(splitted[1].replace('\n', ''))
result[current_key][k] = v
# Else, it's a category
else:
# Removing ':' and '\n' form category name
current_key = w.replace(':', '').replace('\n', '')
# If category not exist - create a dictionary for it
if not current_key in result.keys():
result[current_key] = {}
# {'Colors.of.the universe': {'Black': 111, 'Grey': 222, 'White': 11}, 'Movies of the week': {'Mission Impossible': 121, 'Die_Hard': 123, 'Jurassic Park': 33}, 'Lands.categories.said': {'Desert': 33212, 'forest': 4532, 'grassland': 431, 'tundra': 243451}}
print(result)
That's really close to valid YAML already. You could just quote the property labels and parse. And parsing a known format is MUCH superior to dealing with and/or inventing your own. Even if you're just exploring base python, exploring good practices is just as (probably more) important.
import re
import yaml
raw = ['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n',
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
# Fix spaces in property names
fixed = []
for line in raw:
match = re.match(r'^( *)(\S.*?): ?(\S*)\s*', line)
if match:
fixed.append('{indent}{safe_label}:{value}'.format(
indent = match.group(1),
safe_label = "'{}'".format(match.group(2)),
value = ' ' + match.group(3) if match.group(3) else ''
))
else:
raise Exception("regex failed")
parsed = yaml.load('\n'.join(fixed), Loader=yaml.FullLoader)
print(parsed)

How to read a particular line of interest from a text file?

Here I have a text file. I want to read Adress, Beneficiary, Beneficiary Bank, Acc Nbr, Total US$, Date which is at the top, RUT, BOX. I tried writing some code by myself but I am not able to correctly get the required information and moreover if the length of character changes I will not get correct output. How should I do this such that I will get every required information in a particular string.
The main problem will arise when my slicings will go wrong. For eg: I am using line[31:] for Acc Nbr. But if the address change then my slicing will also go wrong
My Text.txt
2014-11-09 BOX 1531 20140908123456 RUT 21 654321 0123
Girry S.A. CONTADO
G 5 Y Serie A
NO 098765
11 al Rayo 321 - Oqwerty 108 Monteaudio - Gruguay
Pharm Cosco, Inc - Britania PO Box 43215
Dirección Hot Springs AR 71903 - Estados Unidos
Oescripción Importe
US$
DO 7640183 - 50% of the Production Degree 246,123
Beneficiary Bank: Bankue Heritage (Gruguay) S.A Account Nbr: 1234563 Swift: MANIUYMM
Adress: Tencon 108 Monteaudio, Gruguay.
Beneficiary: Girry SA Acc Nbr: 1234567
Servicios prestados en el exterior, exentos de IVA o IRAE
Subtotal US$ 102,500
Iva US$ ---------------
Total US$ 102,500
I.V.A AL DIA Fecha de Vencimiento
IMPRENTA IRIS LTDA. - RUT 210161234015 - 0/40987 17/11/2015
CONSTANCIA N9 1234559842 -04/2013
CONTADO A 000.001/ A 000.050 x 2 VIAS
QWERTYAS ZXCVBIZADA
R. U.T. Bamprador Asdfumldor Final
Fecha 12/12/2014
1º ORIGINAL CLLLTE (Blanco) 2º CASIA AQWERVO (Rosasd)
My Code:
txt = 'Text.txt'
lines = [line.rstrip('\n') for line in open(txt)]
for line in lines:
if 'BOX' in line:
Date = line.split("BOX")[0]
BOX = line.split('BOX ', 1)[-1].split("RUT")[0]
RUT = line.split('RUT ',1)[-1]
print 'Date : ' + Date
print 'BOX : ' + BOX
print 'RUT : ' + RUT
if 'Adress' in line:
Adress = line[8:]
print 'Adress : ' + Adress
if 'NO ' in line:
Invoice_No = line.split('NO ',1)[-1]
print 'Invoice_No : ' + Invoice_No
if 'Swift:' in line:
Swift = line.split('Swift: ',1)[-1]
print 'Swift : ' + Swift
if 'Fecha' in line and '/' in line:
Invoice_Date = line.split('Fecha ',1)[-1]
print 'Invoice_Date : ' + Invoice_Date
if 'Beneficiary Bank' in line:
Beneficiary_Bank = line[18:]
Ben_Acc_Nbr = line.split('Nbr: ', 1)[-1]
print 'Beneficiary_Bank : ' + Beneficiary_Bank.split("Acc")[0]
print 'Ben_Acc_Nbr : ' + Ben_Acc_Nbr.split("Swift")[0]
if 'Beneficiary' in line and 'Beneficiary Bank' not in line:
Beneficiary = line[13:]
print 'Beneficiary : ' + Beneficiary.split("Acc")[0]
if 'Acc Nbr' in line:
Acc_Nbr = line.split('Nbr: ', 1)[-1]
print 'Acc_Nbr : ' + Acc_Nbr
if 'Total US$' in line:
Total_US = line.split('US$ ', 1)[-1]
print 'Total_US : ' + Total_US
Output:
Date : 2014-11-09
BOX : 1531 20140908123456
RUT : 21 654321 0123
Invoice_No : 098765
Swift : MANIUYMM
Beneficiary_Bank : Bankue Heritage (Gruguay) S.A
Ben_Acc_Nbr : 1234563
Adress : Tencon 108 Monteaudio, Gruguay.
Beneficiary : Girry SA
Acc_Nbr : 1234567
Total_US : 102,500
Invoice_Date : 12/12/2014
Some Code Changes
I have made some changes but still I am not convinced as I need to provide spaces also in split.
I would recommend you to use regular expressions to extract information you need. It helps to avoid the calculation of the numbers of offset characters.
import re
with open('C:\Quad.txt') as f:
for line in f:
match = re.search(r"Acc Nbr: (.*?)", line)
if match is not None:
Acc_Nbr = match.group(1)
print Acc_Nbr
# etc...
you can search to obtain index of it. for example:
if 'Acc Nbr' in line:
Acc_Nbr = line[line.find("Acc Nbr") + 10:]
print Acc_Nbr
note that find gives you index of first char of item you searched.

how to get values from the keys in dictionary

I have some code for car registration for parking. I have created a dictionary with car registration number as keys and rest information as values. I am trying to get details (values) of each registration by entering the registration number. Even if the id is in the dictionary it's showing message not found in the dictionary.
global variable
data_dict = {}
def createNameDict(filename):
path = "C:\Users\user\Desktop"
basename = "ParkingData_Part2.txt"
filename = path + "\\" + basename
file = open(filename)
contents = file.read()
print contents,"\n"
data_list = [lines.split(",",1) for lines in contents.split("\n")]
#data_list.sort()
#print data_list
#dict_list = []
for line in data_list:
keys = line[0]
values = line[1]
data_dict[keys] = values
print data_dict,"\n"
print data_dict.keys(),"\n"
print data_dict.values(),"\n"
print data_list
def detailForRegistrationNumber(regNumber):
regNumber == "keys"
if regNumber in data_dict:
print data_dict[regNumber]
else:
print regNumber, "Not in dictionary"
The error message I am getting is:
======= Loading Progam =======
>>> detailForRegistrationNumber('EDF768')
EDF768 Not in dictionary
But the dictionary has the above registration number:
{'HUY768': ' Wilbur Matty, 8912, Creche_Parking', 'GH7682': ' Clara Hill, 7689, AgHort_Parking', 'GEF123': ' Jill Black, 3456, Creche_Parking', 'WER546': ' Olga Grey, 9898, Creche_Parking', 'TY5678': ' Jane Miller, 8987, AgHort_Parking', 'ABC234': ' Fred Greenside, 2345, AgHort_Parking', 'KLOI98': ' Martha Miller, 4563, Vet_Parking', **'EDF768'**: ' Bill Meyer, 2456, Vet_Parking', 'JU9807': ' Jacky Blair, 7867, Vet_Parking', 'DF7800': ' Jacko Frizzle, 4532, Creche_Parking', 'ADF645': ' Cloe Freckle, 6789, Vet_Parking'}
I think your problem is that your function def createNameDict(filename): doesn't return anything, so the data_dict inside it is just a local variable!
Make the last line of the function return data_dict and then use it like data_dict = createNameDict(filename). There is no need for the global variable part, so just remove that.

how to rename key value in python

how can i rename key value in python?
i have this code :
t = { u'last_name': [u'hbkjh'], u'no_of_nights': [u'1'], u'check_in': [u'2012-03-19'], u'no_of_adult': [u'', u'1'], u'csrfmiddlewaretoken': [u'05e5bdb542c3be7515b87e8160c347a0'], u'memo': [u'kjhbn'], u'totalcost': [u'1800.0'], u'product': [u'4'], u'exp_month': [u'1'], u'quantity': [u'2'], u'price': [u'900.0'], u'first_name': [u'sdhjb'], u'no_of_kid': [u'', u'0'], u'exp_year': [u'2012'], u'check_out': [u'2012-03-20'], u'email': [u'ebmalifer#agile.com.ph'], u'contact': [u'3546576'], u'extra_test1': [u'jknj'], u'extra_test2': [u'jnjl'], u'security_code': [u'3245'], u'extra_charged': [u'200.0']}
key = {str(k): str(v[0]) for k,v in t.iteritems() if k.startswith('extra_')}
array = []
for val in key:
data = str(val) + ' = ' + key[val] + ','
array.append(data)
print array
it give me this :
["extra_charged = 200.0,", "extra_test1 = jknj,", "extra_test2 = jnjl,"]
what should i do to remove the 'extra_' and it makes the output like this:
["CHARGED = 200.0,", "TEST1 = jknj,", "TEST2 = jnjl,"]
can anyone have an idea about my case?
thanks in advance ...
So, array indexing can strip off the first 6 characters, and upper() should uppercase it.
Replace that one data= line with:
data = str(val)[6:].upper() + ' = ' + key[val] + ','
that should work.
i found this .replace()
and i do like this ..
data = str(val).replace("extra_","").upper() + ' = ' + key[val] + ','

Categories