I have the following list:
lines
['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North' ]
I would like to couple them in a tuple list as follows, with respect to their names:
tuple_list
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I thought maybe I could do a string search in the elements of the lines but it wont be efficient. Is there a better way to order lines elements in a way which would look like tuple_list
Paring Criteria:
If the both elements have the same Area_name: ('North', 'Mid', 'South')
E.g.: 'line_North_Mid' should be coupled with 'line_Mid_North'
Try this:
from itertools import combinations
tuple_list = [i for i in combinations(lines,2) if i[0].split('_')[1] == i[1].split('_')[2] and i[0].split('_')[2] == i[1].split('_')[1]]
or I think this is better:
[i for i in combinations(lines,2) if i[0].split('_')[1:] == i[1].split('_')[1:][::-1]]
An order-agnostic O(n) solution is possible using collections.defaultdict. The idea is to use as our dictionary keys the last 2 components of your strings delimited by '_', appending values from your input list. Then extract values and convert to a list of tuples.
from collections import defaultdict
L = ['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North', 'line_Mid_North']
dd = defaultdict(list)
for item in L:
dd[frozenset(item.rsplit('_', maxsplit=2)[1:])].append(item)
res = list(map(tuple, dd.values()))
# [('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North')]
You can use the following list comprehension:
lines = ['line_Mid_North', 'line_North_Mid',
'line_North_South', 'line_South_North',
'line_Mid_South', 'line_South_Mid']
[(j,i) for i in lines for j in lines if j not in i
if set(j.split('_')[1:]) < set(i.split('_'))][::2]
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I suggest you have a function that returns the same key for string that are supposed to be together (a grouping-key).
def key(s):
# ignore first part and sort other 2 parts, so they will always be in same order
_, part_1, part_2 = s.split('_')
return tuple(sorted([part_1, part_2]))
The you have to use some grouping method; I used defaultdict for example:
import collections
lines = [
'line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North',
]
dd = collections.defaultdict(list)
for s in lines:
dd[key(s)].append(s) # those with same key get grouped
print(list(tuple(v) for v in dd.values()))
# [
# ('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North'),
# ]
Related
Just having trouble with itertools.groupby. Given a list of dictionaries,
my_list= [
"AD01", "AD01AA", "AD01AB", "AD01AC", "AD01AD","AD02", "AD02AA", "AD02AB", "AD02AC"]
from this list, I expected to create a dictionary, where the key is the shortest name and the values are the longest names
example
[
{"Legacy" : "AD01", "rphy" : ["AD01AA", "AD01AB", "AD01AC", "AD01AD"]},
{"Legacy" : "AD02", "rphy" : ["AD02AA", "AD02AB", "AD02AC"]},
]
could you help me please
You can use itertools.groupby, with some nexts:
from itertools import groupby
my_list= ["AD01", "AD01AA", "AD01AB", "AD01AC", "AD01AD","AD02", "AD02AA", "AD02AB", "AD02AC"]
groups = groupby(my_list, len)
output = [{'Legacy': next(g), 'rphy': list(next(groups)[1])} for _, g in groups]
print(output)
# [{'Legacy': 'AD01', 'rphy': ['AD01AA', 'AD01AB', 'AD01AC', 'AD01AD']},
# {'Legacy': 'AD02', 'rphy': ['AD02AA', 'AD02AB', 'AD02AC']}]
This is not robust to reordering of the input list.
Also, if there is some "gap" in the input, e.g., if "AD01" does not have corresponding 'rphy' entries, then it will throw a StopIteration error as you have found out. In that case you can use a more conventional approach:
from itertools import groupby
my_list= ["AD01", "AD02", "AD02AA", "AD02AB", "AD02AC"]
output = []
for item in my_list:
if len(item) == 4:
dct = {'Legacy': item, 'rphy': []}
output.append(dct)
else:
dct['rphy'].append(item)
print(output)
# [{'Legacy': 'AD01', 'rphy': []}, {'Legacy': 'AD02', 'rphy': ['AD02AA', 'AD02AB', 'AD02AC']}]
One approach would be: (see the note at the end of the answer)
from itertools import groupby
from pprint import pprint
my_list = [
"AD01",
"AD01AA",
"AD01AB",
"AD01AC",
"AD01AD",
"AD02",
"AD02AA",
"AD02AB",
"AD02AC",
]
res = []
for _, g in groupby(my_list, len):
lst = list(g)
if len(lst) == 1:
res.append({"Legacy": lst[0], "rphy": []})
else:
res[-1]["rphy"].append(lst)
pprint(res)
output:
[{'Legacy': 'AD01', 'rphy': [['AD01AA', 'AD01AB', 'AD01AC', 'AD01AD']]},
{'Legacy': 'AD02', 'rphy': [['AD02AA', 'AD02AB', 'AD02AC']]}]
This assumes that your data always starts with your desired key(the name which has the smallest name compare to the next values).
Basically in every iteration you check then length of the created list from groupby. If it is 1, this mean it's your key, if not, it will add the next items to the dictionary.
Note: This code would break if there aren't at least 2 names with the length larger than the keys between two keys.
I have a list of strings that goes like this:
1;213;164
2;213;164
3;213;164
4;213;164
5;213;164
6;213;164
7;213;164
8;213;164
9;145;112
10;145;112
11;145;112
12;145;112
13;145;112
14;145;112
15;145;112
16;145;112
17;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
I would like to remove all duplicates where second 2 numbers are the same. So after running it through program I would get something like this:
1;213;164
9;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
But something like
8;213;164
15;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
would also be correct.
Here is a nice and fast trick you can use (assuming l is your list):
list({ s.split(';', 1)[1] : s for s in l }.values())
No need to import anything, and fast as can be.
In general you can define:
def custom_unique(L, keyfunc):
return list({ keyfunc(li): li for li in L }.values())
You can group the items by this key and then use the first item in each group (assuming l is your list).
import itertools
keyfunc = lambda x: x.split(";", 1)[1]
[next(g) for k, g in itertools.groupby(sorted(l, key=keyfunc), keyfunc)]
Here is a code on the few first items, just switch my list with yours:
x = [
'7;213;164',
'8;213;164',
'9;145;112',
'10;145;112',
'11;145;112',
]
new_list = []
for i in x:
check = True
s_part = i[i.find(';'):]
for j in new_list:
if s_part in j:
check = False
if check == True:
new_list.append(i)
print(new_list)
Output:
['7;213;164', '9;145;112']
I'm parsing through a response of XML using xpath from lxml library.
I'm getting the results and creating lists out of them like below:
object_name = [o.text for o in response.xpath('//*[name()="objectName"]')]
object_size_KB = [o.text for o in response.xpath('//*[name()="objectSize"]')]
I want to use the lists to create a dictionary per element in list and then add them to a final list like this:
[{'object_name': 'file1234', 'object_size_KB': 9347627},
{'object_name': 'file5671', 'objeobject_size_KBt_size': 9406875}]
I wanted a generator because I might need to search for more metadata from the response in the future so I want my code to be future proof and reduce repetition:
meta_names = {
'object_name': '//*[name()="objectName"]',
'object_size_KB': '//*[name()="objectSize"]'
}
def parse_response(response, meta_names):
"""
input: response: api xml response text from lxml xpath
input: meta_names: key names used to generate dictionary per object
return: list of objects dictionary
"""
mylist = []
# create list of each xpath match assign them to variables
for key, value in meta_names.items():
mylist.append({key: [o.text for o in response.xpath(value)]})
return mylist
However the function gives me this:
[{'object_name': ['file1234', 'file5671']}, {'object_size_KB': ['9347627', '9406875']}]
I've been searching for a similar case in the forums but couldn't find something to match my needs.
Appreciate your help.
UPDATE: Renneys answer was what I wanted I just adjusted the length value of range of my results since I don't always have the same length of xpath per object key and since my lists have identical length everytime I picked first index [0].
now the function looks like this.
def create_entries(root, keys):
tmp = []
for key in keys:
tmp.append([o.text for o in root.xpath('//*[name()="' + key + '"]')])
ret = []
# print(len(tmp[0]))
for i in range(len(tmp[0])):
add = {}
for j in range(len(keys)):
add[keys[j]] = tmp[j][i]
ret.append(add)
return ret
Use a two dimensional array:
def createEntries(root, keys):
tmp = []
for key in keys:
tmp.append([o.text for o in root.xpath('//*[name()="' + key + '"]')])
ret = []
for i in range(len(tmp)):
add = {}
for j in range(len(keys)):
add[keys[j]] = tmp[j][i]
ret.append(add)
return ret
I think this is what you are looking for.
You can use zip to combine your two lists into a list of value pairs.
Then, you can use a list comprehension or a generator expression to pair your value pairs with your desired keys.
import pprint
object_name = ['file1234', 'file5671']
object_size = [9347627, 9406875]
[{'object_name': 'file1234', 'object_size_KB': 9347627},
{'object_name': 'file5671', 'objeobject_size_KBt_size': 9406875}]
[{'object_name': ['file1234', 'file5671']}, {'object_size_KB': ['9347627', '9406875']}]
# List Comprehension
obj_list = [{'object_name': name, 'object_size': size} for name,size in zip(object_name,object_size)]
pprint.pprint(obj_list)
print('\n')
# Generator Expression
generator = ({'object_name': name, 'object_size': size} for name,size in zip(object_name,object_size))
for obj in generator:
print(obj)
Live Code Example -> https://onlinegdb.com/SyNSwd7jU
I think the accepted answer is more efficient, but here's an example of how list comprehensions could be used.
meta_names = {
'object_name': ['file1234', 'file5671'],
'object_size_KB': ['9347627', '9406875'],
'object_text': ['Bob', 'Ross']
}
def parse_response(meta_names):
"""
input: response: api xml response text from lxml xpath
input: meta_names: key names used to generate dictionary per object
return: list of objects dictionary
"""
# List comprehensions
to_dict = lambda l: [{key:val for key,val in pairs} for pairs in l]
objs = list(zip(*list([[key,val] for val in vals] for key,vals in meta_names.items())))
pprint.pprint(to_dict(objs))
parse_response(meta_names)
Live Code -> https://onlinegdb.com/ryLq4PVjL
I'm trying to get the matching IDs and store the data into one list. I have a list of dictionaries:
list = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
Expected output would be something like
# {'id':'123','name':'Jason','location': ['McHale', 'Tompson Hall']},
# {'id':'432','name':'Tom','location': 'Sydney'},
How can I get matching data based on dict ID value? I've tried:
for item in mylist:
list2 = []
row = any(list['id'] == list.id for id in list)
list2.append(row)
This doesn't work (it throws: TypeError: tuple indices must be integers or slices, not str). How can I get all items with the same ID and store into one dict?
First, you're iterating through the list of dictionaries in your for loop, but never referencing the dictionaries, which you're storing in item. I think when you wrote list[id] you mean item[id].
Second, any() returns a boolean (true or false), which isn't what you want. Instead, maybe try row = [dic for dic in list if dic['id'] == item['id']]
Third, if you define list2 within your for loop, it will go away every iteration. Move list2 = [] before the for loop.
That should give you a good start. Remember that row is just a list of all dictionaries that have the same id.
I would use kdopen's approach along with a merging method after converting the dictionary entries I expect to become lists into lists. Of course if you want to avoid redundancy then make them sets.
mylist = [
{'id':'123','name':['Jason'],'location': ['McHale']},
{'id':'432','name':['Tom'],'location': ['Sydney']},
{'id':'123','name':['Jason'],'location':['Tompson Hall']}
]
def merge(mylist,ID):
matches = [d for d in mylist if d['id']== ID]
shell = {'id':ID,'name':[],'location':[]}
for m in matches:
shell['name']+=m['name']
shell['location']+=m['location']
mylist.remove(m)
mylist.append(shell)
return mylist
updated_list = merge(mylist,'123')
Given this input
mylist = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
You can just extract it with a comprehension
matched = [d for d in mylist if d['id'] == '123']
Then you want to merge the locations. Assuming matched is not empty
final = matched[0]
final['location'] = [d['location'] for d in matched]
Here it is in the interpreter
In [1]: mylist = [
...: {'id':'123','name':'Jason','location': 'McHale'},
...: {'id':'432','name':'Tom','location': 'Sydney'},
...: {'id':'123','name':'Jason','location':'Tompson Hall'}
...: ]
In [2]: matched = [d for d in mylist if d['id'] == '123']
In [3]: final=matched[0]
In [4]: final['location'] = [d['location'] for d in matched]
In [5]: final
Out[5]: {'id': '123', 'location': ['McHale', 'Tompson Hall'], 'name': 'Jason'}
Obviously, you'd want to replace '123' with a variable holding the desired id value.
Wrapping it all up in a function:
def merge_all(df):
ids = {d['id'] for d in df}
result = []
for id in ids:
matches = [d for d in df if d['id'] == id]
combined = matches[0]
combined['location'] = [d['location'] for d in matches]
result.append(combined)
return result
Also, please don't use list as a variable name. It shadows the builtin list class.
I have a some variables and I need to compare each of them and fill three lists according the comparison, if the var == 1 add a 1 to lista_a, if var == 2 add a 1 to lista_b..., like:
inx0=2 inx1=1 inx2=1 inx3=1 inx4=4 inx5=3 inx6=1 inx7=1 inx8=3 inx9=1
inx10=2 inx11=1 inx12=1 inx13=1 inx14=4 inx15=3 inx16=1 inx17=1 inx18=3 inx19=1
inx20=2 inx21=1 inx22=1 inx23=1 inx24=2 inx25=3 inx26=1 inx27=1 inx28=3 inx29=1
lista_a=[]
lista_b=[]
lista_c=[]
#this example is the comparison for the first variable inx0
#and the same for inx1, inx2, etc...
for k in range(1,30):
if inx0==1:
lista_a.append(1)
elif inx0==2:
lista_b.append(1)
elif inx0==3:
lista_c.append(1)
I need get:
#lista_a = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
#lista_b = [1,1,1]
#lista_c = [1]
Your inx* variables should almost certinaly be a list to begin with:
inx = [2,1,1,1,4,3,1,1,3,1,2,1,1,1,4,3,1,1,3,1,2,1,1,1,2,3,1,1,3,1]
Then, to find out how many 2's it has:
inx.count(2)
If you must, you can build a new list out of that:
list_a = [1]*inx.count(1)
list_b = [1]*inx.count(2)
list_c = [1]*inx.count(3)
but it seems silly to keep a list of ones. Really the only data you need to keep is a single integer (the count), so why bother carrying around a list?
An alternate approach to get the lists of ones would be to use a defaultdict:
from collections import defaultdict
d = defaultdict(list)
for item in inx:
d[item].append(1)
in this case, what you want as list_a could be accessed by d[1], list_b could be accessed as d[2], etc.
Or, as stated in the comments, you could get the counts using a collections.Counter:
from collections import Counter #python2.7+
counts = Counter(inx)
list_a = [1]*counts[1]
list_b = [1]*counts[2]
...