I am using a tuple to store the output of a find -exec stat command and need to condense it in order to run du on it. The output is a tuple with each item being (username,/path/to/file)
I want to condense it to combine like usernames so the end result is (username,/path/to/file1,/path/to/file2,etc)
Is there any way to do this?
Here is the current code that returns my tuple
cmd = ['find',dir_loc,'-type','f','-exec','stat','-c','%U %n','{}','+']
process = Popen(cmd,stdout=PIPE)
find_out = process.communicate()
exit_code = process.wait()
find_out = find_out[0].split('\n')
out_tuple = []
for item in find_out:
out_tuple.append(item.split(' '))
Assuming you have a list of tuples or a list of lists of the form:
out_tuple = [('user_one', 'path_one'),
('user_three', 'path_seven'),
('user_two', 'path_five'),
('user_one', 'path_two'),
('user_one', 'path_three'),
('user_two', 'path_four')]
You can do:
from itertools import groupby
out_tuple.sort()
total_grouped = []
for key, group in groupby(out_tuple, lambda x: x[0]):
grouped_list = [key] + [x[1] for x in group]
total_grouped.append(tuple(grouped_list))
This will give you the list of tuples:
print total_grouped
# Prints:
# [('user_one', 'path_one', 'path_two', 'path_three'),
# ('user_three', 'path_seven'),
# ('user_two', 'path_five', 'path_four')]
If you started with a list of lists, then instead of:
total_grouped.append(tuple(grouped_list))
You can get rid of the tuple construction:
total_grouped.append(grouped_list)
I'll say one thing though, you might be better off using something like a dict as #BradBeattie suggests. If you're going to perform some operation later on that treats the first item in your tuple (or list) in a special way, then a dict is better.
It not only has a notion of uniqueness in the keys, it's also less cumbersome because the nesting has two distinct levels. First you have the dict, then you have the inner item which is a tuple (or a list). This is much clearer than having two similar collections nested one inside the other.
Just use a dict of lists:
out_tuple = [('user1', 'path1'),
('user1', 'path2'),
('user2', 'path3'),
('user1', 'path4'),
('user2', 'path5'),
('user1', 'path6')]
d={}
for user_name, path in out_tuple:
d.setdefault(user_name, []).append(path)
print d
Prints:
{'user2': ['path3', 'path5'], 'user1': ['path1', 'path2', 'path4', 'path6']}
Then if you want the output for each user name as a tuple:
for user_name in d:
print tuple([user_name]+d[user_name])
Prints:
('user2', 'path3', 'path5')
('user1', 'path1', 'path2', 'path4', 'path6')
Related
I'm parsing through a response of XML using xpath from lxml library.
I'm getting the results and creating lists out of them like below:
object_name = [o.text for o in response.xpath('//*[name()="objectName"]')]
object_size_KB = [o.text for o in response.xpath('//*[name()="objectSize"]')]
I want to use the lists to create a dictionary per element in list and then add them to a final list like this:
[{'object_name': 'file1234', 'object_size_KB': 9347627},
{'object_name': 'file5671', 'objeobject_size_KBt_size': 9406875}]
I wanted a generator because I might need to search for more metadata from the response in the future so I want my code to be future proof and reduce repetition:
meta_names = {
'object_name': '//*[name()="objectName"]',
'object_size_KB': '//*[name()="objectSize"]'
}
def parse_response(response, meta_names):
"""
input: response: api xml response text from lxml xpath
input: meta_names: key names used to generate dictionary per object
return: list of objects dictionary
"""
mylist = []
# create list of each xpath match assign them to variables
for key, value in meta_names.items():
mylist.append({key: [o.text for o in response.xpath(value)]})
return mylist
However the function gives me this:
[{'object_name': ['file1234', 'file5671']}, {'object_size_KB': ['9347627', '9406875']}]
I've been searching for a similar case in the forums but couldn't find something to match my needs.
Appreciate your help.
UPDATE: Renneys answer was what I wanted I just adjusted the length value of range of my results since I don't always have the same length of xpath per object key and since my lists have identical length everytime I picked first index [0].
now the function looks like this.
def create_entries(root, keys):
tmp = []
for key in keys:
tmp.append([o.text for o in root.xpath('//*[name()="' + key + '"]')])
ret = []
# print(len(tmp[0]))
for i in range(len(tmp[0])):
add = {}
for j in range(len(keys)):
add[keys[j]] = tmp[j][i]
ret.append(add)
return ret
Use a two dimensional array:
def createEntries(root, keys):
tmp = []
for key in keys:
tmp.append([o.text for o in root.xpath('//*[name()="' + key + '"]')])
ret = []
for i in range(len(tmp)):
add = {}
for j in range(len(keys)):
add[keys[j]] = tmp[j][i]
ret.append(add)
return ret
I think this is what you are looking for.
You can use zip to combine your two lists into a list of value pairs.
Then, you can use a list comprehension or a generator expression to pair your value pairs with your desired keys.
import pprint
object_name = ['file1234', 'file5671']
object_size = [9347627, 9406875]
[{'object_name': 'file1234', 'object_size_KB': 9347627},
{'object_name': 'file5671', 'objeobject_size_KBt_size': 9406875}]
[{'object_name': ['file1234', 'file5671']}, {'object_size_KB': ['9347627', '9406875']}]
# List Comprehension
obj_list = [{'object_name': name, 'object_size': size} for name,size in zip(object_name,object_size)]
pprint.pprint(obj_list)
print('\n')
# Generator Expression
generator = ({'object_name': name, 'object_size': size} for name,size in zip(object_name,object_size))
for obj in generator:
print(obj)
Live Code Example -> https://onlinegdb.com/SyNSwd7jU
I think the accepted answer is more efficient, but here's an example of how list comprehensions could be used.
meta_names = {
'object_name': ['file1234', 'file5671'],
'object_size_KB': ['9347627', '9406875'],
'object_text': ['Bob', 'Ross']
}
def parse_response(meta_names):
"""
input: response: api xml response text from lxml xpath
input: meta_names: key names used to generate dictionary per object
return: list of objects dictionary
"""
# List comprehensions
to_dict = lambda l: [{key:val for key,val in pairs} for pairs in l]
objs = list(zip(*list([[key,val] for val in vals] for key,vals in meta_names.items())))
pprint.pprint(to_dict(objs))
parse_response(meta_names)
Live Code -> https://onlinegdb.com/ryLq4PVjL
I have the following list:
lines
['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North' ]
I would like to couple them in a tuple list as follows, with respect to their names:
tuple_list
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I thought maybe I could do a string search in the elements of the lines but it wont be efficient. Is there a better way to order lines elements in a way which would look like tuple_list
Paring Criteria:
If the both elements have the same Area_name: ('North', 'Mid', 'South')
E.g.: 'line_North_Mid' should be coupled with 'line_Mid_North'
Try this:
from itertools import combinations
tuple_list = [i for i in combinations(lines,2) if i[0].split('_')[1] == i[1].split('_')[2] and i[0].split('_')[2] == i[1].split('_')[1]]
or I think this is better:
[i for i in combinations(lines,2) if i[0].split('_')[1:] == i[1].split('_')[1:][::-1]]
An order-agnostic O(n) solution is possible using collections.defaultdict. The idea is to use as our dictionary keys the last 2 components of your strings delimited by '_', appending values from your input list. Then extract values and convert to a list of tuples.
from collections import defaultdict
L = ['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North', 'line_Mid_North']
dd = defaultdict(list)
for item in L:
dd[frozenset(item.rsplit('_', maxsplit=2)[1:])].append(item)
res = list(map(tuple, dd.values()))
# [('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North')]
You can use the following list comprehension:
lines = ['line_Mid_North', 'line_North_Mid',
'line_North_South', 'line_South_North',
'line_Mid_South', 'line_South_Mid']
[(j,i) for i in lines for j in lines if j not in i
if set(j.split('_')[1:]) < set(i.split('_'))][::2]
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I suggest you have a function that returns the same key for string that are supposed to be together (a grouping-key).
def key(s):
# ignore first part and sort other 2 parts, so they will always be in same order
_, part_1, part_2 = s.split('_')
return tuple(sorted([part_1, part_2]))
The you have to use some grouping method; I used defaultdict for example:
import collections
lines = [
'line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North',
]
dd = collections.defaultdict(list)
for s in lines:
dd[key(s)].append(s) # those with same key get grouped
print(list(tuple(v) for v in dd.values()))
# [
# ('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North'),
# ]
I found similar question, but I'm not able to convert answer to match my needs.
(Find if value exists in multiple lists)
So, basicly, I have multiple lists, and I want to list all of them, which contain current user username.
import getpass
value = getpass.getuser()
rep_WOHTEL = ['user1','user2','user3']
rep_REPDAY = ['user4','user1','user3']
rep_ZARKGL = ['user3','user1','user2']
rep_WOHOPL = ['user3','user2','user5']
#No idea how code below works
w = next(n for n,v in filter(lambda t: isinstance(t[1],list) and t[0].startswith('rep_'), globals().items()) if value in v)
print(w)
If current user is user1, I want it to print rep_WOHTEL, rep_REPDAY and rep_ZARKGL. Code above print only ony of them.
How should I change this part of script, to print all I want?
Like I commented in the linked question, iterating through all of globals() or locals() is a bad idea. Store your lists together in a single dictionary or list, and iterate through that instead.
value = "user1"
named_lists = {
"WOHTEL": ['user1','user2','user3'],
"REPDAY": ['user4','user1','user3'],
"ZARKGL": ['user3','user1','user2'],
"WOHOPL": ['user3','user2','user5']
}
names = [name for name, seq in named_lists.items() if value in seq]
print(names)
Result:
['REPDAY', 'ZARKGL', 'WOHTEL']
Checking if value is in all global lists, and if true, print which list(s) contains the required value.
Code:
rep_WOHTEL = ['user1','user2','user3']
rep_REPDAY = ['user4','user1','user3']
rep_ZARKGL = ['user3','user1','user2']
rep_WOHOPL = ['user3','user2','user5']
value = 'user1'
x = globals().items()
for n,v in filter(lambda t: isinstance(t[1],list) and t[0].startswith('rep_'), x):
if value in v:
print(n)
Output:
rep_REPDAY
rep_ZARKGL
rep_WOHTEL
More info about the used functions:
globals()
dict.items()
filter()
isinstance()
startswith()
I have an application that creates a list of lists. The second element in the list needs to be assigned using lookup list which also consists of a list of lists.
I have used the "all" method to match the values in the list. If the list value exists in the lookup list, it should update the second position element in the new list. However this is not the case. The == comparative yields a False match for all elements, even though they all exist in both lists.
I have also tried various combinations of index finding commands but they are not able to unpack the values of each list.
My code is below. The goal is to replace the "xxx" values in the newData with the numbers in the lookupList.
lookupList= [['Garry','34'],['Simon', '24'] ,['Louise','13'] ]
newData = [['Louise','xxx'],['Garry', 'xxx'] ,['Simon','xxx'] ]
#Matching values
for i in newData:
if (all(i[0] == elem[0] for elem in lookupList)):
i[1] = elem[1]
You can't do what you want with all(), because elem is not a local variable outside of the generator expression.
Instead of using a list, use a dictionary to store the lookupList:
lookupDict = dict(lookupList)
and looking up matches is a simple constant-time (fast) lookup:
for entry in newData:
if entry[0] in lookupDict:
entry[1] = lookupDict[entry[0]]
you should use dictionaries instead, like this:
lookupList = newData = {}
old_lookupList = [['Garry','34'],['Simon', '24'] ,['Louise','13'] ]
old_newData = [['Louise','xxx'],['Garry', 'xxx'] ,['Simon','xxx'] ]
#convert into dictionary
for e in old_newData: newData[e[0]] = e[1]
for e in old_lookupList: lookupList[e[0]] = e[1]
#Matching values
for key in lookupList:
if key in newData.keys():
newData[key]=lookupList[key]
#convert into list
output_list = []
for x in newData:
output_list.append([x, newData[x]])
I like the following code since it can be tweaked and used in different ways:
lookupList= [ ['Garry', '34'],['Simon', '24'] ,['Louise', '13'] ]
newData = [ ['Louise', 'xxx'],['Garry', 'xxx'], ['Peter', 'xxx'] ,['Simon', 'xxx'] ]
#Matching values
for R in newData:
for i in range(0, len(lookupList) + 1):
try:
if lookupList[i][0] == R[0]:
R[1] = lookupList[i][1]
break
except:
print('Lookup fail on record:', R)
print(newData)
I'm trying to get the matching IDs and store the data into one list. I have a list of dictionaries:
list = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
Expected output would be something like
# {'id':'123','name':'Jason','location': ['McHale', 'Tompson Hall']},
# {'id':'432','name':'Tom','location': 'Sydney'},
How can I get matching data based on dict ID value? I've tried:
for item in mylist:
list2 = []
row = any(list['id'] == list.id for id in list)
list2.append(row)
This doesn't work (it throws: TypeError: tuple indices must be integers or slices, not str). How can I get all items with the same ID and store into one dict?
First, you're iterating through the list of dictionaries in your for loop, but never referencing the dictionaries, which you're storing in item. I think when you wrote list[id] you mean item[id].
Second, any() returns a boolean (true or false), which isn't what you want. Instead, maybe try row = [dic for dic in list if dic['id'] == item['id']]
Third, if you define list2 within your for loop, it will go away every iteration. Move list2 = [] before the for loop.
That should give you a good start. Remember that row is just a list of all dictionaries that have the same id.
I would use kdopen's approach along with a merging method after converting the dictionary entries I expect to become lists into lists. Of course if you want to avoid redundancy then make them sets.
mylist = [
{'id':'123','name':['Jason'],'location': ['McHale']},
{'id':'432','name':['Tom'],'location': ['Sydney']},
{'id':'123','name':['Jason'],'location':['Tompson Hall']}
]
def merge(mylist,ID):
matches = [d for d in mylist if d['id']== ID]
shell = {'id':ID,'name':[],'location':[]}
for m in matches:
shell['name']+=m['name']
shell['location']+=m['location']
mylist.remove(m)
mylist.append(shell)
return mylist
updated_list = merge(mylist,'123')
Given this input
mylist = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
You can just extract it with a comprehension
matched = [d for d in mylist if d['id'] == '123']
Then you want to merge the locations. Assuming matched is not empty
final = matched[0]
final['location'] = [d['location'] for d in matched]
Here it is in the interpreter
In [1]: mylist = [
...: {'id':'123','name':'Jason','location': 'McHale'},
...: {'id':'432','name':'Tom','location': 'Sydney'},
...: {'id':'123','name':'Jason','location':'Tompson Hall'}
...: ]
In [2]: matched = [d for d in mylist if d['id'] == '123']
In [3]: final=matched[0]
In [4]: final['location'] = [d['location'] for d in matched]
In [5]: final
Out[5]: {'id': '123', 'location': ['McHale', 'Tompson Hall'], 'name': 'Jason'}
Obviously, you'd want to replace '123' with a variable holding the desired id value.
Wrapping it all up in a function:
def merge_all(df):
ids = {d['id'] for d in df}
result = []
for id in ids:
matches = [d for d in df if d['id'] == id]
combined = matches[0]
combined['location'] = [d['location'] for d in matches]
result.append(combined)
return result
Also, please don't use list as a variable name. It shadows the builtin list class.