Merge nested list items based on a repeating value

Merge nested list items based on a repeating value - python

Although poorly written, this code:
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
for i in range(len(marker_array)):
if marker_array[i-1][1] != marker_array[i][1]:
marker_array_DS.append(marker_array[i])
print marker_array_DS
Returns:
[['hard', '2', 'soft'], ['fast', '3'], ['turtle', '4', 'wet']]
It accomplishes part of the task which is to create a new list containing all nested lists except those that have duplicate values in index [1]. But what I really need is to concatenate the matching index values from the removed lists creating a list like this:
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
The values in index [1] must not be concatenated. I kind of managed to do the concatenation part using a tip from another post:
newlist = [i + n for i, n in zip(list_a, list_b]
But I am struggling with figuring out the way to produce the desired result. The "marker_array" list will be already sorted in ascending order before being passed to this code. All like-values in index [1] position will be contiguous. Some nested lists may not have any values beyond [0] and [1] as illustrated above.

Quick stab at it... use itertools.groupby to do the grouping for you, but do it over a generator that converts the 2 element list into a 3 element.
from itertools import groupby
from operator import itemgetter
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
def my_group(iterable):
temp = ((el + [''])[:3] for el in marker_array)
for k, g in groupby(temp, key=itemgetter(1)):
fst, snd = map(' '.join, zip(*map(itemgetter(0, 2), g)))
yield filter(None, [fst, k, snd])
print list(my_group(marker_array))

from collections import defaultdict
d1 = defaultdict(list)
d2 = defaultdict(list)
for pxa in marker_array:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2:])
res = [[' '.join(d1[x]), x, ' '.join(d2[x])] for x in sorted(d1)]
If you really need 2-tuples (which I think is unlikely):
for p in res:
if not p[-1]:
p.pop()

marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
marker_array_hit = []
for i in range(len(marker_array)):
if marker_array[i][1] not in marker_array_hit:
marker_array_hit.append(marker_array[i][1])
for i in marker_array_hit:
lists = [item for item in marker_array if item[1] == i]
temp = []
first_part = ' '.join([str(item[0]) for item in lists])
temp.append(first_part)
temp.append(i)
second_part = ' '.join([str(item[2]) for item in lists if len(item) > 2])
if second_part != '':
temp.append(second_part);
marker_array_DS.append(temp)
print marker_array_DS
I learned python for this because I'm a shameless rep whore

marker_array = [
['hard','2','soft'],
['heavy','2','light'],
['rock','2','feather'],
['fast','3'],
['turtle','4','wet'],
]
data = {}
for arr in marker_array:
if len(arr) == 2:
arr.append('')
(first, index, last) = arr
firsts, lasts = data.setdefault(index, [[],[]])
firsts.append(first)
lasts.append(last)
results = []
for key in sorted(data.keys()):
current = [
" ".join(data[key][0]),
key,
" ".join(data[key][1])
]
if current[-1] == '':
current = current[:-1]
results.append(current)
print results
--output:--
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]

A different solution based on itertools.groupby:
from itertools import groupby
# normalizes the list of markers so all markers have 3 elements
def normalized(markers):
for marker in markers:
yield marker + [""] * (3 - len(marker))
def concatenated(markers):
# use groupby to iterator over lists of markers sharing the same key
for key, markers_in_category in groupby(normalized(markers), lambda m: m[1]):
# get separate lists of left and right words
lefts, rights = zip(*[(m[0],m[2]) for m in markers_in_category])
# remove empty strings from both lists
lefts, rights = filter(bool, lefts), filter(bool, rights)
# yield the concatenated entry for this key (also removing the empty string at the end, if necessary)
yield filter(bool, [" ".join(lefts), key, " ".join(rights)])
The generator concatenated(markers) will yield the results. This code correctly handles the ['fast', '3'] case and doesn't return an additional third element in such cases.

Related

How can i create a dictionary from 2 lists with one as the key and the other as as the value with only loops? Without using zip() or enumerate()

I wanna achieve this without any libraries or special functions just loops. I wanna have a main program that takes in 2 inputs which are the 2 lists and returns the dictionary like shown below.
Please enter the item names: Cans, bottles, boxes, jugs
please enter quantities : 20,34,10
output : {'Cans':'20','bottles':'34','boxes':'10','jugs':'0'}
If the list of items is longer than the quantities then the quantity becomes automatically 0 as it did with the jugs above.
If the List of Quantity is longer than the items list then the item should automatically become 'unknown object_1' with the number changing accordingly.

Split with comma as delimiter. Fill values with zero for a number of iterations equal to the difference in length between keys and values.
Then use dict comprehension to build your dict. This with the zip built-in function.
keys = 'a,b,c,d'
values = '1,2,3'
keys = keys.split(',')
values = values.split(',')
for i in range(len(keys) - len(values)):
values.append('0')
dct = {}
for i in range(len(keys)):
dct[keys[i]] = values[i]
print(dct)
Output:
{'a': '1', 'b': '2', 'c': '3', 'd': '0'}
This uses only built-in calls so it fits your requirements at best. At the OP requirements it is not using the zip function.

item_names = ['Cans', 'Bottles', 'boxes', 'jugs']
quantities = [20, 34, 10]
output_dict = {}
for i, item in enumerate(item_names):
if i > len(quantities) - 1:
output_dict.update({item : 0})
else:
output_dict.update({item : quantities[i]})

a = list(input().split(','))
b = list(map(int, input().split(',')))
res = {}
for i in range(len(a)):
res[a[i]] = b[i] if i < len(b) else 0
print(res)

list1 = ['cans','Bottles','Boxes','Jugs']
list2 = [1,2,3]
res = {}
for i, element in enumerate(list1):
try:
res[element] = list2[i]
except IndexError:
res[element] = 0
print(res)
Edited code without enumerate or zip:
list1 = ['cans','Bottles','Boxes','Jugs']
list2 = [1,2,3]
res = {}
i=0
for element in list1:
try:
res[element] = list2[i]
except IndexError:
res[element] = 0
i+=1
print(res)
```

python edit tuple duplicates in a list

my target is:
while for looping a list I would like to check for duplicates and if there are some i would like to append a number to it see following example
my list output as an example:
[('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
in a loop i would like to edit those duplicates so instead of the 2nd microsoft i would like to have microsoft1 (if there would be 3 microsoft guys so the third guy would have microsoft2)
with this i can filter the duplicates but i dont know how to edit them directly in the list
list = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
names = []
double = []
for u in list[1:]:
names.append(u[1])
list_size = len(names)
for i in range(list_size):
k = i + 1
for j in range(k, list_size):
if names[i] == names[j] and names[i] not in double:
double.append(names[i])

This is one approach using collections.defaultdict.
Ex:
from collections import defaultdict
lst = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
seen = defaultdict(int)
result = []
for k, v in lst:
if seen[v]:
result.append((k, "{}_{}".format(v, seen[v])))
else:
result.append((k,v))
seen[v] += 1
print(result)
Output:
[('name', 'company'),
('someguy', 'microsoft'),
('anotherguy', 'microsoft_1'),
('thirdguy', 'amazon')]

adding empty string while joining the 2 lists - Python

I have 2 lists
mainlist=[['RD-12',12,'a'],['RD-13',45,'c'],['RD-15',50,'e']] and
sublist=[['RD-12',67],['RD-15',65]]
if i join both the list based on 1st element condition by using below code
def combinelist(mainlist,sublist):
dict1 = { e[0]:e[1:] for e in mainlist }
for e in sublist:
try:
dict1[e[0]].extend(e[1:])
except:
pass
result = [ [k] + v for k, v in dict1.items() ]
return result
Its results in like below
[['RD-12',12,'a',67],['RD-13',45,'c',],['RD-15',50,'e',65]]
as their is no element in for 'RD-13' in sublist, i want to empty string on that.
The final output should be
[['RD-12',12,'a',67],['RD-13',45,'c'," "],['RD-15',50,'e',65]]
Please help me.

Your problem can be solved using a while loop to adjust the length of your sublists until it matches the length of the longest sublist by appending the wanted string.
for list in result:
while len(list) < max(len(l) for l in result):
list.append(" ")

You could just go through the result list and check where the total number of your elements is 2 instead of 3.
for list in lists:
if len(list) == 2:
list.append(" ")
UPDATE:
If there are more items in the sublist, just subtract the lists containing the 'keys' of your lists, and then add the desired string.
def combinelist(mainlist,sublist):
dict1 = { e[0]:e[1:] for e in mainlist }
list2 = [e[0] for e in sublist]
for e in sublist:
try:
dict1[e[0]].extend(e[1:])
except:
pass
for e in dict1.keys() - list2:
dict1[e].append(" ")
result = [[k] + v for k, v in dict1.items()]
return result

You can try something like this:
mainlist=[['RD-12',12],['RD-13',45],['RD-15',50]]
sublist=[['RD-12',67],['RD-15',65]]
empty_val = ''
# Lists to dictionaries
maindict = dict(mainlist)
subdict = dict(sublist)
result = []
# go through all keys
for k in list(set(list(maindict.keys()) + list(subdict.keys()))):
# pick the value from each key or a default alternative
result.append([k, maindict.pop(k, empty_val), subdict.pop(k, empty_val)])
# sort by the key
result = sorted(result, key=lambda x: x[0])
You can set up your empty value to whatever you need.
UPDATE
Following the new conditions, it would look like this:
mainlist=[['RD-12',12,'a'], ['RD-13',45,'c'], ['RD-15',50,'e']]
sublist=[['RD-12',67], ['RD-15',65]]
maindict = {a:[b, c] for a, b, c in mainlist}
subdict = dict(sublist)
result = []
for k in list(set(list(maindict.keys()) + list(subdict.keys()))):
result.append([k, ])
result[-1].extend(maindict.pop(k, ' '))
result[-1].append(subdict.pop(k, ' '))
sorted(result, key=lambda x: x[0])

Another option is to convert the sublist to a dict, so items are easily and rapidly accessible.
sublist_dict = dict(sublist)
So you can do (it modifies the mainlist):
for i, e in enumerate(mainlist):
data: mainlist[i].append(sublist_dict.get(e[0], ""))
#=> [['RD-12', 12, 'a', 67], ['RD-13', 45, 'c', ''], ['RD-15', 50, 'e', 65]]
Or a one liner list comprehension (it produces a new list):
[ e + [sublist_dict.get(e[0], "")] for e in mainlist ]
If you want to skip the missing element:
for i, e in enumerate(mainlist):
data = sublist_dict.get(e[0])
if data: mainlist[i].append(data)
print(mainlist)
#=> [['RD-12', 12, 'a', 67], ['RD-13', 45, 'c'], ['RD-15', 50, 'e', 65]]

Given a linear order completely represented by a list of tuples of strings, output the order as a list of strings

Given pairs of items of form [(a,b),...] where (a,b) means a > b, for example:
[('best','better'),('best','good'),('better','good')]
I would like to output a list of form:
['best','better','good']
This is very hard for some reason. Any thoughts?
======================== code =============================
I know why it doesn't work.
def to_rank(raw):
rank = []
for u,v in raw:
if u in rank and v in rank:
pass
elif u not in rank and v not in rank:
rank = insert_front (u,v,rank)
rank = insert_behind(v,u,rank)
elif u in rank and v not in rank:
rank = insert_behind(v,u,rank)
elif u not in rank and v in rank:
rank = insert_front(u,v,rank)
return [[r] for r in rank]
# #Use: insert word u infront of word v in list of words
def insert_front(u,v,words):
if words == []: return [u]
else:
head = words[0]
tail = words[1:]
if head == v: return [u] + words
else : return ([head] + insert_front(u,v,tail))
# #Use: insert word u behind word v in list of words
def insert_behind(u,v,words):
words.reverse()
words = insert_front(u,v,words)
words.reverse()
return words
=================== Update ===================
Per suggestion of many, this is a straight forward topological sort setting, I ultimately decided to use the code from this source: algocoding.wordpress.com/2015/04/05/topological-sorting-python/
which solved my problem.
def go_topsort(graph):
in_degree = { u : 0 for u in graph } # determine in-degree
for u in graph: # of each node
for v in graph[u]:
in_degree[v] += 1
Q = deque() # collect nodes with zero in-degree
for u in in_degree:
if in_degree[u] == 0:
Q.appendleft(u)
L = [] # list for order of nodes
while Q:
u = Q.pop() # choose node of zero in-degree
L.append(u) # and 'remove' it from graph
for v in graph[u]:
in_degree[v] -= 1
if in_degree[v] == 0:
Q.appendleft(v)
if len(L) == len(graph):
return L
else: # if there is a cycle,
return []
RockBilly's solution also work in my case, because in my setting, for every v < u, we are guaranteed to have a pair (u,v) in our list. So his answer is not very "computer-sciency", but it gets the job done in this case.

If you have a complete grammar specified then you can simply count up the items:
>>> import itertools as it
>>> from collections import Counter
>>> ranks = [('best','better'),('best','good'),('better','good')]
>>> c = Counter(x for x, y in ranks)
>>> sorted(set(it.chain(*ranks)), key=c.__getitem__, reverse=True)
['best', 'better', 'good']
If you have an incomplete grammar then you can build a graph and dfs all paths to find the longest. This isn't very inefficient, as I haven't thought about that yet :):
def dfs(graph, start, end):
stack = [[start]]
while stack:
path = stack.pop()
if path[-1] == end:
yield path
continue
for next_state in graph.get(path[-1], []):
if next_state in path:
continue
stack.append(path+[next_state])
def paths(ranks):
graph = {}
for n, m in ranks:
graph.setdefault(n,[]).append(m)
for start, end in it.product(set(it.chain(*ranks)), repeat=2):
yield from dfs(graph, start, end)
>>> ranks = [('black', 'dark'), ('black', 'dim'), ('black', 'gloomy'), ('dark', 'gloomy'), ('dim', 'dark'), ('dim', 'gloomy')]
>>> max(paths(ranks), key=len)
['black', 'dim', 'dark', 'gloomy']
>>> ranks = [('a','c'), ('b','a'),('b','c'), ('d','a'), ('d','b'), ('d','c')]
>>> max(paths(ranks), key=len)
['d', 'b', 'a', 'c']

What you're looking for is topological sort. You can do this in linear time using depth-first search (pseudocode included in the wiki I linked)

Here is one way. It is based on using the complete pairwise rankings to make an old-style (early Python 2) cmp function and then using functools.cmp_to_key to convert it to a key suitable for the Python 3 approach to sorting:
import functools
def sortByRankings(rankings):
def cmp(x,y):
if x == y:
return 0
elif (x,y) in rankings:
return -1
else:
return 1
items = list({x for y in rankings for x in y})
items.sort(key = functools.cmp_to_key(cmp))
return items
Tested like:
ranks = [('a','c'), ('b','a'),('b','c'), ('d','a'), ('d','b'), ('d','c')]
print(sortByRankings(ranks)) #prints ['d', 'b', 'a', 'c']
Note that to work correctly, the parameter rankings must contain an entry for each pair of distinct items. If it doesn't, you would first need to compute the transitive closure of the pairs that you do have before you feed it to this function.

You can take advantage of the fact that the lowest ranked item in the list will never appear at the start of any tuple. You can extract this lowest item, then remove all elements which contain this lowest item from your list, and repeat to get the next lowest.
This should work even if you have redundant elements, or have a sparser list than some of the examples here. I've broken it up into finding the lowest ranked item, and then the grunt work of using this to create a final ranking.
from copy import copy
def find_lowest_item(s):
#Iterate over set of all items
for item in set([item for sublist in s for item in sublist]):
#If an item does not appear at the start of any tuple, return it
if item not in [x[0] for x in s]:
return item
def sort_by_comparison(s):
final_list = []
#Make a copy so we don't mutate original list
new_s = copy(s)
#Get the set of all items
item_set = set([item for sublist in s for item in sublist])
for i in range(len(item_set)):
lowest = find_lowest_item(new_s)
if lowest is not None:
final_list.insert(0, lowest)
#For the highest ranked item, we just compare our current
#ranked list with the full set of items
else:
final_list.insert(0,set(item_set).difference(set(final_list)).pop())
#Update list of ranking tuples to remove processed items
new_s = [x for x in new_s if lowest not in x]
return final_list
list_to_compare = [('black', 'dark'), ('black', 'dim'), ('black', 'gloomy'), ('dark', 'gloomy'), ('dim', 'dark'), ('dim', 'gloomy')]
sort_by_comparison(list_to_compare)
['black', 'dim', 'dark', 'gloomy']
list2 = [('best','better'),('best','good'),('better','good')]
sort_by_comparison(list2)
['best', 'better', 'good']
list3 = [('best','better'),('better','good')]
sort_by_comparison(list3)
['best', 'better', 'good']

If you do sorting or create a dictionary from the list items, you are going to miss the order as #Rockybilly mentioned in his answer. I suggest you to create a list from the tuples of the original list and then remove duplicates.
def remove_duplicates(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]
i = [(5,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
i = remove_duplicates(list(x for s in i for x in s))
print(i) # prints [5, 2, 1, 3, 4]
j = [('excellent','good'),('excellent','great'),('great','good')]
j = remove_duplicates(list(x for s in j for x in s))
print(j) # prints ['excellent', 'good', 'great']
See reference: How do you remove duplicates from a list in whilst preserving order?
For explanation on the remove_duplicates() function, see this stackoverflow post.

If the list is complete, meaning has enough information to do the ranking(Also no duplicate or redundant inputs), this will work.
from collections import defaultdict
lst = [('best','better'),('best','good'),('better','good')]
d = defaultdict(int)
for tup in lst:
d[tup[0]] += 1
d[tup[1]] += 0 # To create it in defaultdict
print sorted(d, key = lambda x: d[x], reverse=True)
# ['best', 'better', 'good']
Just give them points, increment the left one each time you encounter it in the list.
Edit: I do think the OP has a determined type of input. Always have tuple count of combination nCr(n, 2). Which makes this a correct solution. No need to complain about the edge cases, which I already knew posting the answer(and mentioned it).

Removing values from a list in python

I have a large file of names and values on a single line separated by a space:
name1 name2 name3....
Following the long list of names is a list of values corresponding to the names. The values can be 0-4 or na. What I want to do is consolidate the data file and remove all the names and and values when the value is na.
For instance, the final line of name in this file is like so:
namenexttolast nameonemore namethelast 0 na 2
I would like the following output:
namenexttolast namethelast 0 2
How would I do this using Python?

Let's say you read the names into one list, then the values into another. Once you have a names and values list, you can do something like:
result = [n for n, v in zip(names, values) if v != 'na']
result is now a list of all names whose value is not "na".

s = "name1 name2 name3 v1 na v2"
s = s.split(' ')
names = s[:len(s)/2]
values = s[len(s)/2:]
names_and_values = zip(names, values)
names, values = [], []
[(names.append(n) or values.append(v)) for n, v in names_and_values if v != "na"]
names.extend(values)
print ' '.join(names)
Update
Minor improvement after suggestion from Paul. I'm sure the list comprehension is fairly unpythonic, as it leverages the fact that list.append returns None, so both append expressions will be evaluated and a list of None values will be constructed and immediately thrown away.

I agree with Justin than using zip is a good idea. The problems is how to put the data into two different lists. Here is a proposal that should work ok.
reader = open('input.txt')
writer = open('output.txt', 'w')
names, nums = [], []
row = reader.read().split(' ')
x = len(row)/2
for (a, b) in [(n, v) for n, v in zip(row[:x], row[x:]) if v!='na']:
names.append(a)
nums.append(b)
writer.write(' '.join(names))
writer.write(' ')
writer.write(' '.join(nums))
#writer.write(' '.join(names+nums)) is nicer but cause list to be concat

or say you have a string which you have read from a file. Let's call this string as "s"
words = filter(lambda x: x!="na", s.split())
should give you all the strings except for "na"
edit: the code above obviously doesn't do what you want it to do.
the one below should work though
d = s.split()
keys = d[:len(d)/2]
vals = d[len(d)/2:]
w = " ".join(map(lambda (k,v): (k + " " + v) if v!="na" else "", zip(keys, vals)))
print " ".join([" ".join(w.split()[::2]), " ".join(w.split()[1::2])])

strlist = 'namenexttolast nameonemore namethelast 0 na 2'.split()
vals = ('0', '1', '2', '3', '4', 'na')
key_list = [s for s in strlist if s not in vals]
val_list = [s for s in strlist if s in vals]
#print [(key_list[i],v) for i, v in enumerate(val_list) if v != 'na']
filtered_keys = [key_list[i] for i, v in enumerate(val_list) if v != 'na']
filtered_vals = [v for v in val_list if v != 'na']
print filtered_keys + filtered_vals
If you'd rather group the vals, you could create a list of tuples instead (commented out line)

Here is a solution that uses just iterators plus a single buffer element, with no calls to len and no other intermediate lists created. (In Python 3, just use map and zip, no need to import imap and izip from itertools.)
from itertools import izip, imap, ifilter
def iterStartingAt(cond, seq):
it1,it2 = iter(seq),iter(seq)
while not cond(it1.next()):
it2.next()
for item in it2:
yield item
dataline = "namenexttolast nameonemore namethelast 0 na 2"
datalinelist = dataline.split()
valueset = set("0 1 2 3 4 na".split())
print " ".join(imap(" ".join,
izip(*ifilter(lambda (n,v): v != 'na',
izip(iter(datalinelist),
iterStartingAt(lambda s: s in valueset,
datalinelist))))))
Prints:
namenexttolast namethelast 0 2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merge nested list items based on a repeating value - python

Related

How can i create a dictionary from 2 lists with one as the key and the other as as the value with only loops? Without using zip() or enumerate()

python edit tuple duplicates in a list

adding empty string while joining the 2 lists - Python

Given a linear order completely represented by a list of tuples of strings, output the order as a list of strings

Removing values from a list in python

Categories

Resources