Create sublists with duplicate list elements - python

I'm new to Python and I'm trying to create sublists for list elements sharing the same base:
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
desiredOutput = [['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
I've tried to first isolate the base from each list element using:
def commonNumerator(self):
checkPosition = self.find('/')
commonNumerator = self[:checkPosition]
return commonNumerator
listRawModified = [commonNumerator(x) for x in listRaw]
print(listRawModified)
which gets me:
['AKS', 'SBHS', 'SBJ', 'SBJ', 'AKS', 'SBHS', 'AKS']
but from then I don't know how to proceed to get to the desired ouput.
Can someone explain to me how to do it?

Typical usecase for itertools.groupby():
from itertools import groupby
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
def key(s):
return s.split('/')[0]
[list(g) for k, g in groupby(sorted(listRaw, key=key), key=key)]
# [['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
The key() function helps in extracting the sorting/grouping key: key('AKS/STB') == 'AKS'.

Another way to do this would be to split each element and create a dictionary and then construct your desired output from that dictionary, e.g.:
In []:
d = {}
for i in listRaw:
k, v = i.split('/')
d.setdefault(k, []).append(v)
[['/'.join([k, v]) for v in d[k]] for k in d]
Out[]:
[['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]

This is a typical usecase for itertools. But you could also consider storing the values in a dictionary:
from collections import defaultdict
d = defaultdict(list)
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
for item in listRaw:
i,y = item.split('/')
d[i].append(y)
print(dict(d))
# {'AKS': ['STB', 'OSMX', 'AKX'], 'SBHS': ['AME', 'ABNX'], 'SBJ': ['OAK', 'ALS']}
You can then access the values to AKS with a simple command as:
d['AKS'] # ['STB', 'OSMX', 'AKX']

Related

Handle dictionary collision in python3

I currently have the code below working fine:
Can someone help me solve the collision created from having two keys with the same number in the dictionary?
I tried multiple approach (not listed here) to try create an array to handle it but my approaches are still unsuccessful.
I am using #python3.7
def find_key(dic1, n):
'''
Return the key '3' from the dict
below.
'''
d = {}
for x, y in dic1.items():
# swap keys and values
# and update the result to 'd'
d[y] = x
try:
if n in d:
return d[y]
except Exception as e:
return (e)
dic1 = {'james':2,'david':3}
# Case to test that return ‘collision’
# comment 'dic1' above and replace it by
# dic1 below to create a 'collision'
# dic1 = {'james':2,'david':3, 'sandra':3}
n = 3
print(find_key(dic1,n))
Any help would be much appreciated.
You know there should be multiple returns, so plan for that in advance.
def find_keys_for_value(d, value):
for k, v in d.items():
if v == value:
yield k
data = {'james': 2, 'david': 3, 'sandra':3}
for result in find_keys_for_value(data, 3):
print (result)
You can use a defaultdict:
from collections import defaultdict
def find_key(dct, n):
dd = defaultdict(list)
for x, y in dct.items():
dd[y].append(x)
return dd[n]
dic1 = {'james':2, 'david':3, 'sandra':3}
print(find_key(dic1, 3))
print(find_key(dic1, 2))
print(find_key(dic1, 1))
Output:
['david', 'sandra']
['james']
[]
Building a defaultdict from all keys and values is only justified if you will repeatedly search for keys of the same dict given different values, though. Otherwise, the approach of Kenny Ostrom is preferrable. In any case, the above makes little sense if left as it stands.
If you are not at ease with generators and yield, here is the approach of Kenny Ostrom translated to lists (less efficient than generators, better than the above for one-shot searches):
def find_key(dct, n):
return [x for x, y in dct.items() if y == n]
The output is the same as above.

how to separate cam1,2,3,4,5,6 first images from the list

lst = ['Cam218-10-03_16-05-21-54.jpg',
'Cam318-10-03_17-04-21-54.jpg',
'Cam418-10-03_16-04-21-54.jpg',
'Cam218-10-02_16-05-21-54.jpg',
'Cam318-10-02_17-04-21-54.jpg',
'Cam418-10-02_16-04-21-54.jpg',
'Cam218-10-02_16-04-08-31.jpg',
'Cam318-10-02_16-04-08-30.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg',
'Cam118-10-02_16-04-09-33.jpg',
'Cam218-10-02_16-04-09-33.jpg',
'Cam318-10-02_16-04-09-33.jpg',
'Cam418-10-02_16-04-09-33.jpg',
'Cam518-10-02_16-04-09-33.jpg',
'Cam618-10-02_16-04-09-33.jpg',
'Cam118-10-02_16-04-11-53.jpg',
'Cam218-10-02_16-04-11-53.jpg',
'Cam318-10-02_16-04-11-53.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam118-10-02_16-04-08-31.jpg',
'Cam518-10-02_16-04-11-53.jpg',
'Cam118-10-02_16-04-11-53.jpg']
From this list I want the output:
['Cam118-10-02_16-04-08-31.jpg',
'Cam218-10-02_16-04-08-31.jpg',
'Cam318-10-02_16-04-08-30.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg']
by using Python. Could anybody help me?
With itertools.groupby - O(n*log(n))
>>> from itertools import groupby
>>> [next(g) for _, g in groupby(sorted(lst), key=lambda cam: cam.partition('-')[0])]
['Cam118-10-02_16-04-08-31.jpg',
'Cam218-10-02_16-04-08-31.jpg',
'Cam318-10-02_16-04-08-30.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg']
With keeping track of duplicates manually (output not sorted, but potentially useful to other readers) - O(n)
>>> seen = set()
>>> result = []
>>>
>>> for cam in lst:
...: model, *_ = cam.partition('-')
...: if model not in seen:
...: result.append(cam)
...: seen.add(model)
...:
>>> result
['Cam218-10-03_16-05-21-54.jpg',
'Cam318-10-03_17-04-21-54.jpg',
'Cam418-10-03_16-04-21-54.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg',
'Cam118-10-02_16-04-09-33.jpg']
you can make if condition to check for the occurrence of the photo tag after sorting the list
list.sort()
i = 1
for item in list:
if(item[3]==str(i)):
i=i+1
print(item)
continue
the result is
Cam118-10-02_16-04-08-31.jpg
Cam218-10-02_16-04-08-31.jpg
Cam318-10-02_16-04-08-30.jpg
Cam418-10-02_16-04-08-30.jpg
Cam518-10-02_16-04-08-35.jpg
Cam618-10-02_16-04-08-36.jpg
if you want to get the first occurrence of item with no regards to its order ascendingly, removing list.sort() shall resolve that.

Parse dict into specific list format using Python

Say I have the following dict
{'red':'boop','white':'beep','rose':'blip'}
And I want to get it to a list like so
['red','boop','end','white','beep','rose','blip','end']
The key / value which is to be placed in front of the list is an input.
So I essentially I want [first_key, first_value,end, .. rest of the k/v pairs..,end]
I wrote a brute force approach but I feel like there's a more pythonic way of doing it (and also because once implemented the snippet would make my code O(n^2) )
for item in lst_items
data_lst = []
for key, value in item.iteritems():
data_lst.append(key)
ata_lst.append(value)
#insert 'end' at the appropiate indeces
#more code ...
Any pythonic approach?
The below relies on itertools.chain.from_iterable to flatten the items into a single list. We pull the first two values from the chain and then use them to build a new list, which we extend with the rest of the values.
from itertools import chain
def ends(d):
if not d:
return []
c = chain.from_iterable(d.iteritems())
l = [next(c), next(c), "end"]
l.extend(c)
l.append("end")
return l
ends({'red':'boop','white':'beep','rose':'blip'})
# ['rose', 'blip', 'end', 'white', 'beep', 'red', 'boop', 'end']
If you know the key you want first, and don't care about the rest, we can use a lazily evaluated generator expression to remove it from the flattened list.
def ends(d, first):
if not d:
return []
c = chain.from_iterable((k, v) for k, v in d.iteritems() if k != first)
l = [first, d[first], "end"]
l.extend(c)
l.append("end")
return l
ends({'red':'boop','white':'beep','rose':'blip'}, 'red')
# ['red', 'boop', 'end', 'rose', 'blip', 'white', 'beep', 'end']
The first key is specified in first variable:
first = 'red'
d = {'red':'boop','white':'beep','rose':'blip'}
new_l = [first, d[first], 'end']
for k, v in d.items():
if k == first:
continue
new_l.append(k)
new_l.append(v)
new_l.append('end')
print(new_l)
Prints:
['red', 'boop', 'end', 'white', 'beep', 'rose', 'blip', 'end']
You could use enumerate and check the current index:
>>> d = {'red':'boop','white':'beep','rose':'blip'}
>>> [x for i, e in enumerate(d.items())
... for x in (e + ("end",) if i in (0, len(d)-1) else e)]
...
['white', 'beep', 'end', 'red', 'boop', 'rose', 'blip', 'end']
However, your original idea, first chaining the keys and values and then inserting the "end" items would not have O(n²), either. It would be O(n) followed by another O(n), hence still O(n).
from itertools import chain
list(chain(*item.items())) + ['end']
data_lst = [x for k, v in lst_itemsL.iteritems() for x in (k, v) ]
data_lst.insert(2, 'end')
data_lst.append('end')
This is pythonic; though will likely have the same efficiency (which can't be helped here).
This should be faster than placing if blocks inside the loops...

Python Tulpe Key For Dict Partial Lookup

I have a dictionary with a tuple of 5 values as a key. For example:
D[i,j,k,g,h] = value.
Now i need to process all elements with a certain partial key pair (i1,g1):
I need now for each pair (i1,g1) all values that have i == i1 and g == g1 in the full key.
What is an pythonic and efficient way to retrieve this, knowing that i need the elements for all pairs and each full key belongs to exactly one partial key?
Is there a more appropriate data structure than dictionaries?
One reference implementation is this:
results = {}
for i in I:
for g in G:
results[i,g] = []
for i,j,k,g,h in D:
if i1 == i and g1 == g:
results[i,g].append(D[i,j,k,g,h])
Assuming you know all the valid values for the different indices you can get all possible keys using itertools.product:
import itertools
I = [3,6,9]
J = range(10)
K = "abcde"
G = ["first","second"]
H = range(10,20)
for tup in itertools.product(I,J,K,G,H):
my_dict[tup] = 0
To restrict the indices generated just put a limit on one / several of the indices that gets generated, for instance all of the keys where i = 6 would be:
itertools.product((6,), J,K,G,H)
A function to let you specify you want all the indices where i==6 and g =="first" would look like this:
def partial_indices(i_vals=I, j_vals=J, k_vals=K, g_vals = G, h_vals = H):
return itertools.product(i_vals, j_vals, k_vals, g_vals, h_vals)
partial_indices(i_vals=(6,), g_vals=("first",))
Or assuming that not all of these are present in the dictionary you can also pass the dictionary as an argument and check for membership before generating the keys:
def items_with_partial_indices(d, i_vals=I, j_vals=J, k_vals=K, g_vals = G, h_vals = H):
for tup in itertools.product(i_vals, j_vals, k_vals, g_vals, h_vals):
try:
yield tup, d[tup]
except KeyError:
pass
for k,v in D.iteritems():
if i in k and p in k:
print v

combine like lists withing a nested list

i have a large nested list, and with in each nested list are two values, a company name and an amount, i am wondering if there is a way to combine the nested lists that have the same name together and then add the values? so for example here is a section of the list
[['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Equifax', 146517.59], ['Gerdau', 898579.89], ['Haribo', 265333.85], ['Gerdau', 13019.63676], ['Gerdau', 34107.12062], ['Acer', 52153.02848]
i would expect an outcome that looks like the one below
[['Acer',(481242.74+52153.02848)],['Beko', 966071.86],['Cemex', 187242.16],['Datsun', 748502.91],['Equifax', 146517.59],['Gerdau',(898579.89+13019.63676+34107.12062)],['Haribo', 265333.85]]
so essentially im trying to write a code that will go through a nested list and return a list made by finding all the lists with the same [0] element and combining there [1] element
from collections import defaultdict
d = defaultdict(float)
for name, amt in a:
d[name] += amt
What this does is to create a dict where the amount will be zero (float()) by default, and then sum up using the names as keys.
If you really need the result to be a list, you can get it this way:
>>> print d.items()
[('Equifax', 146517.59), ('Haribo', 265333.85), ('Gerdau', 945706.64738), ('Cemex', 187242.16), ('Datsun', 748502.91), ('Beko', 966071.86), ('Acer', 533395.76848)]
defaultdict is probably a good way to go but you can do this with a normal dictionary:
>>> data = [['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ...]
>>> result = {}
>>> for k, v in data:
... result[k] = result.get(k, 0) + v
>>> result
{'Acer': 533395.76848, 'Beko': 966071.86, 'Cemex': 187242.16, ... }
>>> list(result.items())
[('Acer', 533395.76848), ('Beko', 966071.86), ('Cemex', 187242.16), ...]
from collections import defaultdict
d = defaultdict(list)
l=[['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Equifax', 146517.59], ['Gerdau', 898579.89], ['Haribo', 265333.85], ['Gerdau', 13019.63676], ['Gerdau', 34107.12062], ['Acer', 52153.02848]]
for k,v in l:
d[k].append(v)
w=[]
for x,y in d.items():
w.append([x,sum(y)])
print w
Prints-
[['Equifax', 146517.59], ['Haribo', 265333.85], ['Gerdau', 945706.64738], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Beko', 966071.86], ['Acer', 533395.76848]]
If want to make it a tuple
map(tuple,w)
Output is-
[('Equifax', 146517.59), ('Haribo', 265333.85), ('Gerdau', 945706.64738), ('Cemex', 187242.16), ('Datsun', 748502.91), ('Beko', 966071.86), ('Acer', 533395.76848)]

Categories