Unique value in list of lists in Python - python

I have following list of lists in python :
[
u'aaaaa',
[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
u'zzzzzz',
[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'],
[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time'],
u'bbbbb',
[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'],
[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time'],
[1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']
]
I want following output in python :
[
[u'aaaaa', [1, 6, u'testing', 20.0, 18.0, 2.0, 'In time']],
[u'zzzzzz', [1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'], [2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time']],
[u'bbbbb', [1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']]
]
Please suggest me how can it possible with manage order in python.

The following should give you the desired output. It uses a dictionary to spot duplicate entries.
entries = [
u'aaaaa', [1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
u'zzzzzz', [1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'],
[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time'],
u'bbbbb',
[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'],
[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time'],
[1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']]
d = {}
output = []
entry = []
for item in entries:
if type(item) == type([]):
t = tuple(item)
if t not in d:
d[t] = 0
entry.append(item)
else:
if len(entry):
output.append(entry)
entry = [item]
output.append(entry)
print output
This gives the following output:
[[u'aaaaa', [1, 6, u'testing', 20.0, 18.0, 2.0, 'In time']], [u'zzzzzz', [1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'], [2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time']], [u'bbbbb', [1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']]]
Tested using Python 2.7
Update: If a list of lists format is needed, simply add [] to item in the above script as follows::
entry.append([item])
This would give the following output:
[[u'aaaaa', [[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time']]], [u'zzzzzz', [[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going']], [[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time']]], [u'bbbbb', [[1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']]]]

If you want all unique values from a list:
mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
mylist = [list(x) for x in set(tuple(x) for x in testdata)]
print myset # This is now a set containing all unique values.
# This will not maintain the order of the items

1) I really think you should check out Python dictionaries. They would make much more sense, looking at the kind of output you want.
2) In this case, if I understand you correctly, you want to convert a list with elements that are either strings or lists into a list of lists. This list of lists should have a starting element as a string, and the remaining elements as the following list items within the main list, till you hit the next string. (At least that's what it looks like from your example).
output_list = []
for elem in main_list:
if isinstance(elem,basestring):
output_list.append([elem])
else:
output_list[-1].append(elem)

Related

the function keeps unwanted informations

for some reason when i call the function "novalinhasubtraida",
it keeps its information and affects it`s next calls
matriz = [[1.0, 7.0, 9.0, 5.0],
[1.125, 1.0, 0.25, 0.875],
[0.4, 0.6, 1.0, 0.2]]
result = list()
def subrairlinhas(matriz,linhasubtraida,linhasubtraiadora):
result.clear()
for item1, item2 in zip(matriz[linhasubtraida], matriz[linhasubtraiadora]):
item = item1 - item2*(matriz[linhasubtraida][linhasubtraiadora])
#print(f'item:{item}')
result.append(item)
return result
#novalinhasubtraida contains subtrairlinhas
def novalinhasubtraida(matriz,linhatransformada,linhado1):
result = subrairlinhas(matriz,linhatransformada,linhado1)
#print(result)
matriz.remove(matriz[linhatransformada])
matriz.insert(linhatransformada,result)
return matriz
for example:
INPUT:
novalinhasubtraida(matriz,1,0)
print(matriz)
novalinhasubtraida(matriz,2,0)
print(matriz)
Output:
[[1.0, 7.0, 9.0, 5.0], [0.0, -6.875, -9.875, -4.75], [0.4, 0.6, 1.0, 0.2]]
[[1.0, 7.0, 9.0, 5.0], [0.0, -2.2, -2.6, -1.8], [0.0, -2.2, -2.6, -1.8]]
when instead a insert this:
INPUT:
novalinhasubtraida(matriz,2,0)
print(matriz)
novalinhasubtraida(matriz,1,0)
print(matriz)
OUTPUT:
[[1.0, 7.0, 9.0, 5.0], [1.125, 1.0, 0.25, 0.875], [0.0, -2.2, -2.6, -1.8]]
[[1.0, 7.0, 9.0, 5.0], [0.0, -6.875, -9.875, -4.75], [0.0, -6.875, -9.875, -4.75]]

Relabelling ticks on Seaborn axes?

I'm doing a log-log plot with Seaborn; the data is actually derived from a StackOverflow developer survey. I tried using the built-in log scale, but the results didn't make sense, so this simply calculates the logs before plotting.
df = pd.DataFrame( {'company_size_range': {7800: 7.0, 7801: 700.0, 7802: 7.0, 7803: 20000.0, 7805: 200.0, 7806: 20000.0, 7808: 2000.0, 7809: 2000.0, 7810: 7.0, 7811: 200.0, 7812: 50.0, 7813: 20000.0, 7816: 2.0, 7819: 200.0, 7820: 2000.0, 7824: 2.0, 7825: 2.0, 7827: 2.0, 7828: 50.0, 7830: 14.0, 7831: 50.0, 7833: 200.0, 7834: 50.0, 7835: 50.0, 7838: 2.0, 7840: 50.0, 7841: 50.0, 7842: 7000.0, 7843: 20000.0, 7844: 14.0, 7846: 2.0, 7850: 20000.0, 7851: 700.0, 7852: 200.0, 7853: 200.0, 7855: 200.0, 7856: 7.0, 7857: 50.0, 7858: 700.0, 7861: 20000.0, 7863: 20000.0, 7865: 20000.0, 7867: 700.0, 7868: 20000.0, 7870: 50.0, 7871: 2000.0, 7872: 50.0, 7873: 20000.0, 7874: 200.0, 7876: 14.0, 7877: 20000.0, 7879: 50.0, 7880: 50.0 }, 'team_size_range': {7800: 7.0, 7801: 7.0, 7802: 7.0, 7803: 2.0, 7805: 7.0, 7806: 2.0, 7808: 7.0, 7809: 7.0, 7810: 2.0, 7811: 17.0, 7812: 7.0, 7813: 2.0, 7816: 2.0, 7819: 7.0, 7820: 30.0, 7824: 2.0, 7825: 2.0, 7827: 2.0, 7828: 2.0, 7830: 2.0, 7831: 7.0, 7833: 2.0, 7834: 2.0, 7835: 7.0, 7838: 2.0, 7840: 7.0, 7841: 30.0, 7842: 7.0, 7843: 7.0, 7844: 2.0, 7846: 2.0, 7850: 7.0, 7851: 11.0, 7852: 7.0, 7853: 7.0, 7855: 2.0, 7856: 7.0, 7857: 7.0, 7858: 11.0, 7861: 7.0, 7863: 2.0, 7865: 30.0, 7867: 7.0, 7868: 7.0, 7870: 2.0, 7871: 17.0, 7872: 7.0, 7873: 17.0, 7874: 7.0, 7876: 2.0, 7877: 7.0, 7879: 17.0, 7880: 7.0}} )
g=sns.jointplot(x=np.log10(df['company_size_range']+1),
y=np.log10(df['team_size_range']+1), kind='kde', color='g')
That's fine, but the axes show the log values, not the underlying values. The X-axis, for example, is:
-1, 1, 2, 3, 4, 5, 6
So I added this to fix it, using the X position of the labels as the X values:
g.ax_joint.set_xticklabels(["{:.0f}".format(10**label.get_position()[0]-1)
for label in g.ax_joint.get_xticklabels()])
The trouble is the resulting X-axis labels are nonsense:
1, 2, 3, 5, 9, 0, 0, 0
What is going on, and how best to fix it, please?
You could make use of a FuncFormatter. The benefit would be that the ticks are always drawn right also after resizing the window.
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import pandas as pd
import seaborn as sns
def tickformat_pow10(value, tick_number):
return f'{10**value:,.0f}'
# df = ...
g = sns.jointplot(x=np.log10(df['company_size_range'] + 1),
y=np.log10(df['team_size_range'] + 1), kind='kde', color='g')
g.ax_joint.xaxis.set_major_formatter(FuncFormatter(tickformat_pow10))
g.ax_joint.yaxis.set_major_formatter(FuncFormatter(tickformat_pow10))
Try the following by first using the canvas.draw(). Also, I do not understand why you are subtracting 1
g.fig.canvas.draw()
g.ax_joint.set_xticklabels(["{:.0f}".format(10**label.get_position()[0]-1)
for label in g.ax_joint.get_xticklabels()]);

Binning a list in groups python

I have a list:
l = [2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 12.0,96.0, 192.0, 480.0, 360.0, 504.0, 300.0]
I want to group the elements in list in group size difference of 10. (i.e, 0-10,10-20,20-30,30-40...etc)
For eg:
Output that I'm looking for is:
[ [2,4,5,6,7,8,10],[12],[96],[192],[300],[360],[480],[504] ]
I tried using:
list(zip(*[iter(l)] * 10))
But getting wrong answer.
Use itertools.groupby to group together after dividing(//) it by 10
from itertools import groupby
l = [2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 12.0,96.0, 192.0, 480.0, 360.0, 504.0, 300.0]
groups = []
for _, g in groupby(l, lambda x: (x-1)//10):
groups.append(list(g)) # Store group iterator as a list
print(groups)
Output:
[[2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0], [12.0], [96.0], [192.0], [480.0], [360.0], [504.0], [300.0]]
A defaultdict might not be bad for this, it's not in one pass, but you can sort the keys to keep everything in place. The integer divide by 10 will bin everything for you
groups = defaultdict(list)
for i in l:
groups[int((i-1)//10)].append(i)
groups_list = sorted(groups.values())
groups_list[[2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0], [12.0], [96.0], [192.0], [300.0], [360.0], [480.0], [504.0]]
Even though, an answer is accepted, here is another way :
l = [2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 12.0,96.0, 192.0, 480.0, 360.0, 504.0, 300.0]
l1 = [int(k) for k in l]
l2 = list(list([k for k in l1 if len(str(k))==j]) for j in range(1,len(str(max(l1))) +1))
OUTPUT :
l2 = [[2, 4, 5, 6, 7, 8], [10, 12, 96], [192, 480, 360, 504, 300]]
It can be sub listed using dictionary : the key for dict will be value-1/10 if same key comes value will be appended:
gd={}
for i in l:
k=int((i-1)//10)
if k in gd:
gd[k].append(i)
else:
gd[k]=[i]
print(gd.values())
You can loop over you list l and create a new list using extend and an if condition:
smaller_list = []
larger_list = []
desired_result_list = []
for element in l:
if element <= 10:
smaller_list.extend([element])
else:
larger_list.append([element])
desired_result_list.extend(larger_list + [smaller_list])

Replace dictionary keys from values of another dictionary

I have three dictionaries:
packed_items = {0: [0, 3],
2: [1],
1: [2]}
trucks_dict = {0: [9.5, 5.5, 5.5],
1: [13.0, 5.5, 7.0],
2: [16.0, 6.0, 7.0]}
items_dict = {0: [4.6, 4.3, 4.3],
1: [4.6, 4.3, 4.3],
2: [6.0, 5.6, 9.0],
3: [8.75, 5.6, 6.6]}
packed_items consists of trucks as keys and values as list of items. I want to change my packed_dict such that it gives me output in this format
packed_dict = {[9.5, 5.5, 5.5]:[[4.6, 4.3, 4.3],[8.75, 5.6, 6.6]]
[16.0, 6.0, 7.0]:[[4.6, 4.3, 4.3]]
[13.0, 5.5, 7.0]:[[6.0, 5.6, 9.0]]}
Basically I want to replace my keys in packed_items with the values in trucks_dict, and values in packed_items with values in items_dict.
By converting your list keys to tuples, you can do that with something like:
Code:
result = {}
for k, v in packed_items.items():
for i in v:
result.setdefault(tuple(trucks_dict[k]), []).append(items_dict[i])
Test Code:
packed_items = {0: [0, 3],
2: [1],
1: [2]}
trucks_dict = {0: [9.5, 5.5, 5.5],
1: [13.0, 5.5, 7.0],
2: [16.0, 6.0, 7.0]}
items_dict = {0: [4.6, 4.3, 4.3],
1: [4.6, 4.3, 4.3],
2: [6.0, 5.6, 9.0],
3: [8.75, 5.6, 6.6]}
result = {}
for k, v in packed_items.items():
for i in v:
result.setdefault(tuple(trucks_dict[k]), []).append(items_dict[i])
print(result)
Results:
{(9.5, 5.5, 5.5): [[4.6, 4.3, 4.3], [8.75, 5.6, 6.6]],
(16.0, 6.0, 7.0): [[4.6, 4.3, 4.3]],
(13.0, 5.5, 7.0): [[6.0, 5.6, 9.0]]
}
You cannot have lists as dictionary keys because they are unhashable.
Because you asked for string keys, you can do:
from collections import defaultdict
packed_items = {0: [0, 3],
2: [1],
1: [2]}
trucks_dict = {0: [9.5, 5.5, 5.5],
1: [13.0, 5.5, 7.0],
2: [16.0, 6.0, 7.0]}
items_dict = {0: [4.6, 4.3, 4.3],
1: [4.6, 4.3, 4.3],
2: [6.0, 5.6, 9.0],
3: [8.75, 5.6, 6.6]}
d = defaultdict(list)
for k1, v1 in trucks_dict.items():
for k2, v2 in items_dict.items():
if k1 == k2 % 3:
d[str(v1)].append(v2)
print(d)
# {'[9.5, 5.5, 5.5]': [[4.6, 4.3, 4.3], [8.75, 5.6, 6.6]], '[16.0, 6.0, 7.0]': [[4.6, 4.3, 4.3]], '[13.0, 5.5, 7.0]': [[6.0, 5.6, 9.0]]}
You can use a dict comprehension to map the lists in trucks_dict to items in items_dict. The lists have to be converted to tuples so that they can be hashable as keys:
{tuple(trucks_dict[k]): [items_dict[i] for i in l] for k, l in packed_items.items()}
This returns:
{(9.5, 5.5, 5.5): [[4.6, 4.3, 4.3], [8.75, 5.6, 6.6]],
(13.0, 5.5, 7.0): [[6.0, 5.6, 9.0]],
(16.0, 6.0, 7.0): [[4.6, 4.3, 4.3]]}

How to efficiently do a grid search for parameter combinations in Python?

Problem
For a computation engineering model, I want to do a grid search for all feasible parameter combinations. Each parameter has a certain possibility range, e.g. (0 … 100) and the parameter combination must fulfil the condition a+b+c=100. An example:
ranges = {
'a': (95, 99),
'b': (1, 4),
'c': (1, 2)}
increment = 1.0
target = 100.0
So the combinations that fulfil the condition a+b+c=100 are:
[(95, 4, 1), (95, 3, 2), (96, 2, 2), (96, 3, 1), (97, 1, 2), (97, 2, 1), (98, 1, 1)]
This algorithm should run with any number of parameters, range lengths, and increments.
My solutions (so far)
The solutions I have come up with are all brute-forcing the problem. That means calculating all combinations and then discarding the ones that do not fulfil the given condition:
def solution1(ranges, increment, target):
combinations = []
for parameter in ranges:
combinations.append(list(np.arange(ranges[parameter][0], ranges[parameter][1], increment)))
# np.arange() is exclusive of the upper bound, let's fix that
if combinations[-1][-1] != ranges[parameter][1]:
combinations[-1].append(ranges[parameter][1])
combinations = list(itertools.product(*combinations))
df = pd.DataFrame(combinations, columns=ranges.keys())
# using np.isclose() so that the algorithm works for floats
return df[np.isclose(df.sum(axis=1), target)]
Since I ran into RAM problems with solution1(), I used itertools.product as an iterator.
def solution2(ranges, increment, target):
combinations = []
for parameter in ranges:
combinations.append(list(np.arange(ranges[parameter][0], ranges[parameter][1], increment)))
# np.arange() is exclusive of the upper bound, let's fix that
if combinations[-1][-1] != ranges[parameter][1]:
combinations[-1].append(ranges[parameter][1])
result = []
for combination in itertools.product(*combinations):
# using np.isclose() so that the algorithm works for floats
if np.isclose(sum(combination), target):
result.append(combination)
df = pd.DataFrame(result, columns=ranges.keys())
return df
However, this quickly takes a few days to compute. Hence, both solutions are not viable for large number of parameters and ranges. For instance, one set that I am trying to solve is (already unpacked combinations variable):
[[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0], [22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], [0.0, 1.0, 2.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], [0.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0], [0.0]]
This results in memory use of >40 GB for solution1() and calculation time >400 hours for solution2().
Question
Do you see a solution that is either faster or more intelligent, i.e. not trying to brute-force the problem?
P.S.: I am not 100% sure if this question would be a better fit on one of the other Stackexchange sites. Please suggest in the comments if you think it should be moved and I will delete it here.
Here is a recursive solution:
a = [95, 100]
b = [1, 4]
c = [1, 2]
Params = (a, b, c)
def GetValidParamValues(Params, constriantSum, prevVals):
validParamValues = []
if (len(Params) == 1):
if (constriantSum >= Params[0][0] and constriantSum <= Params[0][1]):
validParamValues.append(constriantSum)
for v in validParamValues:
print(prevVals + v)
return
sumOfLowParams = sum([Params[x][0] for x in range(1, len(Params))])
sumOfHighParams = sum([Params[x][1] for x in range(1, len(Params))])
lowEnd = max(Params[0][0], constriantSum - sumOfHighParams)
highEnd = min(Params[0][1], constriantSum - sumOfLowParams) + 1
if (len(Params) == 2):
for av in range(lowEnd, highEnd):
bv = constriantSum - av
if (bv <= Params[1][1]):
validParamValues.append([av, bv])
for v in validParamValues:
print(prevVals + v)
return
for av in range(lowEnd, highEnd):
nexPrevVals = prevVals + [av]
subSeParams = Params[1:]
GetValidParamValues(subSeParams, constriantSum - av, nexPrevVals)
GetValidParamValues(Params, 100)
The idea is that if there were 2 parameters, a and b, we could list all the valid pairs by passing through the values of a, and taking (ai, S - ai) and just checking if S-ai is a valid value for b.
This is improved on since we can calculate ahead of time which values of ai will make S-ai a valid value for b, so we never check values that don't work.
When the number of params is more than 2, we can again look at every valid value of ai, and we know the sum of the other numbers must be S - ai. So the only thing we need is every possible way for the other numbers to add to S - ai, which is the same problem with one fewer parameter. So by using recursion we can get it go all the way down to size 2 and solve it.

Categories