Probability of an element in list - python

My data
List = [[[12,1,6],[12,1,6],15],[[12,2,6],[12,2,6],18]],[[12,3,6],[12,3,6],24]]
I have a data containing
number of rows having a transition from 12,1,6 to 12,1,6 is 15
number of rows having a transition from 12,2,6 to 12,2,6 is 18
number of rows having a transition from 12,3,6 to 12,3,6 is 24
as list
This data is not generated.There are many other possible combinations are there in my data.The above said is the sample
I want my output to be a list having probabilities of this transition
for example
P1 = the probability of transition from 12,1,6 to 12,1,6
= 15/total length of rows/elements in the list.(In this case it is 3)
P2 = the probability of transition from 12,2,6 to 12,2,6
= 18/total length of rows in the list
my output needs to be
List =[[[12,1,6],[12,1,6],15,P1=(15/3)*100],[[12,2,6],[12,2,6],18,P2]],[[12,3,6],[12,3,6],24,P3]]
Have tried a lot and would be helpful if i get suggestions.
def Sort(sub_li):
sub_li.sort(reverse = True, key = lambda x: x[1])
return sub_li
print(Sort())

List had an extra ] after 18, so I removed it before writing this piece of code, which assumes that the length is the same for all the rows.
List = [[[12,1,6],[12,1,6],15],[[12,2,6],[12,2,6],18],[[12,3,6],[12,3,6],24]]
for i, value in enumerate(List):
value.append(("P%d=%f" % ((i + 1), value[2] / len(value[0]) * 100)))
print(List)
Output:
[[[12, 1, 6], [12, 1, 6], 15, 'P1=500.000000'], [[12, 2, 6], [12, 2, 6], 18, 'P2=600.000000'], [[12, 3, 6], [12, 3, 6], 24, 'P3=800.000000']]

Related

Merge lists in a dataframe column if they share a common value

What I need:
I have a dataframe where the elements of a column are lists. There are no duplications of elements in a list. For example, a dataframe like the following:
import pandas as pd
>>d = {'col1': [[1, 2, 4, 8], [15, 16, 17], [18, 3], [2, 19], [10, 4]]}
>>df = pd.DataFrame(data=d)
col1
0 [1, 2, 4, 8]
1 [15, 16, 17]
2 [18, 3]
3 [2, 19]
4 [10, 4]
I would like to obtain a dataframe where, if at least a number contained in a list at row i is also contained in a list at row j, then the two list are merged (without duplication). But the values could also be shared by more than two lists, in that case I want all lists that share at least a value to be merged.
col1
0 [1, 2, 4, 8, 19, 10]
1 [15, 16, 17]
2 [18, 3]
The order of the rows of the output dataframe, nor the values inside a list is important.
What I tried:
I have found this answer, that shows how to tell if at least one item in list is contained in another list, e.g.
>>not set([1, 2, 4, 8]).isdisjoint([2, 19])
True
Returns True, since 2 is contained in both lists.
I have also found this useful answer that shows how to compare each row of a dataframe with each other. The answer applies a custom function to each row of the dataframe using a lambda.
df.apply(lambda row: func(row['col1']), axis=1)
However I'm not sure how to put this two things together, how to create the func method. Also I don't know if this approach is even feasible since the resulting rows will probably be less than the ones of the original dataframe.
Thanks!
You can use networkx and graphs for that:
import networkx as nx
G = nx.Graph([edge for nodes in df['col1'] for edge in zip(nodes, nodes[1:])])
result = pd.Series(nx.connected_components(G))
This is basically treating every number as a node, and whenever two number are in the same list then you connect them. Finally you find the connected components.
Output:
0 {1, 2, 4, 8, 10, 19}
1 {16, 17, 15}
2 {18, 3}
This is not straightforward. Merging lists has many pitfalls.
One solid approach is to use a specialized library, for example networkx to use a graph approach. You can generate successive edges and find the connected components.
Here is your graph:
You can thus:
generate successive edges with add_edges_from
find the connected_components
craft a dictionary and map the first item of each list
groupby and merge the lists (you could use the connected components directly but I'm giving a pandas solution in case you have more columns to handle)
import networkx as nx
G = nx.Graph()
for l in df['col1']:
G.add_edges_from(zip(l, l[1:]))
groups = {k:v for v,l in enumerate(nx.connected_components(G)) for k in l}
# {1: 0, 2: 0, 4: 0, 8: 0, 10: 0, 19: 0, 16: 1, 17: 1, 15: 1, 18: 2, 3: 2}
out = (df.groupby(df['col1'].str[0].map(groups), as_index=False)
.agg(lambda x: sorted(set().union(*x)))
)
output:
col1
0 [1, 2, 4, 8, 10, 19]
1 [15, 16, 17]
2 [3, 18]
Seems more like a Python problem than pandas one, so here's one attempt that checks every after list, merges (and removes) if intersecting:
vals = d["col1"]
# while there are at least 1 more list after to process...
i = 0
while i < len(vals) - 1:
current = set(vals[i])
# for the next lists...
j = i + 1
while j < len(vals):
# any intersection?
# then update the current and delete the other
other = vals[j]
if current.intersection(other):
current.update(other)
del vals[j]
else:
# no intersection, so keep going for next lists
j += 1
# put back the updated current back, and move on
vals[i] = current
i += 1
at the end, vals is
In [108]: vals
Out[108]: [{1, 2, 4, 8, 10, 19}, {15, 16, 17}, {3, 18}]
In [109]: pd.Series(map(list, vals))
Out[109]:
0 [1, 2, 19, 4, 8, 10]
1 [16, 17, 15]
2 [18, 3]
dtype: object
if you don't want vals modified, can chain .copy() for it.
To add on mozway's answer. It wasn't clear from the question, but I also had rows with single-valued lists. This values aren't clearly added to the graph when calling add_edges_from(zip(l, l[1:]), since l[1:] is empty. I solved it adding a singular node to the graph when encountering emtpy l[1:] lists. I leave the solution in case anyone needs it.
import networkx as nx
import pandas as pd
d = {'col1': [[1, 2, 4, 8], [15, 16, 17], [18, 3], [2, 19], [10, 4], [9]]}
df= pd.DataFrame(data=d)
G = nx.Graph()
for l in df['col1']:
if len(l[1:]) == 0:
G.add_node(l[0])
else:
G.add_edges_from(zip(l, l[1:]))
groups = {k: v for v, l in enumerate(nx.connected_components(G)) for k in l}
out= (df.groupby(df['col1'].str[0].map(groups), as_index=False)
.agg(lambda x: sorted(set().union(*x))))
Result:
col1
0 [1, 2, 4, 8, 10, 19]
1 [15, 16, 17]
2 [3, 18]
3 [9]

Comparisons between an arbitrary number of lists of arbitrary length Python

Given an arbitrary number of lists of integers of arbitrary length, I would like to group the integers into new lists based on a given distance threshold.
Input:
l1 = [1, 3]
l2 = [2, 4, 6, 10]
l3 = [12, 13, 15]
threshold = 2
Output:
[1, 2, 3, 4, 6] # group 1
[10, 12, 13, 15] # group 2
The elements of the groups act as a growing chain so first we have
abs(l1[0] - l2[0]) < threshold #true
so l1[0] and l2[0] are in group 1, and then the next check could be
abs(group[-1] - l1[1]) < threshold #true
so now l1[1] is added to group 1
Is there a clever way to do this without first grouping l1 and l2 and then grouping l3 with that output?
Based on the way that you asked the question, it sounds like you just want a basic python solution for utility, so I'll give you a simple solution.
Instead of treating the lists as all separate entities, it's easiest to just utilize a big cluster of non-duplicate numbers. You can exploit the set property of only containing unique values to go ahead and cluster all of the lists together:
# Throws all contents of lists into a set, converts it back to list, and sorts
elems = sorted(list({*l1, *l2, *l3}))
# elems = [1, 2, 3, 4, 6, 10, 12, 13, 15]
If you had a list of lists that you wanted to perform this on:
lists = [l1, l2, l3]
elems = []
[elems.extend(l) for l in lists]
elems = sorted(list(set(elems)))
# elems = [1, 2, 3, 4, 6, 10, 12, 13, 15]
If you want to keep duplicated:
elems = sorted([*l1, *l2, *l3])
# and respectively
elems = sorted(elems)
From there, you can just do the separation iteratively. Specifically:
Go through the elements one-by-one. If the next element is validly spaced, add it to the list you're building on.
When an invalidly-spaced element is encountered, create a new list containing that element, and start appending to the new list instead.
This can be done as follows (note, -1'th index refers to last element):
out = [[elems[0]]]
thresh = 2
for el in elems[1:]:
if el - out[-1][-1] <= thresh:
out[-1].append(el)
else:
out.append([el])
# out = [[1, 2, 3, 4, 6], [10, 12, 13, 15]]

How to calculate average of sub-lists based on Timestamp values

I have 2 lists. The 1st list is the Timestamps (sec) when data was measured. And 2nd list contains data.
I want to calculate the average of data every 10 sec. Note that timestamp between 2 consecutive data points is not fixed.
Example:
Timestamp = [2, 5, 8, 11, 18, 23, 25, 28]
Data = [1, 2, 3, 4, 5, 6, 7, 8]
And the expected output is supposed to be:
Output = [average of [1,2,3] , average of [4,5] , average of [6,7,8]]
I was wondering if there is any built-in function in Python like average-if analysis to do it automatically.
Thank you for your help.
You can use for that the math function floor with defaultdict as
from collections import defaultdict
from math import floor
timestamp = [2, 5, 8, 11, 18, 23, 25, 28]
data = [1, 2, 3, 4, 5, 6, 7, 8]
average_dc= defaultdict(list)
for t, d in sorted(zip(timestamp, data), key=lambda x : x[0]):
average_dc[math.floor(t / 10)].append(d)
averages = [sum(i)/ len(i) for i in average_dc.values()]
Output
[2.0, 4.5, 7.0]
The sorted(zip(timestamp, data), key=lambda x : x[0]) will concat the timestamp value with the value from data on the same index and then the for loop will insert to average_dc the relevant data value base on the related timestamp value.
In the last line, the list comprehension will iterate over each list in the average_dc and will calculate the average of it.

Reusable code to iterate along different array dimensions

Say, I have an N dimensional array my_array[D1][D2]...[DN]
For a certain application, like sensitivity analysis, I need to fix a point p=(d1, d2, ..., dN) and iterate along each dimension at a time.
The resulting behavior is
for x1 in range(0, D1):
do_something(my_array[x1][d2][d3]...[dN])
for x2 in range(0, D2):
do_something(my_array[d1][x2][d3]...[dN])
.
.
.
for xN in range(0, DN):
do_something(my_array[d1][d2][d3]...[xN])
As you can see, there are many duplicated code here. How can I reduce the work and write some elegant code instead?
For example, is there any approach to the generation of code similar to the below?
for d in range(0, N):
iterate along the (d+1)th dimension of my_array, denoting the element as x:
do_something(x)
You can use numpy.take and do something like the following. Go through the documentation for reference.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html
N = len(my_array)
for i in range(N):
n = len(my_array(i))
indices = p
indices[i] = x[i]
for j in range(n):
do_something(np.take(my_array,indices))
I don't understand what are d1 d2 d3, but I guess you can do something like this:
def get_list_item_by_indexes_list(in_list, indexes_list):
if len(indexes_list) <= 1:
return in_list[indexes_list[0]]
else:
return get_list_item_by_indexes_list(in_list[indexes_list[0]], indexes_list[1:])
def do_to_each_dimension(multi_list, func, dimensions_lens):
d0_to_dN_list = [l - 1 for l in dimensions_lens] # I dont know what is it
for dimension_index in range(0, len(dimensions_lens)):
dimension_len = dimensions_lens[dimension_index]
for x in range(0, dimension_len):
curr_d0_to_dN_list = d0_to_dN_list.copy()
curr_d0_to_dN_list[dimension_index] = x
func(get_list_item_by_indexes_list(multi_list, curr_d0_to_dN_list))
def do_something(n):
print(n)
dimensions_lens = [3, 5]
my_array = [
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]
]
do_to_each_dimension(my_array, do_something, dimensions_lens)
Output:
5 10 15 11 12 13 14 15
This code iterates through the last column and the last row of a 2d array.
Now, to iterate through the last line of each dimension of 3d array:
dimensions_lens = [2, 4, 3]
my_array = [
[
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
],
[
[13, 14, 15],
[16, 17, 18],
[19, 20, 21],
[22, 23, 24]
],
]
do_to_each_dimension(my_array, do_something, dimensions_lens)
Output:
12 24 15 18 21 24 22 23 24
(Note: don't use zero-length dimensions with this code)
You could mess with the string representation of your array access (my_arr[d1][d2]...[dN]) and eval that afterwards to get the values you want. This is fairly "hacky", but it will work on arrays with arbitrary dimensions and allows you to supply the indices as a list while handling the nested array access under the hood, allowing for a clean double for loop .
def access_at(arr, point):
# build 'arr[p1][p2]...[pN]'
access_str = 'arr' + ''.join([f'[{p}]' for p in point])
return eval(access_str)
Using this access method is pretty straight forward:
p = [p1, ..., pN]
D = [D1, ..., DN]
for i in range(N):
# deep copy p
pt = p[:]
for x in range(D[i]):
pt[i] = x
do_something(access_at(my_array, pt))

Finding the Maximum Route in a given input [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have this as a homework and i need to do it in python.
Problem:
The Maximum Route is defined as the maximum total by traversing from the tip of the triangle to its base. Here the maximum route is (3+7+4+9) 23.
3
7 4
2 4 6
8 5 9 3
Now, given a certain triangle, my task is to find the Maximum Route for it.
Not sure how to do it....
We can solve this problem using backtracking. To do that for each element of the triangle in any given row, we have to determine the maximum of sum of the current element and the three connected neighbors in the next row, or
if elem = triangle[row][col] and the next row is triangle[row+1]
then backtrack_elem = max([elem + i for i in connected_neighbors of col in row])
First try to find a way to determine connected_neighbors of col in row
for an elem in position (row,col), connected neighbor in row = next would be [next[col-1],next[col],next[col+1]] provided col - 1 >=0 and col+1 < len(next). Here is am sample implementation
>>> def neigh(n,sz):
return [i for i in (n-1,n,n+1) if 0<=i<sz]
This will return the index of the connected neighbors.
now we can write backtrack_elem = max([elem + i for i in connected_neighbors of col in row]) as
triangle[row][i] = max([elem + next[n] for n in neigh(i,len(next))])
and if we iterate the triangle rowwise and curr is any given row then and i is the ith col index of the row then we can write
curr[i]=max(next[n]+e for n in neigh(i,len(next)))
now we have to iterate the triangle reading the current and the next row together. This can be done as
for (curr,next) in zip(triangle[-2::-1],triangle[::-1]):
and then we use enumerate to generate a tuple of index and the elem itself
for (i,e) in enumerate(curr):
Clubbing then together we have
>>> for (curr,next) in zip(triangle[-2::-1],triangle[::-1]):
for (i,e) in enumerate(curr):
curr[i]=max(next[n]+e for n in neigh(i,len(next)))
But the above operation is destructive and we have to create a copy of the original triangle and work on it
route = triangle # This will not work, because in python copy is done by reference
route = triangle[:] #This will also not work, because triangle is a list of list
#and individual list would be copied with reference
So we have to use the deepcopy module
import copy
route = copy.deepcopy(triangle) #This will work
and rewrite out traverse as
>>> for (curr,next) in zip(route[-2::-1],route[::-1]):
for (i,e) in enumerate(curr):
curr[i]=max(next[n]+e for n in neigh(i,len(next)))
We end up with another triangle where every elem gives the highest route cost possible. To get the actual route, we have to use the original triangle and calculate backward
so for an elem at index [row,col], the highest route cost is route[row][col]. If it follows the max route, then the next elem should be a connected neighbor and the route cost should be route[row][col] - orig[row][col]. If we iterate row wise we can write as
i=[x for x in neigh(next,i) if x == curr[i]-orig[i]][0]
orig[i]
and we should loop downwards starting from the peak element. Thus we have
>>> for (curr,next,orig) in zip(route,route[1:],triangle):
print orig[i],
i=[x for x in neigh(i,len(next)) if next[x] == curr[i]-orig[i]][0]
Lets take a bit complex example, as yours is too trivial to solve
>>> triangle=[
[3],
[7, 4],
[2, 4, 6],
[8, 5, 9, 3],
[15,10,2, 7, 8]
]
>>> route=copy.deepcopy(triangle) # Create a Copy
Generating the Route
>>> for (curr,next) in zip(route[-2::-1],route[::-1]):
for (i,e) in enumerate(curr):
curr[i]=max(next[n]+e for n in neigh(i,len(next)))
>>> route
[[37], [34, 31], [25, 27, 26], [23, 20, 19, 11], [15, 10, 2, 7, 8]]
and finally we calculate the route
>>> def enroute(triangle):
route=copy.deepcopy(triangle) # Create a Copy
# Generating the Route
for (curr,next) in zip(route[-2::-1],route[::-1]): #Read the curr and next row
for (i,e) in enumerate(curr):
#Backtrack calculation
curr[i]=max(next[n]+e for n in neigh(i,len(next)))
path=[] #Start with the peak elem
for (curr,next,orig) in zip(route,route[1:],triangle): #Read the curr, next and orig row
path.append(orig[i])
i=[x for x in neigh(i,len(next)) if next[x] == curr[i]-orig[i]][0]
path.append(triangle[-1][i]) #Don't forget the last row which
return (route[0],path)
To Test our triangle we have
>>> enroute(triangle)
([37], [3, 7, 4, 8, 15])
Reading a comment by jamylak, I realized this problem is similar to Euler 18 but the difference is the representation. The problem in Euler 18 considers a pyramid where as the problem in this question is of a right angle triangle. As you can read my reply to his comment I explained the reason why the results would be different. Nevertheless, this problem can be easily ported to work with Euler 18. Here is the port
>>> def enroute(triangle,neigh=lambda n,sz:[i for i in (n-1,n,n+1) if 0<=i<sz]):
route=copy.deepcopy(triangle) # Create a Copy
# Generating the Route
for (curr,next) in zip(route[-2::-1],route[::-1]): #Read the curr and next row
for (i,e) in enumerate(curr):
#Backtrack calculation
curr[i]=max(next[n]+e for n in neigh(i,len(next)))
path=[] #Start with the peak elem
for (curr,next,orig) in zip(route,route[1:],triangle): #Read the curr, next and orig row
path.append(orig[i])
i=[x for x in neigh(i,len(next)) if next[x] == curr[i]-orig[i]][0]
path.append(triangle[-1][i]) #Don't forget the last row which
return (route[0],path)
>>> enroute(t1) # For Right angle triangle
([1116], [75, 64, 82, 87, 82, 75, 77, 65, 41, 72, 71, 70, 91, 66, 98])
>>> enroute(t1,neigh=lambda n,sz:[i for i in (n,n+1) if i<sz]) # For a Pyramid
([1074], [75, 64, 82, 87, 82, 75, 73, 28, 83, 32, 91, 78, 58, 73, 93])
>>>
Even though this is homework, #abhijit gave an answer so i will too!
To understand this you will need to read up on python generators, might need to google it ;)
>>> triangle=[
[3],
[7, 4],
[2, 4, 6],
[8, 5, 9, 3]
]
The first step is to find all possible routes
>>> def routes(rows,current_row=0,start=0):
for i,num in enumerate(rows[current_row]): #gets the index and number of each number in the row
if abs(i-start) > 1: # Checks if it is within 1 number radius, if not it skips this one. Use if not (0 <= (i-start) < 2) to check in pyramid
continue
if current_row == len(rows) - 1: # We are iterating through the last row so simply yield the number as it has no children
yield [num]
else:
for child in routes(rows,current_row+1,i): #This is not the last row so get all children of this number and yield them
yield [num] + child
This gives
>>> list(routes(triangle))
[[3, 7, 2, 8], [3, 7, 2, 5], [3, 7, 4, 8], [3, 7, 4, 5], [3, 7, 4, 9], [3, 4, 2, 8], [3, 4, 2, 5], [3, 4, 4, 8], [3, 4, 4, 5], [3, 4, 4, 9], [3, 4, 6, 5], [3, 4, 6, 9], [3, 4, 6, 3]]
To get the max is simple now, max accepts generators as they are iterables so we don't need to convert it into a list.
>>> max(routes(triangle),key=sum)
[3, 7, 4, 9]
I will give you some hints on this specific case. Try to create a generalized function for a n-floors triangle yourself.
triangle=[
[3],
[7, 4],
[2, 4, 6],
[8, 5, 9, 3]
]
possible_roads={}
for i1 in range(1):
for i2 in range(max(i1-1,0),i1+2):
for i3 in range(max(i2-1,0),i2+2):
for i4 in range(max(i3-1,0),i3+2):
road=(triangle[0][i1],triangle[1][i2],triangle[2][i3],triangle[3][i4])
possible_roads[road]=sum(road)
print "Best road: %s (sum: %s)" % (max(possible_roads), possible_roads[max(possible_roads)])
[EDIT] Since everyone posted their answers here is mine.
triangle=[
[3],
[7, 4],
[2, 4, 6],
[8, 5, 9, 3]
]
def generate_backtrack(triangle):
n=len(triangle)
routes=[[{'pos':i,'val':triangle[n-1][i]}] for i in range(n)]
while n!=1:
base_routes=[]
for idx in range(len(routes)):
i=routes[idx][-1]['pos'] #last node
movements=range(
max(0,i-1),
min(i+2,n-1)
)
for movement in movements:
base_routes.append(routes[idx]+[{'pos':movement,'val':triangle[n-2][movement]}])
n-=1
routes=base_routes
return [[k['val'] for k in j] for j in routes]
print sorted(generate_backtrack(triangle),key=sum,reverse=True)[0][::-1]
My answer
def maxpath(listN):
liv = len(listN) -1
return calcl(listN,liv)
def calcl(listN,liv):
if liv == 0:
return listN[0]
listN[liv-1] = [(listN[liv-1][i]+listN[liv][i+1],listN[liv-1][i]+listN[liv][i]) \
[ listN[liv][i] > listN[liv][i+1] ] for i in range(0,liv)]
return calcl(listN,liv-1)
output
l5=[
[3],
[7, 4],
[2, 4, 6],
[8, 5, 9, 3],
[15,10,2, 7, 8]
]
print(maxpath(l5)
>>>[35]

Categories