If column A repeats, sum column 2 - python

I have two columns and if column A repeats I want to sum the values of column B.
A = {1 2 3 3 4 4 4}
B = {1 2 3 4 5 6 7}
the result should look like:
A B
1 1
2 2
3 7
4 18
My code:
for i in range(len(a)):
r= np.sqrt(((x-x[j])**2)+((y-y[j])**2)))
if r <= A[i] <= r-5:
B=np.abs((r-0.007)-b[i])

A1 = [1, 2, 3, 3, 4, 4, 4]
B1 = [1, 2, 3, 4, 5, 6, 7]
A2 = []
B2 = []
for i in range(len(A1)):
if A1[i] != A1[i + 1]:
A2.append(A1[i])
B2.append(B1[i])
else:
j = i + 1
sum = B1[i]
while j < len(A1) and A1[i] == A1[j]:
sum += B1[j]
del A1[j]
del B1[j]
A2.append(A1[i])
B2.append(sum)
if j >= len(A1):
break
print A2
print B2
output is:
[1, 2, 3, 4]
[1, 2, 7, 18]

I think the easiest way would be using the following algorithm:
def create_buckets(l):
return [0]*(max(l)+1)
def fill_buckets(A, B):
buckets = create_buckets(A)
for i in range(len(A)):
buckets[A[i]] += B[i]
return buckets
A = [1, 2, 3, 3, 4, 4, 4]
B = [1, 2, 3, 4, 5, 6, 7]
output = fill_buckets(A, B)
for i in range(len(output)):
if output[i] != 0:
print(i, output[i])
We make a list with zero's. It's length is equal to the largest value of A+1. (this way, we have every value of A as an index in the list) These are buckets.
We loop over A. Let's say we get value X on index Y in the loop:
• We check the value on list B with the same index (Y)
• We add that value to the buckets, on index X (value of A on index Y)
We print every value of the buckets that is not zero (or you can make another default value if you want to allow zeros).

I think this is the simplest solution.
A1 = [1, 2, 3, 3, 4, 4, 4]
B1 = [1, 2, 3, 4, 5, 6, 7]
A2 = []
B2 = []
A2.append(A1[0])
B2.append(B1[0])
for i in range(len(A1)-1):
if A1[i] != A1[i+1]:
A2.append(A1[i+1])
B2.append(B1[i+1])
else:
A2.pop()
A2.append(A1[i+1])
b = B2.pop()
B2.append(b+B1[i+1])
print A2
print B2
Output:
A2 = [1,2,3,4]
B2 = [1,2,7,18]

Related

How to get many sums stepping through varying length lists that have specific orders?

My Problem:
I have a list of lists. These lists are varying length e.g. [[2, 1, 5, 3], [2,4,8]
For each item in each list I need to print its sum with the next list item, then the next 2 list items, until I print the sum of all of the list items. Then I move to the second list item and do the same until I have reached the last list item.
The output I need is:
My Desired Output:
2 + 1 = 3
2 + 1 + 5 = 8
2 + 1 + 5 + 3 = 11
1 + 5 = 6
1 + 5 + 3 = 9
5 + 3 = 8
2 + 4 = 6
2 + 4 + 8 = 14
4 + 8 = 12
My (bad) Attempt:
I have tried for hours but have not been able to get close. I was doing something along the lines of the below code but I am wondering if I need to make a recursive function??
for cluster in [[2, 1, 5, 3], [2,4,8]]:
for trip in cluster:
for trip_cluster_index in range(len(cluster)):
if trip != cluster[trip_cluster_index]:
print(cluster, trip, cluster[trip_cluster_index])
O(n^3)
list_sum = [[2, 1, 5, 3], [2,4,8]]
list_out = []
for l in list_sum:
for i in range(1, len(l)):
aux = l[i-1]
for j in range(i, len(l)):
aux += l[j]
list_out.append(aux)
print(list_out)
[3, 8, 11, 6, 9, 8, 6, 14, 12]
O(n^2)
list_sum = [[2, 1, 5, 3], [2,4,8]]
list_out = []
for l in list_sum:
list_1 = []
aux = l[0]
for i in range(1, len(l)):
aux += l[i]
list_1.append(aux)
list_out.extend(list_1)
sum_list = 0
for j in range(0, len(list_1)-1):
sum_list += l[j]
list_2 = [x-sum_list for x in list_1[j+1:]]
list_out.extend(list_2)
print(list_out)
[3, 8, 11, 6, 9, 8, 6, 14, 12]
Inverted O(n^3)
list_sum = [[2, 1, 5, 3], [2,4,8]]
list_out = []
for l in list_sum:
for i in range(0,len(l)-1):
aux = sum(l[i:])
list_out.append(aux)
for j in range(len(l)-1,i+1,-1):
aux -= l[j]
list_out.append(aux)
print(list_out)
[11, 8, 3, 9, 6, 8, 14, 6, 12]
This should give you what you want.
n = -1
listy = [[1,1,1],[2,2,2],[3,3,3]]
for l in listy:
while n < len(listy)-1:
n +=1
total = sum(l) + sum(listy[n])
print(total)
I assumed that your output must contain the whole equations, and this is what I came up with:
L=[[2, 1, 5, 3], [2,4,8]]
for i in L:
for j in range(len(i)):
for k in range(j+2, len(i)+1):
print(' + '.join([str(n) for n in i[j:k]]), '=', sum(i[j:k]))
Hope it is what you were looking for!

Splitting a heap at given key

Given a list: [10, 4, 9, 3, 2, 5, 8, 1, 0]
that has the heap structure of below:
8
9
5
10
2
4
0
3
1
What is a good algorithm in python to get [4,3,2,1,0] which is basically the left child of 10.
parent is (index+1)//2
left child is 2i+1, right child is 2i+2
L = [10, 4, 9, 3, 2, 5, 8, 1, 0]
index = 1
newheap = []
newheap.append(L[index])
leftc = 2 * index + 1
rightc = 2 * index + 2
while(leftc < len(L)):
newheap.append(L[leftc])
if(rightc < len(L)):
newheap.append(L[rightc])
leftc = 2 * leftc + 1
rightc = 2 * rightc + 2
print(newheap)
which outputs
[4,3,2,1]
but I need [4,3,2,1, 0], so not what I wanted. I started the index at 1 which points to 4.
Would recursion be better? Not sure how to go about this.
You can try something like that :
L = [10, 4, 9, 3, 2, 5, 8, 1, 0]
index = 0
offset = 1
newheap = []
while index < len(L):
index += offset
for i in range(offset):
if index+i == len(L):
break
newheap += [L[index+i]]
offset = 2 * offset
print(newheap)

Detect peaks in list of numbers and record their positions

I am trying to create some code that returns the positions and the values of the "peaks" (or local maxima) of a numeric array.
For example, the list arr = [0, 1, 2, 5, 1, 0] has a peak at position 3 with a value of 5 (since arr[3] equals 5).
The first and last elements of the array will not be considered as peaks (in the context of a mathematical function, you don't know what is after and before and therefore, you don't know if it is a peak or not).
def pick_peaks(arr):
print(arr)
posPeaks = {
"pos": [],
"peaks": [],
}
startFound = False
n = 0
while startFound == False:
if arr[n] == arr[n+1]:
n += 1
else:
startFound = True
endFound = False
m = len(arr) - 1
while endFound == False:
if arr[m] == arr[m-1]:
m -= 1
else:
endFound = True
for i in range(n+1, m):
if arr[i] == arr[i-1]:
None
elif arr[i] >= arr[i-1] and arr[i] >= arr[i+1]:
posPeaks["pos"].append(i)
posPeaks["peaks"].append(arr[i])
return posPeaks
My issue is with plateaus. [1, 2, 2, 2, 1] has a peak while [1, 2, 2, 2, 3] does not. When a plateau is a peak, the first position of the plateau is recorded.
Any help is appreciated.
I suggest you use groupby to group contiguous equal values, then for each group store the first position, example for [1, 2, 2, 2, 1] it creates the following list following list of tuples [(1, 0), (2, 1), (1, 4)], putting all together:
from itertools import groupby
def peaks(data):
start = 0
sequence = []
for key, group in groupby(data):
sequence.append((key, start))
start += sum(1 for _ in group)
for (b, bi), (m, mi), (a, ai) in zip(sequence, sequence[1:], sequence[2:]):
if b < m and a < m:
yield m, mi
print(list(peaks([0, 1, 2, 5, 1, 0])))
print(list(peaks([1, 2, 2, 2, 1])))
print(list(peaks([1, 2, 2, 2, 3])))
Output
[(5, 3)]
[(2, 1)]
[]
I know I may be a little late for the party, but I'd like to share my solution using NumPy arrays:
def get_level_peaks(v):
peaks = []
i = 1
while i < v.size-1:
pos_left = i
pos_right = i
while v[pos_left] == v[i] and pos_left > 0:
pos_left -= 1
while v[pos_right] == v[i] and pos_right < v.size-1:
pos_right += 1
is_lower_peak = v[pos_left] > v[i] and v[i] < v[pos_right]
is_upper_peak = v[pos_left] < v[i] and v[i] > v[pos_right]
if is_upper_peak or is_lower_peak:
peaks.append(i)
i = pos_right
peaks = np.array(peaks)
"""
# uncomment this part of the code
# to include first and last positions
first_pos, last_pos = 0, v.size-1
peaks = np.append([first_pos], peaks)
peaks = np.append(peaks, [last_pos])
"""
return peaks
Example 1 (see graph):
v = np.array([7, 2, 0, 4, 4, 6, 6, 9, 5, 5])
p = get_peaks(v)
print(v) # [7 2 0 4 4 6 6 9 5 5]
print(p) # [0 2 7 9] (peak indexes)
print(v[p]) # [7 0 9 5] (peak elements)
Example 2 (see graph):
v = np.array([8, 2, 1, 0, 1, 2, 2, 5, 9, 3])
p = get_peaks(v)
print(v) # [8 2 1 0 1 2 2 5 9 3]
print(p) # [0 3 8 9] (peak indexes)
print(v[p]) # [8 0 9 3] (peak elements)
Example 3 (see graph):
v = np.array([9, 8, 8, 8, 0, 8, 9, 9, 9, 6])
p = get_peaks(v)
print(v) # [9 8 8 8 0 8 9 9 9 6]
print(p) # [0 4 6 9] (peak indexes)
print(v[p]) # [9 0 9 6] (peak elements)
In example 3, we have a flatten upper peak that goes from index 6 to index 8. In this case, the index will always indicate the leftmost position of the plateau. If you want to indicate the middle position or the rightmost position, just change this part of the code:
...
if is_upper_peak or is_lower_peak:
peaks.append(i)
...
to this:
...
# middle position
if is_upper_peak or is_lower_peak:
peaks.append((pos_left + pos_right) // 2)
...
...
# rightmost position
if is_upper_peak or is_lower_peak:
peaks.append(pos_right)
...
This code takes a window number and gives the peak within that window size
l=[1,2,3,4,5,4,3,2,1,2,3,4,3,2,4,2,1,2]
n=int(input("The size of window on either side "))
for i in range(n,len(l)-n):
if max(l[i-n:i]+l[i+1:i+n+1])<l[i]:
print(l[i],' at index = ',i)
You can use the same algorithm with the plateaus as well if you can preprocess the data to remove the repeating numbers and keep only 1 unique number. Thus, you can convert the example [1, 2, 2, 2, 1] to [1, 2, 1] and apply the same algorithm.
Edit:
The Code:
from itertools import groupby
def process_data(data):
return [list(val for num in group) for val, group in groupby(data)]
def peaks(arr):
#print(arr)
posPeaks = {
"pos": [],
"peaks": [],
}
startFound = False
n = 0
while startFound == False:
if arr[n][0] == arr[n+1][0]:
n += 1
else:
startFound = True
endFound = False
m = len(arr) - 1
while endFound == False:
if arr[m][0] == arr[m-1][0]:
m -= 1
else:
endFound = True
for i in range(n+1, m):
if arr[i][0] == arr[i-1][0]:
None
elif arr[i][0] >= arr[i-1][0] and arr[i][0] >= arr[i+1][0]:
pos = sum([len(arr[idx]) for idx in range(i)])
posPeaks["pos"].append(pos) #.append(i)
posPeaks["peaks"].append(arr[i][0])
return posPeaks
print(peaks(process_data([0, 1, 2, 5, 1, 0])))
print(peaks(process_data([1, 2, 2, 2, 1])))
print(peaks(process_data([1, 2, 2, 2, 3])))
Output:
{'pos': [3], 'peaks': [5]}
{'pos': [1], 'peaks': [2]}
{'pos': [], 'peaks': []}
Here is a fairly simple generator function. Just loop and maintain the necessary state: i (last index of of "growth"), up (true if last value change was "growth")
def peaks(ar):
i, up = 0, False
for j in range(1, len(ar)):
prev, val = ar[j-1], ar[j]
if up and val < prev:
yield prev, i
up = False
if val > prev:
i, up = j, True
>>> list(peaks([0,1,2,5,1,0]))
[(5, 3)]
>>> list(peaks([0,1,2,5,1,2,0]))
[(5, 3), (2, 5)]
>>> list(peaks([0,1,2,5,1,2,0,3]))
[(5, 3), (2, 5)]
>>> list(peaks([1,2,2,2,1]))
[(2, 1)]
>>> list(peaks([1,2,2,2,3]))
[]
A shorter script could be:
data_array = [1, 2, 5, 4, 6, 9]
# Delete the first and the last element of the data array.
reduced_array = [ data_array[i] for i in range(1, len(data_array)-1) ]
# Find the maximum value of the modified array
peak_value = max(reduced_array)
# Print out the maximum value and its index in the data array.
print 'The peak value is: ' + str(peak_value)
print 'And its position is: ' + str(data_array.index(peak_value))
Output:
The peak value is: 6
And its position is: 4

Generate random array of integers with a number of appearance of each integer

I need to create a random array of 6 integers between 1 and 5 in Python but I also have another data say a=[2 2 3 1 2] which can be considered as the capacity. It means 1 can occur no more than 2 times or 3 can occur no more than 3 times.
I need to set up a counter for each integer from 1 to 5 to make sure each integer is not generated by the random function more than a[i].
Here is the initial array I created in python but I need to find out how I can make sure about the condition I described above. For example, I don't need a solution like [2 1 5 4 5 4] where 4 is shown twice or [2 2 2 2 1 2].
solution = np.array([np.random.randint(1,6) for i in range(6)])
Even if I can add probability, that should work. Any help is appreciated on this.
You can create an pool of data that have the most counts and then pick from there:
import numpy as np
a = [2, 2, 3, 1, 2]
data = [i + 1 for i, e in enumerate(a) for _ in range(e)]
print(data)
result = np.random.choice(data, 6, replace=False)
print(result)
Output
[1, 1, 2, 2, 3, 3, 3, 4, 5, 5]
[1 3 2 2 3 1]
Note that data is array that has for each element the specified count, then we pick randomly from data this way we ensure that you won't have more elements that the specify count.
UPDATE
If you need that each number appears at least one time, you can start with a list of each of the numbers, sample from the rest and then shuffle:
import numpy as np
result = [1, 2, 3, 4, 5]
a = [1, 1, 2, 0, 1]
data = [i + 1 for i, e in enumerate(a) for _ in range(e)]
print(data)
result = result + np.random.choice(data, 1, replace=False).tolist()
np.random.shuffle(result)
print(result)
Output
[1, 2, 3, 3, 5]
[3, 4, 2, 5, 1, 2]
Notice that I subtract 1 from each of the original values of a, also the original 6 was change to 1 because you already have 5 numbers in the variable result.
You could test your count against a dictionary
import random
a = [2, 2, 3, 1, 2]
d = {idx: item for idx,item in enumerate(a, start = 1)}
l = []
while len(set(l) ^ set([*range(1, 6)])) > 0:
l = []
while len(l) != 6:
x = random.randint(1,5)
while l.count(x) == d[x]:
x = random.randint(1,5)
l.append(x)
print(l)

Python equivalent of R "split"-function

In R, you could split a vector according to the factors of another vector:
> a <- 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> b <- rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> split(a,b)
$`1`
[1] 1 3 5 7 9
$`2`
[1] 2 4 6 8 10
Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).
Is there anything handy in python like that, except from the itertools.groupby approach?
From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)
>>> a = range(1, 11)
>>> b = [0,1] * 5
>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])
Then you can use itertools.compress:
def split(x, f):
return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
If you need more general input (multiple numbers), something like the following will return an n-tuple:
def split(x, f):
count = max(f) + 1
return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )
>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])
Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.
Here's one way with itertools.
import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]
{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}
This gives you a dictionary, which is analogous to the named list that you get from R's split.
As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
from collections import defaultdict
def split(x, f):
res = defaultdict(list)
for v, k in zip(x, f):
res[k].append(v)
return res
>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})
You could try:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]
results in:
In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]
In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]
To make this generalise you can simply iterate over the unique elements in b:
splits = {}
for index in set(b):
splits[index] = [a[k] for k in (i for i,j in enumerate(b) if j == index)]

Categories