summing a column in python

summing a column in python - python

I have a list of lists which consists of numerical data, a kind of matrix.
I'd like to create a function to sum up any column I later choose (1+2+9+10=?, 3+4+11+12=?, etc.)
The restraints are that I want to accomplish that by using for loops, and old-school python, no numpy, preferably without the zip function.
outside the loop I'd like to calculate an average within every column.
What would be the simplest way to accomplish that ?
Here's what I came up with thus far:
data = [[1, 3, 5, 7], [2, 4, 6, 8], [9, 11, 13, 15], [10, 12, 14, 16]]
def calc_avg(data, column):
total = 0
for row in data:
total += ....
avg = total / len(calc_avg)
later on, I would print the average for the column I choose.

Introduce a variable nr to keep count of number of rows added as you loop.
def calc_avg(data, column):
total = 0
nr = 0
for row in data:
nr += 1
total += row[column]
return total / nr

You'd probably need some counter to keep track of the "denominator" for your average -
data = [[1, 3, 5, 7], [2, 4, 6, 8], [9, 11, 13, 15], [10, 12, 14, 16]]
def calc_avg(data, column):
total = 0
counter = 0
for row in data:
total += row[column]
counter += 1
avg = total / counter
return avg

You can write a simple function to collect all column values and perform a math op.
Eg.
def get_sum_avg(chosen_column, dataset):
# filter the column values. Ignore rows with no such col
chosen_column_values = [element[chosen_column - 1] for element in dataset if len(element) >= chosen_column]
# find sum
col_sum = sum(chosen_column_values)
# find avg
average = col_sum / len(chosen_column_values) if len(chosen_column_values) > 0 else 0
return col_sum, average
data = [[1, 3, 5, 7], [2, 4, 6, 8], [9, 11, 13, 15], [10, 12, 14, 16]]
print(get_sum_avg(1, data))

Related

Python | Nested loops | Sum of Sublists

My goal with this is to generate the sum of each sublist separately using nested loops
This is my list called sales_data
sales_data = [[12, 17, 22], [2, 10, 3], [5, 12, 13]]
The sublist can be represented by any variable but for the purpose of this exercise, we will call it scoops_sold, which I have set to 0
scoops_sold = 0
So far I am able to run the nested loop as follows
sales_data = [[12, 17, 22], [2, 10, 3], [5, 12, 13]]
scoops_sold = 0
for location in sales_data:
for element in location:
scoops_sold += element
print(scoops_sold)
This gives me 96 as the result
What I essentially want to accomplish is to return the sum of each sublist but I am not sure I might be able to do that. I thought about using slicing but that was not effective

You can easily solve it using sum():
print(list(map(sum, sales_data)))
If you insist on loops:
sums = []
for location in sales_data:
sold = 0
for element in location:
sold += element
sums.append(sold)
print(sums)

How about
[sum(sub_list) for sub_list in sales_data]
# [51, 15, 30]
However, the question is a bit confusing because you are setting scoops_sold to 0, an int, when the result you describe is a list of int.

If you want to have the sum of all subsets, you might want to use a list to store each subsets' sum, and by using the built-in python function sum, you just need to use one loop:
scoops_sold = []
for sale_data in sales_data:
salscoops_solds_sum.append(sum(sale_data))
The same result can be achieved in one line by using list comprehensions:
scoops_sold = [sum(sale_data) for sale_data in sales_data]

sales_data = [[12, 17, 22], [2, 10, 3], [5, 12, 13]]
scoops_sold = 0
for location in sales_data:
print(location)
for element in location:
scoops_sold += element
print(scoops_sold)

Inserting a value to list according to a threshold value

I have list a = [1,2,3,6,8,12,13,18,33,23] and list b=[] that is empty. I need each value in list a compare with all the values in the list b by taking the difference of the new value from list a with all the contents of the list b. If the difference is grater than to the value of the threshold, it must insert to list b rather than skip to the next value in a, how can do that?
a =[1,2,3,6,8,12,13,18,33,23]
b=[]
b.append(a[0])
for index in range(len(a)):
for i in range(len(b)):
x = a[index] - b[i]
if x > 1:
b.append(a[index])
print("\nOutput list is")
for v in range(len(b)):
print(b[v])
The desired output is:
output = [1,6,8,12,18,33,23]
To further clarify, in first time the list b have the first item from list a. I need to check if the a[0]-b[0]>1, then insert the value of a[0] in b list, and next if a[1] - b[0]>1 then insert the a[1] in b list , and if [[a[2] -b[0] >1] and [a[2]-b[1] > 1]] then insert a[2] in b list and so on

Here is the probable solution to the stated problem though the output is not matching with your desired outcome. But sharing on the basis of how I understood the problem.
a = [1, 2, 3, 6, 8, 12, 13, 18, 33, 23]
b = []
b.append(a[0])
threshold = 1 # Set Threshold value
for index in range(len(a)):
difference = 0
for i in range(len(b)):
difference = abs(a[index] - b[i])
if difference > threshold:
continue # Keep comparing other values in list b
else:
break # No need for further comparison
if difference > threshold:
b.append(a[index])
print("\nOutput list is")
print(b)
Output is:
Output list is
[1, 3, 6, 8, 12, 18, 33]
Also, I notice that after swapping the last two elements (33 <-> 23 ) of the list a as below:
a = [1, 2, 3, 6, 8, 12, 13, 18, 23, 33]
and running the same code. the output was near to your desired output:
Output list is
[1, 3, 6, 8, 12, 18, 23, 33]
This problem is very interesting now as I put myself into more investigation. And I found it a very interesting. Let me explain. First consider the list a as a list of integer numbers starting from 1 to N. For example:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
and set the threshold to 1
threshold = 1 # Set Threshold value
Now, run the programme with threshold = 1 and you will get the output:
Output list is
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
and if you rerun with threshold = 2, you will get the following output:
threshold = 2
Output list is
[1, 4, 7, 10, 13, 16, 19]
Basically, this programme is basically generating a hopping series of integer numbers where hopping is set to the threshold value.
Interesting!!! Isn't it???

Python array logic

I am trying to create a list of lists with the input of m and n, where m is the number of lists within the main list and n is the number of elements within each given list. The grid should contain the integers from start to start + rows * cols - 1 and be ascending. But, every odd numbered row should be descending instead.
The code I've written is returning the expected results, but my automated tester is saying it's incorrect. Maybe my logic is messed up somewhere?
inputs:
start = 1, m = 3, n = 5
expected:
[[1,2,3,4,5],[10,9,8,7,6],[11,12,13,14,15]]
result = []
mylist = []
start = 1
for x in range(0, rows):
for x in range(0, cols):
result.append(start)
start += 1
for y in range(0, rows):
if y%2 != 0:
mylist.append(result[cols - 1::-1])
del result[cols - 1::-1]
else:
mylist.append(result[0:cols])
del result[0:cols]
return mylist

One possible solution, using itertools.count:
from itertools import count
def build(m, n, start=1):
lst, c = [], count(start)
for i in range(m):
lst.append([next(c) for j in range(n)][::-1] if i % 2 else [next(c) for j in range(n)])
return lst
print(build(3, 5, 1))
Prints:
[[1, 2, 3, 4, 5], [10, 9, 8, 7, 6], [11, 12, 13, 14, 15]]
print(build(3, 0, 1))
Prints:
[[], [], []]

just generate the list of numbers you need which will be n * m, in your case that would generate 0 to 14 in the python range function. However as we want to start at ` then we need to add the start offset too the range end.
Now we can generate all the numbers we need we just need to think about how to create them.
well we can add numbers to the list until the list reaches the size of n, then we need to start a new list, However if the list we just finished is an even numbered row then we need to reverse that list.
def build_lists(m, n, start=1):
data =[[]]
for i in range(start, n * m + start):
if len(data[-1]) < n:
data[-1].append(i)
else:
if len(data) % 2 == 0:
data[-1] = data[-1][::-1]
data.append([i])
if len(data) % 2 == 0:
data[-1] = data[-1][::-1]
return data
print(build_lists(3, 5))
print(build_lists(6, 3))
print(build_lists(6, 2, 100))
OUTPUT
[[1, 2, 3, 4, 5], [10, 9, 8, 7, 6], [11, 12, 13, 14, 15]]
[[1, 2, 3], [6, 5, 4], [7, 8, 9], [12, 11, 10], [13, 14, 15], [18, 17, 16]]
[[100, 101], [103, 102], [104, 105], [107, 106], [108, 109], [111, 110]]

Multiprocessing Pooling Fails at Dask Functions

I am trying to take two arrays, "day 1": ranging from 0 to 11 (incremented by +1) and "day 2:" ranging from 11 to 0 (incremented by -1), and sum them. However, I wish to use multiprocessing and dask arrays to speed up the process (I will be going to bigger numbers later). I want to split day 1 and day 2 into four equal parts (day 1: [0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11] and day 2: [11, 10, 9], [8, 7, 6], [5, 4, 3], [2, 1, 0]) and have four processes to add work on each consequent array (i.e., day1's [0, 1, 2] with day 2's [11, 10, 9] and get [11, 11, 11]. After all four processes are done, I hope to return back into one big list of [11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11]. However, within the function of the bolded step, the code fails to run and is stuck in an infinite loop or calculations of some sort.
Code:
import numpy as np
import dask.array as da
from dask import delayed
import threading
import multiprocessing as mp
NUM_WORKERS = 4
# create list from 0 to 11
day1 = list(range(12))
# create list form 11 to 0
day2 = day1[::-1]
def get_sum(i, base):
z = []
x = day1[i * length: i * length + length]
y = day2[i * length: i * length + length]
z.append(x)
z.append(y)
converted = da.from_array(z, chunks = NUM_WORKERS)
**summed = da.sum(converted, axis = 0).compute()**
list_concatenate = np.concatenate((base, summed), axis=0)
all_sum = sum(list_concatenate)
process_list = []
for i in range(NUM_WORKERS):
process_list = mp.Process(target = get_sum, args = (i, process_list))
process_list.start()
process_list.join()

Python list slicing

I'm not able understand what to do here. Can someone help.
I've a few lists:
array = [7,8,2,3,4,10,5,6,7,10,8,9,10,4,5,12,13,14,1,2,15,16,17]
slice = [2, 4, 6, 8, 10, 12, 15, 17, 20, 22]
intervals = [12, 17, 22]
output = []
intermediate = []
slice is a list of indices I need to get from slicing array. interval is a list of indices used to stop the slicing when slice[i] is interval[j] where i and j are looping variables.
I need to form a list of lists from array based on slice and intervals based on the condition that when slice[i] is not interval[j]
intermediate =intermediate + array[slice[i]:slice[i+1]+1]
here in my case:
when slice[i] and interval[j] are equal for value 12. So I need to form a list of lists from array
intermediate = array[slice[0]:slice[0+1]+1] + array[slice[2]:slice[2+1]+1] + array[slice[4]:slice[4+1]+1]
which is
intermediate = array[2:(4+1)] + array[6:(8+1)] + array[10:(12+1)]
and when slice[i] is interval[j] output = output + intermediate and the slicing is continued.
output = output + [intermediate]
which is
output = output + [array[2:(4+1)] + array[6:(8+1)] + array[10:(12+1)]]
now the next value in interval is 17 so till we have 17 in slice we form another list from array[slice[6]:slice[6+1]+1] and add this to the output. This continues.
The final output should be:
output = [array[slice[0]:slice[0+1]+1] + array[slice[2]:slice[2+1]+1] + array[slice[4]:slice[4+1]+1] , array[slice[6]:slice[6+1]+1], array[slice[8]:slice[8+1]+1]]
which is
output = [[2, 3, 4, 5, 6, 7, 8, 9, 10], [12, 13, 14], [15, 16, 17]]

A straightforward solution:
array_ = [7,8,2,3,4,10,5,6,7,10,8,9,10,4,5,12,13,14,1,2,15,16,17]
slice_ = [2, 4, 6, 8, 10, 12, 15, 17, 20, 22]
intervals = [12, 17, 22]
output = []
intermediate = []
for i in range(0, len(slice_), 2):
intermediate.extend(array_[slice_[i]:slice_[i+1]+1])
if slice_[i+1] in intervals:
output.append(intermediate)
intermediate = []
print output
# [[2, 3, 4, 5, 6, 7, 8, 9, 10], [12, 13, 14], [15, 16, 17]]
I have changed some variable names to avoid conflicts.
On large data, you may convert intervals to a set.

Here is a recursive solution which goes through the index once and dynamically check if the index is within the intervals and append the sliced results to a list accordingly:
def slicing(array, index, stops, sliced):
# if the length of index is smaller than two, stop
if len(index) < 2:
return
# if the first element of the index in the intervals, create a new list in the result
# accordingly and move one index forward
elif index[0] in stops:
if len(index) >= 3:
sliced += [[]]
slicing(array, index[1:], stops, sliced)
# if the second element of the index is in the intervals, append the slice to the last
# element of the list, create a new sublist and move two indexes forward accordingly
elif index[1] in stops:
sliced[-1] += array[index[0]:(index[1]+1)]
if len(index) >= 4:
sliced += [[]]
slicing(array, index[2:], stops, sliced)
# append the new slice to the last element of the result list and move two index
# forward if none of the above conditions satisfied:
else:
sliced[-1] += array[index[0]:(index[1]+1)]
slicing(array, index[2:], stops, sliced)
sliced = [[]]
slicing(array, slice_, intervals, sliced)
sliced
# [[2, 3, 4, 5, 6, 7, 8, 9, 10], [12, 13, 14], [15, 16, 17]]
Data:
array = [7,8,2,3,4,10,5,6,7,10,8,9,10,4,5,12,13,14,1,2,15,16,17]
slice_ = [2, 4, 6, 8, 10, 12, 15, 17, 20, 22]
intervals = [12, 17, 22]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

summing a column in python - python

Introduce a variable nr to keep count of number of rows added as you loop. def calc_avg(data, column): total = 0 nr = 0 for row in data: nr += 1 total += row[column] return total / nr

You'd probably need some counter to keep track of the "denominator" for your average - data = [[1, 3, 5, 7], [2, 4, 6, 8], [9, 11, 13, 15], [10, 12, 14, 16]] def calc_avg(data, column): total = 0 counter = 0 for row in data: total += row[column] counter += 1 avg = total / counter return avg

Related

Python | Nested loops | Sum of Sublists

Inserting a value to list according to a threshold value

Python array logic

Multiprocessing Pooling Fails at Dask Functions

Python list slicing

Categories

Resources