Single line chunk re-assignment - python

As shown in the following code, I have a chunk list x and the full list h. I want to reassign back the values stored in x in the correct positions of h.
index = 0
for t1 in range(lbp, ubp):
h[4 + t1] = x[index]
index = index + 1
Does anyone know how to write it in a single line/expression?
Disclaimer: This is part of a bigger project and I simplified the questions as much as possible. You can expect the matrix sizes to be correct but if you think I am missing something please ask for it. For testing you can use the following variable values:
h = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x = [20, 21]
lbp = 2
ubp = 4

You can use slice assignment to expand on the left-hand side and assign your x list directly to the indices of h, e.g.:
h = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x = [20, 21]
lbp = 2
ubp = 4
h[4 + lbp:4 + ubp] = x # or better yet h[4 + lbp:4 + lbp + len(x)] = x
print(h)
# [1, 2, 3, 4, 5, 6, 20, 21, 9, 10]
I'm not really sure why are you adding 4 to the indexes in your loop nor what lbp and ubp are supposed to mean, tho. Keep in mind that when you select a range like this, the list you're assigning to the range has to be of the same length as the range.

Related

Applying a mask to a dataframe, but only over a certain range inside the dataframe

I currently have some code that uses a mask to calculate the mean of values that are overloads, and values that are baseline values. It does this over the entire length of the dataframe. However, now I want to only apply this to a certain range in the dataframe column, between first and last values (ie, a specified region in the column, dictated by user input). Here is my code as it stands:
mask_number = 5
no_overload_cycles = 1
hyst = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9, 7, 5, 3, 6, 3, 2 ,1, 5, 2]})
list_test = []
for i in range(0,len(hyst)-1,mask_number):
for x in range(no_overload_cycles):
list_test.append(i+x)
mask = np.array(list_test)
print(mask)
[0 1 5 10 15 20]
first = 4
last = 17
regression_area = hyst.iloc[first:last]
mean_range_overload = regression_area.loc[np.where(mask == regression area.index)]['test'].mean()
mean_range_baseline = regression_area.drop(mask[first:last])['test'].mean()
So the overload mean would be be cycles, 5, 10, and 15 in test, and the baseline mean would be from positions 4 to 17, excluding 5, 10 and 15. This would be my expected output from this:
print (mean_range_overload)
4
print(mean_range_baseline)
4.545454
However, the no_overload_cycles value can change, and may for example, be 3, which would then create a mask of this:
mask_number = 5
no_overload_cycles = 3
hyst = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9, 7, 5, 3, 6, 3, 2 ,1, 5, 2]})
list_test = []
for i in range(0,len(hyst)-1,mask_number):
for x in range(no_overload_cycles):
list_test.append(i+x)
mask = np.array(list_test)
print(mask)
[0 1 2 5 6 7 10 11 12 15 16 17 20]
So the mean_range_overload would be mean of the values at 5,6,7,10,11,12,15,16,17, and the mean_range_baseline would be the values inbetween these, in the range of first and last in the dataframe column.
Any help on this would be greatly appreciated!
Assuming no_overload_cycles == 1 always, you can simply use slice objects to index the DataFrame.
Say you wish to, in your example, specifically pick cycles 5, 10 and 15 and use them as overload. Then you can get them by doing df.loc[5:15:5].
On the other hand, if you wish to pick the 5th, 10th and 15th cycles from the range you selected, you can get them by doing df.iloc[5:15+1:5] (iloc does not include the right index, so we add one). No loops required.
As mentioned in the comments, your question is slightly confusing, and it'd be helpful if you gave a better description and some expected results; in general I'd also advise you to decouple the domain-specific part of your problem before asking it in a forum, since not everyone knows what you mean by "overload", "baseline", "cycles" etc. I'm not commenting that since I still don't have enough reputation to do so.
I renamed a few of the variables, so what I called a "mask" is not exactly what you called a mask, but I reckon this is what you were trying to make:
mask_length = 5
overload_cycles_per_mask = 3
df = pd.DataFrame({"test": [12, 4, 5, 4, 1, 3, 2, 5, 10, 9, 7, 5, 3, 6, 3, 2 ,1, 5, 2]})
selected_range = (4, 17)
overload_indices = []
baseline_indices = []
# `range` does not include the right hand side so we add one
# ideally you would specify the range as (4, 18) instead
for i in range(selected_range[0], selected_range[1]+1):
if i % mask_length < overload_cycles_per_mask:
overload_indices.append(i)
else:
baseline_indices.append(i)
print(overload_indices)
print(df.iloc[overload_indices].test.mean())
print(baseline_indices)
print(df.iloc[baseline_indices].test.mean())
Basically, the DataFrame rows inside selected_range are divided into segments of length mask_length, each of which has their first overload_cycles_per_mask elements marked as overload, and any others, as baseline.
With that, you get two lists of indices, which you can directly pass to df.iloc, as according to the documentation it supports a list of integers.
Here is the output for mask_length = 5 and overload_cycles_per_mask = 1:
[5, 10, 15]
4.0
[4, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17]
4.545454545454546
And here is for mask_length = 5 and overload_cycles_per_mask = 3:
[5, 6, 7, 10, 11, 12, 15, 16, 17]
3.6666666666666665
[4, 8, 9, 13, 14]
5.8
I do believe calling this a single mask makes things more confusing. In any case, I would tuck the logic for getting the indices away in some separate function to the one which calculates the mean.

Trying to understand how my python code works

i = [1, 2, 3, 5, 5, 7, 9, 12, 14, 14,]
list_length = len(i)
def numbers_found(x):
y = i.count(x)
return y
latest_num = i[list_length - 1]
for z in range(latest_num + 1):
print("Found", numbers_found(z), "of number", "\"" + str(z) + "\".")
I am trying to find how many of a certain number is available in the list, if I somehow minus by 1 to the maximum number in the list (assuming it is in ascending order) and add 1 again it works. Please help explain this to me.
Lets break it down, step by step.
# a list. indexes shown below
# 0 1 2 3 4 5 6 7 8 9
i = [1, 2, 3, 5, 5, 7, 9, 12, 14, 14]
# getting the length of the list (10)
# or the number of elements
list_length = len(i)
# a function returning the amounts of
# times a passed value is found in the list i
def numbers_found(x):
y = i.count(x)
return y
# see above that list_length is 10
# but we need one less that to retrieve the last element
# which will be 14
latest_num = i[list_length - 1]
# range given 1 argument iterates from 0
# to the number you pass it but not including it
# since latest_num is 14, it won't include it
# So range(15) would iterate like
# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
for z in range(latest_num + 1):
print("Found", numbers_found(z), "of number", "\"" + str(z) + "\".")
The reason range works like this, is it's common to see range(length_of_my_list) and expect it to return indexes for the full list.
In order for that to happen you need to only iterate to and not include the length. In your case (10).
What you are using it for is something else. You are trying to find the occurrences of all the numbers in the list. Since you are not using it for indexes, adding + 1 works since you WANT it to include 14.
Use
for z in range(len(i)):
print("Found", numbers_found(z), "of number", "\"" + str(z) + "\".")
Here len(i) is the size of i i.e. the number of elements in i, and range(n) = [0, 1, ..., n - 1] i.e all numbers from 0 to n-1, sorted asc.

Add 1 for odd numbers on list Python

So this is my code:
numbers = []#1
for i in range(10):
i += 1
numbers.append(i)
print (numbers)
numbers2 = [n + 2 for n in numbers]#2
print (numbers2)
numbers3 = []#3
for x in numbers2:
if (x % 2 == 1) :
x += 1
numbers3 = x
print (numbers3)
I'm using Google Colab and run those codes on 3 separate code cells(hashtag numbers comment). The #1 program output is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. And the #2 output is [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. So at #3 cell I want program to add 1 on each odd numbers in numbers2 list. The output I want is [4, 4, 6, 6, 8, 8, 10, 10, 12, 12]. But what I get is :
4
6
8
10
12
Also I'm trying to not use function(just for these codes). And for loop on #1 code I intend to do that.
Additional question : Is it possible to modify elements on list without append the result to another list(like #2 code)? Like just add 2 on each numbers on list
Try the following:
second_list = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
third_list = second_list
for i in range(len(second_list)):
if (second_list[i] % 2 == 1):
make_even = second_list[i] + 1
third_list[i] = make_even
else:
third_list[i] = second_list[i]
print (third_list)
You'll have to tweak it for the Google Colab project you're doing, but the logic here should be right. Right now this answer works in regular Python 3.
your third cell should look like this, you were close to the solution all you need to do is to add x values in numbers3 list
numbers3 = []#3
for x in numbers2:
if (x % 2 == 1) :
x += 1
numbers3.append(x)
print(numbers3)
In order to fill the list instead of just printing the value you must use the append function instead of =
numbers3 = x
#becomes
numbers3.append(x)
To get the list in the format you want, you also have to drop the append out of the if statement, and drop the print out of the for loop, like so:
numbers3 = []#3
for x in numbers2:
if (x % 2 == 1) :
x += 1
numbers3.append(x) # Here
print (numbers3) # and here

Python: Find outliers inside a list

I'm having a list with a random amount of integers and/or floats. What I'm trying to achieve is to find the exceptions inside my numbers (hoping to use the right words to explain this). For example:
list = [1, 3, 2, 14, 108, 2, 1, 8, 97, 1, 4, 3, 5]
90 to 99% of my integer values are between 1 and 20
sometimes there are values that are much higher, let's say somewhere around 100 or 1.000 or even more
My problem is, that these values can be different all the time. Maybe the regular range is somewhere between 1.000 to 1.200 and the exceptions are in the range of half a million.
Is there a function to filter out these special numbers?
Assuming your list is l:
If you know you want to filter a certain percentile/quantile, you can
use:
This removes bottom 10% and top 90%. Of course, you can change any of
them to your desired cut-off (for example you can remove the bottom filter and only filter the top 90% in your example):
import numpy as np
l = np.array(l)
l = l[(l>np.quantile(l,0.1)) & (l<np.quantile(l,0.9))].tolist()
output:
[ 3 2 14 2 8 4 3 5]
If you are not sure of the percentile cut-off and are looking to
remove outliers:
You can adjust your cut-off for outliers by adjusting argument m in
function call. The larger it is, the less outliers are removed. This function seems to be more robust to various types of outliers compared to other outlier removal techniques.
import numpy as np
l = np.array(l)
def reject_outliers(data, m=6.):
d = np.abs(data - np.median(data))
mdev = np.median(d)
s = d / (mdev if mdev else 1.)
return data[s < m].tolist()
print(reject_outliers(l))
output:
[1, 3, 2, 14, 2, 1, 8, 1, 4, 3, 5]
You can use the built-in filter() method:
lst1 = [1, 3, 2, 14, 108, 2, 1, 8, 97, 1, 4, 3, 5]
lst2 = list(filter(lambda x: x > 5,lst1))
print(lst2)
Output:
[14, 108, 8, 97]
So here is a method how to block out those deviators
import math
_list = [1, 3, 2, 14, 108, 2, 1, 8, 97, 1, 4, 3, 5]
def consts(_list):
mu = 0
for i in _list:
mu += i
mu = mu/len(_list)
sigma = 0
for i in _list:
sigma += math.pow(i-mu,2)
sigma = math.sqrt(sigma/len(_list))
return sigma, mu
def frequence(x, sigma, mu):
return (1/(sigma*math.sqrt(2*math.pi)))*math.exp(-(1/2)*math.pow(((x-mu)/sigma),2))
sigma, mu = consts(_list)
new_list = []
for i in range(len(_list)):
if frequence(_list[i], sigma, mu) > 0.01:
new_list.append(i)
print(new_list)

summing element in a range for all element in an array

I have to get the total sum of a range from an array. However the array range needs to move from one element to another. for example if the array is 1,2,3,4,5,6 and if every two element needs to be added , then it should add 1+2 than 2+3 than 3+4 and so on.
I have tried but not getting right approach. I am sure there is a pythonic way of doing this.
here what I have tried
data = np.arange(0,20,.3)
for i in range (0,len(data)):
for j in range(i,len(data)):
get_range = data[j:5]
get_add = get_range.sum()
print("sum:",get_add)
I have tried to add every 5 element here.
You could use a list comprehension which retrieves a chunks list.
l = [1,2,3,4,5,6]
n = 2
output = [sum(l[i:i + n]) for i in range(0, len(l) - n + 1, 1)]
Output
[3, 5, 7, 9, 11]
There is a numpyic way to do this. It is more memory- and CPU- effective if your input data is big enougth.
import numpy as np
# input array: [1, 2, 3, 4, 5, 6]
data = np.arange(1, 7)
# cumulative sum: [1, 3, 6, 10, 15, 21]
data_cumsum = np.cumsum(data)
# append zero to start: [0, 1, 3, 6, 10, 15, 21]
data_cumsum = np.hstack([0, data_cumsum])
# calculate moving sum
window = 2
moving_sum = data_cumsum[window:] - data_cumsum[:-window]
print(moving_sum)
Output:
[ 3 5 7 9 11]
A minor change will solve the problem
data = np.arange(0,10)
for j in range(0,len(data)-1):
get_range = data[j:j+2] #changed from j to j+2
get_add = get_range.sum()
print("sum:",get_add)
OUTPUT
('sum:', 1)
('sum:', 3)
('sum:', 5)
('sum:', 7)
('sum:', 9)
('sum:', 11)
('sum:', 13)
('sum:', 15)
('sum:', 17)
You can easily condense above steps to form a list comprehension giving the same results with same complexity
[sum(data[j:j+2]) for j in range(0,len(data)-1)]
Another fancy approach could be using sliding_window function
from toolz.itertoolz import sliding_window
map(sum,list(sliding_window(2,list(range(0,10)))))
Output
[1, 3, 5, 7, 9, 11, 13, 15, 17]

Categories