I am looking to implement a rolling window on a list, but instead of a fixed length of window, I would like to provide a rolling window list:
Something like this:
l1 = [5, 3, 8, 2, 10, 12, 13, 15, 22, 28]
l2 = [1, 2, 2, 2, 3, 4, 2, 3, 5, 3]
get_custom_roling( l1, l2, np.average)
and the result would be:
[5, 4, 5.5, 5, 6.67, ....]
6.67 is calculated as average of 3 elements 10, 2, 8.
I implemented a slow solution, and every idea is welcome to make it quicker :):
import numpy as np
def get_the_list(end_point, number_points):
"""
example: get_the_list(6, 3) ==> [4, 5, 6]
example: get_the_list(9, 5) ==> [5, 6, 7, 8, 9]
"""
if np.isnan(number_points):
return []
number_points = int( number_points)
return list(range(end_point, end_point - number_points, -1 ))
def get_idx(s):
ss = list(enumerate(s) )
sss = (get_the_list(*elem) for elem in ss )
return sss
def get_custom_roling(s, ss, funct):
output_get_idx = get_idx(ss)
agg_stuff = [s[elem] for elem in output_get_idx]
res_agg_stuff = [ funct(elem) for elem in agg_stuff ]
res_agg_stuff = eiu.pd.Series(data=res_agg_stuff, index = s.index)
return res_agg_stuff
Pandas custom window rolling allows you to modify size of window.
Simple explanation: start and end arrays hold values of indexes to make slices of your data.
#start = [0 0 1 2 2 2 5 5 4 7]
#end = [1 2 3 4 5 6 7 8 9 10]
Arguments passed to get_window_bounds are given by BaseIndexer.
import pandas as pd
import numpy as np
from pandas.api.indexers import BaseIndexer
from typing import Optional, Tuple
class CustomIndexer(BaseIndexer):
def get_window_bounds(self,
num_values: int = 0,
min_periods: Optional[int] = None,
center: Optional[bool] = None,
closed: Optional[str] = None
) -> Tuple[np.ndarray, np.ndarray]:
end = np.arange(1, num_values+1, dtype=np.int64)
start = end - np.array(self.custom_name_whatever, dtype=np.int64)
return start, end
df = pd.DataFrame({"l1": [5, 3, 8, 2, 10, 12, 13, 15, 22, 28],
"l2": [1, 2, 2, 2, 3, 4, 2, 3, 5, 3]})
indexer = CustomIndexer(custom_name_whatever=df.l2)
df["variable_mean"] = df.l1.rolling(indexer).mean()
print(df)
Outputs:
l1 l2 variable_mean
0 5 1 5.000000
1 3 2 4.000000
2 8 2 5.500000
3 2 2 5.000000
4 10 3 6.666667
5 12 4 8.000000
6 13 2 12.500000
7 15 3 13.333333
8 22 5 14.400000
9 28 3 21.666667
Related
I have a vector with size, for example, (1,16) which is x = [1,2,3,4,.....16] and another vector y = [1,2,3,4] whose size is( 1,4)
I want to set the values in the vector x with interval 4 to be the vector y. it means it will be like that x(1:4:16) = y ; In python, how can I do that?
The expected output is to be x = [1 2 3 4 2 6 7 8 3 10 11 12 4 14 15 16].
Try using slice assignment:
x[::len(y)] = y
And now:
print(x)
Will give:
[1, 2, 3, 4, 2, 6, 7, 8, 3, 10, 11, 12, 4, 14, 15, 16]
I've an array of values and want to map each value with one from another array. The mapped value is the largest found which is lower or equal (I make the assumption it always exists).
For example from the values [6, 15, 4, 12, 10, 5] and the lookup table [4, 6, 7, 8, 10, 12] I would print:
6 is between 6 and 7
15 is between 12 and None
4 is between 4 and 6
12 is between 12 and None
10 is between 10 and 12
5 is between 4 and 6
I do this like this:
import numpy as np
def last_smallest(values, limits):
count = values.shape[0]
value = np.zeros(count, dtype='int')
for i in range(count):
found = np.where(limits <= values[i])
value[i] = found[-1][-1]
return value
lookup_table = np.array([4, 6, 7, 8, 10, 12])
samples = np.array([6, 15, 4, 12, 10, 5])
result = last_smallest(samples, lookup_table)
for i, value in enumerate(samples):
index = result[i]
high = lookup_table[index+1] if index < lookup_table.shape[0] - 1 else None
print(f'{value} is between {lookup_table[index]} and {high}')
This works, however last_smallest function is really not elegant. I've tried to vectorize it, but I can't.
Is it possible to replace result = last_smallest(samples, lookup_table) by pure numpy array operations?
np.digitize can be used here:
lookup_table = np.array([4, 6, 7, 8, 10, 12])
samples = np.array([6, 15, 4, 12, 10, 5])
res = np.digitize(samples, lookup_table)
lookup_table = np.append(lookup_table, None) # you might want to change this line
for sample, idx in zip(samples, res):
print(f'{sample} is between {lookup_table[idx-1]} and {lookup_table[idx]}')
Output:
6 is between 6 and 7
15 is between 12 and None
4 is between 4 and 6
12 is between 12 and None
10 is between 10 and 12
5 is between 4 and 6
I am trying to print out the list of nearest neighbours. I should be getting a list of length 12; however, the list I am getting is of shorter length. When I print out the values for the variable 'a', it is getting stuck at the last element of the list 'path'. I've been trying to debug it, but couldn't find why the value of 'a' is not following the same pattern and why I can't get the other elements.
G is a graph with 11 nodes in total, and what I should be getting is G.NNA(3) = [3, 4, 0, 10, 2, 8, 1, 7, 5, 11, 6, 9]. I am getting the list up until the eighth index [3, 4, 0, 10, 2, 8, 1, 7, 5].
I put print statements inside the two 'if' statements, and the value 'a' clearly gets stuck. It would be amazing if I could fix it. Thanks in advance!
def NNA(self, start):
path = [start] # Initialize the path that starts with the given point
usedNodes = [start] # List for used nodes
currentNode = start
a = 1
while len(path) < self.n and a < self.n:
distances = self.dists[currentNode] # self.dists[3] = [4, 3, 3, 0, 1, 3, 3, 2, 3, 3, 4, 2] and the indeces correspond to the nodes
nextNode = distances.index(sorted(distances)[a]) # Gets the next node
if nextNode not in usedNodes :
a = 1
path.append(nextNode)
usedNodes.append(nextNode)
currentNode = nextNode
print(currentNode, a)
elif nextNode in usedNodes:
a = a + 1 # Check the next smallest value
print(currentNode, a)
self.perm = path
print(path)
# Here is the output
4 1
4 2
0 1
10 1
10 2
2 1
8 1
8 2
1 1
7 1
5 1 # gets stuck at 5
5 2
5 3
5 4
5 5
5 6
5 7
5 8
5 9
5 10
5 11
5 12
[3, 4, 0, 10, 2, 8, 1, 7, 5] # expecting [3, 4, 0, 10, 2, 8, 1, 7, 5, 11, 6, 9]
I'm trying to figure out a way to loop over a panda DataFrame to generate a new key.
Here's an example of the dataframe:
df = pd.DataFrame({"pdb" : ["a", "b"], "beg": [1, 2], "end" : [10, 11]})
for index, row in df.iterrows():
df['range'] = [list(x) for x in zip(df['beg'], df['end'])]
And now I want to create a new key, that basically takes the first and last number of df["range"] and full the list with the numbers in the middle (ie, the first one will be [1 2 3 4 5 6 7 8 9 10])
So far I think that I should be using something like this, but I could be completely wrong:
df["total"] = df["range"].map(lambda x: #and here I should append all the "x" that are betwen df["range"][0] and df["range"][1]
Here's an example of the result that I'm looking for:
pdb beg end range total
0 a 1 10 1 10 1 2 3 4 5 6 7 8 9 10
1 b 2 11 2 11 2 3 4 5 6 7 8 9 10 11
I could use some help with the lambda function, I get really confused with the syntax.
Try with apply
df['new'] = df.apply(lambda x : list(range(x['beg'],x['end']+1)),axis=1)
Out[423]:
0 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
1 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
dtype: object
This should work:
df['total'] = df['range'].apply(lambda x: [n for n in range(x[0], x[1]+1)])
As per your output, you need
In [18]: df['new'] = df.apply(lambda x : " ".join(list(map(str,range(x['beg'],x['end']+1)))),axis=1)
In [19]: df
Out[19]:
pdb beg end range new
0 a 1 10 [1, 10] 1 2 3 4 5 6 7 8 9 10
1 b 2 11 [2, 11] 2 3 4 5 6 7 8 9 10 11
If you want to use iterrows then you can do it in the loop itself as follows:
Code :
import pandas as pd
df = pd.DataFrame({"pdb" : ["a", "b"], "beg": [1, 2], "end" : [10, 11]})
for index, row in df.iterrows():
df['range'] = [list(x) for x in zip(df['beg'], df['end'])]
df['total'] = [range(*x) for x in zip(df['beg'], df['end'])]
Output:
pdb beg end range total
0 a 1 10 [1, 10] (1, 2, 3, 4, 5, 6, 7, 8, 9)
1 b 2 11 [2, 11] (2, 3, 4, 5, 6, 7, 8, 9, 10)
My assignment states that I get a list of birthdays and that I have to arrange them chronologically. I must write my own, so I can't use Python's predefined functions, such this:
import datetime
d = ['09-2012', '04-2007', '11-2012', '05-2013', '12-2006', '05-2006', '08-2007']
sorted(d, key=lambda x: datetime.datetime.strptime(x, '%m-%Y'))
Here is what I'm thinking of doing.
Step 1: Red all dates and put them into a list dd/mm/yyyy
date_list = [[1,2,1991],[2,1,1991],[3,4,1992],[5,6,1993],[4,5,1992],[8,5,1993]]
For better visualization, I will rearrange them like so:
1 / 2 / 1991
2 / 1 / 1991
3 / 4 / 1992
5 / 6 / 1993
4 / 5 / 1992
8 / 5 / 1993
Step 2: Sort the entire list by year (col 2)
1 / 2 / 1991
2 / 1 / 1991
3 / 4 / 1992
4 / 5 / 1992
5 / 6 / 1993
8 / 5 / 1993
Step 3: For each unique year, sort that sublist by the column near it (col 1)
2 / 1 / 1991
1 / 2 / 1991
3 / 4 / 1992
4 / 5 / 1992
8 / 5 / 1993
5 / 6 / 1993
Step 4: Do the same for the sublist of each unique month of that year (col 0)
1 / 1 / 1991
2 / 2 / 1991
3 / 4 / 1992
4 / 5 / 1992
8 / 5 / 1993
5 / 6 / 1993
And that should be it. I've used the following functions to try and it:
#Sorts the sublist date_list[position..position+length] by the col
def insertion(date_list, position, length, col):
for i in range (position + 1, pozition + lenght - 1):
aux = date_list[i]
j = i - 1
while j >= 0 and aux[col] < date_list[j][col]:
date_list[j+1] = date_list[j]
j -= 1
date_list[j+1] = aux
return date_list
def sortDateList(date_list, position, lenght, col):
# Nothing to do here
if col < 0:
return date_list
# If it's the first sort, sort everything
if col == 2:
date_list = insertion(date_list, 0, len(date_list), 2)
for i in range (position, position + length - 1):
# Divides the list into sublists based on the column
if date_list[i][col] == date_list[i][col]:
length += 1
else:
# Sorts the sublist, then sorts it after the previous column in it
date_list = insertion(date_list, position, length, col)
date_list = sortDateList(date_list, position, length, col - 1)
position += length
length = 1
date_list = insertion(date_list, position, length, col)
return date_list
I'm not sure exactly what the problem is here, I'm pretty sure it's something really basic that slipped my mind, and I can't keep track of recursion in my brain that well. It gives me some index out of bound errors and such.
For debug, I've printed out info as such:
col position position + length
date_list[position:position+length] before insertion()
date_list[position:position+length] after insertion()
Here is what the console gives me:
2 0 6
2 0 7
[[1, 2, 1991], [2, 1, 1991], [3, 4, 1992], [4, 5, 1992], [5, 6, 1993], [8, 5, 1993]]
[[1, 2, 1991], [2, 1, 1991], [3, 4, 1992], [4, 5, 1992], [5, 6, 1993], [8, 5, 1993]]
1 0 7
[[1, 2, 1991], [2, 1, 1991], [3, 4, 1992], [4, 5, 1992], [5, 6, 1993], [8, 5, 1993]]
[[2, 1, 1991], [1, 2, 1991], [3, 4, 1992], [4, 5, 1992], [8, 5, 1993], [5, 6, 1993]]
0 0 7
[[2, 1, 1991], [1, 2, 1991], [3, 4, 1992], [4, 5, 1992], [8, 5, 1993], [5, 6, 1993]]
[[1, 2, 1991], [2, 1, 1991], [3, 4, 1992], [4, 5, 1992], [5, 6, 1993], [8, 5, 1993]]
0 7 8
[]
[]
0 8 9
[]
[]
0 9 10
[]
[]
0 10 11
[]
[]
0 11 12
Any help is greatly appreciated!
Just write a simple sort algorithm and a compare function, such as this:
date_list = [[1,2,1991],[2,1,1991],[3,4,1992],[5,6,1993],[4,5,1992],[8,5,1993]]
# first compare years, if equal compare months, if equal compare days
def compare(date1,date2):
if date1[2] != date2[2]:
return date1[2]<date2[2]
if date1[1] != date2[1]:
return date1[1]<date2[1]
return date1[0] < date2[0]
for i in range(len(date_list)):
for j in range(i+1,len(date_list)):
if not compare(date_list[i],date_list[j]):
date_list[i],date_list[j] = date_list[j],date_list[i]
print date_list
The time complexity is O(n^2) but you can improve it by using a more efficient sort algorithm.
If you convert it to YYYYMMDD string format you can easily sort it. Try to sort string concatinated data instead of spiting it to 3 part.