Python - Recursive Date Sorting Algorithm - python

My assignment states that I get a list of birthdays and that I have to arrange them chronologically. I must write my own, so I can't use Python's predefined functions, such this:
import datetime
d = ['09-2012', '04-2007', '11-2012', '05-2013', '12-2006', '05-2006', '08-2007']
sorted(d, key=lambda x: datetime.datetime.strptime(x, '%m-%Y'))
Here is what I'm thinking of doing.
Step 1: Red all dates and put them into a list dd/mm/yyyy
date_list = [[1,2,1991],[2,1,1991],[3,4,1992],[5,6,1993],[4,5,1992],[8,5,1993]]
For better visualization, I will rearrange them like so:
1 / 2 / 1991
2 / 1 / 1991
3 / 4 / 1992
5 / 6 / 1993
4 / 5 / 1992
8 / 5 / 1993
Step 2: Sort the entire list by year (col 2)
1 / 2 / 1991
2 / 1 / 1991
3 / 4 / 1992
4 / 5 / 1992
5 / 6 / 1993
8 / 5 / 1993
Step 3: For each unique year, sort that sublist by the column near it (col 1)
2 / 1 / 1991
1 / 2 / 1991
3 / 4 / 1992
4 / 5 / 1992
8 / 5 / 1993
5 / 6 / 1993
Step 4: Do the same for the sublist of each unique month of that year (col 0)
1 / 1 / 1991
2 / 2 / 1991
3 / 4 / 1992
4 / 5 / 1992
8 / 5 / 1993
5 / 6 / 1993
And that should be it. I've used the following functions to try and it:
#Sorts the sublist date_list[position..position+length] by the col
def insertion(date_list, position, length, col):
for i in range (position + 1, pozition + lenght - 1):
aux = date_list[i]
j = i - 1
while j >= 0 and aux[col] < date_list[j][col]:
date_list[j+1] = date_list[j]
j -= 1
date_list[j+1] = aux
return date_list
def sortDateList(date_list, position, lenght, col):
# Nothing to do here
if col < 0:
return date_list
# If it's the first sort, sort everything
if col == 2:
date_list = insertion(date_list, 0, len(date_list), 2)
for i in range (position, position + length - 1):
# Divides the list into sublists based on the column
if date_list[i][col] == date_list[i][col]:
length += 1
else:
# Sorts the sublist, then sorts it after the previous column in it
date_list = insertion(date_list, position, length, col)
date_list = sortDateList(date_list, position, length, col - 1)
position += length
length = 1
date_list = insertion(date_list, position, length, col)
return date_list
I'm not sure exactly what the problem is here, I'm pretty sure it's something really basic that slipped my mind, and I can't keep track of recursion in my brain that well. It gives me some index out of bound errors and such.
For debug, I've printed out info as such:
col position position + length
date_list[position:position+length] before insertion()
date_list[position:position+length] after insertion()
Here is what the console gives me:
2 0 6
2 0 7
[[1, 2, 1991], [2, 1, 1991], [3, 4, 1992], [4, 5, 1992], [5, 6, 1993], [8, 5, 1993]]
[[1, 2, 1991], [2, 1, 1991], [3, 4, 1992], [4, 5, 1992], [5, 6, 1993], [8, 5, 1993]]
1 0 7
[[1, 2, 1991], [2, 1, 1991], [3, 4, 1992], [4, 5, 1992], [5, 6, 1993], [8, 5, 1993]]
[[2, 1, 1991], [1, 2, 1991], [3, 4, 1992], [4, 5, 1992], [8, 5, 1993], [5, 6, 1993]]
0 0 7
[[2, 1, 1991], [1, 2, 1991], [3, 4, 1992], [4, 5, 1992], [8, 5, 1993], [5, 6, 1993]]
[[1, 2, 1991], [2, 1, 1991], [3, 4, 1992], [4, 5, 1992], [5, 6, 1993], [8, 5, 1993]]
0 7 8
[]
[]
0 8 9
[]
[]
0 9 10
[]
[]
0 10 11
[]
[]
0 11 12
Any help is greatly appreciated!

Just write a simple sort algorithm and a compare function, such as this:
date_list = [[1,2,1991],[2,1,1991],[3,4,1992],[5,6,1993],[4,5,1992],[8,5,1993]]
# first compare years, if equal compare months, if equal compare days
def compare(date1,date2):
if date1[2] != date2[2]:
return date1[2]<date2[2]
if date1[1] != date2[1]:
return date1[1]<date2[1]
return date1[0] < date2[0]
for i in range(len(date_list)):
for j in range(i+1,len(date_list)):
if not compare(date_list[i],date_list[j]):
date_list[i],date_list[j] = date_list[j],date_list[i]
print date_list
The time complexity is O(n^2) but you can improve it by using a more efficient sort algorithm.

If you convert it to YYYYMMDD string format you can easily sort it. Try to sort string concatinated data instead of spiting it to 3 part.

Related

Loop stuck while calculating the nearest neighbour

I am trying to print out the list of nearest neighbours. I should be getting a list of length 12; however, the list I am getting is of shorter length. When I print out the values for the variable 'a', it is getting stuck at the last element of the list 'path'. I've been trying to debug it, but couldn't find why the value of 'a' is not following the same pattern and why I can't get the other elements.
G is a graph with 11 nodes in total, and what I should be getting is G.NNA(3) = [3, 4, 0, 10, 2, 8, 1, 7, 5, 11, 6, 9]. I am getting the list up until the eighth index [3, 4, 0, 10, 2, 8, 1, 7, 5].
I put print statements inside the two 'if' statements, and the value 'a' clearly gets stuck. It would be amazing if I could fix it. Thanks in advance!
def NNA(self, start):
path = [start] # Initialize the path that starts with the given point
usedNodes = [start] # List for used nodes
currentNode = start
a = 1
while len(path) < self.n and a < self.n:
distances = self.dists[currentNode] # self.dists[3] = [4, 3, 3, 0, 1, 3, 3, 2, 3, 3, 4, 2] and the indeces correspond to the nodes
nextNode = distances.index(sorted(distances)[a]) # Gets the next node
if nextNode not in usedNodes :
a = 1
path.append(nextNode)
usedNodes.append(nextNode)
currentNode = nextNode
print(currentNode, a)
elif nextNode in usedNodes:
a = a + 1 # Check the next smallest value
print(currentNode, a)
self.perm = path
print(path)
# Here is the output
4 1
4 2
0 1
10 1
10 2
2 1
8 1
8 2
1 1
7 1
5 1 # gets stuck at 5
5 2
5 3
5 4
5 5
5 6
5 7
5 8
5 9
5 10
5 11
5 12
[3, 4, 0, 10, 2, 8, 1, 7, 5] # expecting [3, 4, 0, 10, 2, 8, 1, 7, 5, 11, 6, 9]

Is it okey to use lambda in this case?

I'm trying to figure out a way to loop over a panda DataFrame to generate a new key.
Here's an example of the dataframe:
df = pd.DataFrame({"pdb" : ["a", "b"], "beg": [1, 2], "end" : [10, 11]})
for index, row in df.iterrows():
df['range'] = [list(x) for x in zip(df['beg'], df['end'])]
And now I want to create a new key, that basically takes the first and last number of df["range"] and full the list with the numbers in the middle (ie, the first one will be [1 2 3 4 5 6 7 8 9 10])
So far I think that I should be using something like this, but I could be completely wrong:
df["total"] = df["range"].map(lambda x: #and here I should append all the "x" that are betwen df["range"][0] and df["range"][1]
Here's an example of the result that I'm looking for:
pdb beg end range total
0 a 1 10 1 10 1 2 3 4 5 6 7 8 9 10
1 b 2 11 2 11 2 3 4 5 6 7 8 9 10 11
I could use some help with the lambda function, I get really confused with the syntax.
Try with apply
df['new'] = df.apply(lambda x : list(range(x['beg'],x['end']+1)),axis=1)
Out[423]:
0 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
1 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
dtype: object
This should work:
df['total'] = df['range'].apply(lambda x: [n for n in range(x[0], x[1]+1)])
As per your output, you need
In [18]: df['new'] = df.apply(lambda x : " ".join(list(map(str,range(x['beg'],x['end']+1)))),axis=1)
In [19]: df
Out[19]:
pdb beg end range new
0 a 1 10 [1, 10] 1 2 3 4 5 6 7 8 9 10
1 b 2 11 [2, 11] 2 3 4 5 6 7 8 9 10 11
If you want to use iterrows then you can do it in the loop itself as follows:
Code :
import pandas as pd
df = pd.DataFrame({"pdb" : ["a", "b"], "beg": [1, 2], "end" : [10, 11]})
for index, row in df.iterrows():
df['range'] = [list(x) for x in zip(df['beg'], df['end'])]
df['total'] = [range(*x) for x in zip(df['beg'], df['end'])]
Output:
pdb beg end range total
0 a 1 10 [1, 10] (1, 2, 3, 4, 5, 6, 7, 8, 9)
1 b 2 11 [2, 11] (2, 3, 4, 5, 6, 7, 8, 9, 10)

non fixed rolling window

I am looking to implement a rolling window on a list, but instead of a fixed length of window, I would like to provide a rolling window list:
Something like this:
l1 = [5, 3, 8, 2, 10, 12, 13, 15, 22, 28]
l2 = [1, 2, 2, 2, 3, 4, 2, 3, 5, 3]
get_custom_roling( l1, l2, np.average)
and the result would be:
[5, 4, 5.5, 5, 6.67, ....]
6.67 is calculated as average of 3 elements 10, 2, 8.
I implemented a slow solution, and every idea is welcome to make it quicker :):
import numpy as np
def get_the_list(end_point, number_points):
"""
example: get_the_list(6, 3) ==> [4, 5, 6]
example: get_the_list(9, 5) ==> [5, 6, 7, 8, 9]
"""
if np.isnan(number_points):
return []
number_points = int( number_points)
return list(range(end_point, end_point - number_points, -1 ))
def get_idx(s):
ss = list(enumerate(s) )
sss = (get_the_list(*elem) for elem in ss )
return sss
def get_custom_roling(s, ss, funct):
output_get_idx = get_idx(ss)
agg_stuff = [s[elem] for elem in output_get_idx]
res_agg_stuff = [ funct(elem) for elem in agg_stuff ]
res_agg_stuff = eiu.pd.Series(data=res_agg_stuff, index = s.index)
return res_agg_stuff
Pandas custom window rolling allows you to modify size of window.
Simple explanation: start and end arrays hold values of indexes to make slices of your data.
#start = [0 0 1 2 2 2 5 5 4 7]
#end = [1 2 3 4 5 6 7 8 9 10]
Arguments passed to get_window_bounds are given by BaseIndexer.
import pandas as pd
import numpy as np
from pandas.api.indexers import BaseIndexer
from typing import Optional, Tuple
class CustomIndexer(BaseIndexer):
def get_window_bounds(self,
num_values: int = 0,
min_periods: Optional[int] = None,
center: Optional[bool] = None,
closed: Optional[str] = None
) -> Tuple[np.ndarray, np.ndarray]:
end = np.arange(1, num_values+1, dtype=np.int64)
start = end - np.array(self.custom_name_whatever, dtype=np.int64)
return start, end
df = pd.DataFrame({"l1": [5, 3, 8, 2, 10, 12, 13, 15, 22, 28],
"l2": [1, 2, 2, 2, 3, 4, 2, 3, 5, 3]})
indexer = CustomIndexer(custom_name_whatever=df.l2)
df["variable_mean"] = df.l1.rolling(indexer).mean()
print(df)
Outputs:
l1 l2 variable_mean
0 5 1 5.000000
1 3 2 4.000000
2 8 2 5.500000
3 2 2 5.000000
4 10 3 6.666667
5 12 4 8.000000
6 13 2 12.500000
7 15 3 13.333333
8 22 5 14.400000
9 28 3 21.666667

solving 8 puzzle problem with BFS DFS (Using Python. Needs some suggestion)

My final state is
0 1 2 3 4 5 6 7 8
my graph would look like this
graph = {0 :[1, 3],
1 :[0, 4, 2],
2 :[1, 5],
3 :[0, 4, 6],
4 :[1, 3, 5, 7],
5 :[2, 4, 8],
6 :[3, 7],
7 :[4, 6, 8],
8 :[5 ,7]
}
1 - I was wondering if I should try some other methods such as list, if else statement than graph(above).
2 - Is anything wrong with the graph?
The problem given -
Example [1,5,3,2,0,4,7,8,6] <- more like this 1 5 3 2 0 4
7 8 6
I am supposed to find final state with given state
Thank You
So, there are 4 corner cases:
Top row
Bottom row
Most left column
Most right column
(And combinations)
We can handle them easy like this:
data = [1, 5, 3,
2, 0, 4,
7, 8, 6]
width = 3
height = 3
graph = {number: list() for number in data}
for idx, number in enumerate(data):
current_height = int(idx / width)
current_width = idx % width
if current_width != width - 1: # if next element in same row
graph[number].append(data[idx + 1])
if current_width != 0: # if prev element in same row
graph[number].append(data[idx - 1])
if current_height != 0: # if there is top element
graph[number].append(data[idx - 3])
if current_height != height - 1: # if there is bottom element
graph[number].append(data[idx + 3])
import pprint
pprint.pprint(graph)
This code will construct graph, but is this all for that puzzle?.

Pythonic way to calculate streaks in pandas dataframe

Given df
df = pd.DataFrame([[1, 5, 2, 8, 2], [2, 4, 4, 20, 2], [3, 3, 1, 20, 2], [4, 2, 2, 1, 3], [5, 1, 4, -5, -4], [1, 5, 2, 2, -20],
[2, 4, 4, 3, -8], [3, 3, 1, -1, -1], [4, 2, 2, 0, 12], [5, 1, 4, 20, -2]],
columns=['A', 'B', 'C', 'D', 'E'], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Based on this answer, I created a function to calculate streaks (up, down).
def streaks(df, column):
#Create sign column
df['sign'] = 0
df.loc[df[column] > 0, 'sign'] = 1
df.loc[df[column] < 0, 'sign'] = 0
# Downstreak
df['d_streak2'] = (df['sign'] == 0).cumsum()
df['cumsum'] = np.nan
df.loc[df['sign'] == 1, 'cumsum'] = df['d_streak2']
df['cumsum'] = df['cumsum'].fillna(method='ffill')
df['cumsum'] = df['cumsum'].fillna(0)
df['d_streak'] = df['d_streak2'] - df['cumsum']
df.drop(['d_streak2', 'cumsum'], axis=1, inplace=True)
# Upstreak
df['u_streak2'] = (df['sign'] == 1).cumsum()
df['cumsum'] = np.nan
df.loc[df['sign'] == 0, 'cumsum'] = df['u_streak2']
df['cumsum'] = df['cumsum'].fillna(method='ffill')
df['cumsum'] = df['cumsum'].fillna(0)
df['u_streak'] = df['u_streak2'] - df['cumsum']
df.drop(['u_streak2', 'cumsum'], axis=1, inplace=True)
del df['sign']
return df
The function works well, however is very long. I'm sure there's a much betterway to write this. I tried the other answer in but didn't work well.
This is the desired output
streaks(df, 'E')
A B C D E d_streak u_streak
1 1 5 2 8 2 0.0 1.0
2 2 4 4 20 2 0.0 2.0
3 3 3 1 20 2 0.0 3.0
4 4 2 2 1 3 0.0 4.0
5 5 1 4 -5 -4 1.0 0.0
6 1 5 2 2 -20 2.0 0.0
7 2 4 4 3 -8 3.0 0.0
8 3 3 1 -1 -1 4.0 0.0
9 4 2 2 0 12 0.0 1.0
10 5 1 4 20 -2 1.0 0.0
You could simplify the function as shown:
def streaks(df, col):
sign = np.sign(df[col])
s = sign.groupby((sign!=sign.shift()).cumsum()).cumsum()
return df.assign(u_streak=s.where(s>0, 0.0), d_streak=s.where(s<0, 0.0).abs())
Using it:
streaks(df, 'E')
Firstly, compute the sign of each cell present in the column under consideration using np.sign. These assign +1 to positive numbers and -1 to the negative.
Next, identify sets of adjacent values (comparing current cell and it's next) using sign!=sign.shift() and take it's cumulative sum which would serve in the grouping process.
Perform groupby letting these as the key/condition and again take the cumulative sum across the sub-group elements.
Finally, assign the positive computed cumsum values to ustreak and the negative ones (absolute value after taking their modulus) to dstreak.

Categories