Euler Project #18 - Pythonic approach - python

I'm trying to solve the 18th problem from Project Euler but I'm stuck in the solution. Doing it in a paper I get the same results but I know the answer has a difference of 10 between what I'm getting.
By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.
3
7 4
2 4 6
8 5 9 3
That is, 3 + 7 + 4 + 9 = 23.
Find the maximum total from top to bottom of the triangle below:
75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
19 01 23 75 03 34
88 02 77 73 07 63 67
99 65 04 28 06 16 70 92
41 41 26 56 83 40 80 70 33
41 48 72 33 47 32 37 16 94 29
53 71 44 65 25 43 91 52 97 51 14
70 11 33 28 77 73 17 78 39 68 17 57
91 71 52 38 17 14 91 43 58 50 27 29 48
63 66 04 68 89 53 67 30 73 16 69 87 40 31
04 62 98 27 23 09 70 98 73 93 38 53 60 04 23
NOTE: As there are only 16384 routes, it is possible to solve this problem by trying every route. However, Problem 67, is the same challenge with a triangle containing one-hundred rows; it cannot be solved by brute force, and requires a clever method! ;o)
Here is my code
filename = "triangle.txt"
f = open(filename,"r+")
total = 0
#will store the position of the maximum value in the line
index = 0
#get the first pyramid value
total = [int(x) for x in f.readline().split()][0]
#since it's only one value, the position will start with 0
current_index = 0
# loop through the lines
for line in f:
# transform the line into a list of integers
cleaned_list = [int(x) for x in line.split()]
# get the maxium value between index and index + 1 (adjacent positions)
maximum_value_now = max(cleaned_list[current_index],cleaned_list[current_index + 1])
#print maximum_value_now
# stores the index to the next iteration
future_indexes = [ind for (ind,value) in enumerate(cleaned_list) if value == maximum_value_now]
# we have more that 2 values in our list with this maximum value
# must return only that which is greater than our previous index
if (len(future_indexes) > 1):
current_index = [i for i in future_indexes if (i >= current_index and i <= current_index + 1)][0]
else:
#only one occurence of the maximum value
current_index = future_indexes[0]
# add the value found to the total sum
total = total + maximum_value_now
print total
Thanks!

First of all, read the entire triangle into a 2d structure. It is handy to note that we can do an affine transformation to the triangle and therefore use an easier coordinate system:
3 \ 3
7 4 ====\ 7 4
2 4 6 ====/ 2 4 6
8 5 9 3 / 8 5 9 3
It is easy to read this into a jagged array in Python:
with open(filename, 'r') as file:
rows = [[int(i) for i in line.split()] for line in file]
Now given x as the horizontal coordinate and y as the vertical coordinate, and them increasing left and down, there are 2 valid moves from (x, y): (x + 1, y + 1) and (x, y + 1). It is as simple as that.
The trick here is now to calculate all the maximum sums for cell in each row. This is called dynamic programming. The maximum sum is then the maximal sum in the last row.
Actually there is no need to remember anything beyond the sums on the just preceding row, and the sums on the current row. To calculate the maximal row sums current_sums', we notice that to arrive to positionxin the latest row, the position must have beenx - 1orx. We choose the maximal value of these, then sum with the currentcell_value`. We can consider any of the numbers outside the triangle as 0 for simplicity as they don't affect the maximal solution here. Therefore we get
with open('triangle.txt', 'r') as file:
triangle = [[int(i) for i in line.split()] for line in file]
previous_sums = []
for row in triangle:
current_sums = []
for position, cell_value in enumerate(row):
sum_from_right = 0 if position >= len(previous_sums) else previous_sums[position]
sum_from_left = (previous_sums[position - 1]
if 0 < position <= len(previous_sums)
else 0)
current_sums.append(max(sum_from_right, sum_from_left) + cell_value)
previous_sums = current_sums
print('The maximum sum is', max(previous_sums))
If you like list comprehensions, the inner loop can be written into one:
current_sums = []
for row in triangle:
len_previous = len(current_sums)
current_sums = [
max(0 if pos >= len_previous else current_sums[pos],
current_sums[pos - 1] if 0 < pos <= len_previous else 0)
+ cell_value
for pos, cell_value in enumerate(row)
]
print('The maximum sum is', max(current_sums))

Here is a simple recursive solution which uses memoization
L1 = [
" 3 ",
" 7 4 ",
" 2 4 6 ",
"8 5 9 3",
]
L2 = [
" 75 ",
" 95 64 ",
" 17 47 82 ",
" 18 35 87 10 ",
" 20 04 82 47 65 ",
" 19 01 23 75 03 34 ",
" 88 02 77 73 07 63 67 ",
" 99 65 04 28 06 16 70 92 ",
" 41 41 26 56 83 40 80 70 33 ",
" 41 48 72 33 47 32 37 16 94 29 ",
" 53 71 44 65 25 43 91 52 97 51 14 ",
" 70 11 33 28 77 73 17 78 39 68 17 57 ",
" 91 71 52 38 17 14 91 43 58 50 27 29 48 ",
" 63 66 04 68 89 53 67 30 73 16 69 87 40 31 ",
"04 62 98 27 23 09 70 98 73 93 38 53 60 04 23 ",
]
class Max(object):
def __init__(self, l):
"parse triangle, initialize cache"
self.l = l
self.t = [
map(int,filter(lambda x:len(x)>0, x.split(" ")))
for x in l
]
self.cache = {}
def maxsub(self, r=0, c=0):
"compute max path starting at (r,c)"
saved = self.cache.get((r,c))
if saved:
return saved
if r >= len(self.t):
answer = (0, [], [])
else:
v = self.t[r][c]
s1, l1, c1 = self.maxsub(r+1, c)
s2, l2, c2 = self.maxsub(r+1, c+1)
if s1 > s2:
answer = (v+s1, [v]+l1, [c]+c1)
else:
answer = (v+s2, [v]+l2, [c]+c2)
self.cache[(r,c)] = answer
return answer
def report(self):
"find and report max path"
m = self.maxsub()
print
print "\n".join(self.l)
print "maxsum:%s\nvalues:%s\ncolumns:%s" % m
if __name__ == '__main__':
Max(L1).report()
Max(L2).report()
Sample output
3
7 4
2 4 6
8 5 9 3
maxsum:23
values:[3, 7, 4, 9]
columns:[0, 0, 1, 2]
75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
19 01 23 75 03 34
88 02 77 73 07 63 67
99 65 04 28 06 16 70 92
41 41 26 56 83 40 80 70 33
41 48 72 33 47 32 37 16 94 29
53 71 44 65 25 43 91 52 97 51 14
70 11 33 28 77 73 17 78 39 68 17 57
91 71 52 38 17 14 91 43 58 50 27 29 48
63 66 04 68 89 53 67 30 73 16 69 87 40 31
04 62 98 27 23 09 70 98 73 93 38 53 60 04 23
maxsum:1074
values:[75, 64, 82, 87, 82, 75, 73, 28, 83, 32, 91, 78, 58, 73, 93]
columns:[0, 1, 2, 2, 2, 3, 3, 3, 4, 5, 6, 7, 8, 8, 9]
To solve the 100-row Project Euler problem 67 we make a small change to __main__
def main():
with file('triangle.txt') as f:
L = f.readlines()
Max(L).report()
if __name__ == '__main__':
main()
Last lines of output:
maxsum:7273
values:[59, 73, 52, 53, 87, 57, 92, 81, 81, 79, 81, 32, 86, 82, 97, 55, 97, 36, 62, 65, 90, 93, 95, 54, 71, 77, 68, 71, 94, 8, 89, 54, 42, 90, 84, 91, 31, 71, 93, 94, 53, 69, 73, 99, 89, 47, 80, 96, 81, 52, 98, 38, 91, 78, 90, 70, 61, 17, 11, 75, 74, 55, 81, 87, 89, 99, 73, 88, 95, 68, 37, 87, 73, 77, 60, 82, 87, 64, 96, 65, 47, 94, 85, 51, 87, 65, 65, 66, 91, 83, 72, 24, 98, 89, 53, 82, 57, 99, 98, 95]
columns:[0, 0, 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 7, 8, 9, 10, 11, 12, 12, 12, 13, 13, 13, 14, 14, 15, 15, 16, 17, 17, 17, 18, 19, 20, 21, 22, 23, 24, 25, 25, 25, 26, 27, 27, 28, 29, 30, 31, 32, 32, 32, 32, 33, 33, 34, 35, 36, 36, 36, 36, 36, 36, 36, 37, 38, 39, 40, 41, 41, 42, 42, 42, 42, 42, 42, 42, 43, 43, 43, 44, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 48, 49, 49, 50, 51, 52, 52, 53]
On my Mac it returns the answer immediately. Here is a timeit measurement:
$ python -m timeit -s 'from p067 import main' main
100000000 loops, best of 3: 0.0181 usec per loop

Related

How to print prime numbers in a tabular format

my goal is to print prime numbers in a tabular format, instead of printing one value each line. so far all my attempts have ended in either lines, or misprinted tables.
start = int(input("Start number: "))
end = int(input("End number: "))
if start < 0 or end < 0:
print("Start and End must be positive.")
start = int(input("Start number: "))
end = int(input("End number: "))
if end < start:
print("End must be greater than Start number: ")
start = int(input("Start number: "))
end = int(input("End number: "))
prime = True
for num in range(start,end+1):
if num > 1:
for i in range(2,num):
if num % i == 0:
break
else:
num = print(num)
the one i have here can only print it line by line
#start number: 1
#end number: 100
# 2 3 5 7 11 13 17 19 23 29
#31 37 41 43 47 53 59 61 67 71
#73 79 83 89 97
This can be done with str.rjust or its friends
>>> "2".rjust(3)
' 2'
>>>
first we gather the numbers we want to print and calculate how many characters it take the biggest of them and add one to that value, that result is the one we will use for the rjust
>>> nums=[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
>>> j = len(str(max(nums))) + 1
>>>
now we pick how many we want to print per line
>>> linesize = 10
>>>
and finally we make use of print keyword-only arguments end to control when to print in the same line or not and enumerate to control how many we have already printed
>>> for i,p in enumerate(nums,1):
print( str(p).rjust(j), end="" )
if i%linesize==0:
print() #to go to the next line
2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97
>>>
You could use str.format and implement a reusable solution using a generator:
from math import floor
def tabular(records, line_width=42, sep_space=3):
width = len(str(max(records))) + sep_space
columns = floor(line_width/width)
for i in range(0, len(records), columns):
row_records = records[i:i+columns]
row_format = ("{:>" + str(width) + "}") * len(row_records)
yield row_format.format(*row_records)
# test data / prime numbers
numbers = [
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37,
41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83,
89, 97
]
for row in tabular(numbers):
print(row)
# 2 3 5 7 11 13 17 19
# 23 29 31 37 41 43 47 53
# 59 61 67 71 73 79 83 89
# 97
Example with some other numbers:
for row in tabular(list(range(0, 1600, 50)), 79, 2):
print(row)
# 0 50 100 150 200 250 300 350 400 450 500 550 600
# 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250
# 1300 1350 1400 1450 1500 1550
Example with str.format but without using a generator:
# test data / prime numbers
numbers = [
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37,
41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83,
89, 97
]
width = len(str(max(numbers))) + 3
for i in range(0, len(numbers), 10):
row_records = numbers[i:i+10]
row_format = ("{:>" + width + "}") * len(row_records)
print(row_format.format(*row_records))
# 2 3 5 7 11 13 17 19 23 29
# 31 37 41 43 47 53 59 61 67 71
# 73 79 83 89 97

Maximum average of n consecutive values in DataFrame

I want to find maximum average of n conseсutive values in DataFrame.
import pandas as pd
list1 = [120, 130, 135, 140, 170, 131, 131, 151, 181, 191, 200, 210, 220, 170, 160, 151, 120, 140, 170, 173]
list2 = [80, 81, 82, 82, 82, 83, 84, 84, 85, 85, 85, 86, 87, 88, 89, 90, 90, 90, 91, 91 ]
df = pd.DataFrame(zip(list1, list2), columns=['value1', 'value2'])
df['interval'] = 0
interval_duration = 3 # set interval duration
number_of_intervals = 4 # set number of intervals
# I found only a way with for loop:
for x in range(1, number_of_intervals + 1):
max_average_interval = sum(df['value1'][0 : interval_duration]) / interval_duration
item_max = 0
for item in range(len(df['value1']) - interval_duration + 1):
if sum(df['interval'].loc[item : item + interval_duration - 1]) == 0:
if max_average_interval < sum(df['value1'][item : item + interval_duration]) / interval_duration:
max_average_interval = sum(df['value1'][item : item + interval_duration]) / interval_duration
item_max = item
df['interval'].loc[item_max : item_max + interval_duration - 1] = x
Result:
value1 value2 interval
0 120 80 0
1 130 81 0
2 135 82 0
3 140 82 0
4 170 82 0
5 131 83 0
6 131 84 0
7 151 84 2
8 181 85 2
9 191 85 2
10 200 85 1
11 210 86 1
12 220 87 1
13 170 88 4
14 160 89 4
15 151 90 4
16 120 90 0
17 140 90 3
18 170 91 3
19 173 91 3
where in the interval column:
1 - first maximum interval of consecutive values
2 - second maximum interval of consecutive values
and so on.
Question. If there is a more efficient way to do this? That's matter because I can have thousands and thousands of values.
Updated
Updated again..

Linear regression:ValueError: all the input array dimensions except for the concatenation axis must match exactly

I am looking for a solution for the following problem and it just won't work the way I want to.
So my goal is to calculate a regression analysis and get the slope, intercept, rvalue, pvalue and stderr for multiple rows (this could go up to 10000). In this example, I have a file with 15 rows. Here are the first two rows:
array([
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24],
[ 100, 10, 61, 55, 29, 77, 61, 42, 70, 73, 98,
62, 25, 86, 49, 68, 68, 26, 35, 62, 100, 56,
10, 97]]
)
Full trial data set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
100 10 61 55 29 77 61 42 70 73 98 62 25 86 49 68 68 26 35 62 100 56 10 97
57 89 25 89 48 56 67 17 98 10 25 90 17 52 85 56 18 20 74 97 82 63 45 87
192 371 47 173 202 144 17 147 174 483 170 422 285 13 77 116 500 136 276 392 220 121 441 268
The first row is the x-variable and this is the independent variable. This has to be kept fixed while iterating over every following row.
For the following row, the y-variable and thus the dependent variable, I want to calculate the slope, intercept, rvalue, pvalue and stderr and have them in a dataframe (if possible added to the same dataframe, but this is not necessary).
I tried the following code:
import pandas as pd
import scipy.stats
import numpy as np
df = pd.read_excel("Directory\\file.xlsx")
def regr(row):
r = scipy.stats.linregress(df.iloc[1:, :], row)
return r
full_dataframe = None
for index,row in df.iterrows():
x = regr(index)
if full_dataframe is None:
full_dataframe = x.T
else:
full_dataframe = full_dataframe.append([x.T])
full_dataframe.to_excel('Directory\\file.xlsx')
But this fails and gives the following error:
ValueError: all the input array dimensions except for the concatenation axis
must match exactly
I'm really lost in here.
So, I want to achieve that I have the slope, intercept, pvalue, rvalue and stderr per row, starting from the second one, because the first row is the x-variable.
Anyone has an idea HOW to do this and tell me WHY mine isn't working and WHAT the code should look like?
Thanks!!
Guessing the issue
Most likely, your problem is the format of your numbers, there are Unicode String dtype('<U21') instead of being Integer or Float.
Always check types:
df.dtypes
Cast your dataframe using:
df = df.astype(np.float64)
Below a small example showing the issue:
import numpy as np
import pandas as pd
# DataFrame without numbers (will not work for Math):
df = pd.DataFrame(['1', '2', '3'])
df.dtypes # object: placeholder for everything that is not number or timestamps (string, etc...)
# Casting DataFrame to make it suitable for Math Operations:
df = df.astype(np.float64)
df.dtypes # float64
But it is difficult to be sure of this without having the original file or data you are working with.
Carefully read the Exception
This is coherent with the Exception you get:
TypeError: ufunc 'add' did not contain a loop with signature matching types
dtype('<U21') dtype('<U21') dtype('<U21')
The method scipy.stats.linregress raises a TypeError (so it is about type) and is telling you than it cannot perform add operation because adding String dtype('<U21') does not make any sense in the context of a Linear Regression.
Understand the Design
Loading the data:
import io
fh = io.StringIO("""1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
100 10 61 55 29 77 61 42 70 73 98 62 25 86 49 68 68 26 35 62 100 56 10 97
57 89 25 89 48 56 67 17 98 10 25 90 17 52 85 56 18 20 74 97 82 63 45 87
192 371 47 173 202 144 17 147 174 483 170 422 285 13 77 116 500 136 276 392 220 121 441 268""")
df = pd.read_fwf(fh).astype(np.float)
Then we can regress the second row vs the first:
scipy.stats.linregress(df.iloc[0,:].values, df.iloc[1,:].values)
It returns:
LinregressResult(slope=0.12419744768547877, intercept=49.60998434527584, rvalue=0.11461693561751324, pvalue=0.5938303095361301, stderr=0.22949908667668056)
Assembling all together:
result = pd.DataFrame(columns=["slope", "intercept", "rvalue"])
for i, row in df.iterrows():
fit = scipy.stats.linregress(df.iloc[0,:], row)
result.loc[i] = (fit.slope, fit.intercept, fit.rvalue)
Returns:
slope intercept rvalue
0 1.000000 0.000000 1.000000
1 0.124197 49.609984 0.114617
2 -1.095801 289.293224 -0.205150
Which is, as far as I understand your question, what you expected.
The second exception you get comes because of this line:
x = regr(index)
You sent the index of the row instead of the row itself to the regression method.

simple usecase if numpy.delete() is not working

here is some code:
c = np.delete(a,b)
print(len(a))
print(a)
print(len(b))
print(b)
print(len(c))
print(c)
it gives back:
24
[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55]
20
[46, 35, 37, 54, 40, 49, 34, 48, 50, 38, 42, 47, 33, 52, 41, 36, 39, 44, 55,
51]
24
[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55]
as you can see, all elements of b appear in a, but are not being deleted. can not figure out why. any ideas? thank you.
numpy.delete does not remove the elements contained in b, it deletes a[b], in other words, b needs to contain the indices to remove. Since your b contains only values larger than the length of a, no values are removed. Currently out of bounds indices are ignored, but this will not be true in the future:
/usr/local/bin/ipython3:1: DeprecationWarning: in the future out of bounds indices will raise an error instead of being ignored by `numpy.delete`.
#!/usr/bin/python3
A pure Python solution would be to use set:
set_b = set(b)
c = np.array([x for x in a if x not in set_b])
# array([32, 43, 45, 51, 53])
And using numpy broadcasting to create a mask to determine which values to delete:
c = a[~(a[None,:] == b[:, None]).any(axis=0)]
# array([32, 43, 45, 51, 53])
They are about the same speed with the given example, but the numpy approach and takes more memory (because it generates a 2D matrix that contains all combinations of a and b).

Pivot Table to Dictionary

I have this pivot table:
[in]:unit_d
[out]:
units
store_nbr item_nbr
1 9 27396
28 4893
40 254
47 2409
51 925
89 157
93 1103
99 492
2 5 55104
11 655
44 117125
85 106
93 653
I want to have a dictionary with 'store_nbr' as the key and 'item_nbr' as the values.
So, {'1': [9, 28, 40,...,99], '2': [5, 11 ,44, 85, 93], ...}
I'd use groupby here, after resetting the index to make it into columns:
>>> d = unit_d.reset_index()
>>> {k: v.tolist() for k, v in d.groupby("store_nbr")["item_nbr"]}
{1: [9, 28, 40, 47, 51, 89, 93, 99], 2: [5, 11, 44, 85, 93]}

Categories