Try to calculate:
by storing 1/N and X as float variable. Which result do you get for N=10000, 100000 and 1000000?
Now try to use double variables. Does it change outcome?
In order to do this I wrote this code:
#TRUNCATION ERRORS
import numpy as np #library for numerical calculations
import matplotlib.pyplot as plt #library for plotting purposes
x = 0
n = 10**6
X = []
N = []
for i in range(1,n+1):
x = x + 1/n
item = float(x)
item2 = float(n)
X.append(item)
N.append(item2)
plt.figure() #block for plot purpoes
plt.plot(N,X,marker=".")
plt.xlabel('N')
plt.ylabel('X')
plt.grid()
plt.show()
The output is:
This is wrong because the output should be like that (showed in the lecture):
First, you want to plot N on the x-axis, but you're actually plotting 1/N.
Second, you aren't calculating the expression you think you're calculating. It looks like you're calculating sum_{i=1..N}(1/i).
You need to calculate sum_{i=1..N}(1/N) which is 1/N + 1/N + ... + 1/N repeated N times. In other words, you want to calculate N * 1 / N, which should be equal to 1. Your exercise is showing you that it won't be when you use floating-point math, because reasons.
To do this correctly, let's first define a list of values for N
Nvals = [1, 10, 100, 1000, 10000, 100000, 1000000]
Let's define a function that will calculate our summation for a single value of N:
def calc_sum(N):
total = 0
for i in range(N):
total += 1 / N
return total
Next, let's create an empty list of Xvals and fill it up with the calculated sum for each N
Xvals = []
for N in Nvals:
Xvals.append(calc_sum(N))
or, as a listcomprehension:
Xvals = [calc_sum(N) for N in Nvals]
Now we get this value of Xvals:
[1.0,
0.9999999999999999,
1.0000000000000007,
1.0000000000000007,
0.9999999999999062,
0.9999999999980838,
1.000000000007918]
Clearly, they are not all equal to 1.
You can increase the number of values in Nvals to get a denser plot, but the idea is the same.
Now pay attention to what #khelwood said in their comment:
"float variables" and "double variables" are not a thing in Python. Variables don't have types. And floats are 64 bit
Python floats are all 64-bit floating-point numbers, so you can't do your exercise in python. If you used a language like C or C++ that actually has 32-bit float and 64-bit double types, you'd get something like this:
Try it online
#include <iostream>
float calc_sum_f(int N) {
float total = 0.0;
for (int i = 0; i < N; i++)
total += ((float)1 / N);
return total;
}
double calc_sum_d(int N) {
double total = 0.0;
for (int i = 0; i < N; i++)
total += ((double)1 / N);
return total;
}
int main()
{
int Nvals[7] = { 1, 10, 100, 1000, 10000, 100000, 1000000 };
std::cout << "N\tdouble\tfloat" << std::endl;
for (int ni = 0; ni < 7; ni++) {
int N = Nvals[ni];
double x_d = calc_sum_d(N);
float x_f = calc_sum_f(N);
std::cout << N << "\t" << x_d << "\t" << x_f << std::endl;
}
}
Output:
N double float
1 1 1
10 1 1
100 1 0.999999
1000 1 0.999991
10000 1 1.00005
100000 1 1.00099
1000000 1 1.00904
Here you can see that 32-bit floats don't have enough precision beyond a certain value of N to accurately calculate N * 1 / N. There's no reason the plot should look like your hand-drawn plot, because there's no reason it will decrease consistently as we can evidently see here.
Using numpy Thanks for the suggestion #Kelly to get 32-bit and 64-bit floating point types in python, we can similarly define two functions:
def calc_sum_64(N):
c = np.float64(0)
one_over_n = np.float64(1) / np.float64(N)
for i in range(N):
c += one_over_n
return c
def calc_sum_32(N):
c = np.float32(0)
one_over_n = np.float32(1) / np.float32(N)
for i in range(N):
c += one_over_n
return c
Then, we find Xvals_64 and Xvals_32
Nvals = [10**i for i in range(7)]
Xvals_32 = [calc_sum_32(N) for N in Nvals]
Xvals_64 = [calc_sum_64(N) for N in Nvals]
And we get:
Xvals_32 = [1.0, 1.0000001, 0.99999934, 0.9999907, 1.0000535, 1.0009902, 1.0090389]
Xvals_64 = [1.0,
0.9999999999999999,
1.0000000000000007,
1.0000000000000007,
0.9999999999999062,
0.9999999999980838,
1.000000000007918]
I haven't vectorized my numpy code to make it easier for you to understand what's going on, but Kelly shows a great way to vectorize it to speed up the calculation:
sum(1/N) from i = 1 to N is (1 / N) + (1 / N) + (1 / N) + ... {N times} , which is an array of N ones, divided by N and then summed. You could write the calc_sum_32 and calc_sum_64 functions like so:
def calc_sum_32(N):
return (np.ones((N,), dtype=np.float32) / np.float32(N)).sum()
def calc_sum_64(N):
return (np.ones((N,), dtype=np.float64) / np.float64(N)).sum()
You can then call these functions for every value of N you care about, and get a plot that looks like so, which shows the result oscillating about 1 for float32, but barely any oscillation for float64:
Related
If i specify a number, is there a way to assign a random portion of that number as a total to several groups?
e.g Total 1.
Group 1 - 0.1
Group 2 - 0.3
Group 3 - 0.4
Group 4 - 0.2
It's very simple to do in java ...
You generate a random number from 1 to 100 by using a function like this
// min = 1 max=100 in your case
public int getRandomNumber(int min, int max) {
return (int) ((Math.random() * (max - min)) + min);
}
Then in your function which selects the element you do like this ...
Group 1 - 0.1 Group 2 - 0.3 Group 3 - 0.4 Group 4 - 0.2
If the number is 1 to 10 select group 1.
If the number is 11 to 40 select group 2.
If the number is 41 to 80 select group 3
If the number is 81 to 100 select group 4
Its as easy as calculating percentages.
Does this solve your problem ? Let me know in the comments.
Well if you don't care about somewhat same distribution, you can just
void foo(){
double total = 1.0;
double[] group = new double[4];
for(int i = 0; i < group.length-1; i++){
//get random 0.0-1.0
double rand = getRandom(0.0, 1.0);
double portion = random * total
group[i] = portion;
total -= portion;
}
group[group.length-1] = total;
}
If you do care, you can set getRandom to your liking, f.e.
//get random 0.0-1.0
double rand = getRandom(1.0/group.length*0.7, 1.0/group.length * 1.3);
so it will be 70% to 130% of the average.
This method can distribute a value to an array or "groups" of that value type randomly,
you can switch out double with int or float as you see fit.
public void assignGroups(double[] groups, double number){
Random rand = new Random();
for(int i=0; i<groups.length-1; i++){
double randomNum = rand.nextDouble()*number; // randomly picks number between 0 and the current amount left
groups[i] = randomNum;
number-=randomNum; // subtracts the random number from the total
}
groups[groups.length-1] = number; // sets the last value of the groups to the number left
}
In Python:
from random import random
num_groups = 5 # Number of groups
total = 5 # The given number
base = [0.] + sorted(random() for _ in range(num_groups - 1)) + [1.]
portions = [(right - left) * total for left, right in zip(base[:-1], base[1:])]
Result (print(portions)): A list of length num_groups (number of groups) which contains the distributed total (given number):
[2.5749833618941995, 0.010389749273946869, 0.3718137712569358, 0.3725336641218424, 1.6702794534530752]
Using Java:
private static double roundValue(double value, int precision) {
return (double) Math.round(value * Math.pow(10, precision))/
Math.pow(10, precision);
}
public static double[] generateGroups(double total, int groupsNumber, int precision){
double[] result = new double[groupsNumber];
double sum = 0;
for (int i = 0; i < groupsNumber - 1; i++) {
result[i] = roundValue((total - sum) * Math.random(), precision);
sum += result[i];
}
result[groupsNumber-1] = roundValue((total - sum), precision);
return result;
}
public static void main(String... args) {
double[] result = generateGroups(1.0, 4, 1);
System.out.println(Arrays.toString(result));
}
I am still teaching some R mainly to myself (and to my students).
Here's an implementation of the Collatz sequence in R:
f <- function(n)
{
# construct the entire Collatz path starting from n
if (n==1) return(1)
if (n %% 2 == 0) return(c(n, f(n/2)))
return(c(n, f(3*n + 1)))
}
Calling f(13) I get
13, 40, 20, 10, 5, 16, 8, 4, 2, 1
However note that a vector is growing dynamically in size here. Such moves tend to be a recipe for inefficient code. Is there a more efficient version?
In Python I would use
def collatz(n):
assert isinstance(n, int)
assert n >= 1
def __colla(n):
while n > 1:
yield n
if n % 2 == 0:
n = int(n / 2)
else:
n = int(3 * n + 1)
yield 1
return list([x for x in __colla(n)])
I found a way to write into vectors without specifying their dimension a priori. Therefore a solution could be
collatz <-function(n)
{
stopifnot(n >= 1)
# define a vector without specifying the length
x = c()
i = 1
while (n > 1)
{
x[i] = n
i = i + 1
n = ifelse(n %% 2, 3*n + 1, n/2)
}
x[i] = 1
# now "cut" the vector
dim(x) = c(i)
return(x)
}
I was curious to see how a C++ implementation through Rcpp would compare to your two base R approaches. Here are my results.
First let's define a function collatz_Rcpp that returns the Hailstone sequence for a given integer n. The (non-recursive) implementation was adapted from Rosetta Code.
library(Rcpp)
cppFunction("
std::vector<int> collatz_Rcpp(int i) {
std::vector<int> v;
while(true) {
v.push_back(i);
if (i == 1) break;
i = (i % 2) ? (3 * i + 1) : (i / 2);
}
return v;
}
")
We now run a microbenchmark analysis using both your base R and the Rcpp implementation. We calculate the Hailstone sequences for the first 10000 integers
# base R implementation
collatz_R <- function(n) {
# construct the entire Collatz path starting from n
if (n==1) return(1)
if (n %% 2 == 0) return(c(n, collatz(n/2)))
return(c(n, collatz(3*n + 1)))
}
# "updated" base R implementation
collatz_R_updated <-function(n) {
stopifnot(n >= 1)
# define a vector without specifying the length
x = c()
i = 1
while (n > 1) {
x[i] = n
i = i + 1
n = ifelse(n %% 2, 3*n + 1, n/2)
}
x[i] = 1
# now "cut" the vector
dim(x) = c(i)
return(x)
}
library(microbenchmark)
n <- 10000
res <- microbenchmark(
baseR = sapply(1:n, collatz_R),
baseR_updated = sapply(1:n, collatz_R_updated),
Rcpp = sapply(1:n, collatz_Rcpp))
res
# expr min lq mean median uq max
# baseR 65.68623 73.56471 81.42989 77.46592 83.87024 193.2609
#baseR_updated 3861.99336 3997.45091 4240.30315 4122.88577 4348.97153 5463.7787
# Rcpp 36.52132 46.06178 51.61129 49.27667 53.10080 168.9824
library(ggplot2)
autoplot(res)
The (non-recursive) Rcpp implementation seems to be around 30% faster than the original (recursive) base R implementation. The "updated" (non-recursive) base R implementation is significantly slower than the original (recursive) base R approach (the microbenchmark takes around 10 minutes to finish on my MacBook Air due to baseR_updated).
I'm running these two codes. They both perform the same mathematical procedure (calculate series value up to large terms), and also, as expected, produce the same output.
But for some reason, the PyPy code is running significantly faster than the C code.
I cannot figure out why this is happening, as I expected the C code to run faster.
I'd be thankful if anyone could help me by clarifying that (maybe there is a better way to write the C code?)
C code:
#include <stdio.h>
#include <math.h>
int main()
{
double Sum = 0.0;
long n;
for(n = 2; n < 1000000000; n = n + 1) {
double Sign;
Sign = pow(-1.0, n % 2);
double N;
N = (double) n;
double Sqrt;
Sqrt = sqrt(N);
double InvSqrt;
InvSqrt = 1.0 / Sqrt;
double Ln;
Ln = log(N);
double LnSq;
LnSq = pow(Ln, 2.0);
double Term;
Term = Sign * InvSqrt * LnSq;
Sum = Sum + Term;
}
double Coeff;
Coeff = Sum / 2.0;
printf("%0.14f \n", Coeff);
return 0;
}
PyPy code (faster implementation of Python):
from math import log, sqrt
Sum = 0
for n in range(2, 1000000000):
Sum += ((-1)**(n % 2) * (log(n))**2) / sqrt(n)
print(Sum / 2)
This is far from surprising, PyPy does a number of run-time optimizations by default, where as C compilers by default do not perform any optimization. Dave Beazley's 2012 PyCon Keynote covers this pretty explicitly and provides an deep explanation of why this happens.
Per the referenced talk, C should surpass PyPy when compiled with optimization level 2 or 3 (you can watch the full section on the performance of fibonacci generation in cpython, pypy and C starting here).
Additionally to compiler's optimisation level, you can improve your code as well:
int main()
{
double Sum = 0.0;
long n;
for(n = 2; n < 1000000000; ++n)
{
double N = n; // cast is implicit, only for code readability, no effect on runtime!
double Sqrt = sqrt(N);
//double InvSqrt; // spare that:
//InvSqrt = 1.0/Sqrt; // you spare this division with!
double Ln = log(N);
double LnSq;
//LnSq = pow(Ln,2.0);
LnSq = Ln*Ln; // more efficient
double Term;
//Term = Sign * InvSqrt * LnSq;
Term = LnSq / Sqrt;
if(n % 2)
Term = -Term; // just negating, no multiplication
// (IEEE provided: just one bit inverted)
Sum = Sum + Term;
}
// ...
Now we can simplify the code a little more:
int main()
{
double Sum = 0.0;
for(long n = 2; n < 1000000000; ++n)
// ^^^^ possible since C99, better scope, no runtime effect
{
double N = n;
double Ln = log(N);
double Term = Ln * Ln / sqrt(N);
if(n % 2)
Sum -= Term;
else
Sum += Term;
}
// ...
I have been looking at different algorithms for the problem on Leetcode beginning with approach 1. The problem requires one to calculate the total water area (column width = 1) if the array values were heights of walls.
The first approach finds the minimum height of the maximum wall heights of the left and right sides of each column and adds water to the top of the given column if the column height is less than the minimum. The minimum is taken as this is the highest the water collected can reach. To calculate the maximums of each side requires making n-1 traversals for both left and right.
I code in Python but here's the code in C++ as per the solution given on Leetcode. The problem is not understanding C++ but the math that is explained after the code.
int trap(vector<int>& height)
{
int ans = 0;
int size = height.size();
for (int i = 1; i < size - 1; i++) {
int max_left = 0, max_right = 0;
for (int j = i; j >= 0; j--) { //Search the left part for max bar size
max_left = max(max_left, height[j]);
}
for (int j = i; j < size; j++) { //Search the right part for max bar size
max_right = max(max_right, height[j]);
}
ans += min(max_left, max_right) - height[i];
}
return ans;
}
What I don't get is how they arrived at time complexity O(n^2). I got O(n^3).
Index | Comparisons/Traversals
-------------------------------
1 | n
2 | n
3 | n
4 | n
. | .
. | .
. | .
n-1 | n
The total operations performed here would be:
n + 2n + 3n + 4n + n(n-1) + n^2
Now using the arithmetic series formula
Sum = n * (a_1 + a_n)/2 obtained here and pasted below
The sum above would end up being:
Sum = n * [n + n(n-1)]/ 2 = n * [n + n^2- n)]/ 2 = (n^3)/2
which would give O(n^3).
What am I getting wrong in my reasoning? It seems to be O(n^2) as GeeksForGeeks also points it out as such.
The problem is here:
The total operations performed here would be: n + 2n + 3n + 4n + n(n-1) + n^2
But each row of your table is just n, not n, 2n, …, n^2.
And from a quick glance, it's obvious that you've filled out the table correctly, too: the inner loop has O(n) constant-time steps.
All of the rest of the math you're doing is correct, but irrelevant. To sum up n copies of n, you just multiple n * n, which is of course n^2, not O(n^3).
The complexity of this algorithm can be also, probably in an easier way, seen by just considering the fact that you have 2 nested loops. All inner operations are O(1) so they don't increase the complexity in anyway. Considering nested loops it is pretty obvious that the algorithm is of order O(n^2) because the range of the loops is n and the step is 1.
I read in this question that eigen has very good performance. However, I tried to compare eigen MatrixXi multiplication speed vs numpy array multiplication. And numpy performs better (~26 seconds vs. ~29). Is there a more efficient way to do this eigen?
Here is my code:
Numpy:
import numpy as np
import time
n_a_rows = 4000
n_a_cols = 3000
n_b_rows = n_a_cols
n_b_cols = 200
a = np.arange(n_a_rows * n_a_cols).reshape(n_a_rows, n_a_cols)
b = np.arange(n_b_rows * n_b_cols).reshape(n_b_rows, n_b_cols)
start = time.time()
d = np.dot(a, b)
end = time.time()
print "time taken : {}".format(end - start)
Result:
time taken : 25.9291000366
Eigen:
#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;
int main()
{
int n_a_rows = 4000;
int n_a_cols = 3000;
int n_b_rows = n_a_cols;
int n_b_cols = 200;
MatrixXi a(n_a_rows, n_a_cols);
for (int i = 0; i < n_a_rows; ++ i)
for (int j = 0; j < n_a_cols; ++ j)
a (i, j) = n_a_cols * i + j;
MatrixXi b (n_b_rows, n_b_cols);
for (int i = 0; i < n_b_rows; ++ i)
for (int j = 0; j < n_b_cols; ++ j)
b (i, j) = n_b_cols * i + j;
MatrixXi d (n_a_rows, n_b_cols);
clock_t begin = clock();
d = a * b;
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
std::cout << "Time taken : " << elapsed_secs << std::endl;
}
Result:
Time taken : 29.05
I am using numpy 1.8.1 and eigen 3.2.0-4.
My question has been answered by #Jitse Niesen and #ggael in the comments.
I need to add a flag to turn on the optimizations when compiling: -O2 -DNDEBUG (O is capital o, not zero).
After including this flag, eigen code runs in 0.6 seconds as opposed to ~29 seconds without it.
Change:
a = np.arange(n_a_rows * n_a_cols).reshape(n_a_rows, n_a_cols)
b = np.arange(n_b_rows * n_b_cols).reshape(n_b_rows, n_b_cols)
into:
a = np.arange(n_a_rows * n_a_cols).reshape(n_a_rows, n_a_cols)*1.0
b = np.arange(n_b_rows * n_b_cols).reshape(n_b_rows, n_b_cols)*1.0
This gives factor 100 boost at least at my laptop:
time taken : 11.1231250763
vs:
time taken : 0.124922037125
Unless you really want to multiply integers. In Eigen it is also quicker to multiply double precision numbers (amounts to replacing MatrixXi with MatrixXd three times), but there I see just 1.5 factor: Time taken : 0.555005 vs 0.846788.
Is there a more efficient way to do this eigen?
Whenever you have a matrix multiplication where the matrix on the left side of the = does not also appear on the right side, you can safely tell the compiler that there is no aliasing taking place. This will safe you one unnecessary temporary variable and assignment operation, which for big matrices can make an important difference in performance. This is done with the .noalias() function as follows.
d.noalias() = a * b;
This way a*b is directly evaluated and stored in d. Otherwise, to avoid aliasing problems, the compiler will first store the product into a temporary variable and then assign the this variable to your target matrix d.
So, in your code, the line:
d = a * b;
is actually compiled as follows:
temp = a*b;
d = temp;