Since a couple of hours, I am trying to print a simple time vector in a txt file using Python.
import numpy as np
Tp = 2000 * 10**(-9)
dt = Tp / (90000)
t = np.linspace(0,Tp,dt)
timing = open("time.txt","w")
for ii in range(len(t)) :
timing.write(str(t[ii]))
timing.write("\n")
timing.close()
But I still get an empty file and I don't understand at all why.
Maybe I have to be more specific in the function with the precision I want.
Since I have a lot of small numbers (4e-10 ..) to process I would like to understand a general method to write variable (not the entire vector at once) on a txt file with a exponential notation (In Matlab it's kind of automatic I think).
Thx
You have an error using linspace. Please check https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
Try this:
import numpy as np
Tp = 2000 * 10**(-9)
# dt = Tp / 90000.0
dt = 90000
t = np.linspace(0,Tp,dt)
timing = open("time.txt","w")
for ii in range(len(t)) :
timing.write(str(t[ii]))
timing.write("\n")
timing.close()
Related
Function I tried to replicate:
doing a project for coursework in which I need to make the blackbody function and manipulate it in some ways.
I'm trying out alternate equations and in doing 2 of them i keep getting over flow error.
this is the error message:
alt2_2 = (1/((const_e**(freq/temp))-1))
OverflowError: (34, 'Result too large')
temp is given in kelvin (im using 5800 as my test value as it is approximately the temp of the sun)
freq is speed of light divided by whatever wavelength is inputted
freq = (3*(10**8))/wavelength
in this case i am using 0.00000005 as the test value for wavelength.
and const e is 2.7182
first time using stack. also first time doing a project on my own, any help appreciated.
This does the blackbody computation with your values.
import math
# Planck constant
h = 6.6e-34
# Boltzmann constant
k = 1.38e-23
# Speed of light
c = 3e+8
# Wavelength
wl = 0.00000005
# Temp
T = 5800
# Frequency
f = c/wl
# This is the exponent for e (about 49).
k1 = h*f / (k*T)
# This computes the spectral radiance.
Bvh = 2*f*f*f*h / (math.exp(k1)-1)
print(Bvh)
Output:
9.293819741690355e-08
Since we only used one or two digits on the way in, the resulting value is only good to one or two digits, 9.3E-08.
I was trying to find a fast way to sort strings in Python and the locale is a non-concern i.e. I just want to sort the array lexically according to the underlying bytes. This is perfect for something like radix sort. Here is my MWE
import numpy as np
import timeit
# randChar is workaround for MemoryError in mtrand.RandomState.choice
# http://stackoverflow.com/questions/25627161/how-to-solve-memory-error-in-mtrand-randomstate-choice
def randChar(f, numGrp, N) :
things = [f%x for x in range(numGrp)]
return [things[x] for x in np.random.choice(numGrp, N)]
N=int(1e7)
K=100
id3 = randChar("id%010d", N//K, N) # small groups (char)
timeit.Timer("id3.sort()" ,"from __main__ import id3").timeit(1) # 6.8 seconds
As you can see it took 6.8 seconds which is almost 10x slower than R's radix sort below.
N = 1e7
K = 100
id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE)
system.time(sort(id3,method="radix"))
I understand that Python's .sort() doesn't use radix sort, is there an implementation somewhere that allows me to sort strings as performantly as R?
AFAIK both R and Python "intern" strings so any optimisations in R can also be done in Python.
The top google result for "radix sort strings python" is this gist which produced an error when sorting on my test array.
It is true that R interns all strings, meaning it has a "global character cache" which serves as a central dictionary of all strings ever used by your program. This has its advantages: the data takes less memory, and certain algorithms (such as radix sort) can take advantage of this structure to achieve higher speed. This is particularly true for the scenarios such as in your example, where the number of unique strings is small relative to the size of the vector. On the other hand it has its drawbacks too: the global character cache prevents multi-threaded write access to character data.
In Python, afaik, only string literals are interned. For example:
>>> 'abc' is 'abc'
True
>>> x = 'ab'
>>> (x + 'c') is 'abc'
False
In practice it means that, unless you've embedded data directly into the text of the program, nothing will be interned.
Now, for your original question: "what is the fastest way to sort strings in python"? You can achieve very good speeds, comparable with R, with python datatable package. Here's the benchmark that sorts N = 10⁸ strings, randomly selected from a set of 1024:
import datatable as dt
import pandas as pd
import random
from time import time
n = 10**8
src = ["%x" % random.getrandbits(10) for _ in range(n)]
f0 = dt.Frame(src)
p0 = pd.DataFrame(src)
f0.to_csv("test1e8.csv")
t0 = time(); f1 = f0.sort(0); print("datatable: %.3fs" % (time()-t0))
t0 = time(); src.sort(); print("list.sort: %.3fs" % (time()-t0))
t0 = time(); p1 = p0.sort_values(0); print("pandas: %.3fs" % (time()-t0))
Which produces:
datatable: 1.465s / 1.462s / 1.460s (multiple runs)
list.sort: 44.352s
pandas: 395.083s
The same dataset in R (v3.4.2):
> require(data.table)
> DT = fread("test1e8.csv")
> system.time(sort(DT$C1, method="radix"))
user system elapsed
6.238 0.585 6.832
> system.time(DT[order(C1)])
user system elapsed
4.275 0.457 4.738
> system.time(setkey(DT, C1)) # sort in-place
user system elapsed
3.020 0.577 3.600
Jeremy Mets posted in the comments of this blog post that Numpy can sort string fairly by converting the array to np.araray. This indeed improve performance, however it is still slower than Julia's implementation.
import numpy as np
import timeit
# randChar is workaround for MemoryError in mtrand.RandomState.choice
# http://stackoverflow.com/questions/25627161/how-to-solve-memory-error-in-mtrand-randomstate-choice
def randChar(f, numGrp, N) :
things = [f%x for x in range(numGrp)]
return [things[x] for x in np.random.choice(numGrp, N)]
N=int(1e7)
K=100
id3 = np.array(randChar("id%010d", N//K, N)) # small groups (char)
timeit.Timer("id3.sort()" ,"from __main__ import id3").timeit(1) # 6.8 seconds
Calling MATLAB from Python is bound to give some performance reduction that I could avoid by rewriting (a lot of) code in Python. However, this isn't a realistic option for me, but it annoys me that a huge loss of efficiency lies in the simple conversion from a numpy array to a MATLAB double.
I'm talking about the following conversion from data1 to data1m, where
data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
data1m = matlab.double(list(data1))
Here matlab.double comes from Mathworks own MATLAB package / engine. The second line of code takes 20 s on my system, which just seems like too much for a conversion that doesn't really do anything other than making the numbers 'edible' for MATLAB.
So basically I'm looking for a trick opposite to the one given here that works for converting MATLAB output back to Python.
Passing numpy arrays efficiently
Take a look at the file mlarray_sequence.py in the folder PYTHONPATH\Lib\site-packages\matlab\_internal. There you will find the construction of the MATLAB array object. The performance problem comes from copying data with loops within the generic_flattening function.
To avoid this behavior we will edit the file a bit. This fix should work on complex and non-complex datatypes.
Make a backup of the original file in case something goes wrong.
Add import numpy as np to the other imports at the beginning of the file
In line 38 you should find:
init_dims = _get_size(initializer)
replace this with:
try:
init_dims=initializer.shape
except:
init_dims = _get_size(initializer)
In line 48 you should find:
if is_complex:
complex_array = flat(self, initializer,
init_dims, typecode)
self._real = complex_array['real']
self._imag = complex_array['imag']
else:
self._data = flat(self, initializer, init_dims, typecode)
Replace this with:
if is_complex:
try:
self._real = array.array(typecode,np.ravel(initializer, order='F').real)
self._imag = array.array(typecode,np.ravel(initializer, order='F').imag)
except:
complex_array = flat(self, initializer,init_dims, typecode)
self._real = complex_array['real']
self._imag = complex_array['imag']
else:
try:
self._data = array.array(typecode,np.ravel(initializer, order='F'))
except:
self._data = flat(self, initializer, init_dims, typecode)
Now you can pass a numpy array directly to the MATLAB array creation method.
data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
#faster
data1m = matlab.double(data1)
#or slower method
data1m = matlab.double(data1.tolist())
data2 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,)).astype(np.complex128)
#faster
data1m = matlab.double(data2,is_complex=True)
#or slower method
data1m = matlab.double(data2.tolist(),is_complex=True)
The performance in MATLAB array creation increases by a factor of 15 and the interface is easier to use now.
While awaiting better suggestions, I'll post the best trick I've come up with so far. It comes down to saving the file with `scipy.io.savemat´ and then loading this file in MATLAB.
This is not the prettiest hack and it requires some care to ensure different processes relying on the same script don't end up writing and loading each other's .mat files, but the performance gain is worth it for me.
As a test case I wrote two simple, almost identical MATLAB functions that require 2 numpy arrays (I tested with length 1000000) and one int as input.
function d = test(x, y, fs_signal)
d = sum((x + y))./double(fs_signal);
function d = test2(path)
load(path)
d = sum((x + y))./double(fs_signal);
The function test requires conversion, while test2 requires saving.
Testing test: Converting the two numpy arrays takes cirka 40 s on my system. The total time to prepare for and run test comes down to 170 s
Testing test2: Saving the arrays and int takes cirka 0.35 s on my system. Suprisingly, loading the .mat file in MATLAB is extremely efficient (or more suprisingly, it is extremely ineffcient at dealing with its doubles)... The total time to prepare for and run test2 comes down to 0.38 s
That's a performance gain of almost 450x...
My situation was a bit different (python script called from matlab) but for me converting the ndarray into an array.array massively speed up the process. Basically it is very similar to Alexandre Chabot solution but without the need to alter any files:
#untested i.e. only deducted from my "matlab calls python" situation
import numpy
import array
data1 = numpy.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
ar = array.array('d',data1.flatten('F').tolist())
p = matlab.double(ar)
C = matlab.reshape(p,data1.shape) #this part I am definitely not sure about if it will work like that
At least if done from Matlab the combination of "array.array" and "double" is relative fast. Tested with Matlab 2016b + python 3.5.4 64bit.
I'd like to read differential voltage values from an MCP3304 (5v VDD, 3.3v Vref, main channel = 7, diff channel = 6) connected to an RPi 2 b+ as close as possible to the MCP3304's max sample rate of 100ksps. Preferably, I'd get > 1 sample per 100µs (> 10 ksps).
A kind user recently suggested I try porting my code to C for some speed gains. I'm VERY new to C, so thought I'd give Cython a shot, but can't seem to figure out how to tap into the C-based speed gains.
My guess is that I need to write a .pyx file that includes a more-raw means of accessing the ADC's bits/bytes via SPI than the python package I'm currently using (the python-wrapped gpiozero package). 1) Does this seem correct, and, if so, might someone 2) please help me understand how to manipulate bits/bytes appropriately for the MCP3304 in a way that will produce speed gains from Cython? I've seen C tutorials for the MCP3008, but am having trouble adapting this code to fit the timing laid out in the MCP3304's spec sheet; though I might be able to adapt a Cython-specific MCP3008 (or other ADC) tutorial to fit the MCP3304.
Here's a little .pyx loop I wrote to test how fast I'm reading voltage values. (Timing how long it takes to read 25,000 samples). It's ~ 9% faster than running it straight in Python.
# Load packages
import time
from gpiozero import MCP3304
# create a class to ping PD every ms for 1 minute
def pd_ping():
cdef char *FILENAME = "testfile.txt"
cdef double v
# cdef int adc_channel_pd = 7
cdef size_t i
# send message to user re: expected timing
print "Runing for ", 25000 , "iterations. Fingers crossed!"
print time.clock()
s = []
for i in range(25000):
v = MCP3304(channel = 7, diff = True).value * 3.3
# concatenate
s.append( str(v) )
print "donezo" , time.clock()
# write to file
out = '\n'.join(s)
f = open(FILENAME, "w")
f.write(out)
There is probably no need to create an MCP3304 object for each iteration. Also conversion from float to string can be delayed.
s = []
mcp = MCP3304(channel = 7, diff = True)
for i in range(25000):
v = mcp.value * 3.3
s.append(v)
out = '\n'.join('{:.2f}'.format(v) for v in s) + '\n'
If that multiplication by 3.3 is not strictly necessary at that point, it could be done later.
so I'm calling a python script from my Excel file with VBA and the ExcelPython reference. The python script is working fine except for Excel keeps telling me I have a type mismatch on the noted line:
Function calcCapValue(times As Range, fRates As Range, strike As Range, vol As Range, delta As Double, pv As Range) As Variant
Set methods = PyModule("PyFunctions", AddPath:=ThisWorkbook.Path)
Set result = PyCall(methods, "CalculateCapValue", KwArgs:=PyDict("times", times.Value2, "fwdRates", fRates.Value2, "strike", strike.Cells(1, 1).Value2, "flatVol", vol.Cells(1, 1).Value2, "delta", delta, "pv", pv.Cells(1, 1).Value2))
calcCapValue = PyVar(PyGetItem(result, "res")) ' <--- TYPE MISMATCH HERE
Exit Function
End Function
Can't figure out why, I'm following the example code from here: https://sourceforge.net/p/excelpython/wiki/Putting%20it%20all%20together/
and here: http://www.codeproject.com/Articles/639887/Calling-Python-code-from-Excel-with-ExcelPython
Still getting this type mismatch and I can't figure out why.
Here's the python script:
#imports
import numpy as np
from scipy.stats import norm
#
# Cap Price Calculation
#
def CalculateCapValue(times, fwdRates, strike, flatVol, delta, pv):
capPrice = 0;
#Iterate through each time for caplet price
for i in range(0, len(times)):
ifr = float(fwdRates[i][0])
iti = float(times[i][0])
d1 = (np.log(ifr/strike)+((flatVol**2)*iti)/2)/(flatVol*np.sqrt(iti))
d2 = d1 - (flatVol*np.sqrt(iti))
Nd1 = norm.cdf(d1)
Nd2 = norm.cdf(d2)
capPrice += pv*delta*(ifr*Nd1 - strike*Nd2)
return {"res": capPrice}
Thanks!
Question answered on ExcelPython discussion forums on Sourceforge:
http://sourceforge.net/p/excelpython/discussion/general/thread/97f6c1b0/