I'm debugging my Theano code and printing the values of my tensors as advised here:
a_printed = theano.printing.Print("a: ")(a)
The issue is that, when a is a relatively large matrix, the value is truncated to the first couple of rows and the last couple of rows. However, I would like the whole matrix to be printed. Is this possible?
I believe you can print the underlying numpy, accessed as a.get_value(). Within numpy you can modify printing by
numpy.set_printoptions(threshold=10000000)
where threshold should be bigger than the number of elements expected, and then the whole array will show. See the documentation for set_printoptions. Note that if outputted to a console, this may freeze up because of the possibly very large amount of text.
Related
I came across a problem when trying to do the resize as follow:
I am having a 2D array after processing data and now I want to resize the data, with each row ignoring the first 5 items.
What I am doing right now is to:
edit: this apporach works fine as long as you make sure you are working with a list, but not string. it failed to work on my side because I haven't done the convertion from string to list properly.
so it ends up eliminating the first five characters in the entire string.
2dArray=[[array1],[array2]]
new_Resize_2dArray= [array[5:] for array in 2dArray]
However, it does not seems to work as it just recopy all the element over to the new_Resize_2dArray.
I would like to ask for help to see what did I do wrong or if there is any scientific calculation library I could use to acheive this.
First, because python list indexing is zero-based, your code should read new_Resize_2dArray= [array[5:] for array in 2dArray] if you want to not include the first 5 columns. Otherwise, I see no issue with your single line of code.
As for scientific computing libraries, numpy is a highly prevalent 3rd party package with a high-performance multidimensional array type ndarray. Using ndarrays, your code could be shortened to new_Resize_2dArray = 2dArray[:,5:]
Aside: It would help to include a bit more of your code or a minimum example where you are getting the unexpected result (e.g., use a fake/stand-in 2d array to see if it works as expected or still fails.
I'm kind of newbie in Python, and I read some code written by someone experienced. This part should take part of Numpy array
a=np.random.random((10000,32,32,3)) # random values as an example
mask=list(range(5000))
a=a[mask]
For me it looks rather wasteful to create another list to get part of array. Moreover, resulting array is really the first 5000 fields, no complex selection is required.
As far as I know, the following code should give the same result:
a=a[:5000]
What is advantage of the first example? Is it faster? Or I missed something?
I have created a function to rotate a vector by a quaternion:
def QVrotate_toLocal(Quaternion,Vector):
#NumSamples x Quaternion[w,x,y,z]
#NumSamples x Vector[x,y,z]
#For example shape (20000000,4) with range 0,1
# shape (20000000,3) with range -100,100
#All numbers are float 64s
Quaternion[:,2]*=-1
x,y,z=QuatVectorRotate(Quaternion,Vector)
norm=np.linalg.norm(Quaternion,axis=1)
x*=(1/norm)
y*=(1/norm)
z*=(1/norm)
return np.stack([x,y,z],axis=1)
Everything within QuatVectorRotate is addition and multiplication of (20000000,1) numpy arrays
For the data I have (20million samples for both the quaternion and vector), every time I run the code the solution oscillates between a (known) correct solution and very incorrect solution. Never deviating from pattern correct,incorrect,correct,incorrect...
This kind of numerical oscillation in static code usually means there is an ill-conditioned matrix which is being operated on, python is running out of floating point precision, or there is a silent memory overflow somewhere.
There is little linear algebra in my code, and I have checked and found the norm line to be static with every run. The problem seems to be happening somewhere in lines a= ... to d= ...
Which led me to believe that given these large arrays I was running out of memory somewhere along the line. This could still be the issue, but I dont believe it is; I have 16gb memory, and while running I never get above 75% usage. But again, I do not know enough about memory allocation to definitively rule this out. I attempted to force garbage collection at the beginning and end of the function to no avail.
Any ideas would be appreciated.
EDIT:
I just reproduced this issue with the following data and same behavior was observed.
Q=np.random.random((20000000,4))
V=np.random.random((20000000,3))
When you do Quaternion[:,2]*=-1 in the first line, you are mutating the Quaternion array. This is not a local copy of that array, but the actual array you pass in from the outside.
So every time you run this code, you have different signs on those elements. After having run the function twice, the array is back to the start (since, obviously, -1*-1 = 1).
One way to get around this is to make a local copy first:
Quaternion_temp = Quaternion.copy()
I'm writing a program that creates vario-function plots for a fixed region of a digital elevation model that has been converted to an array. I calculate the variance (difference in elevation) and lag (distance) between point pairs within the window constraints. Every array position is compared with every other array position. For each pair, the lag and variance values are appended to separate lists. Once all pairs have been compared, these lists are then used for data binning, averaging and eventually plotting.
The program runs fine for smaller window sizes (say 60x60 px). For windows up to about 120x120 px or so, which would give 2 lists of 207,360,000 entries, I am able to slowly get the program running. Greater than this, and I run into "MemoryError" reports - e.g. for a 240x240 px region, I would have 3,317,760,000 entries
At the beginning of the program, I create an empty list:
variance = []
lag = []
Then within a for loop where I calculate my lags and variances, I append the values to the different lists:
variance.append(var_val)
lag.append(lag_val)
I've had a look over the stackoverflow pages and have seen a similar issue discussed here. This solution would potentially improve temporal program performance however the solution offered only goes up to 100 million entries and therefore doesn't help me out with the larger regions (as with the 240x240px example). I've also considered using numpy arrays to store the values but I don't think this will stave of the memory issues.
Any suggestions for ways to use some kind of list of the proportions I have defined for the larger window sizes would be much appreciated.
I'm new to python so please forgive any ignorance.
The main bulk of the code can be seen here
Use the array module of Python. It offers some list-like types that are more memory efficient (but cannot be used to store random objects, unlike regular lists). For example, you can have arrays containing regular floats ("doubles" in C terms), or even single-precision floats (four bytes each instead of eight, at the cost of a reduced precision). An array of 3 billion such single-floats would fit into 12 GB of memory.
You could look into PyTables, a library wrapping the HDF5 C library that can be used with numpy and pandas.
Essentially PyTables will store your data on disk and transparently load it into memory as needed.
Alternatively if you want to stick to pure python, you could use a sqlite3 database to store and manipulate your data - the docs say the size limit for a sqlite database is 140TB, which should be enough for your data.
try using heapq, import heapq. It uses the heap for storage rather than the stack allowing you to access the computer full memory.
I'm still confused whether to use list or numpy array.
I started with the latter, but since I have to do a lot of append
I ended up with many vstacks slowing my code down.
Using list would solve this problem, but I also need to delete elements
which again works well with delete on numpy array.
As it looks now I'll have to write my own data type (in a compiled language, and wrap).
I'm just curious if there isn't a way to get the job done using a python type.
To summarize this are the criterions my data type would have to fulfil:
2d n (variable) rows, each row k (fixed) elements
in memory in one piece (would be nice for efficient operating)
append row (with an in average constant time, like C++ vector just always k elements)
delete a set of elements (best: inplace, keep free space at the end for later append)
access element given the row and column index ( O(1) like data[row*k+ column]
It appears generally useful to me to have a data type like this and not impossible to implement in C/Fortran.
What would be the closest I could get with python?
(Or maybe, Do you think it would work to write a python class for the datatype? what performance should I expect in this case?)
As I see it, if you were doing this in C or Fortran, you'd have to have an idea of the size of the array so that you can allocate the correct amount of memory (ignoring realloc!). So assuming you do know this, why do you need to append to the array?
In any case, numpy arrays have the resize method, which you can use to extend the size of the array.