I came across a problem when trying to do the resize as follow:
I am having a 2D array after processing data and now I want to resize the data, with each row ignoring the first 5 items.
What I am doing right now is to:
edit: this apporach works fine as long as you make sure you are working with a list, but not string. it failed to work on my side because I haven't done the convertion from string to list properly.
so it ends up eliminating the first five characters in the entire string.
2dArray=[[array1],[array2]]
new_Resize_2dArray= [array[5:] for array in 2dArray]
However, it does not seems to work as it just recopy all the element over to the new_Resize_2dArray.
I would like to ask for help to see what did I do wrong or if there is any scientific calculation library I could use to acheive this.
First, because python list indexing is zero-based, your code should read new_Resize_2dArray= [array[5:] for array in 2dArray] if you want to not include the first 5 columns. Otherwise, I see no issue with your single line of code.
As for scientific computing libraries, numpy is a highly prevalent 3rd party package with a high-performance multidimensional array type ndarray. Using ndarrays, your code could be shortened to new_Resize_2dArray = 2dArray[:,5:]
Aside: It would help to include a bit more of your code or a minimum example where you are getting the unexpected result (e.g., use a fake/stand-in 2d array to see if it works as expected or still fails.
Related
I'm kind of newbie in Python, and I read some code written by someone experienced. This part should take part of Numpy array
a=np.random.random((10000,32,32,3)) # random values as an example
mask=list(range(5000))
a=a[mask]
For me it looks rather wasteful to create another list to get part of array. Moreover, resulting array is really the first 5000 fields, no complex selection is required.
As far as I know, the following code should give the same result:
a=a[:5000]
What is advantage of the first example? Is it faster? Or I missed something?
I'm working with a large 3 dimensional array of data that is binary, each value is one of two possible values. I currently have this data stored in the numpy array as int32 objects that are either 1 or 0.
It works fine for small arrays but eventually i will need to make the array 5000x5000x20, which I can't even get close to without getting "Memory Error".
Does anyone have any suggestions for a better way to do this? I am really hoping that I can keep it all together in one data structure because I will need to access slices of it along all three axes.
Another possibility is to represent the last axis of 20 bits as a single 32 bit integer. This way a 5000x5000 array would suffice.
You'll get better performance if you change the datatype of your numpy array to something smaller.
For data which can take one of two values, you could use uint8, which will always be a single byte:
arr = np.array(your_data, dtype=np.uint8)
Alternatively, you could use np.bool, though I'm not sure offhand whether that is in fact a 8 bit value or whether it uses the native word size. (I tend to explicitly use the 8 bit value for clarity, though that's more a personal choice.)
At the end of the day, though, you're talking about a lot of data, and it's quite possible that even with a smaller set of values, you won't be able to load it all into python at once.
In that case, it might be worth investigating whether you can break up your problem into smaller parts.
I'm debugging my Theano code and printing the values of my tensors as advised here:
a_printed = theano.printing.Print("a: ")(a)
The issue is that, when a is a relatively large matrix, the value is truncated to the first couple of rows and the last couple of rows. However, I would like the whole matrix to be printed. Is this possible?
I believe you can print the underlying numpy, accessed as a.get_value(). Within numpy you can modify printing by
numpy.set_printoptions(threshold=10000000)
where threshold should be bigger than the number of elements expected, and then the whole array will show. See the documentation for set_printoptions. Note that if outputted to a console, this may freeze up because of the possibly very large amount of text.
I'm still confused whether to use list or numpy array.
I started with the latter, but since I have to do a lot of append
I ended up with many vstacks slowing my code down.
Using list would solve this problem, but I also need to delete elements
which again works well with delete on numpy array.
As it looks now I'll have to write my own data type (in a compiled language, and wrap).
I'm just curious if there isn't a way to get the job done using a python type.
To summarize this are the criterions my data type would have to fulfil:
2d n (variable) rows, each row k (fixed) elements
in memory in one piece (would be nice for efficient operating)
append row (with an in average constant time, like C++ vector just always k elements)
delete a set of elements (best: inplace, keep free space at the end for later append)
access element given the row and column index ( O(1) like data[row*k+ column]
It appears generally useful to me to have a data type like this and not impossible to implement in C/Fortran.
What would be the closest I could get with python?
(Or maybe, Do you think it would work to write a python class for the datatype? what performance should I expect in this case?)
As I see it, if you were doing this in C or Fortran, you'd have to have an idea of the size of the array so that you can allocate the correct amount of memory (ignoring realloc!). So assuming you do know this, why do you need to append to the array?
In any case, numpy arrays have the resize method, which you can use to extend the size of the array.
i want to implement 1024x1024 monochromatic grid , i need read data from any cell and insert rectangles with various dimensions, i have tried to make list in list ( and use it like 2d array ), what i have found is that list of booleans is slower than list of integers.... i have tried 1d list, and it was slower than 2d one, numpy is slower about 10 times that standard python list, fastest way that i have found is PIL and monochromatic bitmap used with "load" method, but i want it to run a lot faster, so i have tried to compile it with shedskin, but unfortunately there is no pil support there, do you know any way of implementing such grid faster without rewriting it to c or c++ ?
Raph's suggestin of using array is good, but it won't help on CPython, in fact I'd expect it to be 10-15% slower, however if you use it on PyPy (http://pypy.org/) I'd expect excellent results.
One thing I might suggest is using Python's built-in array class (http://docs.python.org/library/array.html), with a type of 'B'. Coding will be simplest if you use one byte per pixel, but if you want to save memory, you can pack 8 to a byte, and access using your own bit manipulation.
I would look into Cython which translates the Python into C that is readily compiled (or compiled for you if you use distutils). Just compiling your code in Cython will make it faster for something like this, but you can get much greater speed-ups by adding a few cdef statements. If you use it with Numpy, then you can quickly access Numpy arrays. The speed-up can be quite large by using Cython in this manner. However, it would easier to help you if you provided some example code.