I have a scene object, I would like to load all channels into a numpy array of shape (24,24,3). Where 3 is the number of channels.
scene_xybox = scn.crop(xy_bbox=box)
I have to select each channel:
channel= scene_xybox['VIS006'].values
repeat, and stack at the end.
Is there a way to get the stacked numpy array with one line.
This takes 5 seconds for each box, I have many files and it will take a very very long time to do the same operation to multiple boxes in an image to multiple images.
A perfect answer may require more information from you regarding what your end goal is, how many "boxes" you are cutting out, etc. But I'll see what I can clear up first. I assume you are not resampling the data with Scene.resample in your code at all.
Satpy uses dask so if possible it would be best to compute everything at once. Or at least limit how many times things are computed (.values computes the dask array). If you have a lot of boxes to cut out and your system has the available memory, you may want to calculate the slices yourself for all the xy bboxes (I think there are methods to help with this), load the entire image (see xr.concat below), and then use basic slicing techniques to get each of the box cutouts. This should save you from loading the data from disk each time you call .values, but also will really help with processing the other files you have since the slices should be the same across all times (except for special instrument cases).
You say you want the final shape to be (rows, cols, N). Is there a good reason you can't have (N, rows, cols)? The latter should be faster as the arrays are in their original contiguous form. If whatever processing you are doing after this could be done with dask at all this would "flow" really well with the tasks that would be made too.
You can use xr.concat, passing all the DataArrays at once and then call .values to get the full numpy array underneath. This should compute all the bands at the same time. Something like:
final_arr = xr.concat([scn['VIS006'], scn['band2'], scn['band3']], "bands").values
Related
I have a tensor of example shape (543, 133, 3), meaning 543 frames, with 133 points of X,Y,Z
I would like to run a savgol_filter on every point in every dimension, however, naively, this is quite slow:
points, frames, dims = tensor.shape
new_data = []
for point in range(points):
new_dims = []
for dim in range(dims):
new_dims.append(scipy.signal.savgol_filter(data[point, :, dim], 3, 1))
new_data.append(new_dims)
tensor = np.array(new_data)
On my computer, for this small tensor, this takes 300ms, which is quite a long time.
Is there a way to make this faster?
This is by no means the fastest method, but it should be quite a lot faster than what you're currently doing. We can utilize vectorized operations instead of for loops to achieve much better performance.
From your code, it seems like you want to smooth the 133 dimension (dim 1) and so you could apply SavGol all at once with
savgol_filter(data, 3, 1, axis=1). In general you can specify the axis on which you'd like to apply the filter. On my computer, this brought the computation from 500ms to 2ms.
A side note: Since you care about performance, I would pay attention to what your data order is. Depending on what you're doing, it might be advisable to reorder your data once to save time.
For example: Let's say you have a matrix of 5 signals (5x299). If you wanted to get a single signal. That's easy! Try signal[0]. This doesn't actually require copying the data and we can just "view" it in memory. But what if you wanted to select a particular band in the signal? If you do signal[:,0] then you can't take a "view" of the memory because first you need to access every signal and take that index. If you had first transposed the matrix, then the first index is just the band of every spectra that you want -- no need for iteration. Data order can be an important part of getting the best performance out of your computations.
There are two related concepts here: contiguous memory and vectorized operations. My explanation of why data order is important has some more complications, and you will need to do your own research to determine what data ordering will give you the best performance for your application. The big things to watch out for are C v Fortran contiguous memory layout.
Here are some resources I found: (not an endorsement)
StackOverflow article on contigous memory: What is the difference between contiguous and non-contiguous arrays?
Towards Data Science article on vectorized operations https://towardsdatascience.com/vectorization-must-know-technique-to-speed-up-operations-100x-faster-50b6e89ddd45
for fname in ids['fnames']:
aq = xr.open_dataset(fname, chunks='auto', mask_and_scale=False)
aq = aq[var_lists]
aq = aq.isel(lat=slice(yoff, yoff+ysize), lon=slice(xoff, xoff+xsize))
list_of_ds.append(aq)
aq.close()
all_ds = xr.concat(list_of_ds, dim='time')
all_ds.to_netcdf('tmp.nc')
Hi all, I am making use of xarray to read netcdf files (around 1000) and save selected resutls to a temporary file, as shown above. However, the saving part runs very slow. How can I speed this up?
I also tried directly load the data, but still very slow.
I've also tried using open_mfdataset with parallel=True, and it's also slow:
aq = xr.open_mfdataset(
sorted(ids_list),
data_vars=var_lists,
preprocess=add_time_dim,
combine='by_coords',
mask_and_scale=False,
decode_cf=False,
parallel=True,
)
aq.isel({'lon':irlon,'lat':irlat}).to_netcdf('tmp.nc')
Unfortunately, concatenating ~1000 files in xarray will be slow. Not a great way around that.
It's hard for us to offer specific advice without more detail about your data and setup. But here are some things I'd try:
use xr.open_mfdataset. Your second code block looks great. dask will generally be faster and more efficient at managing tasks than you will with a for loop.
Make sure your chunks are aligned with how you're slicing the data. You don't want to read in more than you have to. If you're reading netCDFs, you have flexibility about how to read in the data into dask. Since you're selecting (it looks like) a small spatial region within each array, it may make sense to explicitly chunk the data such that you're only reading in a small portion of each array, e.g. with chunks={"lat": 50, "lon": 50}. You'll want to balance a few things here - making sure the chunk sizes are manageable and not too small (leading to too many tasks). Shoot for chunks ~100-500 MB range as a general rule, and trying to keep the number of tasks to less than 1 million (or # chunks to fewer than ~10-100k across all your datasets).
Be explicit about your concatenation. The more "magic" the process feels, the more work xarray is doing to infer what you mean. Generally, combine='nested' performs better than 'by_coords', so if you're concatenating files which are structured logically along one or more dimensions, it may help to arrange the files in the same way a dim is provided.
skip the pre-processing. If you can, add new dimensions on concatenation rather than as an ingestion step. This allows dask to more fully plan the computation, rather than treating your preprocess function as a black box, and what's worse as a pre-requisite to scheduling the final array construction operation because you're using combine='by_coords', where the coords are the result of an earlier dask operation. If you need to attach a time dim to each file, with 1 element per file, something like xr.open_mfdataset(files, concat_dim=pd.Index(pd.date_range("2020-01-01", freq="D", periods=1000), name="time"), combine="nested") works well in my experience.
If this is all taking too long, you could try pre-processing the data. Using a compiled utility like nco or even just subsetting the data and grouping smaller subsets of the data into larger files using dask.distributed's client.map might help cut down on the complexity of the final dataset join.
I have multiple dask arrays and would like to save them to a GIF or some movie format using imageio one frame at a time, but I think the problem is generic enough that the solution could help other people. I'm wondering if there is a way to compute the arrays in order and while computing one array and writing it to disk, start computing the next one on the remaining workers. If possible, it would be nice if the scheduler/graph could share tasks between the dask arrays if any.
The code would look something like this in my eyes:
import dask.array as da
writer = Writer(...)
for dask_arr in da.compute([dask_arr1, dask_arr2, dask_arr3]):
writer.write_frame(dask_arr)
It looks like this is probably hackable by users with the distributed scheduler, but I'd like to use the threaded scheduler if possible. I'm also not sure if this is super useful in my exact real world case given memory usage or possibly having to write entire frames at a time instead of chunks. I also don't doubt that this could be handled in a custom array-like object with da.store...some how.
If you're able to write a function that takes in a slice of the array and then writes it appropriately you might be able to use a function like da.map_blocks.
This would become much more complex if you're trying to write into a single file where random access is harder to guarantee.
Perhaps you could use map_blocks to save each slice as a single image and then use some post-processing tool to stitch those images together.
I'm writing a program that creates vario-function plots for a fixed region of a digital elevation model that has been converted to an array. I calculate the variance (difference in elevation) and lag (distance) between point pairs within the window constraints. Every array position is compared with every other array position. For each pair, the lag and variance values are appended to separate lists. Once all pairs have been compared, these lists are then used for data binning, averaging and eventually plotting.
The program runs fine for smaller window sizes (say 60x60 px). For windows up to about 120x120 px or so, which would give 2 lists of 207,360,000 entries, I am able to slowly get the program running. Greater than this, and I run into "MemoryError" reports - e.g. for a 240x240 px region, I would have 3,317,760,000 entries
At the beginning of the program, I create an empty list:
variance = []
lag = []
Then within a for loop where I calculate my lags and variances, I append the values to the different lists:
variance.append(var_val)
lag.append(lag_val)
I've had a look over the stackoverflow pages and have seen a similar issue discussed here. This solution would potentially improve temporal program performance however the solution offered only goes up to 100 million entries and therefore doesn't help me out with the larger regions (as with the 240x240px example). I've also considered using numpy arrays to store the values but I don't think this will stave of the memory issues.
Any suggestions for ways to use some kind of list of the proportions I have defined for the larger window sizes would be much appreciated.
I'm new to python so please forgive any ignorance.
The main bulk of the code can be seen here
Use the array module of Python. It offers some list-like types that are more memory efficient (but cannot be used to store random objects, unlike regular lists). For example, you can have arrays containing regular floats ("doubles" in C terms), or even single-precision floats (four bytes each instead of eight, at the cost of a reduced precision). An array of 3 billion such single-floats would fit into 12 GB of memory.
You could look into PyTables, a library wrapping the HDF5 C library that can be used with numpy and pandas.
Essentially PyTables will store your data on disk and transparently load it into memory as needed.
Alternatively if you want to stick to pure python, you could use a sqlite3 database to store and manipulate your data - the docs say the size limit for a sqlite database is 140TB, which should be enough for your data.
try using heapq, import heapq. It uses the heap for storage rather than the stack allowing you to access the computer full memory.
This question already has answers here:
Very large matrices using Python and NumPy
(11 answers)
Closed 2 years ago.
There are times when you have to perform many intermediate operations on one, or more, large Numpy arrays. This can quickly result in MemoryErrors. In my research so far, I have found that Pickling (Pickle, CPickle, Pytables etc.) and gc.collect() are ways to mitigate this. I was wondering if there are any other techniques experienced programmers use when dealing with large quantities of data (other than removing redundancies in your strategy/code, of course).
Also, if there's one thing I'm sure of is that nothing is free. With some of these techniques, what are the trade-offs (i.e., speed, robustness, etc.)?
I feel your pain... You sometimes end up storing several times the size of your array in values you will later discard. When processing one item in your array at a time, this is irrelevant, but can kill you when vectorizing.
I'll use an example from work for illustration purposes. I recently coded the algorithm described here using numpy. It is a color map algorithm, which takes an RGB image, and converts it into a CMYK image. The process, which is repeated for every pixel, is as follows:
Use the most significant 4 bits of every RGB value, as indices into a three-dimensional look up table. This determines the CMYK values for the 8 vertices of a cube within the LUT.
Use the least significant 4 bits of every RGB value to interpolate within that cube, based on the vertex values from the previous step. The most efficient way of doing this requires computing 16 arrays of uint8s the size of the image being processed. For a 24bit RGB image that is equivalent to needing storage of x6 times that of the image to process it.
A couple of things you can do to handle this:
1. Divide and conquer
Maybe you cannot process a 1,000x1,000 array in a single pass. But if you can do it with a python for loop iterating over 10 arrays of 100x1,000, it is still going to beat by a very far margin a python iterator over 1,000,000 items! It´s going to be slower, yes, but not as much.
2. Cache expensive computations
This relates directly to my interpolation example above, and is harder to come across, although worth keeping an eye open for it. Because I am interpolating on a three-dimensional cube with 4 bits in each dimension, there are only 16x16x16 possible outcomes, which can be stored in 16 arrays of 16x16x16 bytes. So I can precompute them and store them using 64KB of memory, and look-up the values one by one for the whole image, rather than redoing the same operations for every pixel at huge memory cost. This already pays-off for images as small as 64x64 pixels, and basically allows processing images with x6 times the amount of pixels without having to subdivide the array.
3. Use your dtypes wisely
If your intermediate values can fit in a single uint8, don't use an array of int32s! This can turn into a nightmare of mysterious errors due to silent overflows, but if you are careful, it can provide a big saving of resources.
First most important trick: allocate a few big arrays, and use and recycle portions of them, instead of bringing into life and discarding/garbage collecting lots of temporary arrays. Sounds a little bit old-fashioned, but with careful programming speed-up can be impressive. (You have better control of alignment and data locality, so numeric code can be made more efficient.)
Second: use numpy.memmap and hope that OS caching of accesses to the disk are efficient enough.
Third: as pointed out by #Jaime, work un block sub-matrices, if the whole matrix is to big.
EDIT:
Avoid unecessary list comprehension, as pointed out in this answer in SE.
The dask.array library provides a numpy interface that uses blocked algorithms to handle larger-than-memory arrays with multiple cores.
You could also look into Spartan, Distarray, and Biggus.
If it is possible for you, use numexpr. For numeric calculations like a**2 + b**2 + 2*a*b (for a and b being arrays) it
will compile machine code that will execute fast and with minimal memory overhead, taking care of memory locality stuff (and thus cache optimization) if the same array occurs several times in your expression,
uses all cores of your dual or quad core CPU,
is an extension to numpy, not an alternative.
For medium and large sized arrays, it is faster that numpy alone.
Take a look at the web page given above, there are examples that will help you understand if numexpr is for you.
On top of everything said in other answers if we'd like to store all the intermediate results of the computation (because we don't always need to keep intermediate results in memory) we can also use accumulate from numpy after various types of aggregations:
Aggregates
For binary ufuncs, there are some interesting aggregates that can be computed directly from the object. For example, if we'd like to reduce an array with a particular operation, we can use the reduce method of any ufunc. A reduce repeatedly applies a given operation to the elements of an array until only a single result remains.
For example, calling reduce on the add ufunc returns the sum of all elements in the array:
x = np.arange(1, 6)
np.add.reduce(x) # Outputs 15
Similarly, calling reduce on the multiply ufunc results in the product of all array elements:
np.multiply.reduce(x) # Outputs 120
Accumulate
If we'd like to store all the intermediate results of the computation, we can instead use accumulate:
np.add.accumulate(x) # Outputs array([ 1, 3, 6, 10, 15], dtype=int32)
np.multiply.accumulate(x) # Outputs array([ 1, 2, 6, 24, 120], dtype=int32)
Wisely using these numpy operations while performing many intermediate operations on one, or more, large Numpy arrays can give you great results without usage of any additional libraries.