I'm getting MemoryError while running some large matrix operations (chroma, cqt, mfcc extraction) with numpy (1.81), scipy (0.17.0), librosa (0.4.2) on a Jetson TK 1 with ~2 GB Ram and a 16GB swap file.
Any help is much appreciated!
ERROR MESSAGE
Traceback (most recent call last):
File "./analyze_structure.py", line 480, in <module>
args.cutoff, args.order, args.sr, args.feature, bool(args.as_diff))
File "./analyze_structure.py", line 452, in plotData
tracks)
File "./analyze_structure.py", line 178, in plotStructure
feat, beat_times = extractChroma(filename, file_ext)
File "./analyze_structure.py", line 75, in extractChroma
hop_length=HOP_LENGTH)
File "/usr/local/lib/python2.7/dist-packages/librosa-0.4.2-py2.7.egg/librosa/feature/spectral.py", line 800, in chroma_stft
tuning = estimate_tuning(S=S, sr=sr, bins_per_octave=n_chroma)
File "/usr/local/lib/python2.7/dist-packages/librosa-0.4.2-py2.7.egg/librosa/core/pitch.py", line 82, in estimate_tuning
pitch, mag = piptrack(y=y, sr=sr, S=S, n_fft=n_fft, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/librosa-0.4.2-py2.7.egg/librosa/core/pitch.py", line 270, in piptrack
util.localmax(S * (S > (threshold * S.max(axis=0)))))
File "/usr/local/lib/python2.7/dist-packages/librosa-0.4.2-py2.7.egg/librosa/util/utils.py", line 820, in localmax
x_pad = np.pad(x, paddings, mode='edge')
File "/usr/lib/python2.7/dist-packages/numpy/lib/arraypad.py", line 1364, in pad
newmat = _prepend_edge(newmat, pad_before, axis)
File "/usr/lib/python2.7/dist-packages/numpy/lib/arraypad.py", line 175, in _prepend_edge
axis=axis)
MemoryError
The Jetson TK1 is a 32-bit processor. It doesn't have sufficient virtual address space to access more than 4GB of RAM from one process.
The kernel can leverage your 16GB page file to provide 4GB of ram to many separate processes, but this still does not expose more than 4GB of addresses to a single process. It simply allows separate processes to individually use up to 4GB of RAM (on Linux, you'll most likely have a 2GB or 3GB limit depending on your kernel settings).
You should split your work into smaller pieces or use a platform with more address space available.
I believe that's because your processor is 32-bit:
The board has the following devices on-board:
NVIDIA Tegra124 (Tegra K1 32-bit)
On 32-bit installs of Python, it only has 2 gig of RAM available (as with any 32 bit application by default). Try to re-factor your code accordingly.
No amount of swap space will help this, and relying on swap for large calculations is a really bad idea since it takes a long time. Swap is meant for accidental overflows, and not to be relied on.
Related
I'm currently working on a seminar paper on nlp, summarization of sourcecode function documentation. I've therefore created my own dataset with ca. 64000 samples (37453 is the size of the training dataset) and I want to fine tune the BART model. I use for this the package simpletransformers which is based on the huggingface package. My dataset is a pandas dataframe.
An example of my dataset:
My code:
train_df = pd.read_csv(train_path, index_col=0)
train_df.rename(columns={'text':'input_text', 'summary':'target_text'}, inplace=True)
# Logging
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)
# Hyperparameters
model_args = Seq2SeqArgs()
model_args.num_train_epochs = 10
# bart-base = 32, bart-large-cnn = 16
model_args.train_batch_size = 16
# model_args.no_save = True
# model_args.evaluate_generated_text = True
model_args.evaluate_during_training = True
model_args.evaluate_during_training_verbose = True
model_args.overwrite_output_dir = True
model_args.save_model_every_epoch = False
model_args.save_eval_checkpoints = False
model_args.save_optimizer_and_scheduler = False
model_args.save_steps = -1
best_model_dir = 'drive/MyDrive/outputs/bart-large-cnn/best_model/'
model_args.best_model_dir = best_model_dir
# Initialize model
model = Seq2SeqModel(
encoder_decoder_type="bart",
encoder_decoder_name="facebook/bart-base",
args=model_args,
use_cuda=True,
)
# Train the model
model.train_model(
train_df,
# eval_data=eval_df,
# matches=count_matches,
)
everything is fine so far BUT I get this error when I start the training.
Here the error from a run I did on a colab notebook:
Exception in thread Thread-14:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
task = get()
File "/usr/lib/python3.7/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/reductions.py", line 287, in rebuild_storage_fd
storage = cls._new_shared_fd(fd, size)
RuntimeError: unable to mmap 1024 bytes from file <filename not specified>: Cannot allocate memory (12)
One would think that I simply have not enough memory but this were my System Monitor ca. 3 sec. after the error:
and this was the lowest my available or free memory get in the time between starting the training and getting the error:
After a lot of tuning I found out that for some reason every thing works fine when I train the model only with a dataset of the size of max. 21000. I doesn't madder if I train the "base" version or the "large-cnn" version of the BART model. I just depends on size of my dataset. The error always occurs in the "Creating features from dataset file at cache_dir/" time.
So what have I already tried:
I added a lot of swap memory (as you can see in the screenshot of my System Monitor)
reduced the numbers of workers to 1
I increased the hard- as well as the softmax of my systems open files limit (-n) to 86000
I also tried to train the model in a google colab notebook but I had the same issue; if the dataset size gets over ca. 21000 the training fails. Even after I doubled the memory of my colab session but still keeping the datset size just a little bit over the 21000 limit.
Desktop:
transformers 4.6.0
simpletransformers 0.61.4
ubuntu 20.04.2 LTS
After trying to solve this by myself for literally weeks I would me more than happy if anyone of you guys have an idea how I can solve this :)
(I am aware of this post mmap returns can not allocate memory, even though there is enough even though there is enough unfortunately it couldn't solve my problem. My vm.max_map_count is at 860000)
So I just found a simple workaround.
You can just set use_multiprocessing of the model to False:
model_args.use_multiprocessing = False
Now I can run with my whole dataset.
While I do not know how to deal with this problem directly,
I had a somewhat similar issue(and solved). The difference is:
I use fairseq
I can run my code on google colab with 1 GPU
Got RuntimeError: unable to mmap 280 bytes from file </torch_40419_282117887>: Cannot allocate memory (12) immediately when I tried to run it on multiple GPUs.
From the other people's code, I found that he uses python -m torch.distributed.launch -- ... to run fairseq-train, and I added it to my bash script and the RuntimeError is gone and training is going.
So I guess if you can run with 21000 samples, you may use torch.distributed to make whole data into small batches and distribute them to several workers.
I am trying to perform a PCA analysis on a large dataset (410 000 entries and 32 000 features) in python, but sklearn.decomposition.PCA does not work, as the underlying LAPACK implementation can't handle as much data as i have. It throws the following error.
Traceback (most recent call last):
File "main.py", line 47, in <module>
model.fit(x_std.transform(deep_data))
File "/home/lib/python3.6/site-
packages/sklearn/decomposition/_pca.py", line 344, in fit
self._fit(X)
File "/home/lib/python3.6/site-
packages/sklearn/decomposition/_pca.py", line 416, in _fit
return self._fit_full(X, n_components)
File "/home/lib/python3.6/site-
packages/sklearn/decomposition/_pca.py", line 447, in _fit_full
U, S, V = linalg.svd(X, full_matrices=False)
File "/home/lib/python3.6/site-
packages/scipy/linalg/decomp_svd.py", line 125, in svd
compute_uv=compute_uv, full_matrices=full_matrices)
File "/home/lib/python3.6/site-
packages/scipy/linalg/lapack.py", line 605, in _compute_lwork
raise ValueError("Too large work array required -- computation cannot "
ValueError: Too large work array required -- computation cannot be performed with standard 32-bit LAPACK.
I have also tried sklearn.decomposition.IncrementalPCA but as I dont have any issues with RAM it did not solve my problem, it only introduced more as it does not allow me to have all 32000 components if my batch size is smaller than that.
Is there any other implementation of PCA that can handle this much data? I don't necessarily need all 410 000 samples, but i need at least 32 000 so that i can analyze all principal components.
So I am following the tutorial on https://www.pyimagesearch.com/2018/06/18/face-recognition-with-opencv-python-and-deep-learning/ in Pycharm environment. When I run the encode faces file it comes out with this error.
Traceback (most recent call last):
File "Encoding_Faces.py", line 29, in <module>
boxes = face_recognition.face_locations(rgb, model=args["detection_method"])
File "C:\Users\my name\AppData\Local\Programs\Python\Python36-
32\Webcam_Face_Detect\lib\site-packages\face_recognition\api.py", line 116,
in face_locations
return [_trim_css_to_bounds(_rect_to_css(face.rect), img.shape) for face in
_raw_face_locations(img, number_of_times_to_upsample, "cnn")]
File "C:\Users\my name\AppData\Local\Programs\Python\Python36-
32\Webcam_Face_Detect\lib\site-packages\face_recognition\api.py", line 100,
in _raw_face_locations
return cnn_face_detector(img, number_of_times_to_upsample)
MemoryError: bad allocation
But when I see the memory usage on the bottom right of the screen it is around 200 of 4096M. I increase the memory from 750M but to no avail. Weirdly, the error occured on the first photo itself. My images are around 200kb each and 1920 by 1080. Total 17 images. My computer has no gpu so I am not sure if that is the problem.
I checked the task manager as well and the memory usage was about 50% when the program crashed.
My computer is a Hp Spectre x360 i5 6th gen 8gb ram. 2 years old if that is important.
Just realised the issue was that I had my code configured to run with a gpu. My bad... Changed default detection method to "hog", from "cnn".
I have been trying to use TieDIE. In a few words, this software includes an algorithm that find significant subnetwork when you pass some query nodes and a network. With smaller networks It works just fine, but the network that I am interested in, is quite big, It has 21988 nodes and 360474 edges. TieDIE generates an initial network kernel using scipy (although Matlab is also an option to generate this kernel I do not own a license). During the generation of this kernel I get the following error:
Not enough memory to perform factorization. Traceback (most recent call last):
File "Trials.py",
line 44, in <module> diffuser = SciPYKernel(network_path)
File "lib/kernel_scipy.py",
line 83, in __init__ self.kernel = expm(time_T*L)
File "/home/agmoreno/TieDIE-trials/TieDIE/local/lib/python2.7/site-packages/scipy/sparse/linalg/matfuncs.py",
line 602, in expm return _expm(A, use_exact_onenorm='auto')
File "/home/agmoreno/TieDIE-trials/TieDIE/local/lib/python2.7/site-packages/scipy/sparse/linalg/matfuncs.py",
line 665, in _expm X = _solve_P_Q(U, V, structure=structure)
File "/home/agmoreno/TieDIE-trials/TieDIE/local/lib/python2.7/site-packages/scipy/sparse/linalg/matfuncs.py",
line 699, in _solve_P_Q return spsolve(Q, P)
File "/home/agmoreno/TieDIE-trials/TieDIE/local/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py",
line 198, in spsolve Afactsolve = factorized(A)
File "/home/agmoreno/TieDIE-trials/TieDIE/local/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py",
line 440, in factorized return splu(A).solve
File "/home/agmoreno/TieDIE-trials/TieDIE/local/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py",
line 309, in splu ilu=False, options=_options)
MemoryError
What is the most interesting thing about this is that I am using a cluster computer that has 64 cpus, and 700GB or RAM and the software peaks at 1.3% of Memory usage (~10GB), according to a ps monitoring, at some moment of execution and crushing later. I have been told that there is no limit in the usage of RAM... So I really have no clue about what could be happening ...
Maybe someone here could help me on finding an alternative to scipy or solving it.
Is it possible that the memory error comes because of just one node is being used? In this the case, how could I distribute the work across the nodes?
Thanks in advance.
That's right, for a very large network like that you'll need high memory on a single node. The easiest solution is of course a workaround, either:
(1) Is there any way you reduce the size of your input network while still capturing relevant biology? Maybe just look for all the nodes 2 steps away from your input nodes?
(2) Use the new Cytoscape API to do the diffusion for you: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005598 (https://github.com/idekerlab/heat-diffusion)
(3) Use PageRank instead of computing a heat kernel (not ideal, as we've shown that Diffusion tends to work better on biological networks).
Hope this helps!
-Evan Paull (TieDIE developer/lead author)
In relation to my other question here, this code works if I use a small chunk of my dataset with dtype='int32', using a float64 produces a TypeError on my main process after this portion because of safe rules so I'll stick to working with int32 but nonetheless, I'm curious and want to know about the errors I'm getting.
fp = np.memmap("E:/TDM-memmap.txt", dtype='int32', mode='w+', shape=(len(documents), len(vocabulary)))
matrix = np.genfromtxt("Results/TDM-short.csv", dtype='int32', delimiter=',', skip_header=1)
fp[:] = matrix[:]
If I use the full data (where shape=(329568, 27519)), with these dtypes:
I get OverflowError when I use int32 or int
and
I get WindowsError when I use float64
Why and how can I fix this?
Edit: Added Tracebacks
Traceback for int32
Traceback (most recent call last):
File "C:/Users/zeferinix/PycharmProjects/Projects/NLP Scripts/NEW/LDA_Experimental1.py", line 123, in <module>
fp = np.memmap("E:/TDM-memmap.txt", dtype='int32', mode='w+', shape=(len(documents), len(vocabulary)))
File "C:\Python27\lib\site-packages\numpy\core\memmap.py", line 260, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
WindowsError: [Error 8] Not enough storage is available to process this command
Traceback for float64
Traceback (most recent call last):
File "C:/Users/zeferinix/PycharmProjects/Projects/NLP Scripts/NEW/LDA_Experimental1.py", line 123, in <module>
fp = np.memmap("E:/TDM-memmap.txt", dtype='float64', mode='w+', shape=(len(documents), len(vocabulary)))
File "C:\Python27\lib\site-packages\numpy\core\memmap.py", line 260, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OverflowError: cannot fit 'long' into an index-sized integer
Edit: Added other info
Other info that might help:
I have a 1TB (931 GB usable) HDD with 2 partitions, Drive D (22.8GB free of 150GB) where my work files are including this script and where the memmap will be written and Drive E (406GB free of 781GB) where my torrent stuff goes. At first I tried to write the mmap file to Drive D and it generated a 1,903,283kb file for int32 and 3,806,566kb file for float64. I thought maybe because it's running out of space that's why I get those errors so I tried Drive E which should be more than enough but it generated the same file size and gave the same error.
I don't think it is possible to generate an np.memmap file that large using a 32 bit build of numpy, regardless of how much disk space you have.
The error occurs when np.memmap tries to call mmap.mmap internally. The second argument to mmap.mmap specifies the length of the file in bytes. For 329568 by 27519 array containing 64 bit (8 byte) values, the length will be 72555054336 bytes (i.e. ~72GB).
The value 72555054336 needs to be converted to an integer type that can be used as an index. In 32 bit Python, indices need to be 32 bit integer values. However, the largest number that can be represented by a 32 bit integer is much smaller than 72555054336:
print(np.iinfo(np.int32(1)).max)
# 2147483647
Even a 32 bit array would require a length of 36277527168 bytes, which is still about 16x larger than the largest representable 32 bit integer.
I don't see any easy way around this problem besides switching to 64 bit Python/numpy. There are other very good reasons to do this - 32 bit Python can only address a maximum of 3GB of RAM, even though your machine has 8GB available.
Even if you could generate an np.memmap that big, the next line
matrix = np.genfromtxt("Results/TDM-short.csv", dtype='int32', delimiter=',', skip_header=1)
will definitely fail, since it requires creating an array in RAM that's 32GB in size. The only way that you could possibly read that CSV file is in smaller chunks, like in my answer here that I linked to in the comments above.
As I mentioned in the comments for your other question, what you ought to do is convert your TermDocumentMatrix to a scipy.sparse matrix rather than writing it to a CSV file. This would require much, much less storage space and RAM, since it can take advantage of the fact that almost all of the word counts are zero-valued.