Read Data into Google Colab Environment

Read Data into Google Colab Environment - python

I am trying to run a sentiment analysis code in google colab to increase the processing speed compared to running the code on my device. But I am running into a strange error, which I am not able to solve.
I mounted the drive using the following code:
from google.colab import drive
drive.mount('/content/drive')
Then I want to load a Pickle-File, which I have saved in MyDrive using the following code:
with open('/content/drive/MyDrive/data_colab_new.pkl', 'rb') as file:
data = pickle.load(file)
But I get the following error message:
EOFError Traceback (most recent call last)
in
2 with open('/content/drive/MyDrive/data_colab_new.pkl', 'rb') as file:
3 # Load the pickle file
----> 4 data = pickle.load(file)
EOFError: Ran out of input
I already googled and the only explantion I found was that the Pickle-File is empty. But I checked multiple times now and I am sure that the file is not empty.
What could be another reason for that error and do you know any way to fix it? I`m not able to figure it out myself.

use this in google colab to load the pickle file first.
from google.colab import files
files.upload()
then pickle load like this
f = open('model.pkl', 'rb')
model = pickle.load(f)

Related

how to open and read .nc files?

I had a problem of opening .nc files and converting them to .csv files but still, I can not read them (meaning the first part). I saw this link also this link but I could not find out how to open them. I have written a piece of code and I faced an error which I will post below. To elaborate on the error, it is able to find the files but is not able to open them.
#from netCDF4 import Dataset # use scipy instead
from scipy.io import netcdf #### <--- This is the library to import.
import os
# Open file in a netCDF reader
directory = './'
#wrf_file_name = directory+'filename'
wrf_file_name = [f for f in sorted(os.listdir('.')) if f.endswith('.nc')]
nc = netcdf.netcdf_file(wrf_file_name,'r')
#Look at the variables available
nc.variables
#Look at the dimensions
nc.dimensions
And the error is:
Error: LAKE00000002-GloboLakes-L3S-LSWT-v4.0-fv01.0.nc is not a valid NetCDF 3 file

Problem loading ML model saved using joblib/pickle

I saved a jupyter notebook .pynb file to .pickle format using joblib.
My ML model is built using pandas, numpy and the statsmodels python library.
I saved the fitted model to a variable called fitted_model and here is how I used joblib:
from sklearn.externals import joblib
# Save RL_Model to file in the current working directory
joblib_file = "joblib_RL_Model.pkl"
joblib.dump(fitted_model, joblib_file)
I get this as output:
['joblib_RL_Model.pkl']
But when I try to load from file, in a new jupyter notebook, using:
# Load from file
joblib_file = "joblib_RL_Model.pkl"
joblib_LR_model = joblib.load(joblib_file)
joblib_LR_model
I only get this back:
<statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0xa1a8a0ba8>
and no model. I was expecting to see the model load there and see the graph outputs as per original notebook.

Use with open, it is better because, it automatically open and close file. Also with proper mode.
with open('joblib_RL_Model.pkl', 'wb') as f:
pickle.dump(fitted_model, f)
with open('joblib_RL_Model.pkl', 'rb') as f:
joblib_LR_model = pickle.load(f)
And my implementation in Colab is here. Check it.

you can use more quantifiable package which is pickle default package for python to save models
you can use the the following function for the saving of ML Models
import pickle
def save_model(model):
pickle.dump(model, open("model.pkl", "wb"))
template for function would be
import pickle
def save_model(model):
pickle.dump(model, open(PATH_AND_FILE_NAME_TO_BE_SAVED, "wb"))
to load the model when saved it from pickle library you can follow the following function
def load_model(path):
return pickle.load(open(path, 'rb'))
Where path is the path and name to file where model is saved to.
Note:
This would only work for basic ML Models and PyTorch Models, it would not work for Tensorflow based models where you need to use
model.save(PATH_TO_MODEL_AND_NAME)
where model is of type tensorflow.keras.models

How to create a template for "could not convert string to float"?

Is there a way to test a CSV file for errors? For example, I have a CSV file downloaded from Kaggle. When I try to run it in Anaconda, it throws an error.
a) How do you test files before you run them for string to float errors?
b) Is there a way to set up a template to do this for all files moving forward?
Here is the text from notepad. I have converted all text to numbers and still throws an error.
My code:
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
# load the dataset
dataset = loadtxt('data.csv', delimiter=',')
data.csv file
15,1,14,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
34,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
52,5,16,4,1,37,37,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
46,3,21,4,0,0,0,1,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
42,3,23,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
51,3,17,6,1,34,3,0,0,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1
26,1,26,3,0,0,0,1,2,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
45,1,20,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0
44,3,15,0,1,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
44,3,26,4,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
27,1,17,3,0,0,0,1,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
45,4,14,6,0,0,0,1,10,1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
44,2,25,2,0,0,0,1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
43,2,18,5,0,0,0,0,0,1,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
40,3,18,2,0,0,0,1,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Seems like certain CSV files from Kaggle & others have encoding issues.
Instead of opening the file with the default encoding (which is 'utf-8'), use 'utf-8-sig'.
dataset = loadtxt('data.csv', delimiter=',', encoding='utf-8-sig')
Once I create some code to scan for this PRIOR to running in a deep learning algo, I will post it as follow on.

Python read pickle protocol 4 error: STACK_GLOBAL requires str

In Python 3.7.5, ubuntu 18.04, pickle read gives error,
pickle version 4
Sample code:
import pickle as pkl
file = open("sample.pkl", "rb")
data = pkl.load(file)
Error:
UnpicklingError Traceback (most recent call
last)
in
----> 1 data = pickle.load(file)
UnpicklingError: STACK_GLOBAL requires str
Reading from same file object solves problem.
Reading using pandas also gives same problem

I also has this error turned out I was opening a numpy file with pickle. ;)

Turns out it is known issue. There is issue page in
github

I had this problem and just added pckl to the end of the file name.

My problem was that I was trying to pickle and un-pickle across different python environments - watch out to make sure your pickle versions match!

Perhaps this will be the solution to this error for someone.
I needed to load a numpy array:
torch.load(file)
When I loaded the array, this error appeared. All that is needed is to turn the array into a tensor.
For example:
result = torch.from_numpy(np.load(file))

NaN while opening csv file in Google Colab

I am trying to open a csv file in Google colab. I read file with pandas.
import pandas as pd
df = pd.read_csv("airsim_rec.csv", 'r')
When i try to see this dataset df.head(). I get a NaN value
I tried to import this file with different ways.
from google.colab import files
uploaded = files.upload()
Another
!mkdir -p drive
!google-drive-ocamlfuse drive
Both of them didn't work. CSV file is not damaged. When i tried to open it with the same commands on my drive it opened perfectly. Any ideas how to solve it?

Solved! I just didn't pass r as an argument.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read Data into Google Colab Environment - python

use this in google colab to load the pickle file first. from google.colab import files files.upload() then pickle load like this f = open('model.pkl', 'rb') model = pickle.load(f)

Related

how to open and read .nc files?

Problem loading ML model saved using joblib/pickle

How to create a template for "could not convert string to float"?

Python read pickle protocol 4 error: STACK_GLOBAL requires str

NaN while opening csv file in Google Colab

Categories

Resources