Accessing '.pickle' file in Google Colab - python

I am fairly new to using Google's Colab as my go-to tool for ML.
In my experiments, I have to use the 'notMNIST' dataset, and I have set the 'notMNIST' data as notMNIST.pickle in my Google Drive under a folder called as Data.
Having said this, I want to access this '.pickle' file in my Google Colab so that I can use this data.
Is there a way I can access it?
I have read the documentation and some questions on StackOverflow, but they speak about Uploading, Downloading files and/or dealing with 'Sheets'.
However, what I want is to load the notMNIST.pickle file in the environment and use it for further processing.
Any help will be appreciated.
Thanks !

You can try the following:
import pickle
drive.mount('/content/drive')
DATA_PATH = "/content/drive/Data"
infile = open(DATA_PATH+'/notMNIST.pickle','rb')
best_model2 = pickle.load(infile)

The data in Google Drive resides in a cloud and in colaboratory Google provides a personal linux virtual machine on which your notebooks will run.so you need to download from google drive to your colaboratory virtual machine and use it. you can follow this download tutorial

Thanks, guys, for your answers. Google Colab has quickly grown into a more mature development environment, and my most favorite feature is the 'Files' tab.
We can easily upload the model to the folder we want and access it as if it were on a local machine.
This solves the issue.
Thanks.

You can use pydrive for that. First, you need to find the ID of your file.
# Install the PyDrive wrapper & import libraries.
# This only needs to be done once per notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
listed = drive.ListFile({'q': "title contains '.pkl' and 'root' in parents"}).GetList()
for file in listed:
print('title {}, id {}'.format(file['title'], file['id']))
You can then load the file using the following code:
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
import io
import pickle
from googleapiclient.http import MediaIoBaseDownload
file_id = 'laggVyWshwcyP6kEI-y_W3P8D26sz'
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
f = pickle.load(downloaded)

Related

Error downloading a file from Google Drive

I exported some images from Google Earth Engine to Google Drive. I need to download those images to a local drive using a Python script. Then, I tried to use oauth2client, apiclient as I saw here:
I got a list of files in Drive and the corresponding IDs, then I use the ID to try to download the file using the gdown lib:
gdown.download(f'https://drive.google.com/uc?id={file_data["id"]}',
f'{download_path}{os.sep}{filename_to_download}.tif')
I got the following error message:
Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:
https://drive.google.com/uc?id=<id>
As I got the Drive file list, I suppose that the Drive authentication is ok. If I use the error message suggested link in the browser, I can download the file. If a check file properties at Drive, I can see:
Who can access: not shared.
What should I do to download the files?
This is the complete code:
# https://medium.com/swlh/google-drive-api-with-python-part-i-set-up-credentials-1f729cb0372b
# https://levelup.gitconnected.com/google-drive-api-with-python-part-ii-connect-to-google-drive-and-search-for-file-7138422e0563
# https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url
import os
from apiclient import discovery
from httplib2 import Http
from oauth2client import client, file, tools
import gdown
class GoogleDrive(object):
# define API scope
def __init__(self, secret_credentials_file_path = './credentials'):
self.DriveFiles = None
SCOPE = 'https://www.googleapis.com/auth/drive'
self.store = file.Storage(f'{secret_credentials_file_path}{os.sep}credentials.json')
self.credentials = self.store.get()
if not self.credentials or self.credentials.invalid:
flow = client.flow_from_clientsecrets(f'{secret_credentials_file_path}{os.sep}client_secret.json',
SCOPE)
self.credentials = tools.run_flow(flow, self.store)
oauth_http = self.credentials.authorize(Http())
self.drive = discovery.build('drive', 'v3', http=oauth_http)
def RetrieveAllFiles(self):
results = []
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
files = self.drive.files().list(**param).execute()
# append the files from the current result page to our list
results.extend(files.get('files'))
# Google Drive API shows our files in multiple pages when the number of files exceed 100
page_token = files.get('nextPageToken')
if not page_token:
break
except Exception as error:
print(f'An error has occurred: {error}')
break
self.DriveFiles = results
def GetFileData(self, filename_to_search):
for file_data in self.DriveFiles:
if file_data.get('name') == filename_to_search:
return file_data
else:
return None
def DownloadFile(self, filename_to_download, download_path):
file_data = self.GetFileData(f'{filename_to_download}.tif')
gdown.download(f'https://drive.google.com/uc?id={file_data["id"]}',
f'{download_path}{os.sep}{filename_to_download}.tif')
Google drive may not be the best tool for this, you may want to upload them into a RAW file hosting service like Imgur and download it to a file using requests, you can then read the file using the script or you don't even have to write it to the file and just use image.content instead to specify the image. Here's an example:
image = requests.get("https://i.imgur.com/5SMNGtv.png")
with open("image.png", 'wb') as file:
file.write(image.content)
(You can specify the location of where you want the file to download by adding the PATH before the file name, like this:)
image = requests.get("https://i.imgur.com/5SMNGtv.png")
with open("C://Users//Admin//Desktop//image.png", 'wb') as file:
file.write(image.content)
Solution 1.
Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:
https://drive.google.com/uc?id=<id>
In the sharing tab on gdrive (Right click on image, open Share or Get link), please change privacy to anyone with the link. Hopefully your code should work.
Solution 2.
If you can use Google Colab, then you can mount gdrive easily and access files there using
from google.colab import drive
drive.mount('/content/gdrive')
Google has this policy that they do not accept your regular google-/gmail-password. They only accept so called "App Passwords" that you need to create for your google-account in order to authenticate if you are using thirdparty apps

How do I automatically generate files to the same google drive folder as my colab notebook?

I am performing LDA on a simple wikipedia dump file, but the code I am following needs to output the articles to a file. I need some guidance as python and colab are really broad and I can't seem to find an answer to this specific problem. Here's my code for mounting google drive:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate the user
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Get your file
fileId ='xxxx'
fileName = 'simplewiki-20170820-pages-meta-current-reduced.xml'
downloaded = drive.CreateFile({'id': fileId})
downloaded.GetContentFile(fileName)
and here's the culprit, this code is trying to create a file from the article
if not article_txt == None and not article_txt == "" and len(article_txt) > 150 and is_ascii(article_txt):
outfile = dir_path + str(i+1) +"_article.txt"
f = codecs.open(outfile, "w", "utf-8")
f.write(article_txt)
f.close()
print (article_txt)
I have tried so many things already and I can't recall them all. Basically, what I need to know is how to convert this code so that it would work with google drive. I've been trying so many solutions for hours now. Something I recall doing is converting this code into this
file_obj = drive.CreateFile()
file_obj['title'] = "file name"
But then I got an error 'expected str, bytes or os.PathLike object, not GoogleDriveFile'. It's not the question of how to upload a file and open it with colab, as I already know how to do that with the XML file, what I need to know is how to generate files through my colab script and place them to the same folder as my script. Any help would be appreciated. Thanks!
I am not sure whether the problem is with generating the files or copying them to google drive, if it is the latter, a simpler approach would be to mount your drive directly to the instance as follows
from google.colab import drive
drive.mount('drive')
You can then access any item in your drive as if it were a hard disk and copy your files using bash commands:
!cp filename 'drive/My Drive/folder1/'
Another alternative is to use shutil :
import shutil
shutil.copy(filename, 'drive/My Drive/folder1/')

creating a data bunch with fastai in colab using data from google drive

I am trying to load a dataset in google colab from my google drive account using fast.ai.
I am using as a dataset the alien versus predator from kaggle, from here
I downloaded and loaded in my google drive. Then I run this code:
# Load the Drive helper and mount
from google.colab import drive
drive.mount("/content/drive")
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai import *
from fastai.vision import *
path='/content/drive/My Drive/FastaiData/Alien-vs-Predator'
tfms = get_transforms(do_flip=False)
#default bs=64, set image size=100 to run successfully on colab
data = ImageDataBunch.from_folder(path,ds_tfms=tfms, size=100)
path='/content/drive/My Drive/FastaiData/Alien-vs-Predator'
tfms = get_transforms(do_flip=False)
#default bs=64, set image size=100 to run successfully on colab
data = ImageDataBunch.from_folder(path,ds_tfms=tfms, size=100)
and then I got this error:
/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:399: UserWarning: Your training set is empty. Is this is by design, pass `ignore_empty=True` to remove this warning.
warn("Your training set is empty. Is this is by design, pass `ignore_empty=True` to remove this warning.")
/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:402: UserWarning: Your validation set is empty. Is this is by design, use `no_split()`
or pass `ignore_empty=True` when labelling to remove this warning.
or pass `ignore_empty=True` when labelling to remove this warning.""")
IndexError Traceback (most recent call last)
<ipython-input-3-5b3f66a4d360> in <module>()
4
5 #default bs=64, set image size=100 to run successfully on colab
----> 6 data = ImageDataBunch.from_folder(path,ds_tfms=tfms, size=100)
7
8 data.show_batch(rows=3, figsize=(10,10))
/usr/local/lib/python3.6/dist-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, valid_pct, classes, **kwargs)
118 if valid_pct is None: src = il.split_by_folder(train=train, valid=valid)
119 else: src = il.random_split_by_pct(valid_pct)
--> 120 src = src.label_from_folder(classes=classes)
121 return cls.create_from_ll(src, **kwargs)
122
and so on...
It seems that it found the folder I indicated as train and validation set are empty, which is not true.
Thank you for your help
After uploading files to google drive, you should be using PyDrive.
Sample snippet.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://gsuitedevs.github.io/PyDrive/docs/build/html/index.html
# 2. Create & upload a file text file.
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))
For detailed reference, please refer https://colab.research.google.com/notebooks/io.ipynb#scrollTo=zU5b6dlRwUQk

How to upload csv file into google drive and read it from same into python

I have a google drive which I have my csv file uploaded in already, the link to share that file is given as:
https://drive.google.com/open?id=1P_UYUsgvGXUhPCKQiZWlEAynKoeldWEi
I also know my the directory to the drive as:
C:/Users/.../Google Drive/
Please give me a step-by-step guide to achieving how to read this particular csv file directly from google drive and not by downloading it to my PC first before reading it to python.
I have searched this forum and tried some given solutions such as:
How to upload csv file (and use it) from google drive into google colaboratory
It did not work for me, it resulted to the below error:
3 from pydrive.auth import GoogleAuth
4 from pydrive.drive import GoogleDrive
----> 5 from google.colab import auth
6 from oauth2client.client import GoogleCredentials
7
ModuleNotFoundError: No module named 'google.colab'
You don't need that much out of that example to upload a file to google drive:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
# access the drive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
# the file you want to upload, here simple example
f = drive.CreateFile()
f.SetContentFile('document.txt')
# upload the file
f.Upload()
print('title: %s, mimeType: %s' % (f['title'], f['mimeType']))
# read all files, the newly uploaded file will be there
file_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file1 in file_list:
print('title: %s, id: %s' % (file1['title'], file1['id']))
Note: I created an empty file in this example instead of an existing one, you just have to change it to load up the csv file from your local pc where the python file is running on instead.
Kind regards
Here is a simple approach I use for all my csv files stored in Google Drive.
First import the necessary libraries that will facilitate your connection.
!pip install -U -q PyDrive
from google.colab import auth
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from oauth2client.client import GoogleCredentials
Next step is authentication and creating the PyDrive client in order to connect to your Drive.
This should give you a link to connect to Google Cloud SDK.
Select the Google Drive account you want to access. Copy the link and paste it onto the text field prompt on your Colab Notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
To get the file, you will need the id of the file in Google Drive.
downloaded = drive.CreateFile({'id':'1P_UYUsgvGXUhPCKQiZWlEAynKoeldWEi'}) # replace the id with id of the file you want to access
downloaded.GetContentFile('file.csv')
Finally, you can read the file as pandas dataframe.
import pandas as pd
df= pd.read_csv('fle.csv')

How to upload csv file (and use it) from google drive into google colaboratory

Wanted to try out python, and google colaboratory seemed the easiest option.I have some files in my google drive, and wanted to upload them into google colaboratory.
so here is the code that i am using:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# 2. Create & upload a file text file.
uploaded = drive.CreateFile({'xyz.csv': 'C:/Users/abc/Google Drive/def/xyz.csv'})
uploaded.Upload()
print('Uploaded file with title {}'.format(uploaded.get('title')))
import pandas as pd
xyz = pd.read_csv('Untitled.csv')
Basically, for user "abc", i wanted to upload the file xyz.csv from the folder "def".
I can upload the file, but when i ask for the title it says the title is "Untitled".
when i ask for the Id of the file that was uploaded, it changes everytime, so i can not use the Id.
How do i read the file??? and set a proper file name???
xyz = pd.read_csv('Untitled.csv') doesnt work
xyz = pd.read_csv('Untitled') doesnt work
xyz = pd.read_csv('xyz.csv') doesnt work
Here are some other links that i found..
How to import and read a shelve or Numpy file in Google Colaboratory?
Load local data files to Colaboratory
To read a csv file from my google drive into colaboratory, I needed to do the following steps:
1) I first needed to authorize colaboratory to access my google drive with PyDrive. I used their code example for that. (pasted below)
2) I also needed to log into my drive.google.com to find the target id of the file i wanted to download. I found this by right clicking on the file and copying the shared link for the ID. The id looks something like this: '1BH-rffqv_1auzO7tdubfaOwXzf278vJK'
3) Then I ran downloaded.GetContentFile('myName.csv') - putting in the name i wanted (in your case it is xyz.csv)
This seems to work for me!
I used the code they provided in their example:
# Code to read csv file into colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
#2. Get the file
downloaded = drive.CreateFile({'id':'1BH-rffqv_1auzO7tdubfaOwXzf278vJK'}) # replace the id with id of file you want to access
downloaded.GetContentFile('xyz.csv')
#3. Read file as panda dataframe
import pandas as pd
xyz = pd.read_csv('xyz.csv')
Okay I'm pretty sure I'm quite late, but I'd like to put this out there, just in case.
I think the easiest way you could do this is by
from google.colab import drive
drive.mount("/content/drive")
This will generate a link, click on it and sign in using Google OAuth, paste the key in the colab cell and you're connected!
check out the list of available files in the side bar on the left side and copy the path of the file you want to access. Read it as you would, with any other file.
File create takes a file body i its first parameter. If you check the documentation for file create there are a number of fields you can fill out. In the example below you would add them to file_metadata comma separated.
file_metadata = {'name': 'photo.jpg'}
media = MediaFileUpload('files/photo.jpg',
mimetype='image/jpeg')
file = drive_service.files().create(body=file_metadata,
media_body=media,
fields='id').execute()
I suggest you read the file upload section of the documentation to get a better idea how upload works and which files can actually be read from within google drive. I am not sure that this is going to give you access to Google colaborate
Possible fix for your code.
I am not a python dev but my guess would be you can set your title by doing this.
uploaded = drive.CreateFile({'xyz.csv': 'C:/Users/abc/Google Drive/def/xyz.csv',
'name': 'xyz.csv'})
I think it's that simple with this command
# Mount Google Drive
import os
from google.colab import drive
drive.mount('/content/drive')
!pwd
!ls
import pandas as pd
df = pd.read_csv('Untitled.csv')
It will require authorization with your Google OAuth, and create authorization key. put the key into the colab cell.
Please aware !, sometimes the file within google colab directory are not update or similar with google drive if you delete or add files in your Google Drive.

Categories