Good Afternoon from Germany, everybody!
Google Colab and I seem to have divergent opinions on what is possible or not...
I just want a way to access the contents of all other cells from within a cell.
My use case is, that I want to POST the contents of a current Colab Notebook to an external server for grading with minimal user interaction (just running the cell).
So my question is: Is there a way to access the code cells of a Colab NB programmatically?
I saw this answer for Jupyter NB, but it does not work in Google Colab as the Jupyter JS-Variable is not available. The variable google.colab seems to not provide the same functionality, or am I missing something?
Google Colab seems to sandbox each cell in its one iframe, so I cannot query the contents of other cells via JS:
%%js
document.getElementsByClassName('cell')
When run in a cell this just leads to an empty HTMLCollection, when run in the Developer Tools of my browser I get the correct results.
Am I missing something here? Is there a way to escape the sandbox or access the current NB contents within a Python cell?
Thanks in Advance for your help!
Sure, here's a complete example:
https://colab.research.google.com/drive/1mXuyMsPEXFU4ik9EGLBiWUuNfztf7J6_
The key bit is this:
# Obtain the notebook JSON as a string
from google.colab import _message
notebook_json_string = _message.blocking_request('get_ipynb', request='', timeout_sec=5)
Reproducing the executed example from the notebook:
Probably not an ideal solution - but one option may be to read the notebook file directly instead of trying to access the cells internally. The following code mounts the user's google drive in the notebook so that you can read its contents as a file:
from pathlib import Path
from google.colab import drive
# Mount the users google drive into the notebook (takes the
# user to an auth flow for access).
drive.mount('/content/drive')
# Read the contents of the notebook file to POST to grading server.
base = Path('/content/drive/MyDrive/Colab Notebooks')
notebook_path = 'notebook.ipynb'
with open(base / notebook_path) as infile:
notebook_contents = infile.read()
Finding the path of the notebook seems to be tricky, but if it has a standard name you could search for it using base.rpath('*.ipynb') and present the user with some options of files to submit.
Related
I am trying to download a large .csv file from where I have it stored in my github to a notebook I have in google colab. Here is the scheme of the code I am using:
#download fixed data sets
import pandas as pd
import numpy as np
url_train = 'https://raw.githubusercontent.com/username/data/master/train_fixed.csv?token=[long_string]'
x_train = pd.read_csv(url_train)
Usually this will work fun. Frequently, however, (but not always) if I close the notebook and re-open a day later, if I just re-run the code I get a 404 not found error for the URL, and have to go back to github and recopy the (now changed) raw URL for my file.
I am not sure why this is happening or what I was sure, and I wanted to ask if anyone else has experienced this problem and what solutions you would recommend. Perhaps the problem is because this repo is private?
If the repo is private, it's likely that the token argument expires so that accidental disclosure of the URL does not prevent access to the data without possibility of revocation. My recommendation is to arrange to construct the URL dynamically after fetching the token parameter in the context of your current session.
I have an excel file placed in Github and Python installed in AWS machine. I wanted to read the excel file from the AWS machine using Python script. Can you some one help me how to achieve this. So far i used below code to achieve this...
#Importing required Libraries
import pandas as pd
import xlwt
import xlrd
#Formatting WLM data
URL= 'https://github.dev.global.tesco.org/DotcomPerformanceTeam/Sample-WLM/blob/master/LEGO_LIVE_FreshOrderStableProfile_2019_v0.1.xlsx'
data = pd.read_excel(r"URl", sheet_name='WLM', dtype=object)
When i executed this i am getting below error
IOError: [Errno 2] No such file or directory: 'URl'
You can use de Wget command to download the file from GitHub. The key here is to use the raw version link, otherwise you will download an html file. To get the raw link, click on the file you uploaded on GitHub, then right-click on the Raw button and choose the save path or copy path. Finally you can use it to download the file, and then read it with pd.read_excel("Your Excel file URL or disk location"). Example:
#Raw link: https://raw.github.com/<username>/<repo>/<branch>/Excelfile.xlsx
!wget --show-progress --continue -O /content/Excelfile.xlsx https://raw.github.com/<username>/<repo>/<branch>/Excelfile.xlsx
df = pd.read_excel("content/Excelfile.xlsx")
Note: this example applies for Colab if you are using a local environment do not use the exclamation mark. You can also find more ideas here: Download single files from GitHub
These instruction are for a CSV file but should work for an excel file as well.
If the repository is private, you might need to create a personal access token as described in "Creating a personal access token" (pay attention to the permissions especially if the repository belongs to an organisation).
Click the "raw" button in GitHub. Here below is an example from https://github.com/udacity/machine-learning/blob/master/projects/boston_housing/housing.csv:
If the repo is private and there is no ?token=XXXX at the end of the url (see below), you might need to create a personal access token and add it at the end of the url. I can see from your URL that you need to configure your access token to work with SAML SSO, please read About identity and access management with SAML single sign-on and Authorizing a personal access token for use with SAML single sign-on
Copy the link to the file from the browser navigation bar, e.g.:
https://raw.githubusercontent.com/udacity/machine-learning/master/projects/boston_housing/housing.csv
Then use code:
import pandas as pd
url = (
"https://raw.githubusercontent.com/udacity/machine-learning/master"
"/projects/boston_housing/housing.csv"
)
df = pd.read_csv(url)
In case your repo is private, the link copied would have a token at the end:
https://raw.githubusercontent.com/. . ./my_file.csv?token=XXXXXXXXXXXXXXXXXX
I have been trying to get the file path of my csv file in watson studio. The file is saved in my project data assets in watson studio. And all I need is the file path to read its content in a jupyter notebook. I'm trying to use a simple python file reader, that should read a file in a specified path. I have tried using watson studio insert file credentials, but can't get it to work.
This works fine when I run the same file in IBM cognitiveclass.ai platform, but I can't get this to work in IBM watson studio, please help.
file name is enrollments.csv
import unicodecsv
with open('enrollments.csv', 'rb') as f:
reader = unicodecsv.DictReader(f)
enrollments = list(reader)
I assume you mean uploaded the "enrollments.csv" file to Files section.
This uploads file to the Bucket of Cloud Object Storage service which storage for your project.
You can use project-lib to fetch the file url.
# Import the lib
from project_lib import Project
project = Project(sc,"<ProjectId>", "<ProjectToken>")
# Get the url
url = project.get_file_url("myFile.csv")
For more refer this:-
https://dataplatform.cloud.ibm.com/docs/content/analyze-data/project-lib-python.html
https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/a972effc-394f-4825-af91-874cb165dcfc/view?access_token=ee2bd90bee679afc278cdb23453946a3922c454a6a7037e4bd3c4b0f90eb0924
For the sake of future readers, try this one.
Upload your csv file as your asset in your Watson Studio Project (you can also do this step later).
Open your notebook. On the top ribbon, on the upper-right corner of the page (below your name), click on the 1010 icon.
Make sure you're on Files tab, and below you will see the list of your uploaded datasets (you can also upload your files here).
Click the drop-down and choose pandas DataFrame to add a block of code that will load the uploaded data into your notebook. Note that you need to select a blank cell so that it doesn't mess up your existing cell that has some codes.
I struggled to define the path in Watson as well.
Here is what worked for me:
Within a Project, select the "Settings" tab. I believe the default view is on the "Assets" tab.
Create a token. Scroll down to "Access Tokens". Then click on "New token"
Go back to "Assets" and open your notebook.
Click on the three vertical dots in the header. One option is to "Insert project token". This creates a new code block that defines the correct parameters under the Project method.
I think you are really asking how can you read a file from assets in your Watson Studio Project. This is documented here : https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=lib-watson-studio-python
# Import the lib
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()
# Fetch the data from a file
my_file = wslib.load_data("my_asset_name.csv")
# Read the CSV data file into a pandas DataFrame
my_file.seek(0)
import pandas as pd
pd.read_csv(my_file, nrows=10)
Project File
Project File Read in Notebook
The old project-lib has been deprecated. See Deprecation Announcement https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=notebook-using-project-lib-python-deprecated
to access google drive files, you need to call: google.colab.auth.authenticate_user(), which presents a link to an authetication screen, which gives a key you need to paste in the original notebook
is it possible to skip this all together? after all the notebook is already 'linked' to a specific account
is it possible to save this token hardcoded in the notebook for future runs?
is it possible to create a token that can access only some files (useful when sharing the notebook with others - you want to give access only to some data files)
is it possible to simplify the process (make it a single click, without needing to copy paste the token)
Nope, there's no way to avoid this step at the moment.
No, there's no safe way to save this token between runs.
Sharing the notebook doesn't share the token. Another user executing your notebook will go through the auth flow as themselves, and will only be able to use the token they get for Drive files they already have access to.
Sadly, not right now. :)
I have a simple script set up for getting a file off my google drive account and updating it. I have no problems authenticating and getting access to the drive. The file is in the form of a google spreadsheet on the drive. Thus, when I have the pydrive file object, I get the URL of the file in csv format via google_file['exportLinks']['text/csv']. This has worked in the past, however today I tried this same method for a new file, and instead of getting the csv format of the data, I keep getting HTML. In addition, if I copy and paste the link from my google_file['exportLink']['text/csv'] and put it into a browser, the browser will begin to download the file in csv format as requested. I really have no idea what is going on, especially since this has worked in the past.
Here is basically what my code does:
drive = GoogleDrive(gauth)
drivefiles = drive.ListFile().GetList()
form_file = None
for f in drivefiles:
if f['title'] == formFileName:
form_file = f
break
output = requests.get(form_file['exportLinks']['text/csv'])
print output.text #this ends up being HTML, not text/csv
Has anyone else out there seen this problem before? Should I just try to delete and re-add the google spreadsheet file on the drive?
EDIT: UPDATE
So, after changing permissions on the file on google drive from accessible only to people who the file has been shared with to accessible/editable by all, I was able to access and download the file. Does the clients_secret.json file allow a particular google drive user to securely authorize or is that a general key that allows anyone with it to access the Google API in that particular session? Are there any special data that has to be send over the http request if the file has only been shared with a limited set of email addresses?
I also found handling idiosyncrasies of google drive api little bit tricky. I have written a wrapper around google drive api, which makes it relatively easy to deal with it. See if it helps:-
Google Drive Client
Sample code:-
file_id = 'abc'
file_access_token = '34244324324'
scope = 'https://www.googleapis.com/auth/drive.readonly'
path_to_private_key_file = '#'
service_account_name = '284765467291-0qnqb1do03dlaj0crl88srh4pbhocj35#developer.gserviceaccount.com'
drive_client = GoogleDriveClient(
scope = scope,
private_key = open(path_to_private_key_file).read(),
service_account_name = service_account_name)
file = drive_client.get_drive_file(file_id, file_access_token)
Ofcourse, you need to change access variables according to your profile.
In case anyone else has been having problems with this, changing the settings on the file in the drive to "everyone can view" solved the problem - seems that there were some permission restrictions. Would suggest that you look into changing the permissions on the file to the least restrictive setting (as possible) while debugging.