Download a Kaggle Dataset

Download a Kaggle Dataset - python

I would like to download a Kaggle Dataset. I generated the Kaggle.json file, but unfortunately I don't have a drive (I can't use it). Is there any option to generate the username and token in directly in the code?
For example I tried this
x = '{"username":"<USERNAME>","key":"<TOKEN>"}'
y = json.loads(x)
api = KaggleApi(y)
api.authenticate()
files = api.competition_download_files("two-sigma-financial-news")
The error is
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-6-237de0539a08> in <module>()
1 api = KaggleApi(y)
----> 2 api.authenticate()
3 files = api.competition_download_files("two-sigma-financial-news")
/usr/local/lib/python3.6/dist-packages/kaggle/api/kaggle_api_extended.py in authenticate(self)
164 raise IOError('Could not find {}. Make sure it\'s located in'
165 ' {}. Or use the environment method.'.format(
--> 166 self.config_file, self.config_dir))
167
168 # Step 3: load into configuration!
OSError: Could not find kaggle.json. Make sure it's located in /root/.kaggle. Or use the environment method.
But it isn't right. May someone could help me plase? I'm using Colab, but I don't want to store the JSON file in my Google Drive. Is there any option to generate the JSON file directly?
Thanks in advance.

Maybe this post helps: https://www.kaggle.com/general/51898
it links to this script:
# Info on how to get your api key (kaggle.json) here: https://github.com/Kaggle/kaggle-api#api-credentials
!pip install kaggle
api_token = {"username":"USERNAME","key":"API_KEY"}
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
json.dump(api_token, file)
!chmod 600 /content/.kaggle/kaggle.json
!kaggle config path -p /content
!kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
os.chdir('/content/competitions/jigsaw-toxic-comment-classification-challenge')
for file in os.listdir():
zip_ref = zipfile.ZipFile(file, 'r')
zip_ref.extractall()
zip_ref.close()
from: https://gist.github.com/jayspeidell/d10b84b8d3da52df723beacc5b15cb27

Related

How to import the data file into google colab?

I'm generating a code in Google Colab and at some point, I need to call a data file. This data file is called "apr.dat" and is inside a folder called "Eos_table" I hosted this folder on the drive and used the following structure to call it:
import os
def set_eos(file):
directory = os.getcwd()
data_eos = np.loadtxt(directory+str("\\")+str("\\")+str("Eos_table")+str("\\")+str("\\")+str(file ), skiprows=1)
...
return ...
file_eos = "apr.dat"
ep_eos = set_eos(file_eos)[0]
But google colab is returning me an error that I don't know very well if it's due to the way I directed the directory or if it's for another reason. The error is:
OSError: /content\\Eos_table\\apr.dat not found.
What am I doing wrong? How can I fix this error?
Thanks

Google Colab: No such file or directory on local PC

Good day, friends.
I'm trying to load my own file to Google Colab from my own disc, and I use the code with image.load_img. But programm thinks that there is no such a file. I see this file and not agree with Google. )
Could you please make an advice how can I make code to work correctly. Please tell me if I made any mistake. And what is the right way to type path when file is on PC and file is in Colab folder
Thank you very much.
Code is:
from tensorflow.keras import utils
from tensorflow.keras.preprocessing import image
import numpy as np
import pandas as pd
import pylab
from google.colab import files
from PIL import Image
path = 'C:\XYZ\pic7.jpg'
x = image.load_img(path, target_size = (800, 600), color_mode = 'grayscale')
Colab says:
...
FileNotFoundError Traceback (most recent call last)
<ipython-input-43-a3755b9d97b5> in <module>()
9 path = 'C:\XYZ\pic7.jpg'
10
---> 11 x = image.load_img(path, target_size = (800, 600), color_mode = 'grayscale')
1 frames
/usr/local/lib/python3.7/dist-packages/keras_preprocessing/image/utils.py in load_img(path, grayscale, color_mode, target_size, interpolation)
111 raise ImportError('Could not import PIL.Image. '
112 'The use of `load_img` requires PIL.')
--> 113 with open(path, 'rb') as f:
114 img = pil_image.open(io.BytesIO(f.read()))
115 if color_mode == 'grayscale':
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\XYZ\\pic7.jpg'

Google Colab runs on a remote server, not your local machine, so it has no access to "C:\" or any of your local drives.
See the examples for how to work with external data in Colab, including mounting Google Drive - so you'll need to put your images there first.

For those who are interested in decision.
I mounted my Google drive, as dicribed at https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA (Aneroid adviced this link),
Changed path to '/content/drive/MyDrive/myfolder/pic7.jpg'.
Anf after that programm stoped to critisize my path. And error changed. It's Success for me. )
Best wishes to Aneroid.

Google colab error to use files from google drive

So to use files from google drive in google colab I used this code:
from google.colab import drive
drive.mount('/content/gdrive')
!ln -s /content/gdrive/My\ Drive/ /mydrive
!ls /mydrive
Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
app 'Colab Notebooks' lixo 'My Drive' pixellib yolov3
Inside of pixellib I have the folder meat and the file pretraining.h5.
I installed this too:
!pip3 install pixellib
Until here, ok, but when I run this code:
import pixellib
from pixellib.custom_train import instance_custom_training
train_maskrcnn = instance_custom_training()
train_maskrcnn.modelConfig(network_backbone = "resnet101", num_classes= 2, batch_size = 4)
train_maskrcnn.load_pretrained_model("/mydrive/pixellib/pretraining.h5")
train_maskrcnn.load_dataset("/mydrive/pixellib/meat")
train_maskrcnn.train_model(num_epochs = 20, augmentation=True, path_trained_models = "/mydrive/pixellib/mask_rcnn_models")
the following error message appears:
Using resnet101 as network backbone For Mask R-CNN model
---------------------------------------------------------------------------
IsADirectoryError Traceback (most recent call last)
<ipython-input-44-8a3b66f50c89> in <module>()
7 train_maskrcnn.modelConfig(network_backbone = "resnet101", num_classes= 2, batch_size = 4)
8 train_maskrcnn.load_pretrained_model("/mydrive/pixellib/pretraining.h5")
----> 9 train_maskrcnn.load_dataset("/mydrive/pixellib/meat")
10 train_maskrcnn.train_model(num_epochs = 20, augmentation=True, path_trained_models = "/mydrive/pixellib/mask_rcnn_models")
7 frames
/usr/local/lib/python3.6/dist-packages/PIL/Image.py in open(fp, mode)
2807
2808 if filename:
-> 2809 fp = builtins.open(filename, "rb")
2810 exclusive_fp = True
2811
IsADirectoryError: [Errno 21] Is a directory: '/mydrive/pixellib/meat/train/78-3.jpg'
What kind of error is that?
Something is wrong with my code?

If you have a json for every image in this directory (he said this in a comment), it seems likely to me that one of your json files has a .jpg ending. Either download both json and jpg and confirm that they are the correct file type, or you try to plot the image instead to see if it is an image.

How do I automatically generate files to the same google drive folder as my colab notebook?

I am performing LDA on a simple wikipedia dump file, but the code I am following needs to output the articles to a file. I need some guidance as python and colab are really broad and I can't seem to find an answer to this specific problem. Here's my code for mounting google drive:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate the user
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Get your file
fileId ='xxxx'
fileName = 'simplewiki-20170820-pages-meta-current-reduced.xml'
downloaded = drive.CreateFile({'id': fileId})
downloaded.GetContentFile(fileName)
and here's the culprit, this code is trying to create a file from the article
if not article_txt == None and not article_txt == "" and len(article_txt) > 150 and is_ascii(article_txt):
outfile = dir_path + str(i+1) +"_article.txt"
f = codecs.open(outfile, "w", "utf-8")
f.write(article_txt)
f.close()
print (article_txt)
I have tried so many things already and I can't recall them all. Basically, what I need to know is how to convert this code so that it would work with google drive. I've been trying so many solutions for hours now. Something I recall doing is converting this code into this
file_obj = drive.CreateFile()
file_obj['title'] = "file name"
But then I got an error 'expected str, bytes or os.PathLike object, not GoogleDriveFile'. It's not the question of how to upload a file and open it with colab, as I already know how to do that with the XML file, what I need to know is how to generate files through my colab script and place them to the same folder as my script. Any help would be appreciated. Thanks!

I am not sure whether the problem is with generating the files or copying them to google drive, if it is the latter, a simpler approach would be to mount your drive directly to the instance as follows
from google.colab import drive
drive.mount('drive')
You can then access any item in your drive as if it were a hard disk and copy your files using bash commands:
!cp filename 'drive/My Drive/folder1/'
Another alternative is to use shutil :
import shutil
shutil.copy(filename, 'drive/My Drive/folder1/')

creating a data bunch with fastai in colab using data from google drive

I am trying to load a dataset in google colab from my google drive account using fast.ai.
I am using as a dataset the alien versus predator from kaggle, from here
I downloaded and loaded in my google drive. Then I run this code:
# Load the Drive helper and mount
from google.colab import drive
drive.mount("/content/drive")
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai import *
from fastai.vision import *
path='/content/drive/My Drive/FastaiData/Alien-vs-Predator'
tfms = get_transforms(do_flip=False)
#default bs=64, set image size=100 to run successfully on colab
data = ImageDataBunch.from_folder(path,ds_tfms=tfms, size=100)
path='/content/drive/My Drive/FastaiData/Alien-vs-Predator'
tfms = get_transforms(do_flip=False)
#default bs=64, set image size=100 to run successfully on colab
data = ImageDataBunch.from_folder(path,ds_tfms=tfms, size=100)
and then I got this error:
/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:399: UserWarning: Your training set is empty. Is this is by design, pass `ignore_empty=True` to remove this warning.
warn("Your training set is empty. Is this is by design, pass `ignore_empty=True` to remove this warning.")
/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:402: UserWarning: Your validation set is empty. Is this is by design, use `no_split()`
or pass `ignore_empty=True` when labelling to remove this warning.
or pass `ignore_empty=True` when labelling to remove this warning.""")
IndexError Traceback (most recent call last)
<ipython-input-3-5b3f66a4d360> in <module>()
4
5 #default bs=64, set image size=100 to run successfully on colab
----> 6 data = ImageDataBunch.from_folder(path,ds_tfms=tfms, size=100)
7
8 data.show_batch(rows=3, figsize=(10,10))
/usr/local/lib/python3.6/dist-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, valid_pct, classes, **kwargs)
118 if valid_pct is None: src = il.split_by_folder(train=train, valid=valid)
119 else: src = il.random_split_by_pct(valid_pct)
--> 120 src = src.label_from_folder(classes=classes)
121 return cls.create_from_ll(src, **kwargs)
122
and so on...
It seems that it found the folder I indicated as train and validation set are empty, which is not true.
Thank you for your help

After uploading files to google drive, you should be using PyDrive.
Sample snippet.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://gsuitedevs.github.io/PyDrive/docs/build/html/index.html
# 2. Create & upload a file text file.
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))
For detailed reference, please refer https://colab.research.google.com/notebooks/io.ipynb#scrollTo=zU5b6dlRwUQk

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download a Kaggle Dataset - python

Related

How to import the data file into google colab?

Google Colab: No such file or directory on local PC

Google colab error to use files from google drive

How do I automatically generate files to the same google drive folder as my colab notebook?

creating a data bunch with fastai in colab using data from google drive

Categories

Resources