How to export a dataframe to csv on local desktop - python

I have created a dataframe from an existing file.Now i am trying to download it onto my local desktop with the code as shown below-
data.to_csv(r'C:\Users\pmishr50\Desktop\Skills\python\new.csv')
The code doesnt show any error but i cnt find my file in the given path.
I have found results to download the data, also download the data to Google drive. But i wish to download the data to the path mentioned here.

Related

Retrieve File ID's from mounted Google drive via Google Colab

I have generated multiple files and stored them in a directory on Google Drive.
I want to retrieve the File IDs of those files such that I can use them in my python script further down-stream in the following way:
col='image'
df[col] = f'https://drive.google.com/file/d/'+df['ID']+'
Where df['ID'] contains the required File IDs in my pd.DataFrame.
As I am very inexperienced in this regard I would like to ask whether it is possible to:
retrieve these File IDs in an easy way e.g. via a bash command (similar to ls) or directly via python without using the Drive API.
alternatively, store the File IDs while generating the files in my Colab notebook.
Thanks in advance for the help! :-)

Loading Yelp Kaggle DataSet on Google Colab Taking too Long

I am having difficulty loading the Yelp Dataset downloaded from Kaggle:
https://www.kaggle.com/yelp-dataset/yelp-dataset
I downloaded the zip file from Kaggle directly into my local drive under my Desktop folder and extracted all the files from it.
I was able to upload 4 out of the 5 JSON files from the extracted folder. But there was one that was not able to be uploaded at all (taking forever to upload):
yelp_academic_dataset_review.JSON
For some reason, this file is way too large to upload to Google Colab and the file size is around 7 GB.
I also tried uploading the file from my Google Drive as well.
Is there a way around this?
I couldn’t even read the data from this JSON file since I couldn’t upload it.
I tried this code:
from google.colab import files
uploaded = files.upload()
Problem after is nothing happens for hours on end and the data never loads.
Is there any way to bypass this?

How to access Excel data which is in Github from AWS machines by using Python

I have an excel file placed in Github and Python installed in AWS machine. I wanted to read the excel file from the AWS machine using Python script. Can you some one help me how to achieve this. So far i used below code to achieve this...
#Importing required Libraries
import pandas as pd
import xlwt
import xlrd
#Formatting WLM data
URL= 'https://github.dev.global.tesco.org/DotcomPerformanceTeam/Sample-WLM/blob/master/LEGO_LIVE_FreshOrderStableProfile_2019_v0.1.xlsx'
data = pd.read_excel(r"URl", sheet_name='WLM', dtype=object)
When i executed this i am getting below error
IOError: [Errno 2] No such file or directory: 'URl'
You can use de Wget command to download the file from GitHub. The key here is to use the raw version link, otherwise you will download an html file. To get the raw link, click on the file you uploaded on GitHub, then right-click on the Raw button and choose the save path or copy path. Finally you can use it to download the file, and then read it with pd.read_excel("Your Excel file URL or disk location"). Example:
#Raw link: https://raw.github.com/<username>/<repo>/<branch>/Excelfile.xlsx
!wget --show-progress --continue -O /content/Excelfile.xlsx https://raw.github.com/<username>/<repo>/<branch>/Excelfile.xlsx
df = pd.read_excel("content/Excelfile.xlsx")
Note: this example applies for Colab if you are using a local environment do not use the exclamation mark. You can also find more ideas here: Download single files from GitHub
These instruction are for a CSV file but should work for an excel file as well.
If the repository is private, you might need to create a personal access token as described in "Creating a personal access token" (pay attention to the permissions especially if the repository belongs to an organisation).
Click the "raw" button in GitHub. Here below is an example from https://github.com/udacity/machine-learning/blob/master/projects/boston_housing/housing.csv:
If the repo is private and there is no ?token=XXXX at the end of the url (see below), you might need to create a personal access token and add it at the end of the url. I can see from your URL that you need to configure your access token to work with SAML SSO, please read About identity and access management with SAML single sign-on and Authorizing a personal access token for use with SAML single sign-on
Copy the link to the file from the browser navigation bar, e.g.:
https://raw.githubusercontent.com/udacity/machine-learning/master/projects/boston_housing/housing.csv
Then use code:
import pandas as pd
url = (
"https://raw.githubusercontent.com/udacity/machine-learning/master"
"/projects/boston_housing/housing.csv"
)
df = pd.read_csv(url)
In case your repo is private, the link copied would have a token at the end:
https://raw.githubusercontent.com/. . ./my_file.csv?token=XXXXXXXXXXXXXXXXXX

how to get the file path to my csv file in data assets in watson studio?

I have been trying to get the file path of my csv file in watson studio. The file is saved in my project data assets in watson studio. And all I need is the file path to read its content in a jupyter notebook. I'm trying to use a simple python file reader, that should read a file in a specified path. I have tried using watson studio insert file credentials, but can't get it to work.
This works fine when I run the same file in IBM cognitiveclass.ai platform, but I can't get this to work in IBM watson studio, please help.
file name is enrollments.csv
import unicodecsv
with open('enrollments.csv', 'rb') as f:
reader = unicodecsv.DictReader(f)
enrollments = list(reader)
I assume you mean uploaded the "enrollments.csv" file to Files section.
This uploads file to the Bucket of Cloud Object Storage service which storage for your project.
You can use project-lib to fetch the file url.
# Import the lib
from project_lib import Project
project = Project(sc,"<ProjectId>", "<ProjectToken>")
# Get the url
url = project.get_file_url("myFile.csv")
For more refer this:-
https://dataplatform.cloud.ibm.com/docs/content/analyze-data/project-lib-python.html
https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/a972effc-394f-4825-af91-874cb165dcfc/view?access_token=ee2bd90bee679afc278cdb23453946a3922c454a6a7037e4bd3c4b0f90eb0924
For the sake of future readers, try this one.
Upload your csv file as your asset in your Watson Studio Project (you can also do this step later).
Open your notebook. On the top ribbon, on the upper-right corner of the page (below your name), click on the 1010 icon.
Make sure you're on Files tab, and below you will see the list of your uploaded datasets (you can also upload your files here).
Click the drop-down and choose pandas DataFrame to add a block of code that will load the uploaded data into your notebook. Note that you need to select a blank cell so that it doesn't mess up your existing cell that has some codes.
I struggled to define the path in Watson as well.
Here is what worked for me:
Within a Project, select the "Settings" tab. I believe the default view is on the "Assets" tab.
Create a token. Scroll down to "Access Tokens". Then click on "New token"
Go back to "Assets" and open your notebook.
Click on the three vertical dots in the header. One option is to "Insert project token". This creates a new code block that defines the correct parameters under the Project method.
I think you are really asking how can you read a file from assets in your Watson Studio Project. This is documented here : https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=lib-watson-studio-python
# Import the lib
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()
# Fetch the data from a file
my_file = wslib.load_data("my_asset_name.csv")
# Read the CSV data file into a pandas DataFrame
my_file.seek(0)
import pandas as pd
pd.read_csv(my_file, nrows=10)
Project File
Project File Read in Notebook
The old project-lib has been deprecated. See Deprecation Announcement https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=notebook-using-project-lib-python-deprecated

BigQuery: loading excel file

Is there any way we can load direct excel file into BigQuery, instead of converting to CSV.
I get the files every days in excel format and need to load into BigQuery. Right now converting into CSV manually and loading into BigQuery.
Planning to schedule the job.
If not possible to load the excel files directly into BigQuery then I need to write a process(Python) to convert into CSV before loading into BigQuery.
Please let me know if any better options are there.
Thanks,
I think you could achieve above in a few clicks, without any code.
You need to use Google Drive and external (federated) tables.
1) You could upload manually you excel files to Google Drive or synchronise them
2) In Google Drive Settings find:
"**Convert uploads** [x] Convert uploaded files to Google Docs editor format"
and check it.
To access above option go to https://drive.google.com/drive/my-drive, click on the Gear settings icon and then choose Settings.
Now you excel files will be accessible by Big Query
3) Last part: https://cloud.google.com/bigquery/external-data-drive
You could access you excel file by URI: https://cloud.google.com/bigquery/external-data-drive#drive-uri and then create table manually using above uri.
You could do last step also by API.

Categories