TSV(Tab separated Value) extension file can't be uploaded to google colab using pandas
Used this to upload my file
import io
df2 = pd.read_csv(io.BytesIO(uploaded['Filename.csv']))
import io
stk = pd.read_csv(io.BytesIO(uploaded['train.tsv']))
A tsv file should be uploaded and read into the dataframe stk
import pandas as pd
from google.colab import files
import io
#firstly, upload file to colab
uploaded = files.upload()
#secondly, get path to file in colab
file_path = io.BytesIO(uploaded['file_name.tsv'])
#the last step is familiar to us
df = pd.read_csv(file_path, sep='\t', header=0)
To save tsv file on google colab, .to_csv function can be used as follows:
df.to_csv('path_in_drive/filename.tsv', sep='\t', index=False, header=False)
stk = pd.read_csv('path_in_drive/filename.tsv') #to read the file
Don't know if this is a solution to your problem as it doesn't upload the files, but with this solution you can import files that are also on your google drive.
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive/My\ Drive/{'.//'}
After mounting you should be able to load files into your script like on your desktop
Related
I tried to get genres of songs in regional-us-daily-latest, and output genres and other datas as csv file. But colab said,
FileNotFoundError: [Errno 2] No such file or directory: 'regional-us-daily-latest.csv'
I mounted My Drive, but still didn't work.
Could you shed some light on this?
!pip3 install spotipy
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import json
from google.colab import drive
client_id = ‘ID’
client_secret = ’SECRET’
client_credentials_manager = spotipy.oauth2.SpotifyClientCredentials(client_id, client_secret)
spotify = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
import csv
csvfile = open('/content/drive/MyDrive/regional-us-daily-latest.csv', encoding='utf-8')
csvreader = csv.DictReader(csvfile)
us = ("regional-us-daily-latest.csv", "us.csv")
for region in (us):
inputfile = region[0]
outputfile = region[1]
songs = pd.read_csv(inputfile, index_col=0, header=1)
songs = songs.assign(Genre=0)
for index, row in songs.iterrows():
artist = row["Artist"]
result = spotify.search(artist, limit=1, type="artist")
genre = result["artists"]["items"][0]["genres"]
songs['Genre'][index] = genre
songs.head(10)
songs.to_csv(outputfile)
files.download(outputfile)
Save the csv file in the Google drive and go to your notebook click on drive and search for your file in the drive . Then copy the path of the csv file in a variable and use the variable using read_csv() method
please mount the drive first
from google.colab import drive
drive.mount('/content/drive')
Change the directory to MyDrive and check current directory
import os
os.chdir("drive/My Drive/")
print(os.getcwd())
!ls
Set the path of file. and use source_file variable where file name required
source_file = os.path.join(os.getcwd(), "regional-us-daily-latest.csv")
I'm trying to automate my code a bit more, but for that I need my program to know the filenames,
uploaded1 = files.upload()
df = pd.read_csv('Formulário sem título1.csv')
I already tried to do like this:
uploaded1 = files.upload()
df = pd.read_csv(uploaded1)
But it doesn't work like that, i don't know if it's the best but i'm thinking of doing something like this:
uploaded1 = files.upload()
file_name = uploaded1[filename]
df = pd.read_csv(uploaded1)
You didn't say what is files and I found files.upload() only in snippets in Google Colabs so I assume that it is from Google Colabs - so it is not part of pandas.
snippets in Google Colabs shows that you can get filenames using .keys()
from google.colab import files
uploaded = files.upload()
for name in uploaded.keys():
print('filename:', name)
print('length:', uploaded[name])
EDIT:
Full working code
from google.colab import files
import pandas as pd
uploaded = files.upload()
for name in uploaded.keys():
print('filename:', name)
print('length:', uploaded[name])
df = pd.read_csv(name)
print(df)
For some reason, when I attempt to read a hdf file from S3 using the pandas.read_hdf() method, I get a FileNotFoundError when I put an s3 url. The file definitely exists and I have tried using the pandas.read_csv() method with a csv file in the same s3 directory and that works. Is there something else I need to be doing? Here's the code:
import boto3
import h5py
import s3fs
import pandas as pd
csvDataframe = pd.read_csv('s3://BUCKET_NAME/FILE_NAME.csv', key='df')
print("Csv data:")
print(csvDataframe)
dataframe = pd.read_hdf('s3://BUCKET_NAME/FILE_NAME.h5', key='df')
print("Hdf data:")
print(dataframe)
Here is the error:
FileNotFoundError: File s3://BUCKET_NAME/FILE_NAME.h5 does not exist
In the actual code, BUCKET_NAME and FILE_NAME are replaced with their actual strings.
Please make sure file extension is .h5
While uploading CSV file to Google drive, it automatically converting to Google Sheets. How to save it as CSV file in drive? or can I read google sheet through pandas data frame ?
Develop environment: Google Colab
Code Snippet:
Input
data = pd.read_csv("ner_dataset.desktop (3dec943a)",
encoding="latin1").fillna(method="ffill")
data.tail(10)
Output
[Desktop Entry]
0 Type=Link
1 Name=ner_dataset
2 URL=https://docs.google.com/spreadsheets/d/1w0...
WORKING CODE
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
worksheet = gc.open('Your spreadsheet name').sheet1
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)
# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)
#Mount the Drive
from google.colab import drive
drive.mount('drive')
#Authenticate you need to do with your credentials, fill yourself
gauth = GoogleAuth()
#Create CSV and Copy
df.to_csv('data.csv')
!cp data.csv drive/'your drive'
I want to download a CSV file stored in Azure storage into a stream and directly used in my python script, but after I did this with help from Thomas, I cannot use pandas read_csv method, the error message is: pandas.io.common.EmptyDataError: No columns to parse from file,thus I assume the download CSV stream is actually empty, but after check in storage account, the CSV file is fine with all data inside it, what the problem here? below is the code from Thomas:
from azure.storage.blob import BlockBlobService
import io
from io import BytesIO, StringIO
import pandas as pd
from shutil import copyfileobj
with BytesIO() as input_blob:
with BytesIO() as output_blob:
block_blob_service = BlockBlobService(account_name='my account', account_key='mykey')
block_blob_service.get_blob_to_stream('my counter', 'datatest1.csv', input_blob)
df=pd.read_csv(input_blob)
print(df)
copyfileobj(input_blob, output_blob)
#print(output_blob)
# Create the a new blob
block_blob_service.create_blob_from_stream('my counter', 'datatest2.csv', output_blob)
if i dont execute the read_csv code, the create_blob_from_stream will create a empty file, but if i execute the read_csv code, i got error:
pandas.parser.TextReader.cinit (pandas\parser.c:6171)
pandas.io.common.EmptyDataError: No columns to parse from file
the download file stored fine in the blob storage with all data in it. as showing below:
i finally figure out, after spend so many time on this !
have to EXECUTE :
input_blob.seek(0)
to use the stream after save the stream to input_blob !!