How to load CSV file from GCS in read only mode

How to load CSV file from GCS in read only mode - python

file_name = "r1.csv"
client = storage.Client()
bucket = client.get_bucket('upload-testing')
blob = bucket.get_blob(file_name)
blob.download_to_filename("csv_file")
Want to Open r1.csv file in read only Mode.
Getting this Error
with open(filename, 'wb') as file_obj:
Error: [Errno 30] Read-only file system: 'csv_file'
so the function download_to_filename open files in wb mode is there any way threw which i can open r1.csv in read-only mode

As mentioned in previous answer you need to use the r mode, however you don't need to specify that since that's the default mode.
In order to be able to read the file itself, you'll need to download it first, then read its content and treat the data as you want. The following example downloads the GCS file to a temporary folder, opens that downloaded object and gets all its data:
storage_client = storage.Client()
bucket = storage_client.get_bucket("<BUCKET_NAME>")
blob = bucket.blob("<CSV_NAME>")
blob.download_to_filename("/tmp/test.csv")
with open("/tmp/test.csv") as file:
data = file.read()
<TREAT_DATA_AS_YOU_WISH>
This example is thought to run inside GAE.

If you want to open a read only file you should use 'r' mode, 'wb' means write binary:
with open(filename, 'r') as file_obj:

Related

Create a temp tar.gz file in azure function and write ftp server file content on that temp file in python

I have a task requirement of reading a tar.gz file from a ftp server and store it on a blob storage.
How I think I can accomplish is that I must first create a temp file in azure function temp directory, write all content on it, close it and then upload it on the blob storage.
What I have done so far is:
fp = tempfile.NamedTemporaryFile()
filesDirListInTemp = listdir(tempFilePath)
logging.info(filesDirListInTemp)
try:
with open('/tmp/fp', 'w+') as fp:
data = BytesIO()
save_file = ftp.retrbinary('RETR '+ filesDirListInTemp, data.write, 1024)
data.seek(0)
blobservice=BlobClient.from_connection_string(conn_str=connection_string,container_name=container_name,blob_name=filename,max_block_size=4*1024*1024,max_single_put_size=16*1024*1024)
blobservice.upload_blob(gzip.decompress(data.read()))
print("File Uploaded!")
except Exception as X:logging.info(X)
But I am getting error as: expected str, bytes or os.PathLike object, not list.
Please tell me what I am doing wrong here?

Cannot upload compressed file to Cloud Storage with Python 3 writing with gzip.open()

When I try to upload a compressed gzip file to cloud storage using a python script on a Cloud Shell instance, it always upload an empty file.
Here's the code to reproduce the errors:
import gzip
from google.cloud import storage
storage_client = storage.Client()
list=['hello', 'world', 'please', 'upload']
out_file=gzip.open('test.gz', 'wt')
for line in list:
out_file.write(line + '\n')
out_file.close
out_bucket = storage_client.bucket('test-bucket')
out_blob = out_bucket.blob('test')
out_blob.upload_from_filename('test.gz')
It uploads only an empty file named 'test' on my bucket which is not what I expect.
However, my file written in my Cloud Shell is not empty because when I do zcat test.gz it shows the expected content:
hello
world
please
upload

To understand what's happening in your code, here's a description from gzip docs:
Calling a GzipFile object’s close() method does not close fileobj, since you might wish to append more material after the compressed data.
This explains why file objects not being closed affects the upload of your file. Here's a supporting answer which describes the behavior of your code where the fileobj is not being closed, where:
The warning about fileobj not being closed only applies when you open the file, and pass it to the GzipFile via the fileobj= parameter. When you pass only a filename, GzipFile "owns" the file handle and will also close it.
The solution is to not pass the gzipfile via fileobj = parameter and to rewrite it like this:
import gzip
from google.cloud import storage
storage_client = storage.Client()
list=['hello', 'world', 'please', 'upload']
with gzip.open('test.gz', 'rt') as f_in, gzip.open('test.gz', 'wt') as f_out:
for line in list:
f_out.writelines(line + '\n')
out_bucket = storage_client.bucket('test-bucket')
out_blob = out_bucket.blob('test.gz') # include file format in dest filename
out_blob.upload_from_filename("test.gz")

Use seek, write and readline methods on a CSV file stored on Google Cloud Storage (bucket)

I have multiple methods on my Python script to work with a csv file. It's working on my local machine but it does not when I am working with the same csv file stored inside a Google Cloud Storage bucket. I need to keep track of my current_position in the file so this is why I am using seek() and tell(). I tried to use the pandas library but there is no such methods. Does anyone has a basic example of a Python script to read a csv stored in a GCP bucket with those methods?
def read_line_from_csv(position):
#df = pandas.read_csv('gs://trends_service_v1/your_path.csv')
with open('keywords.csv') as f:
f.seek(position)
keyword = f.readline()
position = f.tell()
f.close()
return position, keyword
def save_new_position(current_positon):
f = open("position.csv", "w")
f.write(str(current_positon))
f.close()
update_csv_bucket("position.csv")
def get_position_reader():
try:
with open('position.csv') as f:
return int(f.readline())
except OSError as e:
print(e)

Official library do not have such capabilities I think.
You can download file first than open it and work normally.
Apart from official one you can use gcsfs which implements missing functionality
import gcsfs
fs = gcsfs.GCSFileSystem(project='my-google-project')
with fs.open('my-bucket/my-file.txt', 'rb') as f:
print(f.seek(location))

Another way other than #emil-gi's suggestions would be to use the method mentioned here
#Download the contents of this blob as a bytes object
blob.download_as_string()
Where blob is the object associated with your CSV in your GCS bucket.
If you need to create the connection to the blob first (I don't know what you do in other parts of the code), use the docs

You can use Google Cloud Storage fileio.
For instance:
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(file_path) #folder/filename.csv
#Instantiate a BlobReader
blobReader=storage.fileio.BlobReader(blob)
#Get current position in your file
print(blobReader.tell())
#Read line by line
print(blobReader.readline().decode('utf-8')) #read and print row 1
print(blobReader.readline().decode('utf-8')) #read and print row 2
#Read chunk of X bytes
print(blobReader.read(1000).decode('utf-8')) #read next 1000 bytes
#To seek a specific position.
blobReader.seek(position)

Paramiko Download, process and re-upload the same file

I am using Paramiko to create an SFTP client to create a backup copy of a JSON file, read in the contents of the original, then update (the original). I am able to get this snippet of code to work:
# open sftp connection stuff
# read in json create backup copy - but have to 'open' twice
read_file = sftp_client.open(file_path)
settings = json.load(read_file)
read_file = sftp_client.open(file_path)
sftp_client.putfo(read_file, backup_path)
# json stuff and updating
new_settings = json.dumps(settings, indent=4, sort_keys = True)
# update remote json file
with sftp_client.open(file_path, 'w') as f:
f.write(new_settings)
However when I try to clean up the code and combine the backup file creation and JSON load:
with sftp_client.open(file_path) as f:
sftp_client.putfo(f, backup_path)
settings = json.load(f)
The backup file will be created but json.load will fail to due not having any content. And if I reverse the order, the json.load will read in the values, but the backup copy will be empty.
I'm using Python 2.7 on a Windows machine, creating a remote connection to a QNX (Linux) machine. Appreciate any help.
Thanks in advance.

If you want to read the file second time, you have to seek file read pointer back to the file beginning:
with sftp_client.open(file_path) as f:
sftp_client.putfo(f, backup_path)
f.seek(0, 0)
settings = json.load(f)
Though that is functionally equivalent to your original code with two open's.
If you aim was to optimize the code, to avoid downloading the file twice, you will have to read/cache the file to memory and then upload and load the contents from the cache.
f = BytesIO()
sftp_client.getfo(file_path, f)
f.seek(0, 0)
sftp_client.putfo(f, backup_path)
f.seek(0, 0)
settings = json.load(f)

Python FTPLib Won't Upload Complete File

I am using the following code to upload a SQLITE3 Database file. For some reason, the script does not completely upload the file (the uploaded filesize is less than the original)
FTP = ftplib.FTP('HOST','USERNAME','PASSWORD')
FTP.cwd('/public_html/')
FILE = 'Database.db';
FTP.storbinary("STOR " + FILE, open(FILE, 'r'))
FTP.quit()
When I go to open the uploaded file in SQLite Browser, it says it is an invalid file.
What am I doing incorrectly?

In the open() call, you need to specify that the file is a binary file, like so:
FTP.storbinary("STOR " + FILE, open(FILE, 'rb'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to load CSV file from GCS in read only mode - python

If you want to open a read only file you should use 'r' mode, 'wb' means write binary: with open(filename, 'r') as file_obj:

Related

Create a temp tar.gz file in azure function and write ftp server file content on that temp file in python

Cannot upload compressed file to Cloud Storage with Python 3 writing with gzip.open()

Use seek, write and readline methods on a CSV file stored on Google Cloud Storage (bucket)

Paramiko Download, process and re-upload the same file

Python FTPLib Won't Upload Complete File

Categories

Resources