I have a zipfile on my Google Drive. In that zipfile is a XML file, which I want to parse, extract a specific information and save this information on my local computer (or wherever).
My goal is to use Python & Google Drive API (with help of PyDrive) to achieve this. The workflow could be as follows:
Connect to my Google Drive via Google Drive API (PyDrive)
Get my zipfile id
Load my zipfile to memory
Unzip, obtain the XML file
Parse the XML, extract the desired information
Save it as a csv on my local computer
Right now, I am able to do steps 1,2,4,5,6. But I dont know how to load the zipfile into memory without writing it on my local HDD first.
Following PyDrive code will obtain the zipfile and place it on my local HDD, which is not exactly what I want.
toUnzip = drive.CreateFile({'id':'MY_FILE_ID'})
toUnzip.GetContentFile('zipstuff.zip')
I guess one solution could be as follows:
I could read the zipfile as a string with some encoding:
toUnzip = drive.CreateFile({'id':'MY_FILE_ID'})
zipAsString = toUnzip.GetContentString(encoding='??')
and then, I could somehow (no idea how, perhaps StringIO could be useful) read this string with Python zipfile library. Is this solution even possible? Is there a better way?
You could try StringIO, they emulate files but reside in memory.
Here is the code from a related SO post:
# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'
from StringIO import StringIO
zipdata = StringIO()
zipdata.write(get_zip_data())
myzipfile = zipfile.ZipFile(zipdata)
foofile = myzipfile.open('foo.txt')
print foofile.read()
# output: "hey, foo"
or using a URL:
url = urlopen("http://www.test.com/file.zip")
zipfile = ZipFile(StringIO(url.read()))
Hope this helps.
Eventually, I solved it using BytesIOand cp862 encoding:
toUnzipStringContent = toUnzip.GetContentString(encoding='cp862')
toUnzipBytesContent = BytesIO(toUnzipStringContent.encode('cp862'))
readZipfile = zipfile.ZipFile(toUnzipBytesContent, "r")
Related
How can I read a csv file from Dropbox Online (without downloading it to local machine)?
I installed and imported dropbox in Python, created an app and got a token from "https://www.dropbox.com/developers/apps". While I read some resources online, I am still fuzzy on how I move from here.
Line 149 on the example here provides a function that downloads a file and returns it as a byte string.
https://github.com/dropbox/dropbox-sdk-python/blob/main/example/updown.py
Then just parse the byte string, for example using pandas:
import pandas as pd
from example.updown import download # import the download function from updown.py, or copy or replicate that function, I'm only providing pseudo code here
file_as_bytes = download(dbx, folder, subfolder, name)
df = pd.read_csv(file_as_bytes)
print(df)
Think as suggested here you can just read directly over the URL (but you need to change the arg from ?dl=0 to ?dl=1
e.g.
df = pd.read_csv("https://www.dropbox.com/s/asdfasdfasfasdf/movie_data.csv?dl=1")
I have a really large 7z file in s3 bucket say s3://tempbucket1/Test_For7zip.7z that runs into several tens of GB. I do not want to download it, unzip it and re upload it back to s3. I want to use Boto3 to unzip it on the fly and save it into S3.
I tried to solve this using lzma package based on Previous SO answer which dealt with on the fly unzipping of *.zip files using the fileobj option present in gzip.GzipFile.
from io import BytesIO
import gzip
import lzma
import boto3
# setup constants
bucket = 'tempbucket1'
gzipped_key = 'Test_For7zip.7z'
uncompressed_key = 'Test_Unzip7zip'
# initialize s3 client, this is dependent upon your aws config being done
s3 = boto3.client('s3', use_ssl=False)
s3.upload_fileobj( # upload a new obj to s3
Fileobj=lzma.LZMAFile(
BytesIO(s3.get_object(Bucket=bucket,
Key=gzipped_key)['Body'].read()),
'rb'), # read binary
Bucket=bucket, # target bucket, writing to
Key=uncompressed_key) # target key, writing to
However, this thows the following error
LZMAError: Input format not supported by decoder
Is there a python package that provides can decode 7z files based on BytesIO, or is there a better way of achieving this?
I never tried this, but Googling gave me this as a possible solution. Please reach out through this post if this solves your problem.
I have an excel sheet called last_run.xlsx, and I use a small python code to upload it on slack, as follow:
import os
import slack
token= XXX
client = slack.WebClient(token=slack_token)
response = client.files_upload(
channels="#viktor",
file="last_run.xlsx")
But when I receive it on slack it is a weird zip file and not an excel file anymore... any idea what I do wrong?
Excel files are actually zipped collection of XML documents. So it appears that the automatic file detection of Slack is recognizing it as ZIP file for that reason.
Also manually specified xlsx as filetype does not change that behavior.
What works is if you also specify a filename. Then it will be correctly recognized and uploaded as Excel file.
Code:
import os
import slack
client = slack.WebClient(token="MY_TOKEN")
response = client.files_upload(
channels="#viktor",
file="last_run.xlsx",
filename="last_run.xlsx")
This looks like a bug in the automatic to me, so I would suggest to submit a bug report to Slack about this behavior.
I am not a developer and I have a very limited knowledge of programming. I understand how to use and operate python scripts, however writing them is something I am yet to learn. Please can someone help a total noob :)
I am using an API by sightengine to asses a large folder of .jpg images for their sharpness and colour properties. The documentation for the API only provides a small script for assessing one image at a time. I have spoken to Sight Engines support and they are unwilling to provide a script for batch processing which is bizarre considering all other API companies usually do.
I need some help creating a for loop that will use a python script to iterate through a folder of images and output the API result into a single JSON file. Any help in how to structure this script would be extremely appreciated.
Here is the sightengine code for a simple one image check:
from sightengine.client import SightengineClient
client = SightengineClient("{api_user}", "{api_secret}")
output = client.check('properties','type').set_file('/path/to/local/file.jpg')
Thank you
Part of this is sort of a guess, as I don't know exactly what output will look like. I am assuming it's returned as json format. Which if thats the case, you can append the individual json responses into a single json structure, then use json.dump() to write to file.
So that part is a guess. The other aspect is you want to iterate through your jpg files, which you can do by using os and fnmatch. Just adjust the root directory/folder for it to walk through while it searches fro all the .jpg extensions.
from sightengine.client import SightengineClient
import os
from fnmatch import fnmatch
import json
client = SightengineClient("{api_user}", "{api_secret}")
# Get your jpg files into a list
r = 'C:/path/to/local'
pattern = "*.jpg"
filenames = []
for path, subdirs, files in os.walk(r):
for name in files:
if fnmatch(name, pattern):
#print (path+'/'+name)
filenames.append(path+'/'+name)
# Now iterate through those jpg files
jsonData = []
for file in filenames:
output = client.check('properties','type').set_file(file)
jsonData.append(output)
with open('C:/result.json', 'w') as fp:
json.dump(jsonData, fp, indent=2)
I am new to Python and I am using urllib2 to download files over the internet.
I am using this code
import urllib2
response = urllib2.urlopen('http://www.example.com/myfile.zip')
...
This code actually save the zip file on my temp folder, I don't want it to be like that, I want to save it on my desired location. Is it possible?
You can use the urllib.urlretrieve function, to download the distant file to your local filesystem.
>>> import urllib
>>> urllib.urlretrieve('http://www.example.com/myfile.zip', 'path/to/download/dir/myfile.zip')
See the urllib.urlretrieve documentation for more info.
Simply use something like this :
f = open("path_to_your_file_to_save", 'w')
f.write(urllib.urlopen(url).read())
f.close()
Where path_to_your_file_to_save equals [path_where_save] + [filename.ext]