Why are excel files uploaded as zip file? - python

I have an excel sheet called last_run.xlsx, and I use a small python code to upload it on slack, as follow:
import os
import slack
token= XXX
client = slack.WebClient(token=slack_token)
response = client.files_upload(
channels="#viktor",
file="last_run.xlsx")
But when I receive it on slack it is a weird zip file and not an excel file anymore... any idea what I do wrong?

Excel files are actually zipped collection of XML documents. So it appears that the automatic file detection of Slack is recognizing it as ZIP file for that reason.
Also manually specified xlsx as filetype does not change that behavior.
What works is if you also specify a filename. Then it will be correctly recognized and uploaded as Excel file.
Code:
import os
import slack
client = slack.WebClient(token="MY_TOKEN")
response = client.files_upload(
channels="#viktor",
file="last_run.xlsx",
filename="last_run.xlsx")
This looks like a bug in the automatic to me, so I would suggest to submit a bug report to Slack about this behavior.

Related

Python code to extract single/multiple attachments(.pdf/.png) from .msg file

I have a master folder containing 10-15 .msg files.
Each file may or maynot have attachments either in pdf or png format.
Is there any python code to extract those attachments .
P.S i already tried pywin32..it is specific to just windows.
I am looking to run my code in linux/ubuntu terminal.
This can be done with the package extract_msg as shown below in the form of a MWE (without looping all mail files, without considering overwriting due to duplicate filenames, etcetera).
import extract_msg
with extract_msg.openMsg('Mail.msg') as msg:
for attm in msg.attachments:
file = attm.save()

How can I read a csv file from Dropbox Online?

How can I read a csv file from Dropbox Online (without downloading it to local machine)?
I installed and imported dropbox in Python, created an app and got a token from "https://www.dropbox.com/developers/apps". While I read some resources online, I am still fuzzy on how I move from here.
Line 149 on the example here provides a function that downloads a file and returns it as a byte string.
https://github.com/dropbox/dropbox-sdk-python/blob/main/example/updown.py
Then just parse the byte string, for example using pandas:
import pandas as pd
from example.updown import download # import the download function from updown.py, or copy or replicate that function, I'm only providing pseudo code here
file_as_bytes = download(dbx, folder, subfolder, name)
df = pd.read_csv(file_as_bytes)
print(df)
Think as suggested here you can just read directly over the URL (but you need to change the arg from ?dl=0 to ?dl=1
e.g.
df = pd.read_csv("https://www.dropbox.com/s/asdfasdfasfasdf/movie_data.csv?dl=1")

Unable to open saved excel file using urllib.request.urlretrieve (Sample link mentioned )

Currently, I'm using Flask with Python 3.
For sample purposes, here is a dropbox link
In order to fetch the file and save it, I'm doing the following.
urllib.request.urlretrieve("https://www.dropbox.com/s/w1h6vw2st3wvtfb/Sample_List.xlsx?dl=0", "Sample_List.xlsx")
The file is saved successfully to my project's root directory, however there is a problem. When I try to open the file, I get this error.
What am I doing wrong over here?
Also, is there a way to get the file name and extension from the URL itself? Example, filename = Sample_List and extension = xlsx...something like this.

Python: Get zip file from Google Drive API and load its content

I have a zipfile on my Google Drive. In that zipfile is a XML file, which I want to parse, extract a specific information and save this information on my local computer (or wherever).
My goal is to use Python & Google Drive API (with help of PyDrive) to achieve this. The workflow could be as follows:
Connect to my Google Drive via Google Drive API (PyDrive)
Get my zipfile id
Load my zipfile to memory
Unzip, obtain the XML file
Parse the XML, extract the desired information
Save it as a csv on my local computer
Right now, I am able to do steps 1,2,4,5,6. But I dont know how to load the zipfile into memory without writing it on my local HDD first.
Following PyDrive code will obtain the zipfile and place it on my local HDD, which is not exactly what I want.
toUnzip = drive.CreateFile({'id':'MY_FILE_ID'})
toUnzip.GetContentFile('zipstuff.zip')
I guess one solution could be as follows:
I could read the zipfile as a string with some encoding:
toUnzip = drive.CreateFile({'id':'MY_FILE_ID'})
zipAsString = toUnzip.GetContentString(encoding='??')
and then, I could somehow (no idea how, perhaps StringIO could be useful) read this string with Python zipfile library. Is this solution even possible? Is there a better way?
You could try StringIO, they emulate files but reside in memory.
Here is the code from a related SO post:
# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'
from StringIO import StringIO
zipdata = StringIO()
zipdata.write(get_zip_data())
myzipfile = zipfile.ZipFile(zipdata)
foofile = myzipfile.open('foo.txt')
print foofile.read()
# output: "hey, foo"
or using a URL:
url = urlopen("http://www.test.com/file.zip")
zipfile = ZipFile(StringIO(url.read()))
Hope this helps.
Eventually, I solved it using BytesIOand cp862 encoding:
toUnzipStringContent = toUnzip.GetContentString(encoding='cp862')
toUnzipBytesContent = BytesIO(toUnzipStringContent.encode('cp862'))
readZipfile = zipfile.ZipFile(toUnzipBytesContent, "r")

Reading mbox files with mbox Python module

Good afternoon, I'm working on a kind of spam filter in Python, and I've downloaded some spam and harm emails from this corpus
https://spamassassin.apache.org/publiccorpus/
This is the code I made to read the mbox files
import os
import mailbox
import sys
import pprint
print("Reading emails:")
for mbox_file in os.listdir(os.getcwd()+"/spam"):
print("Processing "+mbox_file)
mbox = mailbox.mbox(mbox_file)
for message in mbox:
print(message['from'])
The thing is that apparently it does not recognize the files, because it never reads anything at all. I create a separate .mbox file, copying the contents of one of the files and it readed perfectly. I Also try reading the files with read() and throws an error message that the file does not exist. I do not know what I'm missing, any help would be nice. Thanks for your time

Categories