i Want to Zip the CSV File in (Buffer) Using zipFile in Python
Below is My Code Which I Have Tried And Error Log Attached
I Dont want to use the compression in df.to_csv due to Version issue
import pandas as pd
import numpy as np
import io
import zipfile
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))
s_buf = io.StringIO()
df.to_csv(s_buf,index=False)
s_buf.seek(0)
s_buf.name = 'my_filename.csv'
localfile= io.BytesIO()
localzip = io.BytesIO()
zf = zipfile.ZipFile(localzip, mode="w",compression=zipfile.ZIP_DEFLATED)
zf.writestr(localfile, s_buf.read())
zf.close()
with open("D:/my_zip.zip", "wb") as f:
f.write(zf.getvalue())
Error I am Getting
Traceback (most recent call last):
File "C:/Users/Window/PycharmProjects/dfZip/dfZiptest.py", line 25, in <module>
zf.writestr(localfile, s_buf.read())
File "C:\Python\Python37\lib\zipfile.py", line 1758, in writestr
date_time=time.localtime(time.time())[:6])
File "C:\Python\Python37\lib\zipfile.py", line 345, in __init__
null_byte = filename.find(chr(0))
AttributeError: '_io.BytesIO' object has no attribute 'find'
zf = zipFile.ZipFile("localzip.zip", mode="w", compression=zipfile.ZIP_DEFLATED)
zf.write(filename + '.cvs', s_buf.read())
zf.close
What you are doing here is
1 - You initializa the path of the ZipFile
2 - You simply pass the name and then the file you want to be written to the archive. In your case you were passing io.BytesIO() as a name, which made no sense to Python, thus the error.
I would strongly advice you, to resolve any Version issues first, because while 'clever' solution may seem like a quick way out, they tend to rack up a terrible technical debt latter, which can and will be a nightmare to deal with.
You are passing a io.BytesIO() object as the first argument to ZipFile.writestr() where it expects either an archive name or a ZipInfo object.
zf.writestr(localfile, s_buf.read())
zinfo_or_arcname is either the file name it will be given in the
archive, or a ZipInfo instance.
source: Docs
Related
I have a folder named sampledata with 4 excel files (ie, 'b.xlsx', 'call.xlsx', 'Daily.xlsx', 'Whatsapp metadata.xlsx'). I want to read content of each excel file in python.
Can anyone help me?
fi contains path for each file.
import os
import xlrd
import pandas as pd
path='/Users/user78/Downloads/references/forensic/sampledata/'
root,dir,files=next(os.walk(path),[])
print(files)
excel_count=0
text_count=0
excel_files=[]
text_files=[]
for file in files:
fi=os.path.join(root,file)
print(type(fi))
print(fi)
with open(fi,'r') as f:
workbook=xlrd.open_workbook(f)
sheet=workbook.sheet_by_index(0)
for i in range(sheet.ncols):
print(sheet.cell_value(0,i))
Above is my code and the resulting error is given below.
['b.xlsx', 'call.xlsx', 'Daily.xlsx', 'Whatsapp metadata.xlsx', '~$call metadata.xlsx', '~$Whatsapp metadata.xlsx']
<class 'str'>
/Users/user78/Downloads/references/forensic/sampledata/b.xlsx
Traceback (most recent call last):
File "C:\Users\user78\Downloads\references\forensic\sample.py", line 28, in <module>
workbook=xlrd.open_workbook(f)
File "C:\Users\user78\AppData\Local\Programs\Python\Python36\lib\site-packages\xlrd\__init__.py", line 110, in open_workbook
filename = os.path.expanduser(filename)
File "C:\Users\user78\AppData\Local\Programs\Python\Python36\lib\ntpath.py", line 312, in expanduser
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not _io.TextIOWrapper
Can anyone help me?
xlrd.open_workbook() expects a filename as the first parameter, not an already opened file as in your case. Try xlrd.open_workbook(fi) and remove the with-context.
The XLRD-Pypi-page contains a simple example code for reference.
However as already mentioned by Shijith and also clearly stated in the library's description, XLRD should only be used for the "old" excel files.
You can do that by using Pandas.read_excel()
Code Structure
According to pandas.read_excel()
pandas.read_excel(io, sheet_name=0, header=0,
names=None, index_col=None, usecols=None,
squeeze=False, dtype=None, engine=None,
converters=None, true_values=None, false_values=None,
skiprows=None, nrows=None, na_values=None,
keep_default_na=True, na_filter=True, verbose=False,
parse_dates=False, date_parser=None, thousands=None,
comment=None, skipfooter=0, convert_float=True,
mangle_dupe_cols=True, storage_options=None
Code Syntax
import pandas as pd
b_df = pd.read_excel('sampledata/b.xlsx')
b_df_lidt = b_df.to_list() # if you want to convert the data into list.
call_df = pd.read_excel('sampledata/call.xlsx')
call_df_list = call_df.to_list()
....
import gzip
input_file = open("example.bed","rb")#compress existing file
data = input_file.read()
with gzip.open("example.bed.gz", "wb") as filez:
filez.write(data)
filez.close()
fileopen= gzip.open("example.bed.gz", "r+")
output=fileopen.read()
decode=output.decode("utf-8")
import pandas as pd
df=pd.read_csv(decode, delimiter='\t',header=1 )#Works partially, missing '\t' at start of file throws error
df.to_csv('exampleziptotxt.bed', index=False)
Converting gzipped .bed file to .txt file using pandas. The start of the file is chr8\t... because there is no leading \t the FileNotFoundError: [Errno 2] File chr8 is returned. Any advice how to correct input file so that it includes leading \t?
read_csv() accepts either a filename or a file-like object.
You're passing it the content of your file as a string, which it attempts to interpret as a filename, trying to open something called chr8, and fails.
Instead just pass the gzip file handle to the function (as it's a file-like object):
import pandas as pd
with gzip.open("example.bed.gz", "r+") as fileopen:
df = pd.read_csv(fileopen, delimiter="\t", header=1)
I have a large number of txt files that contain the base64 encoding for image files. Each txt file has a single encoding line starting with "data:image/jpeg;base64,/9j/.........". I got the following to work as far as saving the image:
import base64
import os
import fnmatch
os.chdir(r'D:\Users\dubs\slidesets'):)
with open('data.image.jpeg.0bac61939da0c.txt', 'r') as file:
str = file.read().replace('data:image/jpeg;base64,', '')
print str
picname = open("data.image.jpeg.0bac61939da0c.jpg", "wb")
picname.write(str.decode('base64'))
picname.close()
My end goal would be to look in a directory for any txt file with "jpeg" in the name, get and edit the string from it, change to image, and save the image in the same directory with the same filename ('data.image.jpeg.0bff54917a8c7.txt' to 'data.image.jpeg.0bff54917a8c7.jpg').
import fnmatch
import os
import base64
os.chdir(r'D:\Users\dubs\slidesets')
for file in os.listdir(r'D:\Users\dubs\slidesets')
if fnmatch.fnmatch(file, "*jpeg*.txt"):
newname = os.path.basename(file).replace(".txt", ".jpg")
with open(file, 'r') as file:
str = file.read().replace('data:image/jpeg;base64,', '')
picname = open("newname", "wb")
picname.write(str.decode('base64'))
picname.close()
The error that I am getting:
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
AttributeError: 'str' object has no attribute 'decode'
I tried "newname" 'newname' and newname because I was unsure how that works with a variable instead, but that didn't help. Not sure why it works for one file in my top code but not in the loop?
Im trying to download a bz2 compressed tarfile and create a tarfile.TarFile object from it.
import MyModule
import StringIO
import tarfile
tardata = StringIO.StringIO()
tardata.write(MyModule.getBz2TarFileData())
tardata.seek(0)
tar = tarfile.open(fileobj = tardata, mode="r:bz2")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.4/tarfile.py", line 896, in open
return func(name, filemode, fileobj)
File "/usr/lib/python2.4/tarfile.py", line 987, in bz2open
pre, ext = os.path.splitext(name)
File "/usr/lib/python2.4/posixpath.py", line 92, in splitext
i = p.rfind('.')
AttributeError: 'NoneType' object has no attribute 'rfind'
According to the docs (http://docs.python.org/library/tarfile.html#tarfile.open) when you use fileobj= it is used in favor of file name=. Though, it looks like its still trying to access a null file?
If fileobj is specified, it is used as an alternative to a file object
opened for name. It is supposed to be at position 0.
If I don't use tarfile.open() and I decompress the bz2 data and create the tarfile.Tarfile object manually it works with StringIO and fileobj:
>>> import MyModule
>>> import tarfile
>>> import StringIO
>>> import bz2
>>> tardata = StringIO.StringIO()
>>> tardata.write(bz2.decompress(MyModule.getBz2TarFileData()))
>>> tardata.seek(0)
>>> tar = tarfile.TarFile(fileobj=tardata, mode='r')
>>> tar.getmembers()
[<TarInfo 'FileNumber1' at -0x48e150f4>, <TarInfo 'FileNumber2' at -0x48e150d4>, <TarInfo 'FileNumber3' at -0x48e11fb4>]
>>>
I was trying to streamline since tarfile is supposed to support bz2 compression.
I just have thrown a look into tarfile.py on my systems. The line numbers were quite different (I have 2.6), so that I would suppose there was heavy work since 2.4.
Maybe the module had a bug in 2.4 times which has been corrected, or the said interface has changed thus the docs don't match your module version any longer.
It is just a guess, however.
tempfile.mkstemp() returns:
a tuple containing an OS-level handle to an open file (as would be returned by os.open()) and the absolute pathname of that file, in that order.
How do I convert that OS-level handle to a file object?
The documentation for os.open() states:
To wrap a file descriptor in a "file
object", use fdopen().
So I tried:
>>> import tempfile
>>> tup = tempfile.mkstemp()
>>> import os
>>> f = os.fdopen(tup[0])
>>> f.write('foo\n')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IOError: [Errno 9] Bad file descriptor
You can use
os.write(tup[0], "foo\n")
to write to the handle.
If you want to open the handle for writing you need to add the "w" mode
f = os.fdopen(tup[0], "w")
f.write("foo")
Here's how to do it using a with statement:
from __future__ import with_statement
from contextlib import closing
fd, filepath = tempfile.mkstemp()
with closing(os.fdopen(fd, 'w')) as tf:
tf.write('foo\n')
You forgot to specify the open mode ('w') in fdopen(). The default is 'r', causing the write() call to fail.
I think mkstemp() creates the file for reading only. Calling fdopen with 'w' probably reopens it for writing (you can reopen the file created by mkstemp).
temp = tempfile.NamedTemporaryFile(delete=False)
temp.file.write('foo\n')
temp.close()
What's your goal, here? Is tempfile.TemporaryFile inappropriate for your purposes?
I can't comment on the answers, so I will post my comment here:
To create a temporary file for write access you can use tempfile.mkstemp and specify "w" as the last parameter, like:
f = tempfile.mkstemp("", "", "", "w") # first three params are 'suffix, 'prefix', 'dir'...
os.write(f[0], "write something")