i have an encrypted excel file that i need to work with i know how to read data from that using this method
import io
import pandas as pd
import msoffcrypto
password= 'something'
decrypted_file = io.BytesIO()
with open(path_to_excel, "rb") as file:
excel_file = msoffcrypto.OfficeFile(file)
excel_file.load_key(password)
excel_file.decrypt(decrypted_file)
return decrypted_file
how to read data: From password-protected Excel file to pandas DataFrame
now my question is how to write back to such files?
Related
I am trying to read a gzip file using pandas.read_csv like so:
import pandas as pd
df = pd.read_csv("data.ZIP.gz", usecols=[*range(0, 39)], encoding="latin1", skipinitialspace=True)
But it throws this error:
ValueError: Passed header names mismatches usecols
However, if I manually extract the zip file from gz file, then read_csv if able to read the data without errors:
df = pd.read_csv("data.ZIP", usecols=[*range(0, 39)], encoding="latin1", skipinitialspace=True)
Since I have to read a lot of these files I don't want to manually extract them. So, how can I fix this error?
You have two levels of compression - gzip and zip - but pandas know how to work with only one level of compression.
You can use module gzip and zipfile with io.BytesIO to extract it to file-like object in memory.
Here minimal working code
It can be useful if zip has many files and you want to select which one to extract
import pandas as pd
import gzip
import zipfile
import io
with gzip.open('data.csv.zip.gz') as f1:
data = f1.read()
file_like_object_1 = io.BytesIO(data)
with zipfile.ZipFile(file_like_object_1) as f2:
#print([x.filename for x in f2.filelist]) # list all filenames
#data = f2.read('data.csv') # extract selected filename
#data = f2.read(f2.filelist[0]) # extract first file
data = f2.read(f2.filelist[0].filename) # extract first file
file_like_object_2 = io.BytesIO(data)
df = pd.read_csv(file_like_object_2)
print(df)
But if zip has only one file then you can use read_csv to extract it - it needs to add option compression='zip' because file-like object has no filename and read_csv can't use filename's extension to recognize compressed file.
import pandas as pd
import gzip
import io
with gzip.open('data.csv.zip.gz') as f1:
data = f1.read()
file_like_object_1 = io.BytesIO(data)
df = pd.read_csv(file_like_object_1, compression='zip')
print(df)
use the gzip module to unzip all your files somethings like this
for file in list_file_names:
file_name=file.replace(".gz","")
with gzip.open(file, 'rb') as f:
file_content = f.read()
with open(file_name,"wb") as r:
r.write(file_content)
You can use zipfile module, such as :
import zipfile
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
zip_ref.extractall(directory_to_extract_to)
I've just started out with Pandas and I have gotten my xls file to convert into an xlsx file using Pandas however I now want the file to save to a different loaction such as OneDrive I was wondering if you could help me out?
Here is the code I have written for it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
#Deleting original file
path = (r"C:\Users\MQ\Downloads\Incident Report.xls")
os.remove(path)
print("Original file has been deleted :)")
#Identifying the xls file
excel_file_1 = 'Incident Report.xls'
df_first_shift = pd.read_excel(r'C:\Users\MQ\3D Objects\New Folder\Incident Report.xls')
print(df_first_shift)
#combining data
df_all = pd.concat([df_first_shift])
print(df_all)
#Creating the .xlsx file
df_all.to_excel("Incident_Report.xlsx")
Use pd.ExcelWriter by passing in your destination path!
destination_path = "path\\to\\your\\onedrive\\filename.xlsx"
writer = pd.ExcelWriter(destination_path , engine='xlsxwriter')
df.to_excel(writer, sheet_name='sheetname')
writer.save()
To write to cloud OneDrive the following code is suggested. I did not run it but offer it as a suggestion.
REFER to www.lieben.nu's example for uploading file to onedrive`
import requests
import io
import pandas as pd
def cloudOneDrive(filename, bytesIO):
'''
Reference : https://www.lieben.nu/liebensraum/2019/04/uploading-a-file-to-onedrive-for-business-with-python/
Write to cloud (bytesIO)
'''
data = {'grant_type':"client_credentials",
'resource':"https://graph.microsoft.com",
'client_id':'XXXXX',
'client_secret':'XXXXX'}
URL = "https://login.windows.net/YOURTENANTDOMAINNAME/oauth2/token?api-version=1.0"
# FIXME: put coder top open OneDrive file here as bytes stream
r = requests.put(URL+"/"+filename+":/content", data=bytesIO, headers=headers)
if r.status_code == 200 or r.status_code == 201:
print("succeeded")
return True
else:
print("Fail", r.status_code)
fn = 'junk.xlsx'
with io.BytesIO() as bio:
with pd.ExcelWriter(bio, mode='wb') as xio:
df.to_excel(bio, sheet_name='sh1')
bio.seek(0)
cloudOneDrive(fn, bio)
I was trying to convert xlsb file to xlsx using Python but I am not able to figure out my problem in my all unsuccessful attempts.
Code:
import pandas as pd
import os
import glob
source='C:\\Users\\JS Developer\\sample.xlsb'
dest= 'C:\\Users\\JS Developer\\Desktop\\New folder'
os.chdir(source)
for file in glob.glob("*.xlb"):
df.to_csv(dest+file+'.csv', index=False)
os.remove(file)
for file in glob.glob("*.xlsb"):
df = pd.read_excel(file)
df.to_csv(dest+file+'.csv', index=False)
os.remove(file)
Once you read the excel and stored it in pandas dataframe save it as
df.to_excel(r'Path\name.xlsx')
Try:
for file in glob.glob("*.xlsb"):
df = pd.read_excel(file)
df.to_excel(dest+file+'.xlsx', index = None, header=True)
os.remove(file)
I want to take a PDF File as an input. And as an output file I want a csv file to show. So all the textual data which is there in the pdf file should be converted to a csv file. But I am not understanding how would this happen..I need your help at the earliest as I've tried to do but couldn't do it.
what ive done is used a library called Tabula-py which converts pdf to csv file. It does create a csv format but there are no contents being copied to the csv file from the pdf file.
heres the code
from tabula import convert_into,read_pdf
import tabula
df = tabula.read_pdf("crimestory.pdf", spreadsheet=True,
pages='all',output_format="csv")
df.to_csv('crimestoryy.csv', index=False)
the output should come as a csv file where the data is present.
what i am getting is a blank csv file.
I have find answer to this question by my own
To tackle this issue I came up with converting the pdf file into a text file. Then I converted this text file to a csv file.here's my code.
conversion.py
import os.path
import csv
import pdftotext
#Load your PDF
with open("crimestory.pdf", "rb") as f:
pdf = pdftotext.PDF(f)
# Save all text to a txt file.
with open('crimestory.txt', 'w') as f:
f.write("\n\n".join(pdf))
save_path = "/home/mayureshk/PycharmProjects/NLP/"
completeName_in = os.path.join(save_path, 'crimestory' + '.txt')
completeName_out = os.path.join(save_path, 'crimestoryycsv' + '.csv')
file1 = open(completeName_in)
In_text = csv.reader(file1, delimiter=',')
file2 = open(completeName_out, 'w')
out_csv = csv.writer(file2)
file3 = out_csv.writerows(In_text)
file1.close()
file2.close()
Try this, hope it will works
import tabula
# convert PDF into CSV
tabula.convert_into("crimestory.pdf", "crimestory.csv", output_format="csv", pages='all')
or
df = tabula.read_pdf("crimestory.pdf", encoding='utf-8', spreadsheet=True, pages='all')
df.to_csv('crimestory.csv', encoding='utf-8')
or
from tabula import read_pdf
df = read_pdf("crimestory.pdf")
df
#make sure df displays your pdf contents in the output
from tabula import convert_into
convert_into("crimestory.pdf", "crimestory.csv", output_format="csv")
!cat.crimestory.csv
Trying to write some Chinese letters in CSV file using StringIo object.
Here is my code:
import csv
import io
csvRow=['emp_name','Erica Meyers','中国日报网','IT']
data_temp = io.StringIO()
writer1 = csv.writer(data_temp, delimiter=',')
writer1.writerow(csvRow)
and data_temp object attaching to jira :
thisJira.add_attachment(issue=new_issue, attachment=data_temp, filename='CSVResult.csv')
In csv file I got this charactes:
Csv Result