pandas pd.read_excel() returning empty dictionary - python

I am a novice Python programmer and I am having an issue loading an xlsx workbook with the pd.read_excel() function. The pandas read_excel documentation says that specifying 'sheet_name = None' should return "All sheets as a dictionary of DataFrames", however I am getting an empty dictionary back:
template_workbook = pd.read_excel(template_path, sheet_name=None, index_col=None)
template_workbook
Returns:
OrderedDict()
When I try to print the worksheet names in the dictionary:
template_workbook.sheet_name
Returns:
AttributeErrorTraceback (most recent call last)
<ipython-input-67 e76a0b915981> in <module>()
----> 1 template_workbook.sheet_name
AttributeError: 'OrderedDict' object has no attribute 'sheet_name'
It is not clear to me why the worksheets are not being listed in the output dictionary. Any tips are greatly appreciated.
I have 26 tabs/sheets, and am trying to fill 23 using the tab names for indexing.

When you use read_excel with multiple sheets, pandas will return a dictionary:
Returns: DataFrame or Dict of DataFrames
If you have an dictionary, you can use the .keys() method to see the file tabs, as in:
print(template_workbook.keys())

I found this post through Google as I ran into this same problem. Unfortunately, no errors were thrown which is not very helpful, so I'm posting this answer to help the next person who might find this.
The read_excel function in Pandas doesn't exhaustively support ALL Excel functionality. This means if you are using some advanced Excel functionality (named ranges) your data might not be parsed correctly when Pandas tries to read your Excel data.
I tried to simplify my Excel file as much as possible which still didn't work, so I created a new Excel Workbook and copied my data in sheet by sheet. This ended up working for me.
So my advice is to keep your Excel file as simple as possible and you'll probably be able to import it with Pandas. If you send over your exact Excel file I'm happy to help debug (I know this is coming years after the question though).

Related

Export SAS lib to csv with correct date format (in CSV file)

I use:
Python 3.7
SAS v7.1 Eterprise
I want to export some data (from library) from SAS to CSV. After that I want to import this CSV to Pandas Dataframe and use it.
I have problem, because when I export data from SAS with this code:
proc export data=LIB.NAME
outfile='path\to\export\file.csv'
dbms=csv
replace;
run;
Every column were exported correctly instead of Column with Date. In SAS I see something like:
06NOV2018
16APR2018
and so on... In CSV it looks the same. But if i import this CSV to DataFrame, unfortunatelly, Python see the column with date as Object/string instead of date type.
So here is my question. How Can I export whole library to CSV from SAS with correct type of column (ecpessially column with Date). Maybe I should convert something before Export? Plz help me with this, In SAS I'm new, i want to just import Data from it and use it in Python.
Before you write something, keep in mind, that I had tried with pandas read_sas function, but during this command I've got such Exception with error:
df1 = pd.read_sas(path)
ValueError: Unexpected non-zero end_of_first_byte Exception ignored
in: 'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback
(most recent call last): File "pandas\io\sas\sas.pyx", line 31, in
pandas.io.sas._sas.rle_decompress
I put fillna function and show the same error :/
df = pd.DataFrame.fillna((pd.read_sas(path)), value="")
I tried with sas7bdat module in Python, but I've got the same error.
Then I tried with sas7bdat_converter module. But CSV has the same values in Date column, so problem with dtype will arrive after convert csv to DataFrame.
Have you got any sugestions? I've spent 2 days tried to figure it out, but without any positive results :/
Regarding the read_sas error, a Git issue has been reported but closed for lack of reproducible example. However, I can easily import SAS data files with Pandas using .sas7bdat files generated from SAS 9.4 base (possibly the v7.1 Enterprise is the issue).
However, consider using parse_dates argument of read_csv as it can convert your date DDMMMYY format to datetime during import. No change needed with your SAS exported dataset.
sas_df = pd.read_csv(r"path\to\export\file.csv", parse_dates = ['DATE_COLUMN'])

Using Python to load template excel file, insert a DataFrame to specific lines and save as a new file

I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)
I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.
Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()

Python xlwt produces AttributeError when searching for empty cell in Excel spreadsheet file

I have an Excel file and I am using Python to fill its rows and columns.
I want to use the following function to find the first empty row in the table and fill it:
from xlwt import Workbook, easyxf
def next_available_row(sheet):
str_list = filter(None, sheet.col_values(1)) # error
return str(len(str_list)+1)
wb=Workbook()
sheet=wb.add_sheet('sheet1')
sheet.write(0,0,'item')
sheet.write(0,1,'cost')
sheet.write(next_available_row(sheet),0,'potato')
sheet.write(next_available_row(sheet),1,4)
but I get the following error:
AttributeError: 'sheet' object has no attribute 'col_values'
What should I do?
The library you are using, xlwt, is for writing .xls spreadsheets only, and does not have the method col_values (to read its contents), as the error message already states (correctly).
The function next_available_row() (from How to find the first empty row of a google spread sheet using python GSPREAD?) that you want to use to search for an empty cell is based on a different library, gspread, and that is apparently not for Excel files (e.g. .xls, note there are several versions of this file type).
So you probably are looking for an entirely different library, one that reads and writes Excel files.
http://www.python-excel.org/ lists several libraries (including your xlrd):
https://pypi.python.org/pypi/xlrd
https://pypi.python.org/pypi/xlwt
https://pypi.python.org/pypi/XlsxWriter
https://pypi.python.org/pypi/openpyxl
Or maybe try to manage something by reading the file first, e.g. with xlwt's sister project, xlrd.
Seems that has no col_values method on xlwt API. http://xlwt.readthedocs.io/en/latest/api.html
Maybe using together the xlrd you can reach your goal.
http://xlrd.readthedocs.io/en/latest/api.html?highlight=col_values#xlrd-sheet

Pandas read csv - dealing with mixed named/nameless columns

I am trying to open a csv file using pandas.
This is a screenshot of the file opened in excel.
Some columns have names and some do not. When trying to read this in with pandas I get the "ValueError: Passed header names mismatches usecols" error.
When I open part of the file in excel, add column names, save, and then import with pandas it works.
The problem is the files are large and cannot fully open in excel (plus I'd prefer a more elegant solution anyway).
Is there a way to deal with this issue in pandas?
I have read answers to other questions regarding this error but none were relevant.
Thanks so much in advance!
In names you can provide column names:
df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', names=['col1', 'col2', 'col3'], engine='python')

Reading an excel data set saved as CSV file in pandas

There is a very similar question to the one I am about to ask posted here:
Reading an Excel file in python using pandas
Except when I attempt to use the solutions posted here I am countered with
AttributeError: 'DataFrame' object has no attribute 'read'
All I want to do is convert this excel sheet into the pandas format so that I can preform data analysis on some of the subjects of my table. I am super new to this so any information, advice, feedback or whatever that anybody could toss my way would be greatly appreciated.
Heres my code:
import pandas
file = pandas.read_csv('FILENAME.csv', 'rb')
# reads specified file name from my computer in Pandas format
print file.read()
By the way, I also tried running the same query with
file = pandas.read_excel('FILENAME.csv', 'rb') returning the same error.
Finally, when I try to resave the file as a .xlsx I am unable to open the document.
Cheers!
read_csv() return a dataframe by itself so there is no need to convert it, just save it into dataframe.
I think this should work
import pandas as pd #It is best practice to import package with as a short name. Makes it easier to reference later.
file = pd.read_csv('FILENAME.csv')
print (file)
Your error message means exactly what it says: AttributeError: 'DataFrame' object has no attribute 'read'
When you use pandas.read_csv you're actually reading the csv file into a dataframe. BTW, you don't need the 'rb'
df = pandas.read_csv('FILENAME.csv')
You can print (df) but you can not do print(df.read()) because the dataframe object doesn't have a .read() attribute. This is what's causing your error.

Categories