Why does openpyxl take forever to import in django? - python

I'm writing (or attempting to write) views to handle importing and exporting xlsx files, but the statement import openpyxl never finishes execution in my view (or anywhere else in my Django application). If I run it from ./manage.py shell it works fine—takes a half-second or so, but works.
My view is as follows, stripped to barebones to make sure there's nothing weird to interfere:
def test_view(request):
import openpyxl
return HttpResponse('Testing')
and the view never loads. I get rid of (comment out) the one line, it works. Same behavior if I try importing at the top of views.py, except then the problem applies to every view. Even if I try to load a subset, like from openpyxl import Workbook, or from openpyxl.workbook import Workbook as suggested here, same deal.
Pertinent info:
openpyxl 2.4.2
django 1.10.2
python 3.4.3
pandas 0.19.2
numpy 1.12.0
Any ideas as to what's happening? Any way I can perhaps get an error message to show up to tell me what's going on?
One way to "fix" the issue but it's not a real fix is by changing the line
_eps = np.finfo('f4').eps
to
_eps = 1.1920929e-07
in pandas.core.indexing. Though the number given seems to be the case for the particular machines I'm using, and may be a common number, I know this is a terrible idea, so I'm still looking for a better solution.
Updated 2017-02-17T07:05: Added/updated version numbers of packages, added everything below the <hr />

Related

Excessive indirect references in NAME formula

I am trying to read in an 'xls' files in python using pandas. My code basically is a one-liner:
import pandas as pd
df = pd.read_excel(str("/test/test_file.xls"))
This code works for the majority of the files, but there are cases when it fails with the error:
Excessive indirect references in NAME formula
What I tried so far:
Tried changing the stack limit(panic and warning) to as far as 10000 in the Pandas package itself, where the exception was occurring. A recursion limit was encountered, so raised it as far as 125000, which led to my Mac/Python reaching its limit so I am guessing not the right solution.
Used a memory-intensive EMR to see if it can read the file - nope.
Looked at the GitHub repo for XLRD here to raise a bug only to find out it's out of support.
Opened the file, saved it as an xlsx, used the same code to read it into a dataframe. Worked like a charm.
Tried using Spark Excel Library to read in a particular section of the data - this worked too but I need to use pandas.
Googled it only to find out the results would show me the XLRD code where the exception is defined. Not one person has reported it.
Tried using Python2 and Python3 with the latest and older versions of Pandas - no use.
I cannot share the file, but has anyone faced this issue before? Can someone help? All suggestions are welcome!
Try the following:
Open the xls file
Copy/paste all cells as values
Rerun your script
Hard to help further without having access to the file to explain exactly what is happening.
But chances are xlrd is trying to resolve the value of a formula and is exceeding the "STACK_PANIC_LEVEL". Without seeing the formula, very difficult to say more.
xlrd has a method of evaluate_name_formula(). When you try to open a .xls file with xlrd, it will raise an error (as you described) if your file has many user-defined formulas. To try to solve your problem, I think you can delete these user-defined formulas and keep the file free of these formulas. Or you can try to edit xlrd code, and prevent it from raising the Error, which seems much more difficult.

Xlwings won't close book after saving

I am currently trying to use xlwings to open a book and update it's links, then save and close. The relevant code I am using is:
import os
import xlwings as xw
app=xw.App(add_book=False)
app.display_alerts=False
for file in os.scandir(dirname):
if (file.name.endswith("Unposted Summary.xlsm")):
path=file.path
tmp=app.books.api.Open(path,UpdateLinks=True)
tmp.save(path)
app.quit()
After having read the documentation several times and using several different methods such as app.quit(), app.kill(), book.close(), etc... I have been unable to get xlwings to close the current book after saving it, so I haven't even approached the question of whether the links are updating properly or not.
I'm guessing the problem is coming from how I'm opening the books. If so, I don't know the syntax to close them.
I don't usually use xlwings, but from what I understand app.books.api.Open calls and returns the COM object, from where I don't even think tmp.save(...) would work (at least not in my case).
A better option would be work directly with xw.Book wrapper instead without the api call:
for file in os.scandir(dirname):
if (file.name.endswith("Unposted Summary.xlsm")):
tmp=app.books.open(file.path, update_links=True)
tmp.save()
tmp.close()
I would also advise you to exercise os.path.abspath and keep in mind your working directory while looping though dirname.

Cannot find csv data file where progress is saved (Python/Pyinstaller base, possibly sys. MEIPASS or NSIS ?)

Qualifier quite new to Python.
I wrote some Python code importing pandas, selenium, sys, os, tkinter, and pillow, then put together with pyinstaller and NSIS.
The programme uses a csv file for its input and updates this based on user actions. The updates are to be saved internally so if the user quits they can continue where they left off.
It all saves properly, user progress saves and picks up correctly, and if I "download CSV" the file is up-to-date. It all works perfectly, functionally speaking.
However, when the programme is run, the csv that it starts with is not where it is saving the progress. It stays the same. The "progress" data is being saved somewhere else From a data security perspective, I need to know where it is saving. I could not find it after hours of looking.
Even if I uninstall the programme and reinstall, it still remembers the progress. Tested also on machines with no Python etc.
I am using:
if getattr(sys, 'frozen', False):
CurrentPath = sys._MEIPASS
else:
CurrentPath = os.path.dirname(__file__)
Could that be it?
The save line itself is rather standard:
df.to_csv('file_s.csv',encoding='utf-8', index=False)
My only other idea is it relates to NSIS installer and the uninstall script. There are painfully few tutorials for beginners on that. If the community thinks that is the issue, I think it best I post a new question with relevant info.
Apologies if this is too vague and happy to provide anymore needed info!
You are defining the CurrentPath variable using the standard "if frozen" approach, which looks OK. But, you don't then seem to use it when you save the file.
Try explicitly concatenating it with the filename to save
df.to_csv(os.path.join(CurrentPath,'file_s.csv'),encoding='utf-8', index=False)
Note that when you run as a executable, sys._MEIPASS will be a temporary folder created (on Windows atleast) somewhere like C:\Users\<you>\AppData\Local\Temp\MEIXXX

Why are the indexing methods missing when working on a dataframe?

My code is an update of an existing script which outputs an xslx file with a lot of data. The original script is pretty stable and has worked for ages.
What I'm trying to do is that, after the original script has ended and the xslx is created, I want to input the file into Pandas and then run a series of analyses on it, using the methods .loc(), .iloc(), .index().
But after I read the file into a variable, when I hit '.' after the variable's name in PyCharm, I get all the dataframe and NDArray methods... except those three that I need.
No errors, no explanations. They are just not there.
And if I ignore this and go on and type them up manually, the variable I put the results into doesn't show ANY methods when I hit '.' for it, next (instead of showing the methods for, say, a series).
I've tried clearing the xslx file of all formatting (it originally had empty lines hidden). I tried running .info() and .head() to make sure they both run fine (They seem to, yes). I even updated my code from Python 2.7 to Python 3.7 using the 2to3 scripts to see if that might change anything. It didn't.
import pandas as pd
analysis_file = pd.read_excel("F:\\myprogram\\output1.xlsx", "Sheet1")
analysis_file. <--- The problem's here
Really not sure how to proceed, and no one I've asked so far has been able to help me.

py2exe and Tableau Python API

First of all, please excuse me if I'm using some of the terminology incorrectly (accountant by trade ...)
I'm writing a piece of code that I was planning to pack as .exe product.
I've already included number of standard libraries (xlrd, csv, math, operator, os, shutil, time, datetime, and xlwings). Unfortunately, when I've added 'dataextract' library my program stopped working.
dataextract is an API written specifically for software called Tableau (one of the leading BI solutions on the market). Also Tableau website says it does not provide any maintenance support for it at the moment.
I've tested it on very basic setup:
from xlwings import Workbook, Sheet, Range
Workbook.set_mock_caller(r'X:\JAC Reporting\Tables\Pawel\Development\_DevXL\Test1.xlsx')
f = Workbook.caller()
s = raw_input('Type in anything: ')
Range(1, (2, 1)).value = s
This works perfectly fine. After adding:
import dataextract as tde
The Console (black box) will only flash on the screen and nothing happens.
Questions:
Does library (in this case 'dataextract') has to meet certain criteria to be compatible with py2exe?
As Tableau does not maintain the original code, does it mean I won't be able to pack it into .exe using py2exe?
Finally: I'm using 'dataextract' for almost 2 years now and as long as you will run the program through .py file it works like a charm :) I just decided to take it one step further.
Any comments/input would be greatly appreciated.
EDIT:
Not sure does it help or not, but when I tried to run the same script using cx_Freeze compiler got below error:
First of all massive thanks to #Andris as he pointed me at the correct direction.
It turned out dataextrac library dlls are not automatically copied while compiler is running. Therefore you need to copy them from 'site-package/dataextrac/bin' into 'dist' folder.
Also from 12 dlls you only need 9 of them (I tried running exe file for each of them). One you don't need are: icin44.dll, msvcp100.dll and msvcr100.dll.
To be on the safe side I will be coping them anyway though.
Hope this post will be any help to otheres :)

Categories