How to resolve Koala2 package dependency issue? - python

I have some code that I wrote that relies on the Koala2 package. The code was running a few weeks ago but has since stopped working. The following code throws an error
!pip install koala2
from koala.ExcelCompiler import ExcelCompiler
ImportError Traceback (most recent call last)
<ipython-input-2-715ed1f6c9fe> in <module>()
----> 1 from koala.ExcelCompiler import ExcelCompiler
2 from koala.Spreadsheet import Spreadsheet
3 import pandas as pd
4 import numpy as np
5 import string
1 frames
/usr/local/lib/python3.7/dist-packages/koala/__init__.py in <module>()
2
3 from openpyxl import *
----> 4 from .ast import *
5 from .Cell import *
6 from .ExcelCompiler import *
/usr/local/lib/python3.7/dist-packages/koala/ast/__init__.py in <module>()
7 import networkx
8 from networkx.classes.digraph import DiGraph
----> 9 from openpyxl.compat import unicode
10
11 from koala.utils import uniqueify, flatten, max_dimension, col2num, resolve_range
ImportError: cannot import name 'unicode' from 'openpyxl.compat'
(/usr/local/lib/python3.7/dist-packages/openpyxl/compat/__init__.py)
This code is taken directly from the very first example that they have on PYPI
I can get around this by:
!pip install openpyxl==2.5.14
But then I am no longer able to use Pandas read_excel() because it requires openpyxl >= 3.0.0
This is strange to me because as I mentioned earlier, everything was running ok a few weeks ago and I don't think there are new versions of koala or openpyxl. Is anyone aware of a workaround for this? Any help is greatly appreciated.
Edit: Also it looks like Koalas has been abandoned by its creators. My goal was to read in an excel file, record where the formulas are and be able to apply these formulas to a new excel file or data frame with the same layout and dimensions (# or rows and columns) as the original excel. If there is a better way to do this without koalas any insight would be very helpful. Thanks

Looks like that is known problem from 2019.
Koala code don't updated more than 2 years (maybe you should search newer alternative?).
Here is two possible solutions:
Use older versions of all other libraries in your code to keep dependencies
Patch Koala library manually everytime when you install it:
Until the bugfix is put into the repository, the fix I made locally is: replace the "from openpyxl.compat import unicode" line with "unicode = str" in all the koala files that have it.

Related

Graph.Read_Ncol (csv) for igraph in Python

I'm completely new to coding and Python and am having trouble with the simple task of reading a csv file.
Naturally, I started with:
import pandas as pd
import igraph as ig
I tested the csv using:
test_df = pd.read_csv('griplinks.csv')
print(test_df.head())
It seemed to work because I was able to come up with the output:
From To
0 1 11
1 1 31
2 1 40
3 1 44
4 1 53
However, when it was time to actually read my csv file using:
griplinks = ig.Graph.Read_Ncol('griplinks.csv', directed=False)
I would come up with:
--------------------------------------------------------------------------- InternalError Traceback (most recent call
last) in ()
1 # Attempt 1
2
----> 3 griplinks = ig.Graph.Read_Ncol('griplinks.csv', directed=False)
InternalError: Error at
c:\users\vssadministrator\appdata\local\temp\pip-req-build-ft6_7fco\vendor\build\igraph\igraph-0.8.3-msvc\src\foreign.c:244:
Parse error in NCOL file, line 1 (syntax error, unexpected NEWLINE,
expecting ALNUM), Parse error
Since nothing's really wrong with my csv file or its path, I was wondering if there's something wrong with the code I used to read it?
The documentation is indeed not really clear: it is expected that the nodes are separated by whitespace, not by a comma. It might be easier to actually construct your graph from the pandas dataframe:
griplinks = ig.Graph.DataFrame(test_df)
Note that this was only introduced in python-igraph version 0.8.3, so make sure to use at least that version.

NameError: name 'pd' is not defined when calling a function in custom package

Context
I'm learning python for Data Science and I'm using Foursquare API to explore venues near a coordinate. It returns a JSON file, so I created a function to return a Pandas DataFrame from Foursquare results using 'foursquare' package (github.com/mLewisLogic/foursquare) and then extract append the data to a DataFrame.
The function works in my Jupyter Notebook (you can check the function here https://github.com/dacog/foursquare_api_tools/blob/master/foursquare_api_tools/foursquare_api_tools.py), and I though about making it easier for others and tried to create a package which could be installed using pip directly from github. I successfully created a package and published it to github to test it, but when I'm trying to use the function it returns
NameError: name 'pd' is not defined
Steps to try the package
!pip install git+https://github.com/dacog/foursquare_api_tools.git#egg=foursquare_api_tools
# #hidden_cell
CLIENT_ID = 'Secret' # your Foursquare ID
CLIENT_SECRET = 'Secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
from foursquare_api_tools import foursquare_api_tools as ft
ft.venues_explore(client,lat='40.7233',lng='-74.0030',limit=100)
and I get
NameError Traceback (most recent call last)
<ipython-input-47-0a062ed9d667> in <module>()
3 import pandas as pd
4
----> 5 ft.venues_explore(client,lat='40.7233',lng='-74.0030',limit=100)
/opt/conda/envs/DSX-Python35/lib/python3.5/site-packages/foursquare_api_tools/foursquare_api_tools.py in venues_explore(client, lat, lng, limit)
3 This returns a pandas dataframe with name, city ,country, lat, long, postal code, address and main category as columns'''
4 # creata a dataframe
----> 5 df_a = pd.DataFrame(columns=['Name', 'City', 'Latitude','Longitude','Category','Postal Code', 'Address'])
6 ll=lat+','+lng
7 #get venues using client https://github.com/mLewisLogic/foursquare
NameError: name 'pd' is not defined
I tried import pandas as pd in the main notebook, inside the function, in __init__.py always with the same result.
You can check the code at https://github.com/dacog/foursquare_api_tools
It's the first time I'm creating a package and pretty new to python, so any help will be greatly appreciated.
UPDATES
Pandas is working fine in the environment when I'm doing the tests.
The installed Python versions are:
!which python --> /home/jupyterlab/conda/bin/python
!whereis python
/usr/bin/python /usr/bin/python2.7 /usr/lib/python2.7 /etc/python /etc/python2.7
/usr/local/lib/python2.7 /usr/share/python
/home/jupyterlab/conda/bin/python /home/jupyterlab/conda/bin/python3.6
/home/jupyterlab/conda/bin/python3.6-config /home/jupyterlab/conda/bin/python3.6m /home/jupyterlab/conda/bin/python3.6m-config /usr/share/man/man1/python.1.gz
You are missing a import pandas as pd statement in foursquare_api_tools.py. Just add that line at the top of that file, and you should be good to go.
The clue is in the error: NameError, on line 5 where you call pd.DataFrame, because there is no import statement, Python does not know what the "name" pd means.
In addition to "import pandas as pd" as seaborn to your libraries, use this:
Import pandas as pd
Import seaborn as sns
Sns.set()
This should work in Jupyter notebook

No lines of output shown after importing pandas

I installed pandas through pip, but when I import it, the code runs but no output is shown at all, right after the import statement.
Here's a sample of my code
import xlrd, xlwt
print("1")
import pandas as pd
print("2")
from math import trunc
1 is printed, but 2 isn't. After 1 is printed, the script just hangs for a few seconds and terminates. This occurs regardless of the code written below the import statement. I also seem to get the same error for the openpyxl module. Does anyone know a fix to this?

Python and importing sub-modules - Pandas example

I was trying to use the pandas.tseries.holiday module within pandas, but for some reason it was not showing up. I tried the following:
import pandas as pd
pd.tseries.<TAB>
This does give me a list of options, but holiday was among them. According to the documentation of holiday, it should be as simple as what I tried above.
This was on my system's Python. I tried it in Jupyter using Anaconda, then in Terminal and even in Emacs, but it was never found. So it must be a general design choice that I am unaware of. I have looked for clues, but all information I find tells me that importing a whole module or parts of it is a subjective choice - example: readability versus name-space pollution etc.
Eventually I just tried importing it manually (the next step would have been downloading the actual holiday file from the pandas git repository.
So I did:
from pandas.tseries import holiday # no error
holiday.<TAB>
... and I am shown all the stuff I need - great!
But what is going on here??
Looking at the actual code of holidays.py does not give me any hint as to why the file/module is not imported when I simply import pandas using the statements above.
Edit
Here is some additional information, showing how holiday is not found within pandas.tseries itself, but can be imported and used explicitly:
>>> import pandas as pd
>>> pd.tseries.holiday.USFederalHolidayCalendar()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'pandas.tseries' has no attribute 'holiday'
>>> from pandas.tseries import holiday
>>> holiday.USFederalHolidayCalendar()
<pandas.tseries.holiday.USFederalHolidayCalendar object at 0x7f3b18dc7fd0>
Using simply import pandas as pd does not automatically import all sub-modules of the pandas library (as pointed out by TomAugspurger in the comments above).
This is because the __init.py__ of the pandas library does not import the everything including the holiday sub-module module.
Either adapt the __init__.py file to do so, or be aware that one must explicitly import certain sub-modules of the pandas library!

TypeError: read_excel() takes exactly 2 arguments (1 given)

I get this problem when i try to read file:
import numpy as np
import pandas as pd
pos = pd.read_excel('pos.xls', header=None)
and the error is like this:
Traceback (most recent call last):
File "one-hot.py", line 4, in <module>
pos = pd.read_excel('pos.xls', header=None)
TypeError: read_excel() takes exactly 2 arguments (1 given)
but to my surprise,when i run the code in my own pc by pycharm,it will not be an error.i get the problem only when i use my school's ubuntu(not use pycharm).
my own python is python 2.7.12,and python on school's ubuntu is python 2.7.6
My best guess (I can't try it on Python 2.7.6 since I don't have it) is that You use pandas version 0.13 or bellow. According to docs, You must also provide sheetname, which, in later version, has default value of 0.
pandas.io.excel.read_excel(io, sheetname, **kwds)
This sounds like an issue with a different version of the pandas library installed. Looking back at the older documentation pages for pandas library, it seems that pandas did in fact require 2 parameters back in version 0.13.0 (and potentially other old versions, but I did not check any others). For version 0.13.0, the docs define the function as:
pandas.read_excel(io, sheetname, **kwds)
You can read those details here: http://pandas.pydata.org/pandas-docs/version/0.13.0/generated/pandas.read_excel.html?highlight=read_excel#pandas.read_excel
Chances are, it is just an issue with a different library version.
I actually had a similar problem which was solved by adding '.xlsx' to the end of my proposed file name:
practicetoexcel.to_excel('Thisxldoc.xlsx', sheet_name = 'Practice')

Categories