Pandas read_excel - python

I struggled for a few hours how to read an excel file with pd.read_excel where the path is a website address. I figured out that the link doesn't go directly to the file but just triggers downloading. Is there any easy way to solve it?
Part of code:
link_energy = 'http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls'
df_energy = pd.read_excel(link_energy)
Error message:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n<!DOC'
Probably it's not a problem of pandas but my lack of skills how do do it.

For me works everything as expected in the following code:
import pandas as pd
link_energy = 'http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls'
df_energy = pd.read_excel(link_energy)
df_energy
without errors on the following env:
The version of the notebook server is: 5.2.2
The server is running on this version of Python:
Python 3.6.3 | packaged by conda-forge | (default, Nov 4 2017, 10:10:56)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
Current Kernel Information:
Python 3.6.3 | packaged by conda-forge | (default, Nov 4 2017, 10:10:56)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

However I am not having access to your url posted.
but pd.read_excel won't work and you need to use pd.read_csv
import pandas as pd
df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls')
Now you need to see the excel file what it contains what is the separator used, if there are any other values in any columns then it needs to be skipped in order to load and read useful data.

Related

%% Cell magic tag not working in Jupyter notebook?

I'm new to Jupyter notebook and I'm trying to set one up with Python and R, using rpy2. I have the line
%%R -i df
which gives me the error SyntaxError: invalid syntax
However when I use just one %, such as
%R require(ggplot2)
this works fine. How can I fix this issue? I am using Python 2.7.
% prefix is for a line magic, whereas %% prefix is for a cell magic.
%%R # <-- must be the only instruction on this line
{body of cell in R code}
whereas:
%R {one line of R code}
I don't have R installed, but I think you may have wanted to call a bash command on R; in that case, use ! to call the command:
!R -i df
for instance, if I type !python -i, I get info about my current python environment:
Python 3.6.2 |Anaconda custom (x86_64)| (default, Jul 20 2017, 13:14:59)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
This is an old question but I run into the same problem today. Reblochon Masque's answer no longer applies as of today. %%R -i df in the middle of a cell throws invalid syntax. Also !R -i df throws WARNING: unknown option '-i' ARGUMENT 'df' __ignored__.
Here is what I need to do to cell magic R to work:
follow this instruction to install R packages RJSONIO and httr (package name is indeed lowercase) and Python package rpy2. No further configuration is needed.
put just one line %load_ext rpy2.ipython in a cell to invoke rpy2.
%%R has to be at the very beginning of a cell, so I put this line in the next cell.
the body of cell cannot be empty after %%R.
%%R -i df does work as intended.

How can I get the file names in directory with Chinese correctly in python?

How can I get the names list in the current working directory with Chinese appropriately in Python?
For example, in my demo folder, I have four files: "folder_中文" "folder_a" "folder_b" "folder_c"
in R I can use the following command to achieve this:
Sys.setlocale(category = "LC_ALL", locale = "zh_cn.utf-8")
setwd("~/desktop/example")
filenames=list.files()
filenames
"folder_中文" "folder_a" "folder_b" "folder_c"
but I failed to achieve this in Python with Anaconda, although if I don't assign it to filenames, the output looks fine (see below); the Chinese is not correct in filenames.
# -*- coding: utf-8 -*-
import os
os.chdir('/Users/../Desktop/example')
! ls
filenames = ! ls
filenames
folder_a folder_b folder_c folder_中文
['folder_a', 'folder_b', 'folder_c', 'folder_\xe4\xb8\xad\xe6\x96\x87']
but If I continue to type
print(filenames)
print(filenames[3])
The Chinese can be observed if I extract this specific element and print it directly.
['folder_a', 'folder_b', 'folder_c', 'folder_\xe4\xb8\xad\xe6\x96\x87']
folder_中文
The last thing I want to highlight is that, if I type Chinese directly, I can see Chinese correctly only if I use the print in a explicit manner. So with or without print makes a big difference on it.
print('中文')
'中文'
中文
Out[65]: '\xe4\xb8\xad\xe6\x96\x87'
My OS is Mac El Capitan (10.11.5) and the version of Anaconda is:
2.7.13 |Anaconda custom (x86_64)| (default, Dec 20 2016, 23:05:08)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
I recall an issue I helped with regarding Chinese characters for Python2. Python3 doesn't have the issue. I believe you need to add the following to your ~.bash_profile:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
Then:
source ~/.bash_profile

iPython: DLL load failed: The specified module could not be found; plain Python fine

I keep getting this (well known) error in iPython. Yet, the same import works fine in plain Python. (Python 3.3.5, see details below)
iPython:
Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 10:37:12) [MSC v.1600 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.
IPython 2.0.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import test1
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-7-ddb30f03c287> in <module>()
----> 1 import test1
ImportError: DLL load failed: The specified module could not be found.
Python (not only it loads fine, it also works):
$ python
Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 10:37:12) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import test1
>>>
Now, Dependency Walker on test1.pyd shows this
[ ? ] LIBGCC_S_DW2-1.DLL Error opening file. The system cannot find the file specified (2).
[ ? ] LIBSTDC++-6.DLL Error opening file. The system cannot find the file specified (2).
[ ? ] PYTHON33.DLL Error opening file. The system cannot find the file spec
I even overwrote sys.path in iPython with the one from plain Python. The file test1.pyd is in C:\Test.
['c:\\Test',
'c:\\WinPython-32bit-3.3.5.0\\python-3.3.5\\python33.zip',
'c:\\WinPython-32bit-3.3.5.0\\python-3.3.5\\DLLs',
'c:\\WinPython-32bit-3.3.5.0\\python-3.3.5\\lib',
'c:\\WinPython-32bit-3.3.5.0\\python-3.3.5',
'c:\\WinPython-32bit-3.3.5.0\\python-3.3.5\\lib\\site-packages',
'c:\\WinPython-32bit-3.3.5.0\\python-3.3.5\\lib\\site-packages\\FontTools',
'c:\\WinPython-32bit-3.3.5.0\\python-3.3.5\\lib\\site-packages\\win32',
'c:\\WinPython-32bit-3.3.5.0\\python-3.3.5\\lib\\site-packages\\win32\\lib',
'c:\\WinPython-32bit-3.3.5.0\\python-3.3.5\\lib\\site-packages\\Pythonwin']
Why would the import work in plain Python but not in iPython?
I have encountered the same problem. After hours looking and thinking I found out the cause. The difference is environment variables between interpreters (plain python and ipython or pycharm etc.). I think your can use %env in ipython to check the environment variables. In plain python, use (works in python 3.7):
import os
os.environ
Then if there are differences, maybe you should set the right one before your run.
Actually there are multiple ways to set envs. For example
os.environ['key']='value' #Both key and value are strings
or
os.putenv('key', 'value')
Here key is the name of the environment variable, and value is the value you want to set it to.
Hope this helps you.~~~///(^v^)\\~~~

Astroquery: AttributeError("'bytes' object has no attribute 'encode'")

I would like to run the NED query of the astroquery package.
from astroquery.ned import Ned
result_table = Ned.query_object("NGC 224")
this is given by the example in the Documentary.
I got the following issue:
TableParseError: Failed to parse NED result! The raw response can be found in
self.response, and the error in self.table_parse_error.
In self.table_parse_error I find:
AttributeError("'bytes' object has no attribute 'encode'")
I have no idea what has gone wrong.
Here are my versions:
Python 3.4.1 (default, May 19 2014, 17:23:49) [GCC 4.9.0 20140507 (prerelease)]
SciPy 0.14.0
Cython 0.20.1
OS posix [linux]
Numpy 1.8.1
IPython 2.1.0
This was raised as an issue on astroquery and is a confirmed bug:
https://github.com/astropy/astroquery/pull/343
it should be fixed shortly

python import cx_Oracle error on cygwin

I tried to install cx_Oracle from pypi source since there is no available port for it in cygwin. I did make some changes as suggested in http://permalink.gmane.org/gmane.comp.python.db.cx-oracle/2492 and modified my setup.py. However, I still get the following error :-
$ python
Python 2.7.3 (default, Dec 18 2012, 13:50:09)
[GCC 4.5.3] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cx_Oracle
/usr/lib/python2.7/site-packages/cx_Oracle-5.1.3-py2.7-cygwin-1.7.24-i686.egg/cx_Oracle.py:3: UserWa
rning: Module cx_Oracle was already imported from /usr/lib/python2.7/site-packages/cx_Oracle-5.1.3-p
y2.7-cygwin-1.7.24-i686.egg/cx_Oracle.pyc, but /home/zerog/cx_Oracle-5.1.3 is being added to sys.pat
h
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.cygwin-1.7.24-i686/egg/cx_Oracle.py", line 7, in <module>
File "build/bdist.cygwin-1.7.24-i686/egg/cx_Oracle.py", line 6, in __bootstrap__
ImportError: Exec format error
>>>
If someone can please help me fix this ?
TIA.
Fixed this by specifying the path to instantclient as below :
$ export PATH=$PATH:/cygdrive/d/Tools/instantclient_11_2
(Other, possibly important stuff) :
$ echo $LD_LIBRARY_PATH
/cygdrive/d/Tools/instantclient_11_2
$ echo $ORACLE_HOME
/cygdrive/d/Tools/instantclient_11_2
Now, I get :-
$ python
Python 2.7.3 (default, Dec 18 2012, 13:50:09)
[GCC 4.5.3] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cx_Oracle
>>>
It's hard to pin down from the error message alone, but I am guessing that you have two different copies of cx_Oracle in your sys.path. The error message is complaining that a different version of the same module had already been import-ed.
Presumably the pristine upstream version is installed system-wide in /usr/lib/python2.7/site-packages/cx_Oracle-5.1.3-py2.7-cygwin-1.7.24-i686.egg, and your modified version in /home/zerog/cx_Oracle-5.1.3.
Does it work if you pare down sys.path so that only the original, or only your modified version, is included?
(You might want to use virtualenv if you need to switch back and forth between two versions frequently.)
I ran into this error "Exec format error."
For me, this was likely caused by a mismatch between cygwin being installed as 64 bit, but the instant client being installed as 32 bit. Double check that everything (oracle, cygwin) is either 32 bit or 64 bit.
What fixed my issue:
Since my cygwin is 64 bit (see uname -a, and look for x86_64), I downloaded the 64 bit instant client from oracle's website, and unzipped
I set the env vars in .profile, to point where it was unzipped:
export ORACLE_HOME=/cygdrive/c/oracle/instantclient_x64_11_2
export LD_LIBRARY_PATH=$ORACLE_HOME
export DYLD_LIBRARY_PATH=$ORACLE_HOME
export TNS_ADMIN='//optional/path/to/your/oracle/tns/files/'
source ~/.profile
To test, you should now be able to run this python command with no error:
import cx_Oracle
To verify the path is correct, if you run ls, you should see something like
ls $ORACLE_HOME
adrci.exe genezi.exe oci.sym ociw32.dll ojdbc6.jar
oraocci11.dll oraociei11.sym uidrvci.exe vc9
adrci.sym genezi.sym ocijdbc11.dll ociw32.sym orannzsbb11.dll
oraocci11.sym orasql11.dll uidrvci.sym xstreams.jar
BASIC_README oci.dll ocijdbc11.sym ojdbc5.jar orannzsbb11.sym
oraociei11.dll orasql11.sym vc8

Categories