I am having trouble importing tika in a python file. I have spent a lot of time googling and been unable to find anything. Here is the iPython command: import tika, and subsequent stack trace.
It occurs to me that there could be a problem with a module on which tika is dependent, such as requests, or urllib3. However, when I try to install those with pip, it says requirement already satisfied. I have also double checked the PYTHONHOME director, and I'm 99% sure it's correct.
$ ipython
Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.
IPython 4.2.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
WARNING: Readline services not available or not loaded.
WARNING: Proper color support under MS Windows requires the pyreadline library.
You can find it at:
http://ipython.org/pyreadline.html
Defaulting color scheme to 'NoColor'
In [1]: import tika
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
C:\cygwin64\lib\python3.6\site-packages\requests\packages\__init__.py in <module>()
26 try:
---> 27 from . import urllib3
28 except ImportError:
ImportError: cannot import name 'urllib3'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-1-9f3de0ba3e70> in <module>()
----> 1 import tika
C:\cygwin64\lib\python3.6\site-packages\tika\tika.py in <module>()
18
19 try:
---> 20 __import__('pkg_resources').declare_namespace(__name__)
21 except ImportError:
22 from pkgutil import extend_path
C:\cygwin64\lib\python3.6\site-packages\pkg_resources\__init__.py in declare_namespace(packageName)
2161 # Ensure all the parent's path items are reflected in the child,
2162 # if they apply
-> 2163 _handle_ns(packageName, path_item)
2164
2165 finally:
C:\cygwin64\lib\python3.6\site-packages\pkg_resources\__init__.py in _handle_ns(packageName, path_item)
2096 path = module.__path__
2097 path.append(subpath)
-> 2098 loader.load_module(packageName)
2099 _rebuild_mod_path(path, packageName, module)
2100 return subpath
C:\cygwin64\lib\python3.6\site-packages\tika\tika.py in <module>()
89 open = codecs.open
90
---> 91 import requests
92 import socket
93 import tempfile
C:\cygwin64\lib\python3.6\site-packages\requests\__init__.py in <module>()
50 # Attempt to enable urllib3's SNI support, if possible
51 try:
---> 52 from .packages.urllib3.contrib import pyopenssl
53 pyopenssl.inject_into_urllib3()
54 except ImportError:
C:\cygwin64\lib\python3.6\site-packages\requests\packages\__init__.py in <module>()
27 from . import urllib3
28 except ImportError:
---> 29 import urllib3
30 sys.modules['%s.urllib3' % __name__] = urllib3
31
C:\cygwin64\lib\python3.6\site-packages\urllib3\__init__.py in <module>()
6 import warnings
7
----> 8 from .connectionpool import (
9 HTTPConnectionPool,
10 HTTPSConnectionPool,
C:\cygwin64\lib\python3.6\site-packages\urllib3\connectionpool.py in <module>()
9
10
---> 11 from .exceptions import (
12 ClosedPoolError,
13 ProtocolError,
C:\cygwin64\lib\python3.6\site-packages\urllib3\exceptions.py in <module>()
1 from __future__ import absolute_import
----> 2 from .packages.six.moves.http_client import (
3 IncompleteRead as httplib_IncompleteRead
4 )
5 # Base Exceptions
ValueError: source code string cannot contain null bytes
In case anyone else is looking at this, here is how I finally solved my issue.
I had been mistakenly assuming that the python-tika module was a fully packaged, ready to run version of tika. In fact, you need to download the java tika server from Apache, and it must be running when you use python-tika (you can easily just run the server on localhost).
The Python-tika module then allows you to make requests to this server from your python code. I probably should have known this but for some reason I didn't pick it up in the documentation.
Are you sure you have that module installed?
If not, just go to the Command Prompt and type pip install tika
I think here is a good approach for installing tika on the Windows:
First, install java SE from this link:
https://www.oracle.com/ca-en/java/technologies/javase-downloads.html
Second, install a specific version of tika:
pip install tika==1.23
Third, download and run the tika server and tika app files from apache:
https://archive.apache.org/dist/tika/tika-server-1.23.jar
https://archive.apache.org/dist/tika/tika-app-1.23.jar
it should be fine, and you should be able to run tika in your application.
Related
I want to use a python package called mingus, but it couldn't find the FluidSynth library. However, I have already installed fluidsynth using homebrew (I'm using macOS Catalina), and it sits in the directory /usr/local/Cellar/fluid-synth/2.1.8/lib
The error messages are:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-2-6003c50a6278> in <module>
----> 1 from mingus.midi import fluidsynth
~/anaconda3/lib/python3.7/site-packages/mingus/midi/fluidsynth.py in <module>
39 import wave
40
---> 41 from mingus.midi import pyfluidsynth as fs
42 from mingus.midi.sequencer import Sequencer
43
~/anaconda3/lib/python3.7/site-packages/mingus/midi/pyfluidsynth.py in <module>
39 )
40 if lib is None:
---> 41 raise ImportError("Couldn't find the FluidSynth library.")
42
43 _fl = CDLL(lib)
ImportError: Couldn't find the FluidSynth library.
The code around line 41 in pyfluidsynth.py is:
from ctypes.util import find_library
import six
lib = (
find_library("fluidsynth")
or find_library("libfluidsynth")
or find_library("libfluidsynth-1")
)
if lib is None:
raise ImportError("Couldn't find the FluidSynth library.")
But I have installed the library in
>> ls -l /usr/local/Cellar/fluid-synth/2.1.8/lib/
total 688
-r--r--r-- 1 hqchen admin 350160 Apr 13 23:12 libfluidsynth.2.3.8.dylib
lrwxr-xr-x 1 hqchen admin 25 Mar 15 14:12 libfluidsynth.2.dylib -> libfluidsynth.2.3.8.dylib
lrwxr-xr-x 1 hqchen admin 21 Mar 15 14:12 libfluidsynth.dylib -> libfluidsynth.2.dylib
I think my problem roots on how to add search path in find_library? I have tried to add the path in LD_LIBRARY_PATH and LIBRARY_PATH, but neither works. I appreciate any help!!
I was using python's datashader 0.5.0 package to plot population density information, generally following the tutorial https://www.continuum.io/blog/developer-blog/analyzing-and-visualizing-big-data-interactively-your-laptop-datashading-2010-us . I installed datashader using conda install -c bokeh datashader=0.5.0.
All was fine. Though perhaps unrelated, things seemed to break as soon as I installed the haloviews and geoviews packages. After installing these additional packages, I can no longer import datashader and my once working code no longer runs. When importing datashader, I get the following error:
AttributeError: module 'snappy' has no attribute 'compress'
I am running on windows 10, anaconda python 3.5.3.
Perhaps I'm going down the wrong rabbit hole, but I thought perhaps it was the snappy package. I ran "conda install -c conda-forge snappy=1.1.4". conda list reveals that snappy is installed. Snappy does import. The snappy.compress object is not found. My issue seems related to the following SO post as I also had a fastparquet error when trying geoviews: error with snappy while importing fastparquet in python
When running import snappy, print(snappy.__filename__) gives the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-b8565733b383> in <module>()
----> 1 import snappy; print(snappy.__file__)
AttributeError: module 'snappy' has no attribute '__file__'
I also tried uninstalling through both conda and pip just in case. Still no joy.
Running "pip install python-snappy" results in a "failed building wheel for python-snappy" error preceded with " error: Microsoft Visual C++ 14.0 is required..." So I went and got the "Microsoft Visual C++ Redistributable for Visual Studio 2017" and ran it, but had no change.
Any thoughts on how to resolve this? For reference, the full error on datashader import is as follows:
--------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-3d7b1ff9e530> in <module>()
----> 1 import datashader
C:\Python\lib\site-packages\datashader\__init__.py in <module>()
3 __version__ = '0.5.0'
4
----> 5 from .core import Canvas
6 from .reductions import (count, any, sum, min, max, mean, std, var, count_cat,
7 summary)
C:\Python\lib\site-packages\datashader\core.py in <module>()
3 import numpy as np
4 from datashape.predicates import istabular
----> 5 from odo import discover
6 from xarray import DataArray
7
C:\Python\lib\site-packages\odo\__init__.py in <module>()
63 from .backends.url import URL
64 with ignoring(ImportError):
---> 65 from .backends.dask import dask
66
67
C:\Python\lib\site-packages\odo\backends\dask.py in <module>()
8
9 from dask.array.core import Array, from_array
---> 10 from dask.bag.core import Bag
11 import dask.bag as db
12 from dask.compatibility import long
C:\Python\lib\site-packages\dask\bag\__init__.py in <module>()
1 from __future__ import absolute_import, division, print_function
2
----> 3 from .core import (Bag, Item, from_sequence, from_url, to_textfiles, concat,
4 from_delayed, map_partitions, bag_range as range,
5 bag_zip as zip, bag_map as map)
C:\Python\lib\site-packages\dask\bag\core.py in <module>()
30
31 from ..base import Base, normalize_token, tokenize
---> 32 from ..bytes.core import write_bytes
33 from ..compatibility import apply, urlopen
34 from ..context import _globals, defer_to_globals
C:\Python\lib\site-packages\dask\bytes\__init__.py in <module>()
2
3 from ..utils import ignoring
----> 4 from .core import read_bytes, open_files, open_text_files
5
6 from . import local
C:\Python\lib\site-packages\dask\bytes\core.py in <module>()
7 from warnings import warn
8
----> 9 from .compression import seekable_files, files as compress_files
10 from .utils import (SeekableFile, read_block, infer_compression,
11 infer_storage_options, build_name_function)
C:\Python\lib\site-packages\dask\bytes\compression.py in <module>()
30 with ignoring(ImportError):
31 import snappy
---> 32 compress['snappy'] = snappy.compress
33 decompress['snappy'] = snappy.decompress
34
AttributeError: module 'snappy' has no attribute 'compress'
It turns out that in adding packages, something messed up the snappy install. I followed this solution: How to install snappy C libraries on Windows 10 for use with python-snappy in Anaconda?
It was a snappy error, not a datashader issue, but I'll leave the post in case anyone has the same series of issues.
In Python 3.5.2 importing asyncio raises an ImportError.
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
Type "copyright", "credits" or "license" for more information.
IPython 5.2.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import asyncio
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-dc80feba2326> in <module>()
----> 1 import asyncio
/usr/lib/python3.5/asyncio/__init__.py in <module>()
19
20 # This relies on each of the submodules having an __all__ variable.
---> 21 from .base_events import *
22 from .coroutines import *
23 from .events import *
/usr/lib/python3.5/asyncio/base_events.py in <module>()
16
17 import collections
---> 18 import concurrent.futures
19 import heapq
20 import inspect
/usr/lib/python3.5/concurrent/futures/__init__.py in <module>()
15 wait,
16 as_completed)
---> 17 from concurrent.futures.process import ProcessPoolExecutor
18 from concurrent.futures.thread import ThreadPoolExecutor
/usr/lib/python3.5/concurrent/futures/process.py in <module>()
50 from concurrent.futures import _base
51 import queue
---> 52 from queue import Full
53 import multiprocessing
54 from multiprocessing import SimpleQueue
ImportError: cannot import name 'Full'
This is the output of pip freeze:
aiohttp==1.3.3
appdirs==1.4.0
async-timeout==1.1.0
chardet==2.3.0
decorator==4.0.11
ipython==5.2.2
ipython-genutils==0.1.0
multidict==2.1.4
numpy==1.12.0
packaging==16.8
pexpect==4.2.1
pickleshare==0.7.4
prompt-toolkit==1.0.13
ptyprocess==0.5.1
Pygments==2.2.0
pyparsing==2.1.10
scipy==0.18.1
simplegeneric==0.8.1
six==1.10.0
traitlets==4.3.1
wcwidth==0.1.7
yarl==0.9.8
How can I import asyncio in such case?
Check if your project or current working directory includes a file called queue.py and rename it. (According to your comment, it is the file /home/gianluca/git/python/asyncio/queue.py.)
Interestingly, on my machine (OSX) importing the package hgvs runs smoothly, even though I'm working on python 2.7 (and ExtendedInterpolation is a python 3 function from configparser). As far as what I've gathered, it simply uses a backport of Python 3's configparser module, so it should work anyways if Python 3 is installed.
However, the following error occurs when I try to import the module on a EC2 instace using iPython Notebook.
Any ideas of what may be causing the issue?
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-60-832dbede7fbb> in <module>()
----> 1 import hgvs.location
/usr/local/lib/python2.7/dist-packages/hgvs/hgvs/__init__.py in <module>()
57 import warnings
58
---> 59 from .config import global_config # flake8: noqa; importing symbol
60
61 logger = logging.getLogger(__name__)
/usr/local/lib/python2.7/dist-packages/hgvs/hgvs/config.py in <module>()
22 from __future__ import absolute_import, division, print_function, unicode_literals
23
---> 24 from configparser import ConfigParser, ExtendedInterpolation
25 from copy import copy
26 import logging
ImportError: cannot import name ExtendedInterpolation
According to this the problem is that:
Unfortunately in 0.14.0 they broke the configparser import by introducing their own.
To resolve it, I downgraded the future library:
pip install future==0.13.1
I am trying to set up a Jupyter notebook server at home. It has taken me a long time, but I have build and installed Python 3.4 and all the required packages from FreeBSD ports successfully. The notebook server is up and running fine, except every time when I try to import numpy:
In[1]: import numpy
The following errors occur:
ImportError Traceback (most recent call last)
<ipython-input-1-5a0bd626bb1d> in <module>()
----> 1 import numpy
/usr/local/lib/python3.4/site-packages/numpy/__init__.py in <module>()
178 return loader(*packages, **options)
179
--> 180 from . import add_newdocs
181 __all__ = ['add_newdocs',
182 'ModuleDeprecationWarning',
/usr/local/lib/python3.4/site-packages/numpy/add_newdocs.py in <module>()
11 from __future__ import division, absolute_import, print_function
12
---> 13 from numpy.lib import add_newdoc
14
15 ###############################################################################
/usr/local/lib/python3.4/site-packages/numpy/lib/__init__.py in <module>()
6 from numpy.version import version as __version__
7
----> 8 from .type_check import *
9 from .index_tricks import *
10 from .function_base import *
/usr/local/lib/python3.4/site-packages/numpy/lib/type_check.py in <module>()
9 'common_type']
10
---> 11 import numpy.core.numeric as _nx
12 from numpy.core.numeric import asarray, asanyarray, array, isnan, \
13 obj2sctype, zeros
/usr/local/lib/python3.4/site-packages/numpy/core/__init__.py in <module>()
12 os.environ[envkey] = '1'
13 env_added.append(envkey)
---> 14 from . import multiarray
15 for envkey in env_added:
16 del os.environ[envkey]
ImportError: /lib/libgcc_s.so.1: version GCC_4.6.0 required by /usr/local/lib/gcc48/libgfortran.so.3 not found
The error messages for importing pandas and matplotlib are different, but I suspect that has something to do with this numpy import error.
Strangely, all 3 packages work fine in Python and IPython consoles with no problems at all!
I have googled and made the following attempts:
delete and reinstall numpy -> no change
append numpy directory to sys.path -> no change
install a lot of other external packages just to see if it's only related to numpy -> they are all working fine in both consoles and notebook, except scipy giving some error related to numpy
Thank you for your help!
My gcc is version 4.2.1.
I have fixed this by setting the LD_LBRARY_PATH to /usr/local/lib/gcc48. gcc48 is already installed in my system.
To avoid setting the path every time, I've added the following line to /.cshrc:
setenv LD_LIBRARY_PATH /usr/local/lib/gcc48
edit:
This won't work is you want to start the notebook server automatically by adding to crontab:
#reboot /usr/local/bin/jupyter-notebook
the same error appears when trying to import numpy and modules depending on numpy
I fixed this by making a copy of /usr/local/bin/jupyter-notebook and added the following lines:
import sys
import re
----------------- add these 2 lines below --------------
import os
os.environ['LD_LIBRARY_PATH'] = '/usr/local/lib/gcc48'
....
Add the new file to crontab instead of jupyter-notebook.
The issue is not with your python modules. The error message at the bottom, where it says ImportError: /lib/libgcc_s.so.1: version GCC_4.6.0 required by /usr/local/lib/gcc48/libgfortran.so.3 not found indicates that it's a dependency error with the Fortran library. Apparently it wants gcc 4.6 or higher, and apparently you have a lower version installed. Not being familiar with Python libraries or your setup, my guess is that it could be an issue with /usr/ports/devel/py-fortran. I would recommend checking the gcc version on your machine with gcc -v and whatever fortran-related ports you have installed with pkg info and then take it from there.