I am trying to use the tika package to Parse files. Tika is successfully installed, tika-server-1.18.jar runned with Code in cmd Java -jar tika-server-1.18.jar
My code in the Jupyter is:
Import tika
from tika Import parser
parsed = parser.from_file('')
However, I receive below error:
2018-07-25 10:20:13,325 [MainThread ] [WARNI] Failed to see startup
log message; retrying... 2018-07-25 10:20:18,329 [MainThread ]
[WARNI] Failed to see startup log message; retrying... 2018-07-25
10:20:23,332 [MainThread ] [WARNI] Failed to see startup log
message; retrying... 2018-07-25 10:20:28,340 [MainThread ] [ERROR]
Tika startup log message not received after 3 tries. 2018-07-25
10:20:28,340 [MainThread ] [ERROR] Failed to receive startup
confirmation from startServer.
RuntimeError: Unable to start Tika Server.
According to Apache Tika's site, all new versions of the tika-server.jar will require Java 8.
24 April 2018: Apache Tika Release
Apache Tika 1.18 has been released! This release includes bug fixes (e.g. extraction from grouped shapes in PPT), security fixes and upgrades to dependencies. PLEASE NOTE: The next versions will require Java 8. Please see the CHANGES.txt file for the full list of changes in the release and have a look at the download page for more information on how to obtain Apache Tika 1.18.
Current outdated docs for tika Python library claim that Java 7 is needed, but now Java 8 must be installed. This is because the current version of tika-server.jar is automatically downloaded at runtime if not found in your temp file.
After installing Java 8, my basic test code launched the server and worked without error.
After you import Tika you need to initialize the Java Server
import tika
tika.initVM()
from tika import parser
parsed = parser.from_file('') //file name should be here
Download Java. If you already have a version of Java installed, try updating it to the latest version. The version that works for me is 1.18.
You have not passed an argument (specified a file) in your line:
parsed = parser.from_file('')
Give it a file to chew on e.g.,
parsed = parser.from_file('myfile.txt')
The server didn't start & presumably this no log warning gets triggered - see line 644 in the source at the Github
then another error message tells you it ain't going to play...
I faced similar issue. Tried all steps mentioned here, nothing helped.
How I solved it:
checked the log file of tika and tika-server.
For windows, you can find it inside C:/Users/your_user_name/AppData/Local/Temp/
Found that tika-server log had mentioned port already in use error.
check below log snippet -
INFO: Setting the server's publish address to be http://localhost:9998/
WARNING: FAILED SelectChannelConnector#localhost:9998: java.net.BindException: Address already in use: bind
java.net.BindException: Address already in use: bind
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Unknown Source)
at sun.nio.ch.Net.bind(Unknown Source)
at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source)
at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.apache.cxf.transport.http_jetty.JettyHTTPServerEngine.addServant(JettyHTTPServerEngine.java:417)
at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.activate(JettyHTTPDestination.java:179)
at org.apache.cxf.transport.AbstractObservable.setMessageObserver(AbstractObservable.java:49)
at org.apache.cxf.binding.AbstractBindingFactory.addListener(AbstractBindingFactory.java:95)
at org.apache.cxf.jaxrs.JAXRSBindingFactory.addListener(JAXRSBindingFactory.java:88)
at org.apache.cxf.endpoint.ServerImpl.start(ServerImpl.java:123)
at org.apache.cxf.jaxrs.JAXRSServerFactoryBean.create(JAXRSServerFactoryBean.java:206)
at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:213)
This clearly indicated that another process is already running in same port. So I just needed to kill java process running on port 9998 (which I assumed might have been defunct)
Once I killed the process in task manager, I tried rerunning the python script, it worked correctly.
To cross check you can also run the tika-server.jar file present in same path - C:/Users/your_user_name/AppData/Local/Temp/ using below command and check if it fails or runs correctly: java -jar tika-server.jar
Hope this will be helpful to someone in future.
If your are using Ubuntu 20.01 (and 18.04) like me, the solution is to Install Oracle JDK 17. Do the following:
sudo add-apt-repository ppa:linuxuprising/java
sudo apt update
sudo apt install oracle-java17-installer
Type java -version on the terminal. You should see the following print-out:
java version "17.0.1" 2021-10-19 LTS`
Java(TM) SE Runtime Environment (build 17.0.1+12-LTS-39)`
Java HotSpot(TM) 64-Bit Server VM (build 17.0.1+12-LTS-39, mixed mode, sharing)
tika should then be able to extract text from your pdf in python.
parser.from_file(<your pdf file>)
Related
I am getting this error:
An error occurred initializing the application server: Failed to locate pgAdmin4.py, terminating server thread.
As it fails it will prompt to adjust the python and application path but I read an answer on Stack Overflow where the person said he deleted the path it worked for him and did so but it still gave me the same error and I don't see the prompt again.
So I went to pgAdmin official site only to see that if it fails I must enter python and application path. How can I configure the paths for the pgAmin. I am using Fedora 27.
Try to just delete the config file. You may have an old one from a previous install.
rm ~/.config/pgadmin/pgadmin4.conf
As it fails it will prompt to adjust the python and application path but read an answer on stackoverflow where the person said he deleted the path it worked for him and did so but it still gave me the same error and i don't see the prompt again
Probably your first error was actually
An error occurred initialising the application server:
Failed to launch the application server, server thread exiting.
Most likely you missing some dep like python3-flask-babelex
e.g on fedora install
sudo dnf install python3-flask-babelex
You see following error (one you mentioned) when you have misconfigured user config file. Which was created after you edited default values from prompt
An error occurred initializing the application server:
Failed to locate pgAdmin4.py, terminating server thread.
This error can be solved by either fixing your config or deleting it to use default values:
e.g. on Fedora checking that your user config is correct
vi ~/.config/pgadmin/pgadmin4.conf
Primarily check that path variables in [General] section are ok.
# example
[General]
ApplicationPath=/usr/lib/python3.6/site-packages/pgadmin4-web/
PythonPath=/usr/lib/python3.6/site-packages:/usr/lib64/python3.6/site-packages
For me, the solution was to sudo dnf remove pgadmin4* then sudo find / -iname "*pgadmin4*" and delete any scraps lying around, then sudo dnf install pgadmin4* - everything is now working fine.
I've set up Apache Mesos 0.21.2 on a virtual machine. The installation was performed by downloading the sources, compiling it and applying make install.
On another virtual machine I've copied the build directory in order to use it as a slave system.
I wanted to start with a small 'hello world' framework as showed in http://jamesporter.me/2014/11/15/hello-mesos.html
however, when I'm executing the python framework with
python hello_mesos.py
I get the following log:
I1227 19:16:02.790803 1678 sched.cpp:137] Version: 0.21.1
2015-12-27 19:16:02,790:1678(0x7f6b1e3de700):ZOO_INFO#log_env#712: Client
environment:zookeeper.version=zookeeper C client 3.4.5
...
2015-12-27
19:17:09,526:1678(0x7f6b1bf7e700):ZOO_ERROR#handle_socket_error_msg#1697:
Socket [127.0.0.1:2181] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
What can be the source that triggers this error? Is there any other way to get more information?
Thank you in advance for any hints and
with best regards
I'd recommend that you use a recent Mesos version, 0.21.2 is quite old as 0.26.0 is out. There are also precompiled packages available.
Concerning your actual problem, it appears that either
ZooKeeper is not started on the host you're trying to execute your framework on
If you're trying to reach ZK from another host, you need to set the actual IP
I am trying to control tor on ubuntu linux using python's stem library as instructed on tor's website. However, when I ran the suggested python code
from stem.control import Controller
with Controller.from_port(port = 9051) as controller:
controller.authenticate() # provide the password here if you set one
bytes_read = controller.get_info("traffic/read")
bytes_written = controller.get_info("traffic/written")
print "My Tor relay has read %s bytes and written %s." % (bytes_read, bytes_written)
I get the error:
Traceback (most recent call last):
File "littleRelay.py", line 5, in module
bytes_read = controller.get_info("traffic/read")
File "/usr/local/lib/python2.7/dist-packages/stem/control.py", line 852, in get_info
raise exc
stem.InvalidArguments: GETINFO request contained unrecognized keywords: traffic/read
So how can I get Tor relay info via python+stem on linux?
I think Tor is running fine because I started tor from the terminal and it says
[notice] Tor has successfully opened a circuit. Looks like client functionality is working.
[notice] Bootstrapped 100%: Done.
Furthermore, when I run the above python code, the terminal says
[notice] New control connection opened.
P.S. I have also tried the code on a windows pc and it worked. I'm really puzzled now.
That error indicates that Tor doesn't support the 'GETINFO traffic/read' query. This is odd - that is a feature I added to Tor back in 2011. Perhaps your copy of Tor is very, very out of date?
Problem Solved! Thank you Damian!
I uninstalled Tor on Ubuntu and install Tor again by following the detail guideline here. Now Tor works with the python code.
I'm not sure how exactly the problem arose but I suppose the problem had to do with installing Tor on Ubuntu by naively using
sudo apt-get install tor
I'm using the Arelle project to implement validation of my Xbrl files.
http://arelle.org/documentation/api-web-services/
When i try to start the webserver that i can call from my code i receive following error.
Been looking up how to fix this and it all points to disabling my antivirus. Got it disabled and i still get this error. Arelle is a Python project
This should start a webservice that i can reach on www.localhost:8082/rest/xbrl
Appearently this was an issue with the previous release of the project.
I installed the latest version 2013-07-25 and no longer had this socket error
After upgrading GAE to 1.7.6 on OS X Lion, I'm getting an error I can't resolve when I run dev_appserver.py. It was working fine in the previous version. Initially the error said I needed to install PyObjC and PIL, which I did, using pip. Now, it says can't open file '/usr/local/bin/_python_runtime.py': [Errno 2] No such file or directory. Here is the full error:
INFO 2013-04-01 23:01:15,091 sdk_update_checker.py:244] Checking for updates to the SDK.
INFO 2013-04-01 23:01:15,660 sdk_update_checker.py:272] The SDK is up to date.
INFO 2013-04-01 23:01:15,705 api_server.py:152] Starting API server at: http://localhost:50096
INFO 2013-04-01 23:01:15,721 dispatcher.py:98] Starting server "default" running at: http://localhost:8080
INFO 2013-04-01 23:01:15,759 admin_server.py:117] Starting admin server at: http://localhost:8000
/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python: can't open file '/usr/local/bin/_python_runtime.py': [Errno 2] No such file or directory
ERROR 2013-04-01 23:01:15,785 http_runtime.py:221] unexpected port response from runtime ['']; exiting the development server
INFO 2013-04-01 23:01:16,775 api_server.py:517] Applying all pending transactions and saving the datastore
INFO 2013-04-01 23:01:16,775 api_server.py:520] Saving search indexes
Exception in thread Thread-1 (most likely raised during interpreter shutdown)
I found a similar post about this here, but it was on a Windows 7 machine and it doesn't appear he's found a solution (or perhaps he did and didn't follow up). Any ideas?
Edit: It works with the GoogleAppEngineLauncher GUI but not the command line. Not sure why.
From my response to Fat Lotus, here's what worked for me:
I updated GAE Launcher again which recreated the symlinks and now it works fine. The current symlink that /usr/local/bin/_python_runtime.py links to is /Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/_python_runtime.py
I've been having this problem as well (related to a Homebrew install); I've managed to get things working by using the following:
ln -s /usr/local/Cellar/google-app-engine/1.7.5/share/google-app-engine/_python_runtime.py /usr/local/bin/_python_runtime.py
I saw the same error in regards to _python_runtime.py not being found. It was caused by not running the GoogleAppEngineLauncher by first copying the application to the local drive.
Make sure you read the error messages carefully, as I didn't read all of them at first. Running the installer from the local disk resolved this issue, at least for me.