Trying to pass extracted text from Tesseract OCR to custom google search

Trying to pass extracted text from Tesseract OCR to custom google search - python

Having some trouble with a project and hopefully someone can help! I'm trying to take extracted text from tesseract OCR and use that text as the search query of Google Chrome searches. My shell script can extract the text and launch Chrome, but I cant figure out how to send the text to the searchbar of chrome. Below are some pictures of my script. I'm extremely new to coding, so any help is appreciated.
Shell script
echo "Realtime Screen OCR"
while true
do
echo "Waiting for trigger"
read
screencapture -R31,205,420,420 screens.png
tesseract screens.png ocr
OCR=`cat ocr.txt`
python3 launch1.py $OCR
##/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome $OCR
echo "Opened Chrome...waiting for next question"
done
Python Script
import urllib.parse
search_query = input("enter search query")
query_encoded = urllib.parse.quote_plus(search_query)
google_search_url = "http://www.google.com/search?q=" +
format(query_encoded)
import webbrowser
webbrowser.open(google_search_url)

It appears to me that the only step you're missing is grabbing the text in your Python code and passing it into the search query. You can do this using sys. In this example we replace the user-inputted text with retrieving the argument from the command line, assuming that this is what you're trying to do.
import urllib.parse
import sys
search_query = sys.argv[1]
query_encoded = urllib.parse.quote_plus(search_query)
google_search_url = "http://www.google.com/search?q=" +
format(query_encoded)
import webbrowser
webbrowser.open(google_search_url)

Related

Passing a value form python file to the iOS app

I need to create and iOS app that is basically a mobile GitHub repository browser. A user passes their input and they get the search results for the repositories.
I have tried approaching it from the basic http point of view, but to na avail (check out this question):
How to print my http query results in Xcode?
And now with support from #HedgeHog I have found a piece of code which does exactly what the backend to my app should do; but is written in Python:
import requests, sys, webbrowser, bs4
print('Your GitHub repository search query:')
userInput = input()
results = requests.get('https://github.com/search?q=' + userInput + '&type=repositories'
+ ' '.join(sys.argv[1:]))
results.raise_for_status()
soup = bs4.BeautifulSoup(results.text, 'html.parser')
linkList = ['https://github.com/'+a['href'] for a in soup.select('.repo-list-item .f4 a[href]')]
As I have also learned, PythonKit doesn't support iOS apps.
Is there any way that I could put a *.py file in the Xcode project and that Xcode could run it as a function (pass an input from a Swift to the Python file, then receive an output to the Swift file)?

how to open html file automatically using webbrowser package in python

I have a python script that use pandas and create dataframe from csv file and i want to display the dataframe information using pandas-profiling package and show the report in the browser once the user run the function .
But the system does not open the browser and display this error:
ValueError: startfile: filepath too long for Windows
code:
def displayDfInfo(self,df):
profile = pp.ProfileReport(df)
html_profile = profile.to_html()
webbrowser.open(html_profile,new=1)
where is the error and how to fix it?

I would simplify it to this:
import webbrowser
html = "myhtml.html"
webbrowser.open(html)

Weird json value urllib python

I'm trying to manipulate a dynamic JSON from this site:
http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do
It has 3 elements, imagem, a base64, labelValorCaptcha, just a message, and uuidCaptcha, a value to pass by parameter to play a sound in this link bellow:
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_e7b072e1fce5493cbdc46c9e4738ab8a
When I enter in the first site through a browser and put in the second link the uuidCaptha after the equal ("..uuidCaptcha="), the sound plays normally. I wrote a simple code to catch this elements.
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
But I dont know what's happening, the caught value of the uuidCaptcha doesn't work. Open a error web page.
Someone knows?
Thanks!

It works for me.
$ cat a.py
#!/usr/bin/env python
# encoding: utf-8
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
$ python a.py
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_efc8d4bc3bdb428eab8370c4e04ab42c

As I said #Charlie Harding, the best way is download the page and get the JSON values, because this JSON is dynamic and need an opened web link to exist.
More info here.

webbrowser script executes without error, but nothing happens?

Im writing a script which is supposed to open different browser with given urls.
When I run it in eclipse it runs the script without errors, but no browsers open. :/
import webbrowser as wb
url_mf = ['https://www.thatsite.com/','http://thatothersite.org/']
url_gc = ['https://www.thatsite.com/','http://thatothersite.org/']
chrome = wb.get('/usr/bin/google-chrome %s')
firefox = wb.get('fierfox %s')
chrome.open(url_gc[1], new=1)
firefox.open(url_mf[1], new=1)
I also have a script using the IEC.py module to open Internet explorer (I need to enter login info and, later, extract horribly unformatted db queries from a site - mechanize & selenium seemed a bit over the top for that?), and that works just fine. But I'm guessing that's like comparing apples and oranges?
import iec
ie= iec.IEController()
ie.Navigate(url_ie[1])
Any help is very much appreciated.

First thing I noticed is the typo on line 5. It should be Firefox instead of fierfox. Second thing, I ran your code in SublimeText 2, I had no problems, I changed the paths because I'm on a windows machine.
The code below opened both Firefox and Chrome.
import webbrowser as wb
url_mf = ['https://www.thatsite.com/','http://www.google.ie/']
url_gc = ['https://www.thatsite.com/','http://www.google.ie/']
chrome = wb.get('"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" %s')
firefox = wb.get('"C:/Program Files (x86)/Mozilla Firefox/firefox.exe" %s')
chrome.open(url_gc[1], new=1)
firefox.open(url_mf[1], new=1)
Do you really want to specify which browser the program wants to use ?, I'd suggest using
import webbrowser as wb
urls = ["http://www.google.ie/","http://www.gametrailers.com/"]
for url in urls:
wb.open(url,new=2, autoraise=True)
This would just get your default browser and open each of the links in new tabs.

Python script for "Google search by image"

I have checked Google Search API's and it seems that they have not released any API for searching "Images". So, I was wondering if there exists a python script/library through which I can automate the "search by image feature".

This was annoying enough to figure out that I thought I'd throw a comment on the first python-related stackoverflow result for "script google image search". The most annoying part of all this is setting up your proper application and custom search engine (CSE) in Google's web UI, but once you have your api key and CSE, define them in your environment and do something like:
#!/usr/bin/env python
# save top 10 google image search results to current directory
# https://developers.google.com/custom-search/json-api/v1/using_rest
import requests
import os
import sys
import re
import shutil
url = 'https://www.googleapis.com/customsearch/v1?key={}&cx={}&searchType=image&q={}'
apiKey = os.environ['GOOGLE_IMAGE_APIKEY']
cx = os.environ['GOOGLE_CSE_ID']
q = sys.argv[1]
i = 1
for result in requests.get(url.format(apiKey, cx, q)).json()['items']:
link = result['link']
image = requests.get(link, stream=True)
if image.status_code == 200:
m = re.search(r'[^\.]+$', link)
filename = './{}-{}.{}'.format(q, i, m.group())
with open(filename, 'wb') as f:
image.raw.decode_content = True
shutil.copyfileobj(image.raw, f)
i += 1

There is no API available but you are can parse the page and imitate the browser, but I don't know how much data you need to parse because google may limit or block access.
You can imitate the browser by simply using urllib and setting correct headers, but if you think parsing complex web-pages may be difficult from python, you can directly use a headless browser like phontomjs, inside a browser it is trivial to get correct elements using javascript/DOM
Note before trying all this check google's TOS

You can try this:
https://developers.google.com/image-search/v1/jsondevguide#json_snippets_python
It's deprecated, but seems to work.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to pass extracted text from Tesseract OCR to custom google search - python

Related

Passing a value form python file to the iOS app

how to open html file automatically using webbrowser package in python

Weird json value urllib python

webbrowser script executes without error, but nothing happens?

Python script for "Google search by image"

Categories

Resources