I am trying to get with python the first result from google after I perform a query, which is not a url.
example:
I type in google search "Arbroath uk lon lat" and it return this:
I found a piece of code that can actually bring the results but I do not know how to catch the first piece with the coordinates.
try:
from googlesearch import search
except ImportError:
print("No module named 'google' found")
# to search
query = "Arbroath uk lon lat"
for j in search(query, num=1, stop=10, pause=2):
print(j)
The above piece of code fetches these results:
https://latitude.to/map/gb/united-kingdom/cities/arbroath
https://latitude.to/articles-by-country/gb/united-kingdom/9331/arbroath
https://www.countrycoordinate.com/city-arbroath-united-kingdom/
https://www.distancesto.com/coordinates/gb/arbroath-latitude-longitude/history/8049.html
https://geloky.com/geocoding/place/Arbroath+United+Kingdom
http://www.longitude-latitude-maps.com/city/226_28,Arbroath,Scotland,United+Kingdom
https://ftp.latitudelongitude.org/gb/arbroath/
https://time.is/Arbroath
https://en.wikipedia.org/wiki/Arbroath
https://www.travelmath.com/cities/Arbroath,+United+Kingdom
whereas I actually want to capture only: "56.5591° N, 2.5915° W"
Any ideas?
Related
I followed this tutorial from this website
in order to learn how I can extract the first link from youtube based on a given query. I have implemented the code into a function like so:
import urllib.request
import re
def GetBestYoutubeLink(MusicRequest):
MusicSearchLink = MusicRequest.replace(" ","+")
MusicSearchLink = "https://www.youtube.com/results?search_query=" + MusicSearchLink
HTMLContent = urllib.request.urlopen(MusicSearchLink)
SearchResults = re.findall(r'href=\"\/watch\?v=(.{11})', HTMLContent.read().decode())
print(SearchResults)
BestLink = "http://www.youtube.com/embed/" + SearchResults[0]
return BestLink
Where a query will passed into the function and it would print the first/best url. However the problem I am facing from this solution is most of the time the SearchResults array when printed is empty and hence I am unable to get the first url. It is not like the query is an uncommon query as I had tried popular songs and videos to obtain the link of, but it simply returns as empty, however it works sometimes with the correct output of the best link. In order to find a solution to this I gave the following statement between when it prints the SearchResults array and when the BestLink variable is defined:
if SearchResults == []:
print(SearchResults)
MusicPlayer(MusicRequest)
Where if the SearchResults array is empty then it runs the function again. However it is being rerun and an empty list is being printed sometimes 20 to 30 times which is not at all efficient. I would like to understand what may the problem be behind my list returning as empty most of the time but sometimes is populated and hence am able to get the link and how may I be able to fix this?
My current python version is 3.6 and I am running on macOS Catalina.
I think the style of the query return changed since this tutorial has been written. If you print the HTMLContent.read().decode() you can see that the URLs are in form "url":"/watch?v=0755SXCTCN0"
I changed your code, you also had a search_results[0] which doesn't exist.
import urllib.request
import re
def GetBestYoutubeLink(MusicRequest):
MusicSearchLink = MusicRequest.replace(" ","+")
MusicSearchLink = "https://www.youtube.com/results?search_query=" + MusicSearchLink
HTMLContent = urllib.request.urlopen(MusicSearchLink)
SearchResults = re.findall(r'/watch\?v=(.{11})', HTMLContent.read().decode())
print(SearchResults)
BestLink = "http://www.youtube.com/embed/" + SearchResults[0]
return BestLink
I am using python 2.7 with wikipedia package to retrieve the text from multiple random wikipedia pages as explained in the docs.
I use the following code
def get_random_pages_summary(pages = 0):
import wikipedia
page_names = [wikipedia.random(1) for i in range(pages)]
return [[p,wikipedia.page(p).summary] for p in page_names]
text = get_random_pages_summary(50)
and get the following error
File
"/home/user/.local/lib/python2.7/site-packages/wikipedia/wikipedia.py",
line 393, in __load raise DisambiguationError(getattr(self, 'title',
page['title']), may_refer_to)
wikipedia.exceptions.DisambiguationError: "Priuralsky" may refer to:
Priuralsky District Priuralsky (rural locality)
what i am trying to do is to get the text. from random pages in Wikipedia, and I need it to be just regular text, without any markdown
I assume that the problem is getting a random name that has more than one option when searching for a Wikipedia page.
when i use it to get one Wikipedia page. it works well.
Thanks
As you're doing it for random articles and with a Wikipedia API (not directly pulling the HTML with different tools) my suggestion would be to catch the DisambiguationError and re-random article in case this happens.
def random_page():
random = wikipedia.random(1)
try:
result = wikipedia.page(random).summary
except wikipedia.exceptions.DisambiguationError as e:
result = random_page()
return result
According to the document(http://wikipedia.readthedocs.io/en/latest/quickstart.html) the error will return multiple page candidates so you need to search that candidate again.
try:
wikipedia.summary("Priuralsky")
except wikipedia.exceptions.DisambiguationError as e:
for page_name in e.options:
print(page_name)
print(wikipedia.page(page_name).summary)
You can improve your code like this.
import wikipedia
def get_page_sumarries(page_name):
try:
return [[page_name, wikipedia.page(page_name).summary]]
except wikipedia.exceptions.DisambiguationError as e:
return [[p, wikipedia.page(p).summary] for p in e.options]
def get_random_pages_summary(pages=0):
ret = []
page_names = [wikipedia.random(1) for i in range(pages)]
for p in page_names:
for page_summary in get_page_sumarries(p):
ret.append(page_summary)
return ret
text = get_random_pages_summary(50)
I am not a python geek but have tried to solve this problem using information from several answers to similar questions but none seem to really work in my case. Here it is:
I am calling a function from a python script:
Here is the function:
def getsom(X):
#some codes
try:
st = get data from site 1 using X
except:
print "not available from site 1, getting from site 2"
st = get data from site 2 using X
#some codes that depend on st
I am calling this from a python script as such:
#some codes
for yr in range(min_yr,max_yr+1):
day=1
while day<max_day:
st1 = getsom(X)
#some code that depends on st1
day+=1
This works fine when data is available on either site 1 or 2 for a particular day, but breaks down when it is unavailable on both sites for another day.
I want to be able to check for the next day if data is unavailable for a particular day for both sites. I have tried different configurations of try and except with no success and would appreciate any help on the most efficient way to do this.
Thanks!
***Edits
Final version that worked:
in the function part:
def getsom(X):
#some codes
try:
st = get data from site 1 using X
except:
print "not available from site 1, getting from site 2"
st = get data from site 2 using X
try:
st = get data from site 2 using X
except:
print "data not available from sites 1 and 2"
st=None
if st is not None:
#some codes that depend on st
In order to iterate to the next day on the script side, I had to handle the none case from the function with another try/except block:
#some codes
for yr in range(min_yr,max_yr+1):
day=1
while day<max_day:
try:
st=getsom(X)
except:
st=None
if st is not None:
#some codes that depend
As mentioned in the comments you seem like you want to catch the exception in first-level exception handler. You can do it like this:
def getsom(X):
#some codes
try:
st = get data from site 1 using X
except:
print "not available from site 1, getting from site 2"
try:
st = get data from site 2 using X
except:
print "Not available from site 2 as well."
# Here you can assign some arbitrary value to your variable (like None for example) or return from function.
#some codes that depend on st
If data is not available on neither of the sites you can assign some arbitrary value to your variable st or simply return from the function.
Is this what you are looking for?
Also, you shouldn't simply write except without specifying the type of exception you expect - look here for more info: Should I always specify an exception type in `except` statements?
Edit to answer the problem in comment:
If you have no data about certain day you can just return None and handle it like this:
#some codes
for yr in range(min_yr,max_yr+1):
day=1
while day<max_day:
st1 = getsom(X)
if st1 is not None:
#some code that depends on st1
day+=1
Why don't you create a separate function for it?
def getdata(X):
for site in [site1, site2]: # possibly more
try:
return get_data_from_site_using_X()
except:
print "not available in %s" % site
print "couldn't find data anywhere"
Then getsom becomes:
def getsom(X):
#some codes
st = getdata(X)
#some codes that depend on st
Have been trying to use this xgoogle to search for pdfs on the internet.. the problem am having is that if i search for "Medicine:pdf" the first page returns to me is not the first page google returns,i.e if i actually use google.... dont know whats wrong here is ma code
try:
page = 0
gs = GoogleSearch(searchfor)
gs.results_per_page = 100
results = []
while page < 2:
gs.page=page
results += gs.get_results()
page += 1
except SearchError, e:
print "Search failed: %s" % e
for res in results:
print res.desc
if i actually use google website to search for the query the first page google display for me is :
Title : Medicine - British Council
Desc :United Kingdom medical training has a long history of excellence and of ... Leaders in medicine throughout the world have received their medical education.
Url : http://www.britishcouncil.org/learning-infosheets-medicine.pdf
But if I used my python Xgoogle Search I get :
Python OutPut
Descrip:UCM175757.pdf
Title:Medicines in My Home: presentation for students - Food and Drug ...
Url:http://www.fda.gov/downloads/Drugs/ResourcesForYou/Consumers/BuyingUsingMedicineSafely/UnderstandingOver-the-CounterMedicines/UCM175757.pdf
I noticed it is difference between using xgoogle and using google in browser. I have no idea why, but you could try the google custom search api. The google custom search api may give you more close result and no risk of banned from google(if you use xgoogle to many times in one short period, you have an error return instead of search result).
first you have to register and enable your custom search in google to get key and cx
https://www.google.com/cse/all
the api format is:
'https://www.googleapis.com/customsearch/v1?key=yourkey&cx=yourcx&alt=json&q=yourquery'
customsearch is the google function you want to use, in your case I think it is customsearch
v1 is the version of you app
yourkey and yourcx are provided from google you could find it on you dashboard
yourquery is the term you want to search, in your case is "Medicine:pdf"
json is the return format
example return the first 3 pages of google custom search results:
import urllib2
import urllib
import simplejson
def googleAPICall():
userInput = urllib.quote("global warming")
KEY = "##################" # get yours
CX = "###################" # get yours
for i in range(0,3):
index = i*10+1
url = ('https://scholar.googleapis.com/customsearch/v1?'
'key=%s'
'&cx=%s'
'&alt=json'
'&q=%s'
'&num=10'
'&start=%d')%(KEY,CX,userInput,index)
request = urllib2.Request(url)
response = urllib2.urlopen(request)
results = simplejson.load(response)
I use twitter-python search API to retrieve search results like below
import twitter
api = twitter.Api()
i = 1
result = api.GetSearch("Avatar", page=i)
print [s.text for s in result]
The code above means I want to get the result on first page returned. I tried multiple assignment of i they all work. But I don't know what is the maxium value of i can I assign. Any idea?
The maximum value is going to depend on the query, and how many pages twitter feels like giving you.
What about using try/except?
import twitter
api = twitter.Api()
# try a large page number:
i = 181
try:
result = api.GetSearch("Avatar", page=i)
print [s.text for s in result]
except:
print 'page is out of range'