Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed yesterday.
Improve this question
How can I use celery thats its expire time is 60sec using celery?
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
def weather(cities):
results = []
for city in cities:
res = requests.get(f'https://www.google.com/search?q={city} weather&oq={city} weather&aqs=chrome.0.35i39l2j0l4j46j69i60.6128j1j7&sourceid=chrome&ie=UTF-8', headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
weather = soup.select('#wob_tm')[0].getText().strip()
results.append({city: weather})
return results
cities = ["tehran" , "Mashhad","Shiraaz","Semirom","Ahvaz","zahedan","baghdad","van","herat","sari"]
weather_data = weather(cities)
print(weather_data)
def temporary_city(city):
res = requests.get(f'https://www.google.com/search?q={city} weather&oq={city} weather&aqs=chrome.0.35i39l2j0l4j46j69i60.6128j1j7&sourceid=chrome&ie=UTF-8', headers=headers)
return res
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
In a HTML file, I have a tag that includes <source type="audio/mpeg" src="/us/media in that, and extract src element from that using bs4?
Here is the desired output:
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'}
res = requests.get('https://dictionary.cambridge.org/us/dictionary/english/vulnerable', headers = headers)
soup = BeautifulSoup(res.content, 'html.parser')
srcs = soup.select('source[src*="us/media"]')
for src in srcs:
try:
print(src['src'])
except:
pass
Output:
/us/media/english/us_pron/v/vul/vulne/vulnerable.mp3
/us/media/english/us_pron_ogg/v/vul/vulne/vulnerable.ogg
/us/media/english/uk_pron/u/ukv/ukvor/ukvorte027.mp3
/us/media/english/uk_pron_ogg/u/ukv/ukvor/ukvorte027.ogg
/us/media/english/us_pron/v/vul/vulne/vulnerable.mp3
/us/media/english/us_pron_ogg/v/vul/vulne/vulnerable.ogg
/us/media/english/us_pron/e/eus/eus74/eus74904.mp3
/us/media/english/us_pron_ogg/e/eus/eus74/eus74904.ogg
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am trying to scrape houzz website
In browser dev tools it shows HTML content. But when I scrape it with beautifulsoup, it returns something else together with some of the html, I do not have much knowledge on this.
A little part of what I get is as follows.
</div><style data-styled="true" data-styled-version="5.2.1">.fzynIk.fzynIk{box-sizing:border-box;margin:0;overflow:hidden;}/*!sc*/
.eiQuKK.eiQuKK{box-sizing:border-box;margin:0;margin-bottom:4px;}/*!sc*/
.chJVzi.chJVzi{box-sizing:border-box;margin:0;margin-left:8px;}/*!sc*/
.kCIqph.kCIqph{box-sizing:border-box;margin:0;padding-top:32px;padding-bottom:32px;border-top:1px solid;border-color:#E6E6E6;}/*!sc*/
.dIRCmF.dIRCmF{box-sizing:border-box;margin:0;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-box-pack:justify;-webkit-justify-content:space-between;-ms-flex-pack:justify;justify-content:space-between;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;margin-bottom:16px;}/*!sc*/
.kmAORk.kmAORk{box-sizing:border-box;margin:0;margin-bottom:24px;}/*!sc*/
.bPERLb.bPERLb{box-sizing:border-box;margin:0;margin-bottom:-8px;}/*!sc*/
What should I do with this? Is not this achievable with beautfulsoup?
Developer Tools operate on a live browser DOM, what you’ll see when inspecting the page source is not the original HTML, but a modified one after applying some browser clean up and executing JavaScript code.
Requests is not executing JavaScript so content can deviate slightly, but you can scrape - Just take a deeper look into your soup.
Example (project titles)
from bs4 import BeautifulSoup
import requests
url_news = " https://www.houzz.com.au/professionals/home-builders/turrell-building-pty-ltd-pfvwau-pf~1099128087"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
response = requests.get(url_news, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
[title.text for title in soup.select('#projects h3')]
Output
['Major Renovation & Master Wing',
'"The Italian Village" Private Residence',
'Country Classic',
'Residential Resort',
'Resort Style Extension, Stone and Timber',
'Old Northern Rd Estate']
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Want to make a python code, so that if the user inputs his search words in IDE it should take the user to google website.
Try:
import webbrowser # the module to open url
sear=input("Enter your search: ").strip().replace(' ','+') # converting search input to a perfect query
URL='https://google.com/search?q=' + sear #creating the url
webbrowser.open_new_tab(URL) # opening the url
Also sentences including some special charecters like 'Hams&Eggs' intent to cause problem. Here we use quote() from urllib:
import webbrowser # the module to open url
from urllib.parse import quote # to encode to percentage encoding
sear=input("Enter your search: ").strip() # converting search input to a perfect query
URL='https://google.com/search?q=' + quote(sear) #creating the url
webbrowser.open_new_tab(URL) # opening the url
Now, to open the first result in the google search:
import requests , webbrowser
from bs4 import BeautifulSoup
sear=input("Enter your search: ").strip().replace(' ','+')
URL='https://google.com/search?q=' + sear
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
headers = {"user-agent": USER_AGENT}
resp = requests.get(URL,headers = headers)
soup = BeautifulSoup(resp.content, "html.parser")
webbrowser.open_new_tab(soup.find_all('div',class_='r')[0].find('a')['href'])
Now to open all result links:
import requests , webbrowser
from bs4 import BeautifulSoup
sear=input("Enter your search: ").strip().replace(' ','+')
URL='https://google.com/search?q=' + sear
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
headers = {"user-agent": USER_AGENT}
resp = requests.get(URL,headers = headers)
soup = BeautifulSoup(resp.content, "html.parser")
for i in soup.find_all('div',class_='r'):
webbrowser.open_new_tab(i.find('a')['href'])
import webbrowser
url = 'https://www.google.com/search?q=' #Search Query Url
search_words = input("Search your words") # User inputs data as per search requirements
webbrowser.open(url+search_words) # Opens the webbrowser
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
A url has this JSON which is not in standard format:
{}&& {identifier:'ID', label:'As at 08-03-2018 5:06 PM',items:[{ID:0,N:'2ndChance W200123',SIP:'',NC:'CDWW',R:'',I:'',M:'',LT:0.009,C:0.000,VL:108.200,BV:2149.900,B:'0.008',S:'0.009',SV:7218.300,O:0.009,H:0.009,L:0.008,V:873.700,SC:'5',PV:0.009,P:0.000,BL:'100',P_:'X',V_:''},{ID:1,N:'3Cnergy',SIP:'',NC:'502',R:'',I:'',M:'t',LT:0,C:0,VL:0.000,BV:50.000,B:'0.022',S:'0.025',SV:36.000,O:0,H:0,L:0,V:0.000,SC:'2',PV:0.021,P:0,BL:'100',P_:'X',V_:''},{ID:2,N:'3Cnergy W200528',SIP:'',NC:'1E0W',R:'',I:'',M:'t',LT:0,C:0,VL:0.000,BV:0,B:'',S:'0.004',SV:50.000,O:0,H:0,L:0,V:0.000,SC:'5',PV:0.002,P:0,BL:'100',P_:'X',V_:''}`
I want to make all the data into list or in pandas started from ID.
{}&& {identifier:'ID', label:'As at 08-03-2018 5:06 PM',items: is not wanted when I requested the url
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url ='AttributeError: 'http://www.sgx.com/JsonRead/JsonstData?qryId=RAll'
page = requests.get(url,headers=headers)
alldata = html.fromstring(page.content)
However, I am unable to continue as the JSON format is not standard. How to correct it?
import requests
import execjs
url = 'http://www.sgx.com/JsonRead/JsonstData?qryId=RAll'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
page = requests.get(url,headers=headers)
content = page.content[len('{}&& '):] if page.content.startswith('{}&& ') else page.content
data = execjs.get().eval(content)
print(data)
The data is JavaScript Object, in literal notation.
We can use PyExecJs to eval it and get corresponding python dict.