I am very new and I am getting totally stuck with recent task. I want to autorefresh stock price automatically as it is changing. I am scrapping nasdaq.com
website for actual intraday price.
I have a recent code:
import bs4 as bs
import urllib
tiker = input("zadaj ticker: ")
url = urllib.request.urlopen("http://www.nasdaq.com/symbol/"+tiker+"/real-time")
stranka = url.read()
soup = bs.BeautifulSoup(stranka, 'lxml')
print (tiker.upper())
for each in soup.find('div', attrs={'id': 'qwidget_lastsale'}):
print(each.string)
I was only able to make an infinite loop while True but i get prints in lines despite i want to change only one line as actual price is changing.
very thank you for your notes.
You can achieve it by printing "\b" to remove the previously printed string and then printing on the same line:
import bs4 as bs
import urllib
import time
import sys
tiker = input("zadaj ticker: ")
print (tiker.upper())
written_string = ''
while True:
url = urllib.request.urlopen("http://www.nasdaq.com/symbol/"+tiker+"/real-time")
stranka = url.read()
soup = bs.BeautifulSoup(stranka, 'lxml')
for each in soup.find('div', attrs={'id': 'qwidget_lastsale'}):
for i in range(len(written_string)):
sys.stderr.write("\b")
sys.stderr.write(each.string)
written_string = each.string
time.sleep(1)
Related
I am Facing issue it print two times email and phone number along with it print tele and send to also how it trim on scraping i tried but failed.
Please help me out from this.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
suit=[]
url ="https://www.allenovery.com/en-gb/global/people/Shruti-Ajitsaria"
r= requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
#time.sleep(1)
content = soup.find_all('div', class_ = 'hyperlinks')
#print(content)
for property in content:
link = property.find('a', {'class': 'tel'})['href']
email = property.find_next('a', {'class': 'mail'})['href']
print(link,email)
Use this in your code to see that how many times your loop is going to be run
print(len(content))
If you do so, you will find that the length of your loop (which is equal to length of content variable) is 2, and as you put the print(link,email) inside your loop, it will run twice and you see the printed result two times... in other words, you are printing the result 2 times. to fix that, remove the indentation for print(link,email) to put it outside the loop and it will be fixed.
Duplicate email and Telephone number because they exist more than once.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
suit=[]
url ="https://www.allenovery.com/en-gb/global/people/Shruti-Ajitsaria"
r= requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
#time.sleep(1)
tel = soup.select('div.hyperlinks > ul > li > a')[0].get('href')
email= soup.select('div.hyperlinks > ul > li > a')[1].get('href')
print(tel,email)
Output:
tel:+44 20 3088 1831 mailto:shruti.ajitsaria#allenovery.com
I created a code that constantly updates all the currency values around the world.
Currently with the code, it only displays the euro value, when it should display USD, Euro, Rupees, etc. Could anyone please tell me why this is only displaying one value?
import time
import os
import requests
from bs4 import BeautifulSoup
def refresh():
URL = "https://www.x-rates.com/table/?from=USD&amount=1"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser')
ratelist = soup.findAll("table", {"class": "ratesTable"})[0].findAll("tbody")
for tableVal in ratelist:
trList = tableVal.findAll('tr')
for trVal in trList[:6]:
print(trVal.text)
time.sleep(5)
os.system('cls')
refresh()
refresh()
Have a good day,
Bipolar Sheep
I would say your last refresh() statement should be outside of the for loops.
In your code, it refreshes after the first trVal has been printed, so it begins the refresh() statement once again. Please try this version :
import time
import os
import requests
from bs4 import BeautifulSoup
def refresh():
URL = "https://www.x-rates.com/table/?from=USD&amount=1"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser')
ratelist = soup.findAll("table", {"class": "ratesTable"})[0].findAll("tbody")
for tableVal in ratelist:
trList = tableVal.findAll('tr')
for trVal in trList[:6]:
print(trVal.text)
time.sleep(5)
os.system('cls')
refresh()
refresh()
EDIT : I also choose to drop the os.system('cls') from the loops.
I want to crawl all these movie reviews in this page.
Which part in red circle
I tried to crawl with this code. (I used Jupiter Notebook-Anaconda3)
import requests
from bs4 import BeautifulSoup
test_url = "https://movie.naver.com/movie/bi/mi/pointWriteFormList.nhn?code=174903&type=after&page=1"
resp = requests.get(test_url)
soup = BeautifulSoup(resp.content, 'html.parser')
soup
score_result = soup.find('div', {'class': 'score_result'})
lis = score_result.findAll('li')
lis[:3]
from urllib.request import urljoin #When I ran this block and next block it didn't save any reviews.
review_text=[]
#review_text = lis[0].find('p').getText()
list_soup =soup.find_all('li', 'p')
for item in list_soup:
review_text.append(item.find('p').get_text())
review_text[:5] #Nothing was saved.
As I wrote in third block and forth block nothing was saved. What is the problem?
This will get what you want. Tested in python within Jupyter Notebook (latest)
import requests
from bs4 import BeautifulSoup
from bs4.element import NavigableString
test_url = "https://movie.naver.com/movie/bi/mi/pointWriteFormList.nhn?code=174903&type=after&page=1"
resp = requests.get(test_url)
soup = BeautifulSoup(resp.content, 'html.parser')
movie_lst = soup.select_one('div.score_result')
ul_movie_lst = movie_lst.ul
for movie in ul_movie_lst:
if isinstance(movie, NavigableString):
continue
score = movie.select_one('div.star_score em').text
name = movie.select_one('div.score_reple p span').text
review = movie.select_one('div.score_reple dl dt em a span').text
print(score + "\t" + name)
print("\t" + review)
Preview
I am trying to get a product price using BeautifulSoup in python.
But i keep getting erroes, no matter what I try.
The picture of the site i am trying to web scrape
I want to get the 19,90 value.
I have already done a code to get all the product names, and now need their prices.
import requests
from bs4 import BeautifulSoup
url = 'https://www.zattini.com.br/busca?nsCat=Natural&q=amaro&searchTermCapitalized=Amaro&page=1'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
price = soup.find('span', itemprop_='price')
print(price)
Less ideal is parsing out the JSON containing the prices
import requests
import json
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.zattini.com.br/busca?nsCat=Natural&q=amaro&searchTermCapitalized=Amaro&page=1'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
scripts = [script.text for script in soup.select('script') if 'var freedom = freedom ||' in script.text]
pricesJson = scripts[0].split('"items":')[1].split(']')[0] + ']'
prices = [item['price'] for item in json.loads(pricesJson)]
names = [name.text for name in soup.select('#item-list [itemprop=name]')]
results = list(zip(names,prices))
df = pd.DataFrame(results)
print(df)
Sample output:
span[itemprop='price'] is generated by javascript. Original value stored in div[data-final-price] with value like 1990 and you can format it to 19,90 with Regex.
import re
...
soup = BeautifulSoup(page.text, 'html.parser')
prices = soup.select('div[data-final-price]')
for price in prices:
price = re.sub(r'(\d\d$)', r',\1', price['data-final-price'])
print(price)
Results:
19,90
134,89
29,90
119,90
104,90
59,90
....
So I'm trying to scrape the a box score for an NBA game from ESPN. I tried to get the names first but I'm having a difficult time getting rid of the html tags.
I've tried using
get_text(), .text(), .string_strip()
but they keep giving me errors.
Here's the code I'm working with right now.
from bs4 import BeautifulSoup
import requests
url= "http://scores.espn.com/nba/boxscore?gameId=400900407"
r = requests.get(url)
soup = BeautifulSoup(r.text,"html.parser")
name = []
for row in soup.find_all('tr')[1:]:
player_name = row.find('td', attrs={'class': 'name'})
name.append(player_name)
print(name)
Using player_name.text should work, but the problem is that sometimes row.find('td', attrs={'class': 'name'} is empty. Try like this:
if player_name:
name.append(player_name.text)
I solve this like that:
from bs4 import BeautifulSoup
import requests
url= "http://scores.espn.com/nba/boxscore?gameId=400900407"
r = requests.get(url)
soup = BeautifulSoup(r.text,"html.parser")
name = []
for row in soup.find_all('tr')[1:]:
try:
player_name = row.select('td.name span')[0].text
name.append(player_name)
except:
pass
print(name)
My code for your reference
import requests
from pyquery import PyQuery as pyq
url= "http://scores.espn.com/nba/boxscore?gameId=400900407"
r = requests.get(url)
doc = pyq(r.content)
print([h.text() for h in doc('.abbr').items()])