I am trying to generate some link.
NOTE: THERE IS PROBLEM WITH return vs print.
when i write the code with with return, it is only return one linK:
run this code:
import requests
import re
wikiurl = 'https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States'
state_pat = re.compile(r'title=\"(\w+)\">')
def get_page_content(url):
response = requests.get(url)
return response.text
def link_generator(wikiurl):
content = get_page_content(wikiurl)
names = state_pat.findall(content)
for i in names:
return 'https://www.local.com/business/results/listing.cfm?s=tile+and+grout+cleaning&ar=' + i + '%2CNY&gsp=ZFZWU1RaU09zWGNYdjFEV1l2ZHFLNVZUUFRPT3c3a21lbFVCbERQOU5VS3p6ai9DRXNMa29PcVZ0ZVV0TXZLM01wUVFUUHZYK2lrMnB5VGJyMHZJeUNoK1dXaUoxZ1NKT3AxbVlJOGN1aVBEb1NRMzlCemdDVHh5aGd3eU5DYUpKWDRtNFVQR0llOFJibUhQR3pSV3ppWFR4ekJoRVltL29UdFQ0MW9KUS9IenJrcjVBMUt3bkErRnlSVnFjRnZ0TjhRWEdET0FuZWRVUGNkemdxUlkzOUYyUjZXbHBzQWRMY3hEUTY4WmtnYkRsSkEvazBrVVY5d0NmSVVMaWp0WnNDNmFsZFNzMitWeHZDYTg2YmJwRGQzSisvOUJaYWNBaFdUd21LaWJpNk9veS9OT1N1VE5DV3RUNDIxdkY5NmZ4bWFVcWtLc1BlVkNRNlEvSG4ydER1T1ZkcXk4Um5BWU5kUU9UZnVOUE9BPQ%253D%253D&lwfilter=&wsrt=&wpn='
a = link_generator(wikiurl)
print(a)
and if i run this code adding a print into fuction, it returns all the link, why? i need all the link with return
run this code: you will see different:
import requests
import re
wikiurl = 'https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States'
state_pat = re.compile(r'title=\"(\w+)\">')
def get_page_content(url):
response = requests.get(url)
return response.text
def link_generator(wikiurl):
content = get_page_content(wikiurl)
names = state_pat.findall(content)
for i in names:
print('https://www.local.com/business/results/listing.cfm?s=tile+and+grout+cleaning&ar=' + i + '%2CNY&gsp=ZFZWU1RaU09zWGNYdjFEV1l2ZHFLNVZUUFRPT3c3a21lbFVCbERQOU5VS3p6ai9DRXNMa29PcVZ0ZVV0TXZLM01wUVFUUHZYK2lrMnB5VGJyMHZJeUNoK1dXaUoxZ1NKT3AxbVlJOGN1aVBEb1NRMzlCemdDVHh5aGd3eU5DYUpKWDRtNFVQR0llOFJibUhQR3pSV3ppWFR4ekJoRVltL29UdFQ0MW9KUS9IenJrcjVBMUt3bkErRnlSVnFjRnZ0TjhRWEdET0FuZWRVUGNkemdxUlkzOUYyUjZXbHBzQWRMY3hEUTY4WmtnYkRsSkEvazBrVVY5d0NmSVVMaWp0WnNDNmFsZFNzMitWeHZDYTg2YmJwRGQzSisvOUJaYWNBaFdUd21LaWJpNk9veS9OT1N1VE5DV3RUNDIxdkY5NmZ4bWFVcWtLc1BlVkNRNlEvSG4ydER1T1ZkcXk4Um5BWU5kUU9UZnVOUE9BPQ%253D%253D&lwfilter=&wsrt=&wpn=')
a = link_generator(wikiurl)
print(a)
When you issue a return statement in a function it doesn't execute any further lines and returns to its caller. If you want to iteratively return items in a generator you can replace return with yield. Alternatively collect the results as a list and return the list.
You then need to change your final line when you're calling this to:
a = list(link_generator(wikiurl))
to unpack your generator
Related
try to print LTP data for more than one crypto in live market but printing only for one crypto.
import pandas as pd
import requests
import json
ltp_data= []
crypto = {"BTCUSDT", "LTCUSDT", "DOGEUSDT"}
def live_ltp():
for i in crypto:
key = "https://api.binance.com/api/v3/ticker/price?symbol="
url = key+i
response = requests.get(url)
Ltp = response.json()
ltp_data.append(Ltp)
return Ltp
while True:
print(str(live_ltp()))
return will exit your loop as soon as it is hit. If you bring your return statement outside of the loop, and have it return ltp_data (instead of the "LTP" json object) you should be able to get the items in the list you appear to be populating.
ltp_data= []
crypto = {"BTCUSDT", "LTCUSDT", "DOGEUSDT"}
def live_ltp():
for i in crypto:
key = "https://api.binance.com/api/v3/ticker/price?symbol="
url = key+i
response = requests.get(url)
Ltp = response.json()
ltp_data.append(Ltp)
return ltp_data
crypto_ltps = live_ltp()
print(crypto_ltps)
You have added the return statement at the end of loop because of which it's executing only one time and returning only 1 data.
Instead,
import pandas as pd
import requests
import json
ltp_data= []
crypto = {"BTCUSDT", "LTCUSDT", "DOGEUSDT"}
def live_ltp():
responses = []
for i in crypto:
key = "https://api.binance.com/api/v3/ticker/price?symbol="
url = key+i
response = requests.get(url)
Ltp = response.json()
ltp_data.append(Ltp)
responses.append(Ltp)
return responses
while True:
print(str(live_ltp()))
This will solve the problem.
Hope this helps you!!!
Please free to comment if you get any error in this and mark the answer as correct if it worked.
You have a return Ltp in the for loop so you will always just get a single response for the first item in the set of crypto id's. You could instead do return lpd_data after the loop ends. But that creates a new problem - since you are updating a global list, it will just keep growing and growing.
Instead, write your function to take input parameters and return a locally-generated list.
import pandas as pd
import requests
import json
def live_ltp(crypto_ids):
ltp_data = []
for i in crypto_ids:
key = "https://api.binance.com/api/v3/ticker/price?symbol="
url = key+i
response = requests.get(url)
Ltp = response.json()
ltp_data.append(Ltp)
return ltp_data
crypto = {"BTCUSDT", "LTCUSDT", "DOGEUSDT"}
while True:
print(str(live_ltp(crypto)))
solution with dataframe in place.
you will need to pass empty dataframe to function: live_ltp(df_frame).
I would also use .json_normalize to set table in place properly.
import pandas as pd
import requests
import json
ltp_data = pd.DataFrame() # empty dataframe (not list) which will be updated in the function below
crypto = {"BTCUSDT", "LTCUSDT", "DOGEUSDT"}
def live_ltp(df_frame):
for i in crypto:
key = "https://api.binance.com/api/v3/ticker/price?symbol="
url = key+i
response = requests.get(url)
Ltp = response.json()
ltp_df = pd.json_normalize(Ltp)
ltp_df['time'] = pd.Timestamp.now()
df_frame = pd.concat([df_frame, ltp_df], axis=0)
return df_frame
while True:
final_df = live_ltp(ltp_data) # passing empty dataframe to function
final_df.to_excel('test.xlsx', index=False)
print(final_df)
I used code from a YouTube video :
import json
import re
import requests
class Helper:
def __init__(self):
pass
def id_from_url(self, url: str):
return url.rsplit("/", 1)[1]
class YouTubeStats:
def __init__(self, url: str):
#self.json_url = urllib.request.urlopen(url)
self.json_url = requests.get(url)
self.data = json.loads(self.json_url.text)
def print_data(self):
print(self.data)
def get_video_title(self):
return self.data["items"][0]["snippet"]["title"]
def get_video_description(self):
return self.data["items"][0]["snippet"]["description"]
api_key = "never-gonna-let-you-know"
link_file = "links.csv"
with open(link_file, "r") as f:
content = f.readlines()
content = list(map(lambda s: s.strip(), content))
content = list(map(lambda s: s.strip(','), content))
helper = Helper()
for youtube_url in content:
video_id = helper.id_from_url(youtube_url)
url = f"https://www.googleapis.com/youtube/v3/search?part=snippet&channelId={video_id}&maxResults=1&order=date&type=video&key={api_key}"
yt_stats = YouTubeStats(url)
title = yt_stats.get_video_title()
description = yt_stats.get_video_description()
print(title)
and when ever i use it keeps on showing funky characters (') instead of an apostrophe.
Note : this might get updated and fix itself since its a API but please use ^ as a reference also my API key might break
Well NVM i figured it out i just needed to use : html.unescape() to convert
I have posted a similar question before, but after reworking the project, I've gotten here:
With two csv files (new.csv, scrapers.csv) -
new.csv contains a single column:
'urls' = whole URLs
scrapers.csv contains two columns:
'scraper_dom' = A simplification of specific URL domains
'scraper_id' = An associated scraper_id that is used to import URLs to a separately managed database
Question
My goal here is to iterate through new.csv (parsing out fnetloc using urlparse) and perform a lookup on scrapers.csv to return a set of matching 'scraper_id' given a set of 'urls' (the way a VLOOKUP would work, or a JOIN in SQL), once urlparse does it's thing to isolate the netloc within the URL (the result of fnetloc).
My next big issue is that urlparse does not parse the URLs (from new.csv) to the exact simplification found in the scrapers.csv file, so I'd be reliant on a sort of partial match until I can figure out the regular expressions to use for that part of it.
I've imported pandas because previous attempts found me creating DataFrames and performing a pd.merge but I couldn't get that to work either...
Current code, commented out bits at the bottom are failed attempts, just thought I'd include what I've tried thus far.
(## are just intermediate print lines I put in to check output of the program)
import pandas as pd, re
from urllib.parse import urlparse
import csv
sd = {}
sid = {}
#INT = []
def fnetloc(any):
try:
p = urlparse(any)
return p.netloc
except IndexError:
return 'Error'
def dom(any):
try:
r = any.split(',')
return r[0]
except IndexError:
return 'Error'
def ids(any):
try:
e = any.split(',')
return e[0]
except IndexError:
return 'Error'
with open('scrapers.csv',encoding='utf-8',newline='') as s:
reader = enumerate(csv.reader(s))
s.readline()
for j, row in reader:
dict1 = dict({'scraper_dom':dom(row[0]), 'scraper_id':ids(row[1])})
sid[j + 1] = dict1
for di in sid.keys():
id = di
##print(sid[di]['scraper_dom'],sid[di]['scraper_id'])
with open('new.csv',encoding='UTF-8',newline='') as f:
reader = enumerate(csv.reader(f))
f.readline()
for i, row in reader:
dict2 = dict({'scraper_domain': fnetloc(row[0])})
sd[i + 1] = dict2
for d in sd.keys():
id = d
##print(sd[d]['scraper_domain'])
#def tryme( ):
#return filter(sd.has_key, sid)
#print(list(filter(sid, sd.keys())))
Sample of desired output.
You just need a procedure that can take a fnetloc and a list of scrapers and check to see if there is a scraper that matches that fnetloc:
def fnetloc_to_scraperid(fnetloc: str, scrapers: List[Scraper]) -> str:
try:
return next(x.scraper_id for x in scrapers if x.matches(fnetloc))
except:
return "[no scraper id found]"
I also recommend that you use some classes instead of keeping everything in csv row objects--it reduces errors in your code, in the long run, and greatly advances your sanity.
This script worked on the sample data I fed it:
import csv
from urllib.parse import urlparse
from typing import List
def fnetloc(any) -> str:
try:
p = urlparse(any)
return p.netloc
except IndexError:
return 'Error'
class Scraper:
def __init__(self, scraper_dom: str, scraper_id: str):
self.scraper_dom = scraper_dom
self.scraper_id = scraper_id
def matches(self, fnetloc: str) -> bool:
return fnetloc.endswith(self.scraper_dom)
class Site:
def __init__(self, url: str):
self.url = url
self.fnetloc = fnetloc(url)
def get_scraperid(self, scrapers: List[Scraper]) -> str:
try:
return next(x.scraper_id for x in scrapers if x.matches(self.fnetloc))
except:
return "[no scraper id found]"
sites = [Site(row[0]) for row in csv.reader(open("new.csv"))]
scrapers = [Scraper(row[0], row[1]) for row in csv.reader(open("scrapers.csv"))]
for site in sites:
print(site.url, site.get_scraperid(scrapers), sep="\t")
Now I can get the code to execute but when i call the function and pass the parameter it tells me invalid syntax. I tried with '10.1.1.27' and "10.1.1.27" as well as the below code but I can't get it to work. Any advice is appreciated.
download_permitted(10.1.1.27)
Below is the function in it's entirety
from urllib.request import urlopen
def download_permitted(address):
f=urlopen("http://"+address+"/config?action=get¶mid=eParamID_MediaState")
response = f.read()
if (response.find('"value":"1"') > -1):
return True
f = urlopen("http://" + address + "/config?action=set¶mid=eParamID_MediaState&value=1")
This is because there is no such method as urllib.urlopen(). Instead, try the following.
from urllib.request import urlopen
def download_permitted(address):
f = urlopen("http://" + address + "/config?action=get¶mid=eParamID_MediaState")
...
I wrote the following function:
def CiteParser():
with open("/tmp/content.txt") as myfile:
soup = BeautifulSoup(myfile)
for cite in soup.find_all('cite'):
print cite.string
Now I want to call it in my program like this:
result = open("/tmp/result.txt", "a+")
res = CiteParser()
result.write(str(res))
result.close()
I have also another function that appends url content to /tmp/content.txt and I put CiteParser into a loop.
But it returns always same result for me..
Am I calling CiteParser correctly? if not how is it possible?
Thank you
Instead of printing strings, you need to return it.
def CiteParser():
with open("/tmp/content.txt") as myfile:
soup = BeautifulSoup(myfile)
result = []
for cite in soup.find_all('cite'):
result.append(cite.string)
return '\n'.join(result)
Otherwise, the function return nothing; implicitly return None.