I'm a rookie Python developer and trying to do small projects to improve myself. Nowadays, I'm developing a Pinterest bot in one of these. This simple bot pins the images in the folder to the account with the Pinterest API. The API has a maximum of 10 visual loading limits within one hour, and I don't want to limit the number of images in the file. I've tried a few things, but can't find a way without errors, because I'm inexperienced, think there's something I can't see. I would appreciate it if you could give me an idea.
I have written a simple if - else loop and each time after loading ten images in the file, it has 1-hour wait with time.sleep. The API gave a timeout error.
I've edited the loop above for 7 minutes. The API gave a timeout error.
I've tried to down time.sleep to a minute, it works well, but after ten images, the API limit has been a problem.
I've defined the code that runs the API as a function with def and placed it in the loop. I thought that it wouldn't be a problem because it would restart the API after the sleep phase with else. It pined ten images without any issues, but after the sleep back to the beginning, the API gave a timeout error.
Version with loop :
api = pinterest.Pinterest(token="")
board = ''
note = ''
link = ''
image_list = []
images = open("images.txt", "w")
for filename in glob.glob('images/*.jpg'):
image_list.append(filename)
i = 0
p = 0
while i < len(image_list):
if p <= 9 and image_list[i] not in images:
api.pin().create(board, note, link, image_list[i])
i += 1
p += 1
images.write(image_list[i])
else:
time.sleep(3600)
p = 0
continue
Version with def :
def dude() :
i = 0
api = pinterest.Pinterest(token="")
board = ''
note = ''
link = ''
api.pin().create(board, note, link, image_list[i])
time.sleep(420)
i = 0
while i < len(image_list):
dude()
i += 1
print(i)
After trying a lot of things, I was able to solve the problem with the retrying library. First I installed the library with the following code.
$ pip3 install retrying
After the installation, I changed my code as follows and the bot started working properly without any API or time error.
from retrying import retry
image_list = []
images = open("images.txt", "w")
for filename in glob.glob('images/*.jpg'):
image_list.append(filename)
#retry
def dude() :
api = pinterest.Pinterest(token="")
board = ''
note = ''
link = ''
api.pin().create(board, note, link, image_list[i])
i = 0
while i < len(image_list):
dude()
i += 1
time.sleep(420)
Related
Is there a way to restart the download if the download get stuck at XX%? I'm trying to do scraping and download quite a lot of files. I'm using the below code. It will solve connection error, but it won't restart any download if it get stuck.
for element in elements:
for attempt in range(100):
try:
wget.download(element.get_attribute("href"), path)
except:
print("attempt error, retry" + str(attempt))
else:
break
Seems there is no feature to restart the download. I looked at many examples of this package -> https://www.programcreek.com/python/example/83386/wget.download. The page for the manual is gone and the pypi.org page does have have any info about a feature like this.
However, you can restart the download simply by adding another line to the except. This code will work for you.
# Set some variables to end loop after download success
# The download loop will exit if failed 5 times
downloaded = False
attempts = 0
for element in elements:
while not downloaded and attempts < 5:
try:
wget.download(element.get_attribute("href"), path)
# Set downloaded flag to end loop
downloaded = True
except:
print("attempt error, retry" + str(attempt))
wget.download(element.get_attribute("href"), path)
attempts += 1
Another approach it to use requests library which is more popular.
import requests
def proceed_files():
# Set some variables to end loop after download success
# The download loop will exit if failed 5 times
file_urls = ['list', 'of', 'file urls']
for url in file_urls:
downloaded = False
attempts = 0
while not downloaded and attempts < 5:
if download_file(url):
downloaded = True
else:
attempts += 1
def download_file(url):
try:
request = requests.get(url, allow_redirects=True)
file_name = url.split('/')[:-1]
open(file_name, 'wb').write(request.content)
return True
except:
return False
I've got a Python program on Windows 10 that runs a loop thousands of times with multiple functions that can sometimes hang and stop the program execution. Some are IO functions, and some are selenium webdriver functions. I'm trying to build a mechanism that will let me run a function, then kill that function after a specified number of seconds and try it again if that function didn't finish. If the function completes normally, let the program execution continue without waiting for the timeout to finish.
I've looked at at least a dozen different solutions, and can't find something that fits my requirements. Many require SIGNALS which is not available on Windows. Some spawn processes or threads which consume resources that can't easily be released, which is a problem when I'm going to run these functions thousands of times. Some work for very simple functions, but fail when a function makes a call to another function.
The situations this must work for:
Must run on Windows 10
A "driver.get" command for selenium webdriver to read a web page
A function that reads from or writes to a text file
A function that runs an external command (like checking my IP address or connecting to a VPN server)
I need to be able to specify a different timeout for each of these situations. A file write should take < 2 seconds, whereas a VPN server connection may take 20 seconds.
I've tried the following function libraries:
timeout-decorator 0.5.0
wrapt-timeout-decorator 1.3.1
func-timeout 4.3.5
Here is a trimmed version of my program that includes the functions I need to wrap in a timeout function:
import csv
import time
from datetime import date
from selenium import webdriver
import urllib.request
cities = []
total_cities = 0
city = ''
city_counter = 0
results = []
temp = ''
temp2 = 'IP address not found'
driver = None
if __name__ == '__main__':
#Read city list
with open('citylist.csv') as csvfile:
readCity = csv.reader(csvfile, delimiter='\n');
for row in csvfile:
city = row.replace('\n','')
cities.append(city.replace('"',''))
#Get my IP address
try:
temp = urllib.request.urlopen('http://checkip.dyndns.org')
temp = str(temp.read())
found = temp.find(':')
found2 = temp.find('<',found)
except:
pass
if (temp.find('IP Address:') > -1):
temp2 = temp[found+2:found2]
print(' IP: [',temp2,']\n',sep='')
total_cities = len(cities)
## Open browser for automation
try: driver.close()
except AttributeError: driver = None
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-logging"])
driver = webdriver.Chrome(options=options)
#Search for links
while (city_counter < total_cities):
city = cities[city_counter]
searchTerm = 'https://www.bing.com/search?q=skate park ' + city
## Perform search using designated search term
driver.get(searchTerm)
haystack = driver.page_source
driver.get(searchTerm)
found = 0
found2 = 0
while (found > -1):
found = haystack.find('<a href=',found2)
found2 = haystack.find('"',found+10)
if (haystack[found+9:found+13] == 'http'):
results.append(haystack[found+9:found2])
city_counter += 1
driver.close()
counter = 0
while counter < len(results):
print(counter,': ',results[counter],sep='')
counter += 1
The citylist.csv file:
"Oakland, CA",
"San Francisco, CA",
"San Jose, CA"
I'm trying to build an API tool for creating 100+ campaigns at a time, but so far I keep running into timeout errors. I have a feeling it's because I'm not doing this as a batch/async request, but I can't seem to find straightforward instructions specifically for batch creating campaigns in Python. Any help would be GREATLY appreciated!
I have all the campaign details prepped and ready to go in a Google sheet, which my script then reads (using pygsheets) and attempts to create the campaigns. Here's what it looks like so far:
from facebookads.adobjects.campaign import Campaign
from facebookads.adobjects.adaccount import AdAccount
from facebookads.api import FacebookAdsApi
from facebookads.exceptions import FacebookRequestError
import time
import pygsheets
FacebookAdsApi.init(access_token=xxx)
gc = pygsheets.authorize(service_file='xxx/client_secret.json')
sheet = gc.open('Campaign Prep')
tab1 = sheet.worksheet_by_title('Input')
tab2 = sheet.worksheet_by_title('Output')
# gets range size, offsetting it by 1 to account for the range starting on row 2
row_range = len(tab1.get_values('A1', 'A', returnas='matrix', majdim='ROWS', include_empty=False))+1
# finds first empty row in the output sheet
start_row = len(tab2.get_values('A1', 'A', returnas='matrix', majdim='ROWS', include_empty=False))
def create_campaigns(row):
campaign = Campaign(parent_id=row[6])
campaign.update({
Campaign.Field.name: row[7],
Campaign.Field.objective: row[9],
Campaign.Field.buying_type: row[10],
})
c = campaign.remote_create(params={'status': Campaign.Status.active})
camp_name = c['name']
camp_id = 'cg:'+c['id']
return camp_name, camp_id
r = start_row
# there's a header so I have the range starting at 2
for x in range(2, int(row_range)):
r += 1
row = tab1.get_row(x)
camp_name, camp_id = create_campaigns(row)
# pastes the generated campaign ID, campaign name and account id back into the sheet
tab2.update_cells('A'+str(r)+':C'+str(r).format(r),[[camp_id, camp_name, row[6].rsplit('_',1)[1]]])
I've tried putting this in a try loop and if it runs into a FacebookRequestError have it do time.sleep(5) then keep trying, but I'm still running into timeout errors every 5 - 10 rows it loops through. When it doesn't timeout it does work, I guess I just need to figure out a way to make this handle big batches of campaigns more efficiently.
Any thoughts? I'm new to the Facebook API and I'm still a relative newb at Python, but I find this stuff so much fun! If anyone has any advice for how this script could be better (as well as general Python advice), I'd love to hear it! :)
Can you post the actual error message?
It sounds like what you are describing is that you hit the rate limits after making a certain amount of calls. If that is so, time.sleep(5) won't be enough. The rate score decays over time and will be reset after 5 minutes https://developers.facebook.com/docs/marketing-api/api-rate-limiting. In that case I would suggest making a sleep between each call instead. However a better option would be to upgrade your API status. If you hit the rate limits this fast I assume you are on Developer level. Try upgrading first to Basic and then Standard and you should not have these problems. https://developers.facebook.com/docs/marketing-api/access
Also, as you mention, utilizing Facebook's batch request API could be a good idea. https://developers.facebook.com/docs/marketing-api/asyncrequests/v2.11
Here is a thread with examples of the Batch API working with the Python SDK: https://github.com/facebook/facebook-python-ads-sdk/issues/116
I paste the code snippet (copied from the last link that #reaktard pasted), credit to github user #williardx
it helped me a lot in my development.
# ----------------------------------------------------------------------------
# Helper functions
def generate_batches(iterable, batch_size_limit):
# This function can be found in examples/batch_utils.py
batch = []
for item in iterable:
if len(batch) == batch_size_limit:
yield batch
batch = []
batch.append(item)
if len(batch):
yield batch
def success_callback(response):
batch_body_responses.append(response.body())
def error_callback(response):
# Error handling here
pass
# ----------------------------------------------------------------------------
batches = []
batch_body_responses = []
api = FacebookAdsApi.init(your_app_id, your_app_secret, your_access_token)
for ad_set_list in generate_batches(ad_sets, batch_limit):
next_batch = api.new_batch()
requests = [ad_set.get_insights(pending=True) for ad_set in ad_set_list]
for req in requests:
next_batch.add_request(req, success_callback, error_callback)
batches.append(next_batch)
for batch_request in batches:
batch_request.execute()
time.sleep(5)
print batch_body_responses
I am trying to write a script in python in order to crawl images from google search. I want to track the urls of images and after that store those images to my computer. I found a code to do so. However it only track 60 urls. Afterthat a timeout message appears. Is it possible to track more than 60 images?
My code:
def crawl_images(query, path):
BASE_URL = 'https://ajax.googleapis.com/ajax/services/search/images?'\
'v=1.0&q=' + query + '&start=%d'
BASE_PATH = os.path.join(path, query)
if not os.path.exists(BASE_PATH):
os.makedirs(BASE_PATH)
counter = 1
urls = []
start = 0 # Google's start query string parameter for pagination.
while start < 60: # Google will only return a max of 56 results.
r = requests.get(BASE_URL % start)
for image_info in json.loads(r.text)['responseData']['results']:
url = image_info['unescapedUrl']
print url
urls.append(url)
image = urllib.URLopener()
try:
image.retrieve(url,"model runway/image_"+str(counter)+".jpg")
counter +=1
except IOError, e:
# Throw away some gifs...blegh.
print 'could not save %s' % url
continue
print start
start += 4 # 4 images per page.
time.sleep(1.5)
crawl_images('model runway', '')
Have a look at the Documentation: https://developers.google.com/image-search/v1/jsondevguide
You should get up to 64 results:
Note: The Image Searcher supports a maximum of 8 result pages. When
combined with subsequent requests, a maximum total of 64 results are
available. It is not possible to request more than 64 results.
Another note: You can restrict the file type, this way you dont need to ignore gifs etc.
And as an additional Note, please keep in mind that this API should only be used for user operations and not for automated searches!
Note: The Google Image Search API must be used for user-generated
searches. Automated or batched queries of any kind are strictly
prohibited.
You can try the icrawler package. Extremely easy to use. I've never had problems with the number of images to be downloaded.
I'm trying to write a simple script to download call details information from Twilio using the python helper library. So far, it seems that my only option is to use .iter() method to get every call ever made for the subaccount. This could be a very large number.
If I use the .list() resource, it doesn't seem to give me a page count anywhere, so I don't know for how long to continue paging to get all calls for the time period. What am I missing?
Here are the docs with code samples:
http://readthedocs.org/docs/twilio-python/en/latest/usage/basics.html
It's not very well documented at the moment, but you can use the following API calls to page through the list:
import twilio.rest
client = twilio.rest.TwilioRestClient(ACCOUNT_SID, AUTH_TOKEN)
# iterating vars
remaining_messages = client.calls.count()
current_page = 0
page_size = 50 # any number here up to 1000, although paging may be slow...
while remaining_messages > 0:
calls_page = client.calls.list(page=current_page, page_size=page_size)
# do something with the calls_page object...
remaining_messages -= page_size
current_page += 1
You can pass in page and page_size arguments to the list() function to control which results you see. I'll update the documentation today to make this more clear.
As mentioned in the comment, the above code did not work because remaining_messages = client.calls.count() always returns 50, making it absolutely useless for paging.
Instead, I ended up just trying next page until it fails, which is fairly hacky. The library should really include numpages in the list resource for paging.
import twilio.rest
import csv
account = <ACCOUNT_SID>
token = <ACCOUNT_TOKEN>
client = twilio.rest.TwilioRestClient(account, token)
csvout = open("calls.csv","wb")
writer = csv.writer(csvout)
current_page = 0
page_size = 50
started_after = "20111208"
test = True
while test:
try:
calls_page = client.calls.list(page=current_page, page_size=page_size, started_after=started_after)
for calls in calls_page:
writer.writerow( (calls.sid, calls.to, calls.duration, calls.start_time) )
current_page += 1
except:
test = False