How do you get Flickr random Images using the API and Python?
I used the following Flickr API:
flickr.photos.search(text,page,per_page,extras)
# where:
# text = "flower" (also with other words the results are very disappointing about the randomness)
# per_page = 1 (I have set 1 Image per page)
# page = In the vast majority of cases, the number of pages found per word exceeds 100000. Therefore I set a random number between 1 and 100000
# extras = "url_sq,url_t,url_s,url_q,url_m,url_n,url_z,url_c,url_l,url_o"
When I launch my application, which displays an Image every 20 seconds, the results are very very disappointing, in the sense that, about every 20 Images displayed, 16 are always the same Image.
Below the entire code:
def update_flickrImage(self):
FLICKR_PUBLIC = 'XXXXXXXXXXXXXXXXXX'
FLICKR_SECRET = 'XXXXXXXXXXX'
flickr = FlickrAPI(FLICKR_PUBLIC,FLICKR_SECRET,format='parsed-json')
random.seed()
rand_page = random.randrange(1,100000,1)
extras = 'url_sq,url_t,url_s,url_q,url_m,url_n,url_z,url_c,url_l,url_o'
cats = flickr.photos.search(text="flower", page=rand_page, per_page=1, extras=extras)
photos = cats['photos']
pprint(photos)
print("Page: ",rand_page)
for image in photos['photo']:
self.title = image['title']
try:
url = image['url_o']
width = image['width_o']
height = image['height_o']
except:
try:
url = image['url_l']
width = image['width_l']
height = image['height_l']
except:
try:
url = image['url_c']
width = image['width_c']
height = image['height_c']
except:
pass
try:
r = requests.get(url)
self.pic = r.content
except:
pass
I tried your code as close I could. When I ran the test on 100 calls, I only got back 3 different links.
When I reduced the number to 4000 in the randrange function, I got 98 unique URLs out of 100. The whole code is below (with my public and secret commented out):
import flickrapi as fa
import random
# import pprint as pp
import time as ti
def update_flickrImage(self):
FLICKR_PUBLIC = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
FLICKR_SECRET = 'XXXXXXXXXXXXXXXX'
flickr = fa.FlickrAPI(FLICKR_PUBLIC,FLICKR_SECRET,format='parsed-json')
random.seed()
rand_page = random.randrange(1,4000,1)
extras = 'url_sq,url_t,url_s,url_q,url_m,url_n,url_z,url_c,url_l,url_o'
cats = flickr.photos.search(text="flower",
page=rand_page,
per_page=1,
extras=extras)
photos = cats['photos']
# pp.pprint(photos)
print("Page: ",rand_page)
for image in photos['photo']:
title = image['title']
try:
url = image['url_o']
width = image['width_o']
height = image['height_o']
except:
try:
url = image['url_l']
width = image['width_l']
height = image['height_l']
except:
try:
url = image['url_c']
width = image['width_c']
height = image['height_c']
except:
pass
self['title'] = title
self['url'] = url
self['width'] = width
self['height'] = height
return url
imgobj = {'title':'A','url':'https','width':'0','height':'0'}
for i in range(100):
imgurl = update_flickrImage(imgobj)
print( imgurl)
ti.sleep(2)
Flickr search API has a limit of 4000 records returned per search query.
In my Flickr account I have over 13,000 photos. I can download 100 at a time for up to 1400 pages when I need to make a local searchable database in MySQL. I do most of my Flickr work with PHP.
Yeah, you have to play with it. A search for "flower" returns 295,805 pages. That's too much.
Also in my version of your code, I had to comment out the pretty print. Title would blow it up with certain UTF characters. I just wanted to see unique URLs.
Related
I'm trying to get the latest 100 posts from my giphy user.
It works for accounts like "giphy" and "spongebob"
But not for "jack0_o"
import requests
def get_user_gifs(username):
api_key = "API_KEY"
limit = 25 # The number of GIFs to retrieve per request (max 25)
offset = 0
# Set a flag to indicate when all GIFs have been retrieved
done = False
# Keep making requests until all GIFs have been retrieved
while not done:
# Make the request to the Giphy API
endpoint = f"https://api.giphy.com/v1/gifs/search?api_key={api_key}&q={username}&limit={limit}&offset={offset}&sort=recent"
response = requests.get(endpoint)
data = response.json()
# Extract the GIF URLs from the data and print them one per line
for gif in data["data"]:
print(gif["url"])
# Update the starting index for the next batch of GIFs
offset += limit
# Check if there are more GIFs to retrieve
if len(data["data"]) < limit or offset >= 100:
done = True
get_user_gifs("spongebob") #WORKS
get_user_gifs("jack0_o") #does not work
Already tried adding ratings with "pg", "r", "g"
I'm trying to get all the tracks from 2 playlists into a CSV file. However, in both playlists, even though I increase the offset parameter by 100 in each query, the first 100 songs of both playlists are returned. So the page is never changed. What could be the problem?
import spotipy, json, csv
from spotipy.oauth2 import SpotifyClientCredentials
client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
data_file = open('data.csv', 'w')
writer = csv.writer(data_file)
writer.writerow(['track_num', 'track_id', 'track_name', 'first_artist'] + ['liked'])
playlist_ids = [
'xxxxxxxxxxxxxxxxxxxxxxx', # playlist 1
'yyyyyyyyyyyyyyyyyyyyyyy' # playlist 2
]
for playlist_id in playlist_ids:
offset_n = 0
total = 100
while offset_n < total:
tracks_response = sp.playlist_tracks(playlist_id, offset=offset_n)
tracks_json = json.dumps(tracks_response)
tracks_data = json.loads(tracks_json)
if offset_n == 0:
total = tracks_data['tracks']['total']
for track in tracks_data['tracks']['items']:
track_id = track['track']['id']
track_name = track['track']['name']
first_artist = track['track']['artists'][0]['name']
if playlist_id == playlist_ids[0]:
writer.writerow([row_num, track_id, track_name, first_artist] + [1])
else:
writer.writerow([row_num, track_id, track_name, first_artist] + [0])
offset_n += 100
data_file.close()
The playlist_tracks method returns a paginated result with details of the tracks of a playlist.
So you need to iterate over all pages to get the full data.
You can use this example as a reference:
def get_all_tracks_from_playlist(playlist_id)
tracks_response = sp.playlist_tracks(playlist_id)
tracks = tracks_response["items"]
while tracks_response["next"]:
tracks_response = sp.next(tracks_response)
tracks.extend(tracks_response["items"])
return tracks
Regarding the ReadTimeout exception you have mentioned in the comments:
Spotify client accepts requests_timeout and retries as arguments, according to the documentation the default values are requests_timeout=5, and retries=3
You can extend them as you wish to decrease the chance you will get the ReadTimeout exception.
As a start you can double the request timeout to 10 seconds, and change the retries to 5:
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager, requests_timeout=10, retries=5)
My task is to Write a function to get the number of jobs for the given technology.
Note: The API gives a maximum of 50 jobs per page.
If you get 50 jobs per page, it means there could be some more job listings available.
So if you get 50 jobs per page you should make another API call for next page to check for more jobs.
If you get less than 50 jobs per page, you can take it as the final count.
Following is my code
baseurl = "https://jobs.github.com/positions.json"
def get_number_of_jobs(technology):
number_of_jobs = 0
tech = technology
page= 0
PARAMS = {'technology':tech , 'page': page}
jobs=requests.get(url=baseurl,params = PARAMS )
if jobs.ok:
listings = jobs.json()
number_of_jobs=len(listings)
if number_of_jobs==50:
page= page+1
PARAMS = {'technology':tech , 'page': page}
jobs=requests.get(url=baseurl,params = PARAMS )
if jobs.ok:
listings2 = jobs.json()
number_of_jobs= number_of_jobs + len(listings2)
return technology,number_of_jobs
Now I can not figure out how to do the pagination in this function? Meaning how to check if there are more than 50 job posting for a specific technology or not and if it is then run the code again and get those postings as well?
I print the output as
print(get_number_of_jobs('python'))
('python', 100)
Can someone please help?
Many thanks in advance!
Please let me know if should work
import requests
baseurl = 'https://jobs.github.com/positions.json'
total_job = 0
def get_number_of_jobs(technology, page):
global total_job
PARAMS = {'technology':technology , 'page': page}
jobs=requests.get(url=baseurl,params = PARAMS )
total_job += len(jobs.json()) if jobs.ok else 0
return len(jobs.json()) if jobs.ok else 0
def get_jobs(technology):
page = 0
while get_number_of_jobs(technology, page) >= 50:page+=1
return total_job
print(get_jobs('python'))
baseurl = 'https://jobs.github.com/positions.json'
def get_number_of_jobs(technology):
number_of_jobs = 0
page = 0
while True:
payload = {"description":technology,"page":page}
r = requests.get(baseurl,params=payload)
if r.ok:
data = r.json()
number_of_jobs = len(data)
if number_of_jobs >= 50:
page += 1
continue
else:
break
return technology,number_of_jobs
I'm struggling to get a Lambda function working. I have a python script to access twitter API, pull information, and export that information into an excel sheet. I'm trying to transfer python script over to AWS/Lambda, and I'm having a lot of trouble.
What I've done so far: Created AWS account, setup S3 to have a bucket, and poked around trying to get things to work.
I think the main area I'm struggling is how to go from a python script that I'm executing via local CLI and transforming that code into lambda-capable code. I'm not sure I understand how the lambda_handler function works, what the event or context arguments actually mean (despite watching a half dozen different tutorial videos), or how to integrate my existing functions into Lambda in the context of the lambda_handler, and I'm just very confused and hoping someone might be able to help me get some clarity!
Code that I'm using to pull twitter data (just a sample):
import time
import datetime
import keys
import pandas as pd
from twython import Twython, TwythonError
import pymysql
def lambda_handler(event, context):
def oauth_authenticate():
twitter_oauth = Twython(keys.APP_KEY, keys.APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter_oauth.obtain_access_token()
twitter = Twython(keys.APP_KEY, access_token = ACCESS_TOKEN)
return twitter
def get_username():
"""
Prompts for the screen name of targetted account
"""
username = input("Enter the Twitter screenname you'd like information on. Do not include '#':")
return username
def get_user_followers(username):
"""
Returns data on all accounts following the targetted user.
WARNING: The number of followers can be huge, and the data isn't very valuable
"""
#username = get_username()
#import pdb; pdb.set_trace()
twitter = oauth_authenticate()
datestamp = str(datetime.datetime.now().strftime("%Y-%m-%d"))
target = twitter.lookup_user(screen_name = username)
for y in target:
target_id = y['id_str']
next_cursor = -1
index = 0
followersdata = {}
while next_cursor:
try:
get_followers = twitter.get_followers_list(screen_name = username,
count = 200,
cursor = next_cursor)
for x in get_followers['users']:
followersdata[index] = {}
followersdata[index]['screen_name'] = x['screen_name']
followersdata[index]['id_str'] = x['id_str']
followersdata[index]['name'] = x['name']
followersdata[index]['description'] = x['description']
followersdata[index]['date_checked'] = datestamp
followersdata[index]['targeted_account_id'] = target_id
index = index + 1
next_cursor = get_followers["next_cursor"]
except TwythonError as e:
print(e)
remainder = (float(twitter.get_lastfunction_header(header = 'x-rate-limit-reset')) \
- time.time())+1
print("Rate limit exceeded. Waiting for:", remainder/60, "minutes")
print("Current Time is:", time.strftime("%I:%M:%S"))
del twitter
time.sleep(remainder)
twitter = oauth_authenticate()
continue
followersDF = pd.DataFrame.from_dict(followersdata, orient = "index")
followersDF.to_excel("%s-%s-follower list.xlsx" % (username, datestamp),
index = False, encoding = 'utf-8')
looking at the example provided by wordpresslib, its very straight forward on how to upload images to the media library. However, the attachment of images looks like it was never finished. Has anyone successfully attached the images?
#!/usr/bin/env python
"""
Small example script that publish post with JPEG image
"""
# import library
import wordpresslib
print 'Example of posting.'
print
url = raw_input('Wordpress URL (xmlrpc.php will be added):')
user = raw_input('Username:')
password = raw_input('Password:')
# prepare client object
wp = wordpresslib.WordPressClient(url+"xmlrpc.php", user, password)
# select blog id
wp.selectBlog(0)
# upload image for post
# imageSrc = wp.newMediaObject('python.jpg')
# FIXME if imageSrc:
# create post object
post = wordpresslib.WordPressPost()
post.title = 'Test post'
post.description = '''
Python is the best programming language in the earth !
No image BROKEN FIXME <img src="" />
'''
#post.categories = (wp.getCategoryIdFromName('Python'),)
# Add tags
post.tags = ["python", "snake"]
# do not publish post
idNewPost = wp.newPost(post, False)
print
print 'Posting successfull! (Post has not been published though)'
WordPressPost class:
class WordPressPost:
"""Represents post item
"""
def __init__(self):
self.id = 0
self.title = ''
self.date = None
self.permaLink = ''
self.description = ''
self.textMore = ''
self.excerpt = ''
self.link = ''
self.categories = []
self.user = ''
self.allowPings = False
self.allowComments = False
self.tags = []
self.customFields = []
def addCustomField(self, key, value):
kv = {'key':key, 'value':value}
self.customFields.append(kv)
Wordpress saves images as website.com/wp-content/uploads/YEAR/MONTH/FILENAME
Adding a simple image tag with the above format in to post.description display the image on the post.
where YEAR is the current year with a 4 digit format (ex. 2015)
and MONTH is the current month with a leading zero (ex. 01,02,... 12)
and FILENAME is the file name submitted via imageSrc = wp.newMediaObject('python.jpg')
Example file name: website.com/wp-content/uploads/2015/06/image.jpg
Here is how I posted my image:
import time
import wordpresslib
import Image
from datetime import datetime
time = datetime.now()
h = str(time.strftime('%H'))
m = str(time.strftime('%M'))
s = str(time.strftime('%S'))
mo = str(time.strftime('%m'))
yr = str(time.strftime('%Y'))
url = 'WORDPRESSURL.xmlrpc.php'
wp = wordpresslib.WordPressClient(url,'USERNAME','PASSWORD')
wp.selectBlog(0)
imageSrc = wp.newMediaObject('testimage'+h+m+s'.jpg') #Used this format so that if i post images with the same name its unlikely they will override eachother
img = 'http://WORDPRESSURL/wp-content/uploads/'+yr+'/'+mo+'/testimage'+h+m+s+'.jpg'
post=wordpresslib.WordPressPost()
post.title='title'
post.description='<img src="'+img+'"/>'
idPost=wp.newPost(post,true)