How to make python download an image from a URL but if the image is already downloaded it doesnt - python

So say from a random api, lets say api.example.com as an example. It sends a random image once you go on it and sends the json for it. So like {"url": "api.example.com/img1.png"}. After de-jsonifying it how can i download the image and save it in some folder, but if its already downloaded so say the image name is taken it will not download it.
Edit: here is my code i done so far.
`
url = f"https://nekos.life/api/v2/img/neko"
response = requests.get(url)
response.raise_for_status()
jsonResponse = response.`json()
urll = (jsonResponse["url"])
urllib.request.urlretrieve(urll, "neko.png")`

as said in this article, i think [os.path][1] can do the job pretty well.
just try to use
os.path.exists(phot_path)
that should be it.
[1]: https://linuxize.com/post/python-check-if-file-exists/

Related

Why downloading Facebook images with requests.get() gives corrupted files?

I am a very new to Python and Facebook Graph API and hope you can help me out with this:
I have writted (in Python) a peace of code that uploads images tu a page on facebook (in a post, so it contains some text too) and this works as expected. Now I am trying to write a peace of code capable of downloading thhe image inside a post (given post_id). Unfortunately I always get "file corrupted " error.
Here is the code I use to download the image:
# this function uploads an image from a web url
ret = upload_img_to_fb(url)
# decode JSON from the request
post_id = json.loads(ret)
ret = get_json_from_fb_postID(post_id['post_id'])
perm_url = json.loads(ret)
print('Perm url = ' + perm_url['permalink_url'] + '\n')
img_response = requests.get(perm_url['permalink_url'])
image = open("foto4.jpg","wb")
image.write(img_response.content)
image.close()
Now, the print will print the following:
Perm url = https://www.facebook.com/102956292308044/photos/a.103716555565351/107173241886349/?type=3
which, acording to what I understood makes everything wrong because it is not a picture, even if a picture is displayed on the screen. So I right clicked the pic and opened it's link and I got:
https://scontent.fbzo1-2.fna.fbcdn.net/v/t39.30808-6/273008252_107171558553184_3697853178128736286_n.jpg?_nc_cat=103&ccb=1-5&_nc_sid=9267fe&_nc_ohc=d0ZvWSTzIogAX-PsuzN&_nc_ht=scontent.fbzo1-2.fna&oh=00_AT8GWh0wDHgB6tGCzHuPE2VZFus9EgWhllaJfVkZ-Nqtow&oe=620465E4
and if I pass this last link as parameter to img_response = requests.get() it works.
How do I get around this?

How to check file type for an image stored as url?

I have a list of urls which look more or less like this:
'https://myurl.com/images/avatars/cb55-f14b-455d1-9ac4w20190416075520341'
I'm trying to validate the image behind the url and check what image type (png, jpeg or other) it has and write back the image type into a new dataframe column imgType.
my code so far
import pandas as pd
import requests
df = pd.read_csv('/path/to/allLogo.csv')
urls = df.T.values.tolist()[4]
for x in urls:
#i'm stuck here... as the content doesn't seem to give me image type.
s=requests.get(url, verify=False).content
df["imgType"] =
df.to_csv('mypath/output.csv')
Could someone help me with this? thanks in advance
One possibility is to check response headers for 'Content-Type' - but it depends on the server what headers are sent back to the client (without knowing the real URL is hard to tell):
import requests
url = 'https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png'
response = requests.get(url)
# uncomment this to print all response headers:
# print(response.headers)
print(response.headers['Content-Type'])
Prints:
image/png
check what image type (png, jpeg or other)
If you manage to download it, either to disk (file) or memory (as bytes - .content of requests' response) then you might harness python built-in module imghdr, following way:
import imghdr
imgtype = imghdr.what("path/to/image.png") # testing file on disk
or
import requests
r = requests.get("url_of_image")
imgtype = imghdr.what(h=r.content) # testing
Keep in mind that imghdr does recognize limited set of image file format (see linked docs), however it should suffice if you are only interested in detecting png vs jpeg vs other.

Discord bot - issue saving a text file after hosting

OK, I have been trying to think of a solution/find a solution myself for quite some time but everything I am attempting either ends up not a solution, or too complex for me to attempt without knowing it will work.
I have a discord bot, made in python. The bots purpose is to parse a blog for HTML links, when a new HTML link is posted, it will the post the link into discord.
I am using a textfile to save the latest link, and then parsing the website every 30seconds to check if a new link has been posted by comparing the link at position 0 in the array to the link in the textfile.
Now, I have managed to host my bot on Heroku with some success however I have since learned that Heroku cannot modify my textfile since it pulls the code from github, any changes are reverted after ~24hours.
Since learning this I have attempted to host the textfile on an AWS S3 bucket, however I have now learned that it can add and delete files, but not modify existing ones, and can only write new files from existing files on my system, meaning if I could do this, I wouldn't need to do this since I would be able to modify the file actually on my system and not need to host it anywhere.
I am looking for hopefully simple solutions/suggestions.
I am open to changing the hosting/whatever is needed, however I cannot pay for hosting.
Thanks in advance.
EDIT
So, I am editing this because I have a working solution thanks to a suggestion commented below.
The solution is to get my python bot to commit the new file to github, and then use that commited file's content as the reference.
import base64
import os
from github import Github
from github import InputGitTreeElement
user = os.environ.get("GITHUB_USER")
password = os.environ.get("GITHUB_PASSWORD")
g = Github(user,password)
repo = g.get_user().get_repo('YOUR REPO NAME HERE')
file_list = [
'last_update.txt'
]
file_names = [
'last_update.txt',
]
def git_commit():
commit_message = 'News link update'
master_ref = repo.get_git_ref('heads/master')
master_sha = master_ref.object.sha
base_tree = repo.get_git_tree(master_sha)
element_list = list()
for i, entry in enumerate(file_list):
with open(entry) as input_file:
data = input_file.read()
if entry.endswith('.png'):
data = base64.b64encode(data)
element = InputGitTreeElement(file_names[i], '100644', 'blob', data)
element_list.append(element)
tree = repo.create_git_tree(element_list, base_tree)
parent = repo.get_git_commit(master_sha)
commit = repo.create_git_commit(commit_message, tree, [parent])
master_ref.edit(commit.sha)
I then have a method called 'check_latest_link' which checks my github repo's RAW format, and parses that HTML to source the contents and then assigns that content as a string to my variable 'last_saved_link'
import requests
def check_latest_link():
res = requests.get('[YOUR GITHUB PAGE LINK - RAW FORMAT]')
content = res.text
return(content)
Then in my main method I have the follow :
#client.event
async def task():
await client.wait_until_ready()
print('Running')
while True:
channel = discord.Object(id=channel_id)
#parse_links() is a method to parse HTML links from a website
news_links = parse_links()
last_saved_link = check_latest_link()
print('Running')
await asyncio.sleep(5)
#below compares the parsed HTML, to the saved reference,
#if they are not the same then there is a new link to post.
if last_saved_link != news_links[0]:
#the 3 methods below (read_file, delete_contents and save_to_file)
#are methods that simply do what they suggest to a text file specified elsewhere
read_file()
delete_contents()
save_to_file(news_links[0])
#then we have the git_commit previously shown.
git_commit()
#after git_commit, I was having an issue with the github reference
#not updating for a few minutes, so the bot posts the message and
#then goes to sleep for 500 seconds, this stops the bot from
#posting duplicate messages. because this is an async function,
#it will not stop other async functions from executing.
await client.send_message(channel, news_links[0])
await asyncio.sleep(500)
I am posting this so I can close the thread with an "Answer" - please refer to post edit.

Weird json value urllib python

I'm trying to manipulate a dynamic JSON from this site:
http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do
It has 3 elements, imagem, a base64, labelValorCaptcha, just a message, and uuidCaptcha, a value to pass by parameter to play a sound in this link bellow:
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_e7b072e1fce5493cbdc46c9e4738ab8a
When I enter in the first site through a browser and put in the second link the uuidCaptha after the equal ("..uuidCaptcha="), the sound plays normally. I wrote a simple code to catch this elements.
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
But I dont know what's happening, the caught value of the uuidCaptcha doesn't work. Open a error web page.
Someone knows?
Thanks!
It works for me.
$ cat a.py
#!/usr/bin/env python
# encoding: utf-8
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
$ python a.py
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_efc8d4bc3bdb428eab8370c4e04ab42c
As I said #Charlie Harding, the best way is download the page and get the JSON values, because this JSON is dynamic and need an opened web link to exist.
More info here.

Upload image to facebook using the Python API

I have searched the web far and wide for a still working example of uploading a photo to facebook through the Python API (Python for Facebook). Questions like this have been asked on stackoverflow before but non of the answers I have found work anymore.
What I got working is:
import facebook as fb
cfg = {
"page_id" : "my_page_id",
"access_token" : "my_access_token"
}
api = get_api(cfg)
msg = "Hello world!"
status = api.put_wall_post(msg)
where I have defined the get_api(cfg) function as this
graph = fb.GraphAPI(cfg['access_token'], version='2.2')
# Get page token to post as the page. You can skip
# the following if you want to post as yourself.
resp = graph.get_object('me/accounts')
page_access_token = None
for page in resp['data']:
if page['id'] == cfg['page_id']:
page_access_token = page['access_token']
graph = fb.GraphAPI(page_access_token)
return graph
And this does indeed post a message to my page.
However, if I instead want to upload an image everything goes wrong.
# Upload a profile photo for a Page.
api.put_photo(image=open("path_to/my_image.jpg",'rb').read(), message='Here's my image')
I get the dreaded GraphAPIError: (#324) Requires upload file for which non of the solutions on stackoverflow works for me.
If I instead issue the following command
api.put_photo(image=open("path_to/my_image.jpg",'rb').read(), album_path=cfg['page_id'] + "/picture")
I get GraphAPIError: (#1) Could not fetch picture for which I haven't been able to find a solution either.
Could someone out there please point me in the right direction of provide me with a currently working example? It would be greatly appreciated, thanks !
A 324 Facebook error can result from a few things depending on how the photo upload call was made
a missing image
an image not recognised by Facebook
incorrect directory path reference
A raw cURL call looks like
curl -F 'source=#my_image.jpg' 'https://graph.facebook.com/me/photos?access_token=YOUR_TOKEN'
As long as the above calls works, you can be sure the photo agrees with Facebook servers.
An example of how a 324 error can occur
touch meow.jpg
curl -F 'source=#meow.jpg' 'https://graph.facebook.com/me/photos?access_token=YOUR_TOKEN'
This can also occur for corrupted image files as you have seen.
Using .read() will dump the actual data
Empty File
>>> image=open("meow.jpg",'rb').read()
>>> image
''
Image File
>>> image=open("how.png",'rb').read()
>>> image
'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00...
Both of these will not work with the call api.put_photo as you have seen and Klaus D. mentioned the call should be without read()
So this call
api.put_photo(image=open("path_to/my_image.jpg",'rb').read(), message='Here's my image')
actually becomes
api.put_photo('\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00...', message='Here's my image')
Which is just a string, which isn't what is wanted.
One needs the image reference <open file 'how.png', mode 'rb' at 0x1085b2390>
I know this is old and doesn't answer the question with the specified API, however, I came upon this via a search and hopefully my solution will help travelers on a similar path.
Using requests and tempfile
A quick example of how I do it using the tempfile and requests modules.
Download an image and upload to Facebook
The script below should grab an image from a given url, save it to a file within a temporary directory and automatically cleanup after finished.
In addition, I can confirm this works running on a Flask service on Google Cloud Run. That comes with the container runtime contract so that we can store the file in-memory.
import tempfile
import requests
# setup stuff - certainly change this
filename = "your-desired-filename"
filepath = f"{directory}/{filename}"
image_url = "your-image-url"
act_id = "your account id"
access_token = "your access token"
# create the temporary directory
temp_dir = tempfile.TemporaryDirectory()
directory = temp_dir.name
# stream the image bytes
res = requests.get(image_url, stream=True)
# write them to your filename at your temporary directory
# assuming this works
# add logic for non 200 status codes
with open(filepath, "wb+") as f:
f.write(res.content)
# prep the payload for the facebook call
files = {
"filename": open(filepath, "rb"),
}
url = f"https://graph.facebook.com/v10.0/{act_id}/adimages?access_token={access_token}"
# send the POST request
res = requests.post(url, files=files)
res.raise_for_status()
if res.status_code == 200:
# get your image data back
image_upload_data = res.json()
temp_dir.cleanup()
if "images" in image_upload_data:
return image_upload_data["images"][filepath.split("/")[-1]]
return image_upload_data
temp_dir.cleanup() # paranoid: just in case an error isn't raised

Categories