Iterating over links using Requests and adding to Pandas DF - python

I have a problem which I can't sort out for a day, and kindly asking for help of the community.
I have prepared Pandas DF which looks like this:
What I need to do:
Create URL links using 'name' and 'namespace' columns of df
Ask each URL
Save parameters from page if code = 200 and I have data, otherwise - find error code and save to the new column if df, called 's2s_status'
What I have for now:
start_slice = 1
total_end = 20
while start_slice <= total_end:
end_slice = start_slice + 1
if end_slice>total_end:
end_slice=total_end
var_name=df.loc[df.index == start_slice,'name'].values[0]
var_namespace=df.loc[df.index == start_slice, 'namespace'].values[0]
url=f"http://{var_name}.{var_namespace}.prod.s.o3.ru:84/config"
r=requests.get(url, timeout=(12600,1))
data=r.json()['Values']['s2s_auth_requests_sign_grpc']
if r.status_code == 404:
df['s2s_status']="404 error"
elif r.status_code == 500:
df['s2s_status']="500 Error"
elif r.status_code == 502:
df['s2s_status']="502 Error"
elif r.status_code == 503:
df['s2s_status']="503 Error"
else:
data=r.json()['Values']['s2s_auth_requests_sign_grpc']
df['s2s_status']="sign"
if end_slice == total_end:
break
else:
start_slice = end_slice
print(r)
print(url)
print(df)
This code iterates over first 20 records, but:
Brings wrong errors, e.g. page like 'http://exteca-auth.eea.prod.s.o3.ru:84/config' not found at all, but it gives me 404 error.
Less important, but still - didn't imagine how to handle cases when page won't return anything (no data with 200 code, not 404/500/502/503/etc. error)
Thank you in advance.

Related

The script is not saving my bad_responses correctly

I have brainstormed a lot of possibilities and created a bulk API call so I can import some products into my store. It works fine, products are imported correctly, however, I have trouble saving bad responses in a csv file.
Maybe I am doing something wrong or my indentation is not correct. Please, point me in a right direction or provide any advice for the future for not making similar mistakes.
This is the code:
df = pd.read_csv('edited_csv.csv')
bad_responses_list = []
for i in range(len(df)):
endpoint = f"{base_url}/products"
data = {
"product_id": int(df['product_id'][i]),
"title": df['title'][i],
"discount_type": "percentage",
"discount": 5,
}
}
response = requests.post(endpoint, json.dumps(data), headers=headers)
status_code = response.status_code
if status_code != 200 or status_code != 201:
bad_responses_list.append([df['product_id'][i], response.status_code])
df_bad_responses = pd.DataFrame(bad_responses_list, columns=['product_id', 'status_code'])
df_bad_responses.to_csv('products_with_bad_responses.csv')
Now, when I run this it creates csv with good and bad responses, something like this:
product_id, status_code
7262783, 201
9458389, 201
0493788, 422
7273628, 422
7263728, 201
Thank you in advance!
the or in this line:
if status_code != 200 or status_code != 201:
needs to be an and

How to pass multiple API calls?

I have a simple validation API call like this:
client = Client(
token='{{YOUR_TOKEN_HERE}}',
key='{{YOUR_KEY}}',
environment='prod'
)
lookup_api = client.validations
result = lookup_api.list(number="{{NUMBER}}")
if result['status'] == 200:
print(result['data'])
else:
print("An error occurred." + str(result['status']))
print(result['data'])
I want to pass multiple different tokens and multiple numbers, how should I do it?
I tried one token with multiple numbers and it worked, but I have been stuck on passing multiple tokens to multiple numbers for hours.
Here was my attempt:
tokens = ['112233','223344']
key='10000-000'
environment='prod'
clients = [Client(tokens=token, key=key, environment=environment) for token in tokens]
lookup_api = [list(clients=x).validations for x in clients]
results = [lookup_api.list(number=x) for x in numbers]
for result in results:
if result['status'] == 200:
print(result['data'])
else:
print("An error occurred." + str(result['status']))
print(result['data'])
Any suggestion or help would be greatly appreciate!
list(clients=x) isn't proper syntax.
If you want to call the list function of client.validations like you had before you'd want this
results = [c.validations.list(number=n) for n in numbers for c in clients]
Otherwise, use a regular loop
for c in clients:
for n in numbers:
result = c.validations.list(number=n)
status = result['status']
data = result['data']
if status != 200:
print("An error occurred. " + str(status))
print(data)

Apply a Function to a Dictionary in Python

I have an ELIF function to determine whether or not a website exists. The elif works, but is incredibly slow. I'd like to create a dictionary to apply the ELIF function to the list of URLs I have. Ideally I'm looking to get the outputs into a new table listing the URL and the result from the function.
I'm creating a dictionary for the potential outputs outlined in the elif statement posted below
check = {401:'web site exists, permission needed', 404:'web site does not exist'}
for row in df['sp_online']:
r = requests.head(row)
if r.status_code == 401:
print ('web site exists, permission needed')
elif r.status_code == 404:
print('web site does not exist')
else:
print('other')
How can I get the results of the confirmation function to show each url's result as a new column in the dataframe?
I think your should try a Thread or Multiprocessing approach. Instead of requesting one site at a time, you can pool n websites and wait for their responses. With ThreadPool you can achieve this with a few extra lines. Hope this is of use to you!
import requests
from multiprocessing.pool import ThreadPool
list_sites = ['https://www.wikipedia.org/', 'https://youtube.com', 'https://my-site-that-does-not-exist.com.does.not']
def get_site_status(site):
try:
response = requests.get(site)
except requests.exceptions.ConnectionError:
print("Connection refused")
return 1
if response.status_code == 401:
print('web site exists, permission needed')
elif response.status_code == 404:
print('web site does not exist')
else:
print('other')
return 0
pool = ThreadPool(processes=1)
results = pool.map_async(get_site_status, list_sites)
print('Results: {}'.format(results.get()))
I think you are looking for Series.map
df = pd.DataFrame({'status': [401, 404, 500]})
check = {401:'web site exists, permission needed', 404:'web site does not exist'}
print(df['status'].map(check))
prints
0 web site exists, permission needed
1 web site does not exist
2 NaN
Name: status, dtype: object
Assign to a new column in the normal way
df['new_col'] = df['status'].map(check)

how to send an array in python post data

I am trying to send an array as postData = {'WIFI_CLONE', 'keyword#2'} to python post request as follows and running into exception as too many values to unpack ?how to fix this?
def AddKeywordToProblem(self, problemID=None, keyword = ""):
if self._checklogin():
problemID = '37040553'
postData = ['WIFI_CLONE', 'keyword#2']
logger.info(postData)
r = requests.post(self._baseurl + 'problems/' + problemID + '/keywords',
headers=self._headers,data=postData,timeout=DEFAULT_REQUESTS_TIMEOUT)
if r.status_code != 201:
logger.warning('Error: Unable to get data. Server came back with:')
logger.warning(r.text)
return False
return r.json()
Exception
too many values to unpack
requests.post(data=json.dumps(postData))

Super-performatic comparison

I have a python code which recovers information from an HTTP API using the requests module. This code is run over and over again with an interval of few milliseconds between each call.
The HTTP API which I'm calling can send me 3 different responses, which can be:
text 'EMPTYFRAME' with HTTP status 200
text 'CAMERAUNAVAILABLE' with HTTP status 200
JPEG image with HTTP status 200
This is part of the code which handles this situation:
try:
r = requests.get(url,
auth=(username, pwd),
params={
'camera': camera_id,
'ds': int((datetime.now() - datetime.utcfromtimestamp(0)).total_seconds())
}
)
if r.text == 'CAMERAUNAVAILABLE':
raise CameraManager.CameraUnavailableException()
elif r.text == 'EMPTYFRAME':
raise CameraManager.EmptyFrameException()
else:
return r.content
except ConnectionError:
# handles the error - not important here
The critical part is the if/elif/else section, this comparison is taking way too long to complete and if I completely remove and simply replace it by return r.content, I have the performance I wish to, but checking for these other two responses other than the image is important for the application flow.
I also tried like:
if len(r.text) == len('CAMERAUNAVAILABLE'):
raise CameraManager.CameraUnavailableException()
elif len(r.text) == len('EMPTYFRAME'):
raise CameraManager.EmptyFrameException()
else:
return r.content
And:
if r.text[:17] == 'CAMERAUNAVAILABLE':
raise CameraManager.CameraUnavailableException()
elif r.text[:10] == 'EMPTYFRAME':
raise CameraManager.EmptyFrameException()
else:
return r.content
Which made it faster but still not as fast as I think this can get.
So is there a way to optimize this comparison?
EDIT
With the accepted answer, the final code is like this:
if r.headers['content-type'] == 'image/jpeg':
return r.content
elif len(r.text) == len('CAMERAUNAVAILABLE'):
raise CameraManager.CameraUnavailableException()
elif len(r.text) == len('EMPTYFRAME'):
raise CameraManager.EmptyFrameException()
Checking the response's Content-Type provided a much faster way to assure an image was received.
Comparing the whole r.text (which may contain the JPEG bytes) is probably slow.
You could compare the Content-Type header the server should set:
ct = r.headers['content-type']
if ct == "text/plain":
# check for CAMERAUNAVAILABLE or EMPTYFRAME
else:
# this is a JPEG

Categories