How to iterate through DataFrame and use the values in requests?

How to iterate through DataFrame and use the values in requests? - python

amazing pythoners,
I am hoping to get some help with my below scenario
I have a list of a few centers based on which I want to extract employee data for each center. Earlier I was using the below method and it was working beautifully.
row[0] in the CSV file had the whole URL which looked something like this:
https://api.test.com/v1/centers/96d901bd-2fcc-4f59-91d7-de18f0b0aa90/employees?page=1&size=100
FilePath = open("Employees.csv")
CSV_File = csv.reader(FilePath)
#CSV_File.next()
header = next(CSV_File)
for row in CSV_File:
url2 = row[0]
CenterCode = row[1]
try:
payload={}
payload2={}
headers = {'Authorization': 'apikey'}
response = requests.request("GET", url2, headers=headers, data=payload)
EmployeesData = response.json()
for i in EmployeesData['employees']:
print(i['employee_id'], end = ','), print(CenterCode)
import requests
import pandas as pd
import json
## AA is my DataFrame
AA = AA[['id', 'code']]
#print(AA)
CID = AA['id']
CID2 = CID.to_string(index = False)
#print(CID2)
for index in range(len(AA)):
#print (AA.loc[index, 'id'], AA.loc[index, 'code'])
try:
url2 = f"https://api.test.com/v1/centers/{CID2}/employees?page=1&size=100"
print(url2)
payload={}
files=[]
headers = {'Authorization': 'apikey'}
response = requests.request("GET", url2, headers=headers, data=payload, files=files)
data = response.json()
print('Employee Guid','|','Employee Code', '|', CID2)
except Exception as e:
print(e)
Now I have included the URL in the below new code and replaced only "Center ID" by using the F string. I am extracting Center ID from Pandas DataFrame. However, when I run the code, I am getting the error "Expecting value: line 1 column 1 (char 0)" and I guessed that it must be due to the URL, hence I tried to print the URL and found below result.
Output:-
https://api.zenoti.com/v1/centers/ee2395cb-e714-41df-98d2-66a69d38c556
96d901bd-2fcc-4f59-91d7-de18f0b0aa90/employees?page=1&size=100
Expecting value: line 1 column 1 (char 0)
https://api.zenoti.com/v1/centers/ee2395cb-e714-41df-98d2-66a69d38c556
96d901bd-2fcc-4f59-91d7-de18f0b0aa90/employees?page=1&size=100
Expecting value: line 1 column 1 (char 0)
[Finished in 4.6s]
What is happening in the above output is that I have 2 rows to test my code each containing a unique Center ID, however, both of them are getting added together and replaced by the F string in the URL hence the error 😓
Any suggestion, what could be done differently here?
Thanks in advance.

If I understand correctly, the Center ID is this: CID = AA['id']
Try iterating through the id column this way:
for CID2 in AA['id']:
try:
url2 = f"https://api.test.com/v1/centers/{CID2}/employees?page=1&size=100"
print(url2)
except Exception as e:
print(e)

Related

Timestring passed into URL to output JSON file - Python API Call

I'm getting the following error for my python scraper:
import requests
import json
symbol_id = 'COINBASE_SPOT_BTC_USDT'
time_start = '2022-11-20T17:00:00'
time_end = '2022-11-21T05:00:00'
limit_levels = 100000000
limit = 100000000
url = 'https://rest.coinapi.io/v1/orderbooks/{symbol_id}/history?time_start={time_start}limit={limit}&limit_levels={limit_levels}'
headers = {'X-CoinAPI-Key' : 'XXXXXXXXXXXXXXXXXXXXXXX'}
response = requests.get(url, headers=headers)
print(response)
with open('raw_coinbase_ob_history.json', 'w') as json_file:
json.dump(response.json(), json_file)
with open('raw_coinbase_ob_history.json', 'r') as handle:
parsed = json.load(handle)
with open('coinbase_ob_history.json', 'w') as coinbase_ob:
json.dump(parsed, coinbase_ob, indent = 4)
<Response [400]>
And in my written json file, I'm outputted
{"error": "Wrong format of 'time_start' parameter."}
I assume a string goes into a url, so I flattened the timestring to a string. I don't understand why this doesn't work. This is the documentation for the coinAPI call I'm trying to make with 'timestring'. https://docs.coinapi.io/?python#historical-data-get-4

Incorrect syntax for python. To concatenate strings, stick them together like such:
a = 'a' + 'b' + 'c'

string formatting is invalid, and also need use & in between different url params
# python3
url = f"https://rest.coinapi.io/v1/orderbooks/{symbol_id}/history?time_start={time_start}&limit={limit}&limit_levels={limit_levels}"
# python 2
url = "https://rest.coinapi.io/v1/orderbooks/{symbol_id}/history?time_start={time_start}&limit={limit}&limit_levels={limit_levels}".format(symbol_id=symbol_id, time_start=time_start, limit=limit, limit_levels=limit_levels)
https://docs.python.org/3/tutorial/inputoutput.html
https://docs.python.org/2/tutorial/inputoutput.html

Pass row data from CSV to API and append results to an empty CSV column

I'm attempting to write a code in Python that will take the rows from a CSV file and pass them to an API call. If there is a successful return, I'd like to append yes to the match column that I added. If no data is returned, append no instead.
This is the current code to return the matching results of the first row:
headers = {
'Authorization': {token},
'Content-Type': 'application/json; charset=utf-8',
}
data = '[{
"name": "Company 1",
"email_domain": "email1.com",
"url": "https://www.url1.com"
}]'
response = requests.post(
'https://{base_url}/api/match',
headers=headers,
data=data
)
This code is working for each row if I manually pass in the data to the API call, but since there are hundreds of rows, I'd like to be able to iterate through each row, pass them through the API call, and append yes or no to the match column that I created. My strong suit is not writing for loops, which I believe is the way to attack this, but would love any input from someone who has done something similar.

Since you want to iterate over your csv file, you will have to use a for loop. The csv.DictReader can convert a csv file in a list of dictionaries which is exactly what you need:
import csv
with open('filename.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data = str([row])
alternatively you can use pandas.DataFrame.to_json(orient="index")

Assuming you have a src.csv file with the following contents:
company_id,company_name,url,company_domain,match
1,Company 1,https://www.url1.com,email1.com,
2,Company 2,https://www.url2.com,email2.io,
3,Company 3,https://www.url3.com,email3.com,
The following code snippet will read it and create a new tgt.csv file with the column match of each row set to yes or no, based on the results of the request.post() (you need to adjust it for your logic though):
import csv
import json
token = 'Your API Token Here'
base_url = 'https://some.base.url'
headers = {
'Authorization': token,
'Content-Type': 'application/json; charset=utf-8',
}
with open('src.csv') as src, open('tgt.csv', 'w', newline='') as tgt:
reader = csv.reader(src)
writer = csv.writer(tgt)
columns = next(reader)
writer.writerow(columns)
for company_id, company_name, url, company_domain, match in reader:
data = json.dumps([{
'name': company_name,
'email_domain': company_domain,
'url': url
}])
response = requests.post(
f'{base_url}/api/match',
headers=headers,
data=data
)
if response.ok:
match = 'yes'
else:
match = 'no'
writer.writerow((company_id, company_name, url, company_domain, match))

Request Status Code 500 when running Python Script

This is what i am suppose to do:
List all files in data/feedback folder
Scan all the files, and make a nested dictionary with Title, Name, Date & Feedback (All the files are in Title,Name, Date & Feedback format with each in a different line of file, that’s why using rstrip function)
Post the dictionary in The given url
Following is my code:
#!/usr/bin/env python3
import os
import os.path
import requests
import json
src = '/data/feedback/'
entries = os.listdir(src)
Title, Name, Date, Feedback = 'Title', 'Name', 'Date', 'Feedback'
inputDict = {}
for i in range(len(entries)):
fileName = entries[i]
completeName = os.path.join(src, fileName)
with open(completeName, 'r') as f:
line = f.readlines ()
line tuple = (line[0],line[1],line[2],line[3])
inputDict[fileName] = {}
inputDict[fileName][Title] = line_tuple[0].rstrip()
inputDict[fileName][Name] = line_tuple[1].rstrip()
inputDict[fileName][Date] = line_tuple[2].rstrip()
inputDict[fileName][Feedback] = line_tuple[3].rstrip()
x = requests.get ("http://website.com/feedback")
print (x.status_code)
r = requests.post ("http://Website.com/feedback” , data=inputDict)
print (r.status_code)
After i run it, get gives 200 code but post gives 500 code.
I just want to know if my script is causing the error or not ?

r = requests.post ("http://Website.com/feedback” , data=inputDict)
If your rest api endpoint is expecting json data then the line above is not doing that; it is sending the dictionary inputDict as form-encoded, as though you were submitting a form on an HTML page.
You can either use the json parameter in the post function, which sets the content-type in the headers to application/json:
r = requests.post ("http://Website.com/feedback", json=inputDict)
or set the header manually:
headers = {'Content-type': 'application/json'}
r = requests.post("http://Website.com/feedback", data=json.dumps(inputDict), headers=headers)

passing value from panda dataframe to http request

I'm not sure how I should ask this question. I'm looping through a csv file using panda (at least I think so). As I'm looping through rows, I want to pass a value from a specific column to run an http request for each row.
Here is my code so far:
def api_request(request):
fs = gcsfs.GCSFileSystem(project=PROJECT)
with fs.open('gs://project.appspot.com/file.csv') as f:
df = pd.read_csv(f,)
value = df[['ID']].to_string(index=False)
print(value)
response = requests.get(REQUEST_URL + value,headers={'accept': 'application/json','ClientToken':TOKEN }
)
json_response = response.json()
print(json_response)
As you can see, I'm looping through the csv file to get the ID to pass it to my request url.
I'm not sure I understand the issue but looking at the console log it seems that print(value) is in the loop when the response request is not. In other words, in the console log I'm seeing all the ID printed but I'm seeing only one http request which is empty (probably because the ID is not correctly passed to it).
I'm running my script with cloud functions.

Actually, forgo the use of the Pandas library and simply iterate through csv
import csv
def api_request(request):
fs = gcsfs.GCSFileSystem(project=PROJECT)
with fs.open('gs://project.appspot.com/file.csv') as f:
reader = csv.reader(f)
next(reader, None) # SKIP HEADERS
for row in reader: # LOOP THROUGH GENERATOR (NOT PANDAS SERIES)
value = row[0] # SELECT FIRST COLUMN (ASSUMED ID)
response = requests.get(
REQUEST_URL + value,
headers={'accept': 'application/json', 'ClientToken': TOKEN }
)
json_response = response.json()
print(json_response)

Give this a try instead:
def api_request(request):
fs = gcsfs.GCSFileSystem(project=PROJECT)
with fs.open('gs://project.appspot.com/file.csv') as f:
df = pd.read_csv(f)
for value in df['ID']:
response = requests.get(
REQUEST_URL + value,
headers = {'accept': 'application/json', 'ClientToken': TOKEN }
)
json_response = response.json()
print(json_response)
As mentioned in my comment, you haven't iterated through the data. What you are seeing is just the string representation of it with linebreaks (which might be why you mistakenly thought to be looping).

Check response using urllib2

I am trying access a page by incrementing the page counter using opencorporates api. But the problem is there are times when useless data is there. For example in the below url for jurisdiction_code = ae_az I get webpage showing just this:
{"api_version":"0.2","results":{"companies":[],"page":1,"per_page":26,"total_pages":0,"total_count":0}}
which is technically empty. How to check for such data and skip over this to move on to next jurisdiction?
This is my code
import urllib2
import json,os
f = open('codes','r')
for line in f.readlines():
id = line.strip('\n')
url = 'http://api.opencorporates.com/v0.2/companies/search?q=&jurisdiction_code={0}&per_page=26&current_status=Active&page={1}?api_token=ab123cd45'
i = 0
directory = id
os.makedirs(directory)
while True:
i += 1
req = urllib2.Request(url.format(id, i))
print url.format(id,i)
try:
response = urllib2.urlopen(url.format(id, i))
except urllib2.HTTPError, e:
break
content = response.read()
fo = str(i) + '.json'
OUTFILE = os.path.join(directory, fo)
with open(OUTFILE, 'w') as f:
f.write(content)

Interpret the response you get back (you already know it's json) and check if the data you want is there.
...
content = response.read()
data = json.loads(content)
if not data.get('results', {}).get('companies'):
break
...
Here's your code written with Requests and using the answer here. It is nowhere near as robust or clean as it should be, but demonstrates the path you might want to take. The rate limit is a guess, and doesn't seem to work. Remember to put your actual API key in.
import json
import os
from time import sleep
import requests
url = 'http://api.opencorporates.com/v0.2/companies/search'
token = 'ab123cd45'
rate = 20 # seconds to wait after rate limited
with open('codes') as f:
codes = [l.strip('\n') for l in f]
def get_page(code, page, **kwargs):
params = {
# 'api_token': token,
'jurisdiction_code': code,
'page': page,
}
params.update(kwargs)
while True:
r = requests.get(url, params=params)
try:
data = r.json()
except ValueError:
return None
if 'error' in data:
print data['error']['message']
sleep(rate)
continue
return data['results']
def dump_page(code, page, data):
with open(os.path.join(code, str(page) + '.json'), 'w') as f:
json.dump(data, f)
for code in codes:
try:
os.makedirs(code)
except os.error:
pass
data = get_page(code, 1)
if data is None:
continue
dump_page(code, 1, data['companies'])
for page in xrange(1, int(data.get('total_pages', 1))):
data = get_page(code, page)
if data is None:
break
dump_page(code, page, data['companies'])

I think that actually this example is not "technically empty." It contains data and is therefore technically not empty. The data just does not include any fields that are useful to you. :-)
If you want your code to skip over responses that have uninteresting data, then just check whether the JSON has the necessary fields before writing any data:
content = response.read()
try:
json_content = json.loads(content)
if json_content['results']['total_count'] > 0:
fo = str(i) + '.json'
OUTFILE = os.path.join(directory, fo)
with open(OUTFILE, 'w') as f:
f.write(content)
except KeyError:
break
except ValueError:
break
etc. You might want to report the ValueError or the KeyError, but that's up to you.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to iterate through DataFrame and use the values in requests? - python

If I understand correctly, the Center ID is this: CID = AA['id'] Try iterating through the id column this way: for CID2 in AA['id']: try: url2 = f"https://api.test.com/v1/centers/{CID2}/employees?page=1&size=100" print(url2) except Exception as e: print(e)

Related

Timestring passed into URL to output JSON file - Python API Call

Pass row data from CSV to API and append results to an empty CSV column

Request Status Code 500 when running Python Script

passing value from panda dataframe to http request

Check response using urllib2

Categories

Resources