Change API call in for loop python - python

I am trying to call several api datasets within a for loop in order to change the call and then append those datasets together into a larger dataframe.
I have written this code which works to call the first dataset but then returns this error for the next call.
`url = base + "max=" + maxrec + "&" "type=" + item + "&" + "freq=" + freq + "&" + "px=" +px + "&" + "ps=" + str(ps) + "&" + "r="+ r + "&" + "p=" + p + "&" + "rg=" +rg + "&" + "cc=" + cc + "&" + "fmt=" + fmt
TypeError: must be str, not Response`
Here is my current code
import requests
import pandas as pd
base = "http://comtrade.un.org/api/get?"
maxrec = "50000"
item = "C"
freq = "A"
px="H0"
ps="all"
r="all"
p="0"
rg="2"
cc="AG2"
fmt="json"
comtrade = pd.DataFrame(columns=[])
for year in range(1991,2018):
ps="{}".format(year)
url = base + "max=" + maxrec + "&" "type=" + item + "&" + "freq=" + freq + "&" + "px=" +px + "&" + "ps=" + str(ps) + "&" + "r="+ r + "&" + "p=" + p + "&" + "rg=" +rg + "&" + "cc=" + cc + "&" + "fmt=" + fmt
r = requests.get(url)
x = r.json()
new = pd.DataFrame(x["dataset"])
comtrade = comtrade.append(new)

Let requests assemble the URL for you.
common_params = {
"max": maxrec,
"type": item,
"freq": freq,
# etc
}
for year in range(1991,2018):
response = requests.get(base, params=dict(common_params, ps=str(year))
response_data = response.json()
new = pd.DataFrame(response_data["dataset"])
comtrade = comtrade.append(new)

Disclaimer: the other answer is correct and you should use it.
However, your actual problem stems from the fact that you are overriding r here:
r = requests.get(url)
x = r.json()
During the next iteration r will still be that value and not the one you initialized it with in the first place. You could simply rename it to result to avoid that problem. Better let the requests library do the work though.

Related

how to force the code to parse specific POST call response

I'm making three post calls one by one and getting tree response. My code is always taking last response and passing it on. The question is how to force the code to parse second or first response not last one.
This is my method with for loop and range3.
def create_card_on_the_board_call(self, id_list, card_count, card_name):
global create_card
for i in range(card_count):
create_card = self.rest_api_helper.post_call(
self.base_url + '/cards?' + 'key=' + test_config[
'key'] + '&' + 'token=' + test_config[
'token'] + '&' + f'idList={id_list}' + '&' + f'name={card_name}', headers=None)
return create_card.json()
response looks like that received response {"id":"5f9dcffaa96f144d311deaa5","checkItemStates":[]}
I need id from first or second response not third. Also I need all three calls.
and this is how I'm calling the post
card_count = 3
card_name = 'someCard'
create_card_on_the_board = api_board.create_card_on_the_board_call(id_list=create_list_on_the_board["id"],
card_count=card_count, card_name=card_name)
now my method is looking like this:
def create_card_on_the_board_call(self, id_list, card_count, card_name):
global create_card
responses = []
for i in range(card_count):
create_card = self.rest_api_helper.post_call(
self.base_url + '/cards?' + 'key=' + test_config[
'key'] + '&' + 'token=' + test_config[
'token'] + '&' + f'idList={id_list}' + '&' + f'name={card_name}', headers=None)
responses.append(create_card)
return responses[2].json()
and I could select which card I want to pass but still dont know how to use as a proper argument in next call? e.g.
new_card_name = 'NewCardName2'
update_card = api_board.update_card_call(card_id=create_card_on_the_board['id'], new_card_name=new_card_name)
def update_card_call(self, card_id, new_card_name):
update_card = self.rest_api_helper.put_call(
self.base_url + '/cards/' + f'{card_id}' + '?' + 'key=' + test_config[
'key'] + '&' + 'token=' + test_config[
'token'] + '&' + f'name={new_card_name}', headers=None)
return update_card.json()
so here I want to use different create_card_on_the_board but I dont know how? Still taking the same card as in return responses[2].json(). I tried card_id=create_card_on_the_board[1]['id']
but its not working as this is not a list
You could put all of the responses in a list so you could get them all and then decide what to do with them. Here's how you'd do that:
def create_card_on_the_board_call(self, id_list, card_count, card_name):
responses = []
for i in range(card_count):
create_card = self.rest_api_helper.post_call(
self.base_url + '/cards?' + 'key=' + test_config[
'key'] + '&' + 'token=' + test_config[
'token'] + '&' + f'idList={id_list}' + '&' + f'name={card_name}', headers=None)
responses.append(create_card)
// Logic here determines which response to return and sets `x = 0, 1 or 2`
return responses[x].json()
So now you have a list named responses after your loop completes, and you can do whatever you want with it. I put a comment to show kinda what you seem to be talking about in your question. You could just return the whole list with return responses and let the caller worry about which one to do what with.

Automatic labeling of LDA generated topics

I'm trying to categorize customer feedback and I ran an LDA in python and got the following output for 10 topics:
(0, u'0.559*"delivery" + 0.124*"area" + 0.018*"mile" + 0.016*"option" + 0.012*"partner" + 0.011*"traffic" + 0.011*"hub" + 0.011*"thanks" + 0.010*"city" + 0.009*"way"')
(1, u'0.397*"package" + 0.073*"address" + 0.055*"time" + 0.047*"customer" + 0.045*"apartment" + 0.037*"delivery" + 0.031*"number" + 0.026*"item" + 0.021*"support" + 0.018*"door"')
(2, u'0.190*"time" + 0.127*"order" + 0.113*"minute" + 0.075*"pickup" + 0.074*"restaurant" + 0.031*"food" + 0.027*"support" + 0.027*"delivery" + 0.026*"pick" + 0.018*"min"')
(3, u'0.072*"code" + 0.067*"gps" + 0.053*"map" + 0.050*"street" + 0.047*"building" + 0.043*"address" + 0.042*"navigation" + 0.039*"access" + 0.035*"point" + 0.028*"gate"')
(4, u'0.434*"hour" + 0.068*"time" + 0.034*"min" + 0.032*"amount" + 0.024*"pay" + 0.019*"gas" + 0.018*"road" + 0.017*"today" + 0.016*"traffic" + 0.014*"load"')
(5, u'0.245*"route" + 0.154*"warehouse" + 0.043*"minute" + 0.039*"need" + 0.039*"today" + 0.026*"box" + 0.025*"facility" + 0.025*"bag" + 0.022*"end" + 0.020*"manager"')
(6, u'0.371*"location" + 0.110*"pick" + 0.097*"system" + 0.040*"im" + 0.038*"employee" + 0.022*"evening" + 0.018*"issue" + 0.015*"request" + 0.014*"while" + 0.013*"delivers"')
(7, u'0.182*"schedule" + 0.181*"please" + 0.059*"morning" + 0.050*"application" + 0.040*"payment" + 0.026*"change" + 0.025*"advance" + 0.025*"slot" + 0.020*"date" + 0.020*"tomorrow"')
(8, u'0.138*"stop" + 0.110*"work" + 0.062*"name" + 0.055*"account" + 0.046*"home" + 0.043*"guy" + 0.030*"address" + 0.026*"city" + 0.025*"everything" + 0.025*"feature"')
Is there a way to automatically label them? I do have a csv file which has feedbacks manually labeled, but I do not want to supply these labels myself. I want the model to create labels. Is it possible?
The comments here link to another SO answer that links to a paper. Let's say you wanted to do the minimum to try to make this work. Here is an MVP-style solution that has worked for me: search Google for the terms, then look for keywords in the response.
Here is some working, though hacky, code:
pip install cssselect
then
from urllib.parse import urlencode, urlparse, parse_qs
from lxml.html import fromstring
from requests import get
from collections import Counter
def get_srp_text(search_term):
raw = get(f"https://www.google.com/search?q={topic_terms}").text
page = fromstring(raw)
blob = ""
for result in page.cssselect("a"):
for res in result.findall("div"):
blob += ' '
blob += res.text if res.text else " "
blob += ' '
return blob
def blob_cleaner(blob):
clean_blob = blob.replace(r'[\/,\(,\),\:,_,-,\-]', ' ')
return ''.join(e for e in blob if e.isalnum() or e.isspace())
def get_name_from_srp_blob(clean_blob):
blob_tokens = list(filter(bool, map(lambda x: x if len(x) > 2 else '', clean_blob.split(' '))))
c = Counter(blob_tokens)
most_common = c.most_common(10)
name = f"{most_common[0][0]}-{most_common[1][0]}"
return name
pipeline = lambda x: get_name_from_srp_blob(blob_cleaner(get_srp_text(x)))
Then you can just get the topic words from your model, e.g.
topic_terms = "delivery area mile option partner traffic hub thanks city way"
name = pipeline(topic_terms)
print(name)
>>> City-Transportation
and
topic_terms = "package address time customer apartment delivery number item support door"
name = pipeline(topic_terms)
print(name)
>>> Parcel-Package
You could improve this up a lot. For example, you could use POS tags to only find the most common nouns, then use those for the name. Or find the most common adjective and noun, and make the name "Adjective Noun". Even better, you could get the text from the linked sites, then run YAKE to extract keywords.
Regardless, this demonstrates a simple way to automatically name clusters, without directly using machine learning (though, Google is most certainly using it to generate the search results, so you are benefitting from it).

preserving text structure information - pyparsing

Using pyparsing, is there a way to extract the context you are in during recursive descent. Let me explain what I mean. I have the following code:
import pyparsing as pp
openBrace = pp.Suppress(pp.Literal("{"))
closeBrace = pp.Suppress(pp.Literal("}"))
ident = pp.Word(pp.alphanums + "_" + ".")
comment = pp.Literal("//") + pp.restOfLine
messageName = ident
messageKw = pp.Suppress(pp.Keyword("msg"))
text = pp.Word(pp.alphanums + "_" + "." + "-" + "+")
otherText = ~messageKw + pp.Suppress(text)
messageExpr = pp.Forward()
messageExpr << (messageKw + messageName + openBrace +
pp.ZeroOrMore(otherText) + pp.ZeroOrMore(messageExpr) +
pp.ZeroOrMore(otherText) + closeBrace).ignore(comment)
testStr = "msg msgName1 { some text msg msgName2 { some text } some text }"
print messageExpr.parseString(testStr)
which produces this output: ['msgName1', 'msgName2']
In the output, I would like to keep track of the structure of embedded matches. What I mean is that, for example, I would like the following output with the test string above: ['msgName1', 'msgName1.msgName2'] to keep track of the hierarchy in the text. However, I am new to pyparsing and have yet to find a way yet to extract the fact that "msgName2" is embedded in the structure of "msgName1."
Is there a way to use the setParseAction() method of ParserElement to do this, or maybe using naming of results?
Helpful advice would be appreciated.
Thanks to Paul McGuire for his sagely advice. Here are the additions/changes I made, which solved the problem:
msgNameStack = []
def pushMsgName(str, loc, tokens):
msgNameStack.append(tokens[0])
tokens[0] = '.'.join(msgNameStack)
def popMsgName(str, loc, tokens):
msgNameStack.pop()
closeBrace = pp.Suppress(pp.Literal("}")).setParseAction(popMsgName)
messageName = ident.setParseAction(pushMsgName)
And here is the complete code:
import pyparsing as pp
msgNameStack = []
def pushMsgName(str, loc, tokens):
msgNameStack.append(tokens[0])
tokens[0] = '.'.join(msgNameStack)
def popMsgName(str, loc, tokens):
msgNameStack.pop()
openBrace = pp.Suppress(pp.Literal("{"))
closeBrace = pp.Suppress(pp.Literal("}")).setParseAction(popMsgName)
ident = pp.Word(pp.alphanums + "_" + ".")
comment = pp.Literal("//") + pp.restOfLine
messageName = ident.setParseAction(pushMsgName)
messageKw = pp.Suppress(pp.Keyword("msg"))
text = pp.Word(pp.alphanums + "_" + "." + "-" + "+")
otherText = ~messageKw + pp.Suppress(text)
messageExpr = pp.Forward()
messageExpr << (messageKw + messageName + openBrace +
pp.ZeroOrMore(otherText) + pp.ZeroOrMore(messageExpr) +
pp.ZeroOrMore(otherText) + closeBrace).ignore(comment)
testStr = "msg msgName1 { some text msg msgName2 { some text } some text }"
print messageExpr.parseString(testStr)

Adding the values of two strings using Python and XML path

It generates an output with wallTime and setupwalltime into a dat file, which has the following format:
24000 4 0
81000 17 0
192000 59 0
648000 250 0
1536000 807 0
3000000 2144 0
6591000 5699 0
I would like to know how to add the two values i.e.(wallTime and setupwalltime) together. Can someone give me a hint? I tried converting to float, but it doesn’t seem to work.
import libxml2
import os.path
from numpy import *
from cfs_utils import *
np=[1,2,3,4,5,6,7,8]
n=[20,30,40,60,80,100,130]
solver=["BiCGSTABL_iluk", "BiCGSTABL_saamg", "BiCGSTABL_ssor" , "CG_iluk", "CG_saamg", "CG_ssor" ]# ,"cholmod", "ilu" ]
file_list=["eval_BiCGSTABL_iluk_default", "eval_BiCGSTABL_saamg_default" , "eval_BiCGSTABL_ssor_default" , "eval_CG_iluk_default","eval_CG_saamg_default", "eval_CG_ssor_default" ] # "simp_cholmod_solver_3D_evaluate", "simp_ilu_solver_3D_evaluate" ]
for cnt_np in np:
i=0
for sol in solver:
#open write_file= "Graphs/" + "Np"+ cnt_np + "/CG_iluk.dat"
#"Graphs/Np1/CG_iluk.dat"
write_file = open("Graphs/"+ "Np"+ str(cnt_np) + "/" + sol + ".dat", "w")
print("Reading " + "Graphs/"+ "Np"+ str(cnt_np) + "/" + sol + ".dat"+ "\n")
#loop through different unknowns
for cnt_n in n:
#open file "cfs_calculations_" + cnt_n +"np"+ cnt_np+ "/" + file_list(i) + "_default.info.xml"
read_file = "cfs_calculations_" +str(cnt_n) +"np"+ str(cnt_np) + "/" + file_list[i] + ".info.xml"
print("File list" + file_list[i] + "vlaue of i " + str(i) + "\n")
print("Reading " + " cfs_calculations_" +str(cnt_n) +"np"+ str(cnt_np) + "/" + file_list[i] + ".info.xml" )
#read wall and cpu time and write
if os.path.exists(read_file):
doc = libxml2.parseFile(read_file)
xml = doc.xpathNewContext()
walltime = xpath(xml, "//cfsInfo/sequenceStep/OLAS/mechanic/solver/summary/solve/timer/#wall")
setupwalltime = xpath(xml, "//cfsInfo/sequenceStep/OLAS/mechanic/solver/summary/setup/timer/#wall")
# cputime = xpath(xml, "//cfsInfo/sequenceStep/OLAS/mechanic/solver/summary/solve/timer/#cpu")
# setupcputime = xpath(xml, "//cfsInfo/sequenceStep/OLAS/mechanic/solver/summary/solve/timer/#cpu")
unknowns = 3*cnt_n*cnt_n*cnt_n
write_file.write(str(unknowns) + "\t" + walltime + "\t" + setupwalltime + "\n")
print("Writing_point" + str(unknowns) + "%f" ,float(setupwalltime ) )
doc.freeDoc()
xml.xpathFreeContext()
write_file.close()
i=i+1
In java you can add strings and floats. What I understand is that you need to add the values and then display them. That would work (stringing the sum)
write_file.write(str(unknowns) + "\f" + str(float(walltime) + float(setupwalltime)) + "\n")
You are trying to add a str to a float. That doesn't work. If you want to use string concatenation, first coerce all of the values to str. Try this:
write_file.write(str(unknowns) + "\t" + str(float(walltime) + float(setupwalltime)) + "\n")
Or, perhaps more readably:
totalwalltime = float(walltime) + float(setupwalltime)
write_file.write("{}\t{}\n".format(unknowns, totalwalltime))

optimize python processing json retrieved from the fb-graph-api

i'm getting json data from the facebook-graph-api about:
my relationship with my friends
my friends relationships with each other.
right now my program looks like this (in python pseudo code, please note some variables have been changed for privacy):
import json
import requests
# protected
_accessCode = "someAccessToken"
_accessStr = "?access_token=" + _accessCode
_myID = "myIDNumber"
r = requests.get("https://graph.facebook.com/" + _myID + "/friends/" + _accessStr)
raw = json.loads(r.text)
terminate = len(raw["data"])
# list used to store the friend/friend relationships
a = list()
for j in range(0, terminate + 1):
# calculate terminating displacement:
term_displacement = terminate - (j + 1)
print("Currently processing: " + str(j) + " of " + str(terminate))
for dj in range(1, term_displacement + 1):
# construct urls based on the raw data:
url = "https://graph.facebook.com/" + raw["data"][j]["id"] + "/friends/" + raw["data"][j + dj]["id"] + "/" + _accessStr
# visit site *THIS IS THE BOTTLENECK*:
reqTemp = requests.get(url)
rawTemp = json.loads(reqTemp.text)
if len(rawTemp["data"]) != 0:
# data dumps to list which dumps to file
a.append(str(raw["data"][j]["id"]) + "," + str(rawTemp["data"][0]["id"]))
outputFile = "C:/Users/franklin/Documents/gen/friendsRaw.csv"
output = open(outputFile, "w")
# write all me/friend relationship to file
for k in range(0, terminate):
output.write(_myID + "," + raw["data"][k]["id"] + "\n")
# write all friend/friend relationships to file
for i in range(0, len(a)):
output.write(a[i])
output.close()
So what its doing is: first it calls my page and gets my friend list (this is allowed through the facebook api using an access_token) calling a friend's friend list is NOT allowed but I can work around that by requesting a relationship between a friend on my list and another friend on my list. so in part two (indicated by the double for loops) i'm making another request to see if some friend, a, is also a friend of b, (both of which are on my list); if so there will be a json object of length one with friend a's name.
but with about 357 friends there's literally thousands of page requests that need to be made. in other words the program is spending a lot of time just waiting around for the json-requests.
my question is then can this be rewritten to be more efficient? currently, due to security restrictions, calling a friend's friend list attribute is disallowed. and it doesn't look like the api will allow this. are there any python tricks that can make this run faster? maybe parallelism?
Update modified code is pasted below in the answers section.
Update this is the solution I came up with. Thanks #DMCS for the FQL suggestion but I just decided to use what I had. I will post the FQL solution up when I get a chance to study the implementation. As you can see this method just makes use of more condensed API calls.
Incidentally for future reference the API call limit is 600 calls per 600 seconds, per token & per IP, so for every unique IP address, with a unique access token, the number of calls is limited to 1 call per second. I'm not sure what that means for asynchronous calling #Gerrat, but there is that.
import json
import requests
# protected
_accessCode = "someaccesscode"
_accessStr = "?access_token=" + _accessCode
_myID = "someidnumber"
r = requests.get("https://graph.facebook.com/"
+ _myID + "/friends/" + _accessStr)
raw = json.loads(r.text)
terminate = len(raw["data"])
a = list()
for k in range(0, terminate - 1):
friendID = raw["data"][k]["id"]
friendName = raw["data"][k]["name"]
url = ("https://graph.facebook.com/me/mutualfriends/"
+ friendID + _accessStr)
req = requests.get(url)
temp = json.loads(req.text)
print("Processing: " + str(k + 1) + " of " + str(terminate))
for j in range(0, len(temp["data"])):
a.append(friendID + "," + temp["data"][j]["id"] + ","
+ friendName + "," + temp["data"][j]["name"])
# dump contents to file:
outputFile = "C:/Users/franklin/Documents/gen/friendsRaw.csv"
output = open(outputFile, "w")
print("Dumping to file...")
# write all me/friend relationships to file
for k in range(0, terminate):
output.write(_myID + "," + raw["data"][k]["id"]
+ ",me," + str(raw["data"][k]["name"].encode("utf-8", "ignore")) + "\n")
# write all friend/friend relationships to file
for i in range(0, len(a)):
output.write(str(a[i].encode("utf-8", "ignore")) + "\n")
output.close()
This isn't likely optimal, but I tweaked your code a bit to use Requests async method (untested):
import json
import requests
from requests import async
# protected
_accessCode = "someAccessToken"
_accessStr = "?access_token=" + _accessCode
_myID = "myIDNumber"
r = requests.get("https://graph.facebook.com/" + _myID + "/friends/" + _accessStr)
raw = json.loads(r.text)
terminate = len(raw["data"])
# list used to store the friend/friend relationships
a = list()
def add_to_list(reqTemp):
rawTemp = json.loads(reqTemp.text)
if len(rawTemp["data"]) != 0:
# data dumps to list which dumps to file
a.append(str(raw["data"][j]["id"]) + "," + str(rawTemp["data"][0]["id"]))
async_list = []
for j in range(0, terminate + 1):
# calculate terminating displacement:
term_displacement = terminate - (j + 1)
print("Currently processing: " + str(j) + " of " + str(terminate))
for dj in range(1, term_displacement + 1):
# construct urls based on the raw data:
url = "https://graph.facebook.com/" + raw["data"][j]["id"] + "/friends/" + raw["data"][j + dj]["id"] + "/" + _accessStr
req = async.get(url, hooks = {'response': add_to_list})
async_list.append(req)
# gather up all the results
async.map(async_list)
outputFile = "C:/Users/franklin/Documents/gen/friendsRaw.csv"
output = open(outputFile, "w")

Categories