Loop and add function component as index - python

I would like to change the index of the following code. Instead of having 'close' as the index, I want to have the corresponding x from the function. As sometimes like in this example even if i provide 4 curr only 3 are available. Meaning that I cannot add the list as the index after looping as the size changes. Thank you for your help. I should add that even with the set_index(x) the index remain 'close'.
The function daily_price_historical retrieve prices from a public API . There are exactly 7 columns from which I select the the first one (close).
The function:
def daily_price_historical(symbol, comparison_symbol, all_data=False, limit=1, aggregate=1, exchange=''):
url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}'\
.format(symbol.upper(), comparison_symbol.upper(), limit, aggregate)
if exchange:
url += '&e={}'.format(exchange)
if all_data:
url += '&allData=true'
page = requests.get(url)
data = page.json()['Data']
df = pd.DataFrame(data)
df.drop(df.index[-1], inplace=True)
return df
The code:
curr = ['1WO', 'ABX','ADH', 'ALX']
d_price = []
for x in curr:
try:
close = daily_price_historical(x, 'JPY', exchange='CCCAGG').close
d_price.append(close).set_index(x)
except:
pass
d_price = pd.concat(d_price, axis=1)
d_price = d_price.transpose()
print(d_price)
The output:
0
close 2.6100
close 0.3360
close 0.4843

The function daily_price_historical returns a dataframe, so daily_price_historical(x, 'JPY', exchange='CCCAGG').close is a pandas Series. The title of a Series is its name, but you can change it with rename. So you want:
...
close = daily_price_historical(x, 'JPY', exchange='CCCAGG').close
d_price.append(close.rename(x))
...
In your original code, d_price.append(close).set_index(x) raised a AttributeError: 'NoneType' object has no attribute 'set_index' exception because append on a list returns None but the exception was raised after the append and was silently swallowed by the catchall except: pass.
What to remember from that: never use the very dangerous :
try:
...
except:
pass
which hides any error.

Try this small code
import pandas as pd
import requests
curr = ['1WO', 'ABX','ADH', 'ALX']
def daily_price_historical(symbol, comparison_symbol, all_data=False, limit=1, aggregate=1, exchange=''):
url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}'\
.format(symbol.upper(), comparison_symbol.upper(), limit, aggregate)
if exchange:
url += '&e={}'.format(exchange)
if all_data:
url += '&allData=true'
page = requests.get(url)
data = page.json()['Data']
df = pd.DataFrame(data)
df.drop(df.index[-1], inplace=True)
return df
d_price = []
lables_ind = []
for idx, x in enumerate(curr):
try:
close = daily_price_historical(x, 'JPY', exchange='CCCAGG').close
d_price.append(close[0])
lables_ind.append(x)
except:
pass
d_price = pd.DataFrame(d_price,columns=["0"])
d_price.index = lables_ind
print(d_price)
Output
0
1WO 2.6100
ADH 0.3360
ALX 0.4843

Related

Problem with for loop while querying PubAG API with python

Here is my code, I am trying to get a 100 unique publications for each search term:
import requests
from time import sleep
import pandas as pd
search_terms = ['grapes', 'cotton','apple', 'onion','cucumber']
res_data = []
pag_data = []
for i in search_terms[0:5]:
try:
api_key = "DEMO_KEY"
endpoint = 'https://api.nal.usda.gov/pubag/rest/search?query=abstract:'+str(i)
print(endpoint)
q_params = {"api_key": api_key, "per_page":100}
response = requests.get(endpoint, params=q_params)
print(response)
r = response.json()
res_data.append(r)
print(len(res_data))
a = res_data[0]['response']
pag_data.append(a)
print(len(pag_data))
sleep(5)
except:
print(i)
continue
Then to create DataFrames
pag_dfs = [pd.DataFrame(i['docs']) for i in pag_data]
df = pd.concat(pag_dfs, axis=0, ignore_index=True)
I am getting the 500 publications but only 100 are unique, If I do it one by one it works. How can I improve my for loop to get unique records for each search term?
Docs:https://pubag.nal.usda.gov/apidocs
edit:
I got the for loop to work by adding another for loop for the second list, this adds a lot of duplicates but I then processed it with pandas to get the unique publications for each term.
res_data = []
pag_data = []
for i in search_terms:
try:
api_key = "DEMO_KEY"
endpoint = 'https://api.nal.usda.gov/pubag/rest/search?query=abstract:'+str(i)
print(endpoint)
q_params = {"api_key": api_key, "per_page":100}
response = requests.get(endpoint, params=q_params)
print(response)
r = response.json()
res_data.append(r)
print(len(res_data))
for x in res_data:
a = x['response']
pag_data.append(a)
print(len(pag_data))
#sleep(2)
except:
print(i)
continue
As I mentioned in the comments there are things that can be done to improve your code quality.
for i in search_terms[0:5] is equivalent to for i in search_terms
Looping variable name i is bad, does not say what it is. Use something better, for example for term in search_terms
You don't have to do str(i) seeing as it's already a string
The continue in your except block is useless, the for-loop will continue without it too.
You are also doing unnecessary things such as creating the res_data list.
Now to a potential solution. Given the lack of proper details regarding exactly what you want out I have made an assumption or two. With that said, here is a solution that yields the following results that might be what you need:
nalt_all ... pmcid_url
0 [adipose tissue, anthocyanins, aorta, body wei... ... NaN
2 [aquaculture, data collection, eco-efficiency,... ... NaN
7 [ribosomal DNA, host plants, Xylella fastidios... ... NaN
8 [bioactive properties, canola oil, droplet siz... ... NaN
10 [Aphidoidea, Bactrocera, Citrus, adults, birds... ... NaN
.. ... ... ...
472 [Cucumis sativus, Escherichia coli, biosynthes... ... NaN
480 [biochemistry, cucumbers, fungicides, hydropho... ... NaN
491 [Cucumber mosaic virus, Eisenia fetida, Tobacc... ... NaN
494 [Cucumber mosaic virus, RNA libraries, correla... ... NaN
499 [Aphidoidea, Capsicum annuum, Cucumber mosaic ... ... NaN
[141 rows x 28 columns]
With columns:
nalt_all, subject, author_lastname, last_modified_date, language, title, startpage, usda_authored_publication, id, text_availability, doi_url, issue, author, format, endpage, journal_name, abstract, pmid_url, url, volume, publication_year, usda_funded_publication, issn, page, author_primary, doi, handle_url, pmcid_url
import requests
from time import sleep
import pandas as pd
api_key = "DEMO_KEY"
q_params = {"api_key": api_key, "per_page": 100}
# Fetch data from the API
search_terms = ['grapes', 'cotton','apple', 'onion','cucumber']
publications = []
for i, search_term in enumerate(search_terms):
try:
endpoint = 'https://api.nal.usda.gov/pubag/rest/search?query=abstract:' + search_term
response = requests.get(endpoint, params=q_params)
r = response.json()
# Here I used the list destructuring operator *
publications = [*publications, *r["response"]["docs"]]
# `publications += r["response"]["docs"]` is an equivalent way of merging the lists
sleep(5) # Important, don't spam the API
except requests.HTTPError as e:
print("=" * 50)
print(f"Failed: {i}, {search_term}, {e}")
print("=" * 50)
# Merge all the publications (each publication is a dictionary)
pubs = {}
for i, pub in enumerate(publications):
pubs.update({i: pub})
"""
The creation of the Pandas dataframe
1) Create the dataframe and specify that we want the keys of the `pubs` dictionary to be the rows of the dataframe by specifying `orient="index"`
2) We drop all duplicates with respect to the column "issn" from the dataframe. The ISSN is a unique identifier for each publication.
3) We drop all columns that have all "NaN" because they are useless
4) We drop the first "date" column as it too seems useless, it only contains "0000".
"""
df = pd.DataFrame.from_dict(pubs, orient="index").drop_duplicates(subset="issn").dropna(axis=1, how='all').drop("date", axis=1)
print(df)
EDIT: You can perform the merging of the publication dictionaries in the first loop if you want to, this would make the script faster as it avoids unnecessary work. I skipped it in the above solution so not to complicate it too much. Here is how the code would look if you do that.
import requests
from time import sleep
import pandas as pd
api_key = "DEMO_KEY"
q_params = {"api_key": api_key, "per_page": 100}
# Fetch data from the API
search_terms = ['grapes', 'cotton','apple', 'onion','cucumber']
publications = dict()
j = 0
for i, search_term in enumerate(search_terms):
try:
endpoint = 'https://api.nal.usda.gov/pubag/rest/search?query=abstract:' + search_term
response = requests.get(endpoint, params=q_params)
r = response.json()
for publication in r["response"]["docs"]:
publications.update({j: publication})
j += 1
sleep(5) # Important, don't spam the API
except requests.HTTPError as e:
print("=" * 50)
print(f"Failed: {i}, {search_term}, {e}")
print("=" * 50)
"""
The creation of the Pandas dataframe
1) Create the dataframe and specify that we want the keys of the `pubs` dictionary to be the rows of the dataframe by specifying `orient="index"`
2) We drop all duplicates with respect to the column "issn" from the dataframe. The ISSN is a unique identifier for each publication.
3) We drop all columns that have all "NaN" because they are useless
4) We drop the first "date" column as it too seems useless, it only contains "0000".
"""
df = pd.DataFrame.from_dict(publications, orient="index").drop_duplicates(subset="issn").dropna(axis=1, how='all').drop("date", axis=1)
print(df, end="\n\n")
for col in df.columns:
print(col, end=", ")

Python sql returning list

got some functions with sqlstatements. My first func is fine because i get only 1 result.
My second function returns a large list of errorcodes and i dont know how to get them back for response.
TypeError: <sqlalchemy.engine.result.ResultProxy object at 0x7f98b85ef910> is not JSON serializable
Tried everything need help.
My Code:
def topalarms():
customer_name = request.args.get('customer_name')
machine_serial = request.args.get('machine_serial')
#ts = request.args.get('ts')
#ts_start = request.args.get('ts')
if (customer_name is None) or (machine_serial is None):
return missing_param()
# def form_response(response, session):
# response['customer'] = customer_name
# response['serial'] = machine_serial
# return do_response(customer_name, form_response)
def form_response(response, session):
result_machine_id = machine_id(session, machine_serial)
if not result_machine_id:
response['Error'] = 'Seriennummer nicht vorhanden/gefunden'
return
#response[''] = result_machine_id[0]["id"]
machineid = result_machine_id[0]["id"]
result_errorcodes = error_codes(session, machineid)
response['ErrorCodes'] = result_errorcodes
return do_response(customer_name, form_response)
def machine_id(session, machine_serial):
stmt_raw = '''
SELECT
id
FROM
machine
WHERE
machine.serial = :machine_serial_arg
'''
utc_now = datetime.datetime.utcnow()
utc_now_iso = pytz.utc.localize(utc_now).isoformat()
utc_start = datetime.datetime.utcnow() - datetime.timedelta(days = 30)
utc_start_iso = pytz.utc.localize(utc_start).isoformat()
stmt_args = {
'machine_serial_arg': machine_serial,
}
stmt = text(stmt_raw).columns(
#ts_insert = ISODateTime
)
result = session.execute(stmt, stmt_args)
ts = utc_now_iso
ts_start = utc_start_iso
ID = []
for row in result:
ID.append({
'id': row[0],
'ts': ts,
'ts_start': ts_start,
})
return ID
def error_codes(session, machineid):
stmt_raw = '''
SELECT
name
FROM
identifier
WHERE
identifier.machine_id = :machineid_arg
'''
stmt_args = {
'machineid_arg': machineid,
}
stmt = text(stmt_raw).columns(
#ts_insert = ISODateTime
)
result = session.execute(stmt, stmt_args)
errors = []
for row in result:
errors.append(result)
#({'result': [dict(row) for row in result]})
#errors = {i: result[i] for i in range(0, len(result))}
#errors = dict(result)
return errors
My problem is func error_codes somethiing is wrong with my result.
my Output should be like this:
ABCNormal
ABCSafety
Alarm_G01N01
Alarm_G01N02
Alarm_G01N03
Alarm_G01N04
Alarm_G01N05
I think you need to take a closer look at what you are doing correctly with your working function and compare that to your non-working function.
Firstly, what do you think this code does?
for row in result:
errors.append(result)
This adds to errors one copy of the result object for each row in result. So if you have six rows in result, errors contains six copies of result. I suspect this isn't what you are looking for. You want to be doing something with the row variable.
Taking a closer look at your working function, you are taking the first value out of the row, using row[0]. So, you probably want to do the same in your non-working function:
for row in result:
errors.append(row[0])
I don't have SQLAlchemy set up so I haven't tested this: I have provided this answer based solely on the differences between your working function and your non-working function.
You need a json serializer. I suggest using Marshmallow: https://marshmallow.readthedocs.io/en/stable/
There are some great tutorials online on how to do this.

adding multiple columns to a dataframe using df.apply and a lambda function

I am trying to add multiple columns to an existing dataframe with df.apply and a lambda function. I am able to add columns one by one but not able to do it for all the columns together.
My code
def get_player_stats(player_name):
print(player_name)
resp = requests.get(player_id_api + player_name)
if resp.status_code != 200:
# This means something went wrong.
print('Error {}'.format(resp.status_code))
result = resp.json()
player_id = result['data'][0]['pid']
resp_data = requests.get(player_data_api + str(player_id))
if resp_data.status_code != 200:
# This means something went wrong.
print('Error {}'.format(resp_data.status_code))
result_data = resp_data.json()
check1 = len(result_data.get('data',None).get('batting',None))
# print(check1)
check2 = len(result_data.get('data',{}).get('batting',{}).get('ODIs',{}))
# check2 = result_data.get(['data']['batting']['ODIs'],None)
# print(check2)
if check1 > 0 and check2 > 0:
total_6s = result_data['data']['batting']['ODIs']['6s']
total_4s = result_data['data']['batting']['ODIs']['4s']
average = result_data['data']['batting']['ODIs']['Ave']
total_innings = result_data['data']['batting']['ODIs']['Inns']
total_catches = result_data['data']['batting']['ODIs']['Ct']
total_stumps = result_data['data']['batting']['ODIs']['St']
total_wickets = result_data['data']['bowling']['ODIs']['Wkts']
print(average,total_innings,total_4s,total_6s,total_catches,total_stumps,total_wickets)
return np.array([average,total_innings,total_4s,total_6s,total_catches,total_stumps,total_wickets])
else:
print('No data for player')
return '','','','','','',''
cols = ['Avg','tot_inns','tot_4s','tot_6s','tot_cts','tot_sts','tot_wkts']
for col in cols:
players_available[col] = ''
players_available[cols] = players_available.apply(lambda x: get_player_stats(x['playerName']) , axis =1)
I have tried adding columns explicitly to the dataframe but still i am getting an error
ValueError: Must have equal len keys and value when setting with an iterable
Can someone help me with this?
It's tricky, since in pandas the apply method evolve through versions.
In my version (0.25.3) and also the other recent versions, if the function returns pd.Series object then it works.
In your code, you could try to change the return value in the function:
return pd.Series([average,total_innings,total_4s,total_6s,
total_catches,total_stumps,total_wickets])
return pd.Series(['','','','','','',''])

List index out of range error for loop on Json

I'm trying to request a json object and run through the object with a for loop and take out the data I need and save it to a model in django.
I only want the first two attributes of runner_1_name and runner_2_name but in my json object the amount or runners varies inside each list. I keep getting list index out of range error. I have tried to use try and accept but when I try save to the model it's showing my save variables is referenced before assignment What's the best way of ignoring list index out or range error or fixing the list so the indexes are correct? I also want the code to run really fast as I will using this function as a background task to poll every two seconds.
#shared_task()
def mb_get_events():
mb = APIClient('username' , 'pass')
tennis_events = mb.market_data.get_events()
for data in tennis_events:
id = data['id']
event_name = data['name']
sport_id = data['sport-id']
start_time = data['start']
is_ip = data['in-running-flag']
par = data['event-participants']
event_id = par[0]['event-id']
cat_id = data['meta-tags'][0]['id']
cat_name = data['meta-tags'][0]['name']
cat_type = data['meta-tags'][0]['type']
url_name = data['meta-tags'][0]['type']
try:
runner_1_name = data['markets'][0]['runners'][0]['name']
except IndexError:
pass
try:
runner_2_name = data['markets'][0]['runners'][1]['name']
except IndexError:
pass
run1_par_id = data['markets'][0]['runners'][0]['id']
run2_par_id = data['markets'][0]['runners'][1]['id']
run1_back_odds = data['markets'][0]['runners'][0]['prices'][0]['odds']
run2_back_odds = data['markets'][0]['runners'][1]['prices'][0]['odds']
run1_lay_odds = data['markets'][0]['runners'][0]['prices'][3]['odds']
run2_lay_odds = data['markets'][0]['runners'][1]['prices'][3]['odds']
te, created = MBEvent.objects.update_or_create(id=id)
te.id = id
te.event_name = event_name
te.sport_id = sport_id
te.start_time = start_time
te.is_ip = is_ip
te.event_id = event_id
te.runner_1_name = runner_1_name
te.runner_2_name = runner_2_name
te.run1_back_odds = run1_back_odds
te.run2_back_odds = run2_back_odds
te.run1_lay_odds = run1_lay_odds
te.run2_lay_odds = run2_lay_odds
te.run1_par_id = run1_par_id
te.run2_par_id = run2_par_id
te.cat_id = cat_id
te.cat_name = cat_name
te.cat_type = cat_type
te.url_name = url_name
te.save()
Quick Fix:
try:
runner_1_name = data['markets'][0]['runners'][0]['name']
except IndexError:
runner_1_name = '' # don't just pass here
try:
runner_2_name = data['markets'][0]['runners'][1]['name']
except IndexError:
runner_2_name = ''
It giving you variables is referenced before assignment because in expect block you are just passing, so if try fails runner_1_name or runner_2_name is never defined. You when you try to use those variables you get an error because they were never defined. So in except block either set the value to a blank string or some other string like 'Runner Does not Exists'.
Now if you want to totally avoid try/except and IndexError you can use if statements to check the length of markets and runners. Something like this:
runner_1_name = ''
runner_2_name = ''
# Make sure markets exists in data and its length is greater than 0 and runners exists in first market
if 'markets' in data and len(data['markets']) > 0 and 'runners' in data['market'][0]:
runners = data['markets'][0]['runners']
# get runner 1
if len(runners) > 0 and `name` in runners[0]:
runner_1_name = runners[0]['name']
else:
runner_1_name = 'Runner 1 does not exists'
# get runner 2
if len(runners) > 1 and `name` in runners[1]:
runner_2_name = runners[1]['name']
else:
runner_2_name = 'Runner 2 does not exists'
As you can see this gets too long and its not the recommended way to do things.
You should just assume data is alright and try to get the names and use try/except to catch any errors as suggested above in my first code snippet.
I had the issue with a list of comments that can be empty or filled by an unknown number of comments
My solution is to initialize a counting variable at 0 and have a while loop on a boolean
In the loop I try to get comment[count] if it fails on except IndexError I set the boolean to False to stop the infinite loop
count = 0
condition_continue = True
while condition_continue :
try:
detailsCommentDict = comments[count]
....
except IndexError:
# no comment at all or no more comment
condition_continue = False

Field is present in output, but getting KeyError

Below is my output. As you can see, children is clearly a (dictionary) field in my response.
This code works perfectly, but it keeps any nested fields (lists or dictionaries) as is:
user = ""
password = getattr(config, 'password')
url = ''
req = requests.post(url = url, auth=(user, password))
print('Authentication succesful!/n')
ans = req.json()
#Transform resultList into Pandas DF
solr_df = pd.DataFrame.from_dict(json_normalize(ans['resultList']), orient='columns')
I instead would like to normalize the "children" field, so I did the following instead of the last row above:
solr_df = pd.DataFrame()
for record in ans['resultList']:
df = pd.DataFrame(record['children'])
df['contactId'] = record['contactId']
solr_df = solr_df.append(df)
However, I am getting a KeyError: 'children'.
Can anyone suggest what I am doing wrong?
One of your records is probably missing the 'children' key so catch that exception and continue processing the rest of the output.
solr_df = pd.DataFrame()
for record in ans['resultList']:
try:
df = pd.DataFrame(record['children'])
df['contactId'] = record['contactId']
solr_df = solr_df.append(df)
except KeyError as e:
print("Record {} triggered {}".format(record, e))
Since the message is KeyError: 'children', the only plausible reason for the error is that the children key is missing in one of the dicts. You can avoid the exception by using a try/except block, or can pass in a default value for the key, like:
solr_df = pd.DataFrame()
for record in ans['resultList']:
df = pd.DataFrame(record.get('children', {})
df['contactId'] = record.get('contactId')
solr_df = solr_df.append(df)

Categories