Iterating on a list based on different parameters - python

I'm once again asking for help on iterating over a list. This is the problem that eludes me this time:
I have this table:
that contains various combinations of countries with their relative trade flow.
Since trade goes both ways, my list has for example one value for ALB-ARM (how much albania traded with armenia that year) and then down the list another value for ARM-ALB (the other way around).
I want to sum this two trade values for every pair of countries; and I've been trying around with some code but I quickly realise how all my approaches are wrong.
How do I even set it up? I feel like it's too hard with a loop and it will be easy with some function that I don't even know exists.
Example data in Table format:
from astropy.table import Table
country1 = ["ALB","ALB","ARM","ARM","AZE","AZE"]
country2 = ["ARM","AZE","ALB","AZE","ALB","ARM"]
flow = [500,0,200,300,90,20]
t = Table([country1,country2,flow],names=["1","2","flow"],meta={"Header":"Table"})
and the expected output would be:
trade = [700,90,700,320,90,320]
result = Table([country1,country2,flow,trade],names=["1","2","flow","trade"],meta={"Header":"Table"})
Thank you in advance all

Maybe this could help:
country1 = ["ALB","ALB","ARM","ARM","AZE","AZE"]
country2 = ["ARM","AZE","ALB","AZE","ALB","ARM"]
flow = [500,0,200,300,90,20]
trade = []
pairs = map(lambda t: '-'.join(t), zip(country1, country2))
flow_map = dict(zip(pairs, flow))
for left_country, right_country in zip(country1, country2):
trade.append(flow_map['-'.join((left_country, right_country))] + flow_map['-'.join((right_country, left_country))])
print(trade)
outputs:
[700, 90, 700, 320, 90, 320]

Related

How to save results (and recall them when needed) of a simulation in Python?

I started (based on the idea shown in this model an actuarial project in Python in which I want to simulate, based on a set of inputs and adding (as done here: https://github.com/Saurabh0503/Financial-modelling-and-valuationn/blob/main/Dynamic%20Salary%20Retirement%20Model%20Internal%20Randomness.ipynb) some degree of internal randomness, how much it will take for an individual to retire, with a certain amount of wealth and a certain amount of annual salary and by submitting a certain annual payment (calculated as the desired cash divided by the years that will be necessary to retire). In my model's variation, the user can define his/her own parameters, making the model more flexible and user friendly; and there is a function that calculates the desired retirement cash based on individual's propensity both to save and spend.
The problem is that since I want to summarize (by taking the mean, max, min and std. deviation of wealth, salary and years to retirement) the output I obtain from the model, I have to save results (and to recall them) when I need to do so; but I don't have idea of what to do in order to accomplish this task.
I tried this solution, consisting in saving the simultation's output in a pandas dataframe. In particular I wrote that function:
def get_salary_wealth_year_case_df(data):
all_ytrs = []
salary = []
wealth = []
annual_payments = []
for i in range(data.n_iter):
ytr = years_to_retirement(data, print_output=False)
sal = salary_at_year(data, year, case, print_output=False)
wlt = wealth_at_year(data, year, prior_wealth, case, print_output=False)
pmt = annual_pmts_case_df(wealth_at_year, year, case, print_output=False)
all_ytrs.append(ytr)
salary.append(sal)
annual_payments.append(pmt)
df = pd.DataFrame()
df['Years to Retirement'] = all_ytrs
df['Salary'] = sal
df['Wealth'] = wlt
df['Annual Payments'] = pmt
return df
I need a feedback about what I'm doing. Am I doing it right? If so, are there more efficient ways to do so? If not, what should I do? Thanks in advance!
Given the inputs used for the function, I'm assuming your code (as it is) will do just fine in terms of computation speed.
As suggested, you can add a saving option to your function so the results that are being returned are stored in a .csv file.
def get_salary_wealth_year_case_df(data, path):
all_ytrs = []
salary = []
wealth = []
annual_payments = []
for i in range(data.n_iter):
ytr = years_to_retirement(data, print_output=False)
sal = salary_at_year(data, year, case, print_output=False)
wlt = wealth_at_year(data, year, prior_wealth, case, print_output=False)
pmt = annual_pmts_case_df(wealth_at_year, year, case, print_output=False)
all_ytrs.append(ytr)
salary.append(sal)
annual_payments.append(pmt)
df = pd.DataFrame()
df['Years to Retirement'] = all_ytrs
df['Salary'] = sal
df['Wealth'] = wlt
df['Annual Payments'] = pmt
# Save the dataframe to a given path inside your workspace
df.to_csv(path, header=False)
return df
After saving, returning the object might be optional. This depends on if you are going to use this dataframe on your code moving forward.

Iterate through multiple list of dictionaries

I would like to iterate through list of dictionaries in order to get a specific value, but I can't figure it out.
I've made a simplified version of what I've been working with. These lists or much longer, with more dictionaries in them, but for the sake of an example, I hope this shortened dataset will be enough.
listOfResults = [{"29":2523,"30":626,"10":0,"32":128},{"29":2466,"30":914,"10":0,"32":69}]
For example, I need the values of the key "30" from the dictionaries above. I've managed to get those and stored them in a list of integers. ( [626, 914] )
These integers are basically IDs. After this, I need to get the value of these IDs from another list of dictionaries.
listOfTrack = [{"track_length": 1.26,"track_id": 626,"track_name": "Rainbow Road"},{"track_length": 6.21,"track_id": 914,"track_name": "Excalibur"}]
I would like to print/store the track_names and track_lengths of the IDs I've got from the listOfResults earlier. Unfortunately, I've ended up in a complete mess of for loops.
You want something like this:
ids = [626, 914]
result = { track for track in list_of_tracks if track.get("track_id") in ids }
I unfortunately can't comment on the answer given by Nathaniel Ford because I'm a new user so I just thought I'd share it here as an answer.
His answer is basically correct, but I believe you need to replace the curly braces with brackets or else you will get this error: TypeError: unhashable type: 'dict'
The answer should look like:
ids = [626, 914]
result = [track for track in listOfTrack if track.get("track_id") in ids]
listOfResults = [{"29":2523,"30":626,"10":0,"32":128},{"29":2466,"30":914,"10":0,"32":69}]
ids = [x.get('30') for x in listOfResults]
listOfTrack = [{"track_length": 1.26,"track_id": 626,"track_name": "Rainbow Road"},{"track_length": 6.21,"track_id": 914,"track_name": "Excalibur"}]
out = [x for x in listOfTrack if x.get('track_id') in ids]
Alternatively, it may be time to learn a new library if you're going to be doing a lot this.
import pandas as pd
results_df = pd.DataFrame(listOfResults)
track_df = pd.DataFrame(listOfTrack)
These Look like:
# results_df
29 30 10 32
0 2523 626 0 128
1 2466 914 0 69
# track_df
track_length track_id track_name
0 1.26 626 Rainbow Road
1 6.21 914 Excalibur
Now we can answer your question:
# Creates a mask of rows where this is True.
mask = track_df['track_id'].isin(results_df['30'])
# Specifies that we want just those two columns.
cols = ['track_length', 'track_name']
out = track_df.loc[mask, cols]
print(out)
# Or we can make it back into a dictionary:
print(out.to_dict('records'))
Output:
track_length track_name
0 1.26 Rainbow Road
1 6.21 Excalibur
[{'track_length': 1.26, 'track_name': 'Rainbow Road'}, {'track_length': 6.21, 'track_name': 'Excalibur'}]

Dataframe compare EQ - position doesn't matter

Reading this article - https://datatofish.com/compare-values-dataframes/ while helpful it doesn't help with my use-case in that I have multiple prices per product. ie product1 = computer has price = 350, 850.
My compare to DF has product2 = computer has price = 850, 350
When I compare these TWO since I think the order seems to matter it says that they do not match, how can I compare irregardless of order?
df.product_series.eq(other=df.warehouse_series)
sample dataframe
,Unnamed: 0,product,testfrom1,testfrom2,seriesfrom1,seriesfrom2
0,0,hi this is me,1703.0|1144.0|2172.0|735.0,,"['1703.0', '1144.0', '2172.0', '735.0']",
1,1,abc543,1120.0|637.0|2026.0|1599.0,,"['1120.0', '637.0', '2026.0', '1599.0']",
2,2,thisisus,2663.0|859.0|2281.0|1487.0,,"['2663.0', '859.0', '2281.0', '1487.0']",
3,3,abc123,1407.0|1987.0|696.0,,"['1407.0', '1987.0', '696.0']",
4,4,thing2,1392.0|1971.0|552.0,,"['1392.0', '1971.0', '552.0']",
5,5,thing1,1025.0|1566.0|581.0,,"['1025.0', '1566.0', '581.0']",
in the sample above I compare seriesfrom1 and seriesfrom2 but again I think the order where they differ is throwing things off..
Use this custom function to compare. Can't compare it directly using eq
def custom_compare_eq(series,other):
length = len(series.values)
for i in range(length):
r1 = eval(str(series.values[i]))
r2 = eval(str(other.values[i]))
if type(r1)!=type(r2):
yield False
else:
if type(r1)==int:
yield r1==r2
elif type(r1)==list:
yield set(r1)==set(r2)
result = list(custom_compare_eq(df.column1,df.column2))
This will compare two lists with different orders

more efficient way of looping over name list in python

I"m playing around with Bittrex's API to get the current price of a coin. (E.g: btc-ltc). So in this case, the API will read:
r = requests.get('https://bittrex.com/api/v1.1/public/getticker?market=BTC-LTC').json()
pd = pandas.Dataframe(r)
print(pd)
If I want to get the current price of maybe... 50 or 200 different coins, i wrote a loop to replace BTC-LTC with that particular market coin name. (part of another API on Bittrex)
for i in marketnames:
r = requests.get('https://bittrex.com/api/v1.1/public/getticker?market={names}'.format(names=i)).json()
pd = pandas.Dataframe(r)
print(pd)
The problem with this loop is that it goes through 1 by 1, iterating over the list of coin names, 200 times to get the price.
Is there a more efficient way of doing this?
was there a typo in your code? if you iterate through the marketnames list then you should use i in your code, as below?
for i in marketnames:
r = requests.get('https://bittrex.com/api/v1.1/public/getticker?market={names}'.format(names=i)).json()
pd = pandas.Dataframe(r)
print(pd)

Computing aggregate by creating nested dictionary on the fly

I'm new to python and I could really use your help and guidance at the moment. I am trying to read a csv file with three cols and do some computation based on the first and second column i.e.
A spent 100 A spent 2040
A earned 60
B earned 48
B earned 180
A spent 40
.
.
.
Where A spent 2040 would be the addition of all 'A' and 'spent' amounts. This does not give me an error but it's not logically correct:
for row in rows:
cols = row.split(",")
truck = cols[0]
if (truck != 'A' and truck != 'B'):
continue
record = cols[1]
if(record != "earned" and record != "spent"):
continue
amount = int(cols[2])
#print(truck+" "+record+" "+str(amount))
if truck in entries:
#entriesA[truck].update(record)
if record in records:
records[record].append(amount)
else:
records[record] = [amount]
else:
entries[truck] = records
if record in records:
records[record].append(amount)
else:
entries[truck][record] = [amount]
print(entries)
I am aware that this part is incorrect because I would be adding the same inner dictionary list to the outer dictionary but I'm not sure how to go from there:
entries[truck] = records
if record in records:
records[record].append(amount)
However, Im not sure of the syntax to create a new dictionary on the fly that would not be 'records'
I am getting:
{'B': {'earned': [60, 48], 'spent': [100]}, 'A': {'earned': [60, 48], 'spent': [100]}}
But hoping to get:
{'B': {'earned': [48]}, 'A': {'earned': [60], 'spent': [100]}}
Thanks.
For the kind of calculation you are doing here, I highly recommend Pandas.
Assuming in.csv looks like this:
truck,type,amount
A,spent,100
A,earned,60
B,earned,48
B,earned,180
A,spent,40
You can do the totalling with three lines of code:
import pandas
df = pandas.read_csv('in.csv')
totals = df.groupby(['truck', 'type']).sum()
totals now looks like this:
amount
truck type
A earned 60
spent 140
B earned 228
You will find that Pandas allows you to think on a much higher level and avoid fiddling with lower level data structures in cases like this.
if record in entries[truck]:
entries[truck][record].append(amount)
else:
entries[truck][record] = [amount]
I believe this is what you would want? Now we are directly accessing the truck's records, instead of trying to check a local dictionary called records. Just like you did if there wasn't any entry of a truck.

Categories