I need to calculate a loss/profit carried forward for various years for various mappings.The test data looks like the following:
import pandas as pd
data = {'combined_line': {0: 'COMB', 1: 'COMB', 2: 'COMB', 3: 'COMB', 4: 'COMB', 5: 'COMB', 6: 'COMB', 7: 'COMB', 8: 'COMB', 9: 'COMB', 10: 'COMB', 11: 'COMB', 12: 'COMB', 13: 'COMB', 14: 'COMB', 15: 'COMB', 16: 'COMB', 17: 'COMB', 18: 'COMB', 19: 'COMB', 20: 'COMB', 21: 'COMB', 22: 'COMB', 23: 'COMB', 24: 'COMB', 25: 'COMB', 26: 'COMB', 27: 'COMB', 28: 'COMB', 29: 'COMB', 30: 'COMB', 31: 'COMB', 32: 'COMB', 33: 'COMB', 34: 'COMB', 35: 'COMB', 36: 'COMB', 37: 'COMB', 38: 'COMB', 39: 'COMB', 40: 'COMB', 41: 'COMB', 42: 'COMB', 43: 'COMB', 44: 'COMB', 45: 'COMB', 46: 'COMB', 47: 'COMB', 48: 'COMB', 49: 'COMB', 50: 'COMB', 51: 'COMB', 52: 'COMB', 53: 'COMB', 54: 'COMB', 55: 'COMB', 56: 'COMB', 57: 'COMB', 58: 'COMB', 59: 'COMB', 60: 'COMB', 61: 'COMB', 62: 'COMB', 63: 'COMB'}, 'line': {0: 'HWNK', 1: 'HWNK', 2: 'HWNK', 3: 'HWNK', 4: 'HWNK', 5: 'HWNK', 6: 'HWNK', 7: 'HWNK', 8: 'PGIB',
9: 'PGIB', 10: 'PGIB', 11: 'PGIB', 12: 'PGIB', 13: 'PGIB', 14: 'PGIB', 15: 'PGIB', 16: 'UIGZ', 17: 'UIGZ', 18: 'UIGZ', 19: 'UIGZ', 20: 'UIGZ', 21: 'UIGZ', 22: 'UIGZ', 23: 'UIGZ', 24: 'JVSM', 25: 'JVSM', 26: 'JVSM', 27: 'JVSM', 28: 'JVSM', 29: 'JVSM', 30: 'JVSM', 31: 'JVSM', 32: 'IALH', 33: 'IALH', 34: 'IALH', 35: 'IALH', 36: 'IALH', 37: 'IALH', 38: 'IALH', 39: 'IALH', 40: 'GUER', 41: 'GUER', 42: 'GUER', 43: 'GUER', 44: 'GUER', 45: 'GUER', 46: 'GUER', 47: 'GUER', 48: 'UGQC', 49: 'UGQC', 50: 'UGQC', 51: 'UGQC', 52: 'UGQC', 53: 'UGQC', 54: 'UGQC', 55: 'UGQC', 56: 'ZBZA', 57: 'ZBZA', 58: 'ZBZA', 59: 'ZBZA', 60: 'ZBZA', 61: 'ZBZA', 62: 'ZBZA', 63: 'ZBZA'},
'Underwriting Year': {0: 2006, 1: 2007, 2: 2008, 3: 2009, 4: 2010, 5: 2011, 6: 2012, 7: 2013, 8: 2006, 9: 2007, 10: 2008, 11: 2009, 12: 2010, 13: 2011, 14: 2012, 15: 2013, 16: 2006, 17: 2007, 18: 2008, 19: 2009, 20: 2010, 21: 2011, 22: 2012, 23: 2013, 24: 2006, 25: 2007, 26: 2008, 27: 2009, 28: 2010, 29: 2011, 30: 2012, 31: 2013, 32: 2006, 33: 2007, 34: 2008, 35: 2009, 36: 2010, 37: 2011, 38: 2012, 39: 2013, 40: 2006, 41: 2007, 42: 2008, 43: 2009, 44: 2010, 45: 2011, 46: 2012, 47: 2013, 48: 2006, 49: 2007, 50: 2008, 51: 2009, 52: 2010, 53: 2011, 54: 2012, 55: 2013, 56: 2006, 57: 2007, 58: 2008, 59: 2009, 60: 2010, 61: 2011, 62: 2012, 63: 2013}, 'Loss Carried Forward Years': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4, 5: 4, 6: 4, 7: 4, 8: 4, 9: 4, 10: 4, 11: 4, 12: 4, 13: 4, 14: 4, 15: 4, 16: 4, 17: 4, 18: 4, 19: 4, 20: 4, 21: 4, 22: 4, 23: 4, 24: 4, 25: 4, 26: 4, 27: 4, 28: 4, 29: 4, 30: 4, 31: 4, 32: 4, 33: 4, 34: 4, 35: 4, 36: 4, 37: 4, 38: 4, 39: 4, 40: 4, 41: 4, 42: 4, 43: 4, 44: 4, 45: 4, 46: 4, 47: 4, 48: 4, 49: 4, 50: 4, 51: 4, 52: 4, 53: 4, 54: 4, 55: 4, 56: 4, 57: 4, 58: 4, 59: 4, 60: 4, 61: 4, 62: 4, 63: 4}, 'Result': {0: 1.7782623338664507, 1: 573.5652911310642, 2: -757.5452321102866, 3: 109.5149916578, 4: -255.67441806846205, 5: -687.5363404984247, 6: -237.72375990073272, 7: 377.0590732628068, 8: 195.06552059019327, 9: 253.9139354887218, 10: -199.3089719508628, 11: -613.0298155777073, 12: 579.0530926295057, 13: 29.428579932476623, 14: 138.8491336480481, 15: 169.5509712778246, 16: -678.0475161337745, 17: 143.8572792017776, 18: 582.0521770196842, 19: 999.6608185859805, 20: 617.653356833144, 21: 324.507583333668, 22: -659.8006551374211, 23: 504.40968855532833, 24: -233.0400805626533, 25: -216.2984964245977, 26: -867.441337711643, 27:
837.8986975605346, 28: 701.1722485951575, 29: 430.6209772769762, 30: 949.027900642678, 31: 153.92299033433596, 32: 839.6369570865697, 33: -453.5140989578259, 34: -58.89747070779697, 35: -530.522608203202, 36: -463.6972938418005, 37: -468.78369264516937, 38: -541.2808912223624, 39: 330.6903172253092, 40: -638.0156450384441, 41: -304.1122851963345, 42: 437.2797841418076, 43: 561.7387061220729, 44: -503.2740733067485, 45: 433.5804400240565, 46: 475.2435623884169, 47: -405.59364491545136, 48: -415.5501796978929, 49: -935.0663192223606, 50: 171.69580433209808, 51: -554.0056030900487, 52: 45.388394682329135, 53: -440.7714651883558, 54: 59.27169133875464, 55: 40.29995988400401, 56: -812.8599999277563, 57: 86.19303814647606, 58: 655.1887822922679, 59: 62.82680301860228, 60: 22.36985316764265, 61: -964.6910496383512, 62: -830.95126121312, 63: -808.1019400083396}}
df = pd.DataFrame(data)
I need to calculate a profit/loss carried forward on the combined and individual level.
On a combined level only a loss can be carried forward and only carriable for the Loss Carried Forward Years column value (so after 4 years a loss expires). On a combined level the loss carried forward looks like the following.
╒════╤═════════════════════╤══════════╤════════════════════════╤═════════════════════════════════════╤═════════════════╕
│ │ Underwriting Year │ Result │ Loss Carried Forward │ Result After Loss Carried Forward │ combined_line │
╞════╪═════════════════════╪══════════╪════════════════════════╪═════════════════════════════════════╪═════════════════╡
│ 0 │ 2006 │ -1741.03 │ 0.00 │ -1741.03 │ COMB │
├────┼─────────────────────┼──────────┼────────────────────────┼─────────────────────────────────────┼─────────────────┤
│ 1 │ 2007 │ -851.46 │ -1741.03 │ -2592.49 │ COMB │
├────┼─────────────────────┼──────────┼────────────────────────┼─────────────────────────────────────┼─────────────────┤
│ 2 │ 2008 │ -36.98 │ -2592.49 │ -2629.47 │ COMB │
├────┼─────────────────────┼──────────┼────────────────────────┼─────────────────────────────────────┼─────────────────┤
│ 3 │ 2009 │ 874.08 │ -2629.47 │ -1755.39 │ COMB │
├────┼─────────────────────┼──────────┼────────────────────────┼─────────────────────────────────────┼─────────────────┤
│ 4 │ 2010 │ 742.99 │ -1755.39 │ -1012.40 │ COMB │
├────┼─────────────────────┼──────────┼────────────────────────┼─────────────────────────────────────┼─────────────────┤
│ 5 │ 2011 │ -1343.64 │ -888.44 │ -2232.08 │ COMB │
├────┼─────────────────────┼──────────┼────────────────────────┼─────────────────────────────────────┼─────────────────┤
│ 6 │ 2012 │ -647.36 │ -1380.62 │ -2027.99 │ COMB │
├────┼─────────────────────┼──────────┼────────────────────────┼─────────────────────────────────────┼─────────────────┤
│ 7 │ 2013 │ 362.24 │ -1991.01 │ -1628.77 │ COMB │
╘════╧═════════════════════╧══════════╧════════════════════════╧═════════════════════════════════════╧═════════════════╛
The problem I am having is calculating the individual lines' profit/loss carried forward. To get those values, you will need to carry profit and losses forward to balance with the combined level.
I have written a test data creator:
def generate_data(combined_line: str) -> List[Dict[str, Any]]:
data: List[Dict[str, Any]] = []
# Underwriting Years
end_uwy: int = random.randint(2001, 2022)
start_uwy: int = random.randint(2000, end_uwy-1)
uwy_list = [i for i in range(start_uwy, end_uwy)]
lines = random.sample(range(1, 456976), random.randint(2, 10))
alphabets_list: List[str] = list(string.ascii_uppercase)
keywords = [''.join(i) for i in itertools.product(alphabets_list, repeat = 4)]
lines_list:List[str] = [keywords[i] for i in lines]
loss_carried_forwards_years: int = random.randint(3, 10)
for line in lines_list:
for uw_year in uwy_list:
data_dict: Dict[str, Any] = {
"combined_line": combined_line,
"line": line,
"Underwriting Year": uw_year,
"Loss Carried Forward Years": loss_carried_forwards_years,
"Result": random.uniform(-1000, 1000)
}
data.append(data_dict)
return data
To check that the result balance I do the following:
grouped_df = indiv_df.groupby(by=["Underwriting Year", "combined_line"]).sum().reset_index()
assert_frame_equal(combined_df, grouped_col_df)
I can't get it right to go back from the combined level to the individual level with the code I have written, so that if you group by and sum the individual level, it equals the combined level.
The problem with this is that you the grouped data is an, in aggregate, loss or profit (after considering the results of all the individuals together). However, to calculate the result after loss carried forward means for both and then trying to equate these will, in most cases, not work.
This is because for some companies there would have been a negative result, which would be carried forward, whilst in others there would be a positive and so not carried forward. If for the given year there are larger positive values than negative, the grouped data would not carry forward the negative individual results, which causes the difference.
Here is the code I wrote below that calculates the two different types, and although the code is nearly identical, it is apparent that there will be a difference because of when the aggregation occurs.
# data from your function
df = pd.DataFrame(generate_data("COMB"))
""" Creating the individual data """
individ_df = pd.DataFrame()
# for each individual "line" in each "combined_line"
for grp, dat in df.groupby(["combined_line", "line"]):
# sort values by underwriting year
dat = dat.sort_values(by="Underwriting Year")
# loss carried forward if a shifted calculation of a 4-year rolling sum
dat["Loss Carried Forward"] = pd.Series(np.where(dat["Result"] < 0, dat["Result"], 0)).rolling(4, min_periods=1).sum().shift(1).fillna(0)
# result after loss carried forward is result plus loss carried forward
dat["Result After Loss Carried Forward"] = dat["Result"] + dat["Loss Carried Forward"]
# concatenate this result to the dataframe
individ_df = pd.concat([individ_df, dat], axis=0)
""" Grouped calculations """
# This is exactly the same, but grouped for combined_line, not individual
grouped_df = df.groupby(by=["Underwriting Year", "combined_line"]).sum().reset_index()
grouped_df["Loss Carried Forward"] = pd.Series(np.where(grouped_df["Result"] < 0, grouped_df["Result"], 0)).rolling(4, min_periods=1).sum().shift(1).fillna(0)
grouped_df["Result After Loss Carried Forward"] = grouped_df["Result"] + grouped_df["Loss Carried Forward"]
""" Checking the results of the "Result After Loss Carried Forward" """
# individuals grouped
individ_df.groupby(["combined_line", "Underwriting Year"])["Result After Loss Carried Forward"].sum()
# grouped_df
grouped_df["Result After Loss Carried Forward"]
Related
I generate the following plot:
By the following code:
data = {'BestFit_rej_ratio': {0: 0.1975987994, 1: 0.2006003002, 2: 0.1790895448, 3: 0.2216108054, 4: 0.1785892946, 5: 0.1890945473, 6: 0.1780890445, 7: 0.1780890445, 8: 0.2016008004, 9: 0.1900950475, 10: 0.1985992996, 11: 0.2031015508, 12: 0.2046023012, 13: 0.2071035518, 14: 0.1750875438, 15: 0.2166083042, 16: 0.1725862931, 17: 0.188094047, 18: 0.1870935468, 19: 0.1895947974, 20: 0.004502251126, 21: 0.006503251626, 22: 0.005002501251, 23: 0.006503251626, 24: 0.008004002001, 25: 0.006003001501, 26: 0.00300150075, 27: 0.005502751376, 28: 0.0100050025, 29: 0.005002501251, 30: 0.006003001501, 31: 0.005502751376, 32: 0.007503751876, 33: 0.005502751376, 34: 0.005502751376, 35: 0.005502751376, 36: 0.007503751876, 37: 0.005002501251, 38: 0.004002001001, 39: 0.009004502251, 40: 0.4172086043, 41: 0.4322161081, 42: 0.4017008504, 43: 0.4247123562, 44: 0.4292146073, 45: 0.4077038519, 46: 0.4282141071, 47: 0.4637318659, 48: 0.4392196098, 49: 0.4172086043, 50: 0.4187093547, 51: 0.4057028514, 52: 0.4287143572, 53: 0.4242121061, 54: 0.4347173587, 55: 0.4307153577, 56: 0.4102051026, 57: 0.4437218609, 58: 0.4212106053, 59: 0.4172086043}, 'MDP_rej_ratio': {0: 0.1660830415, 1: 0.1605802901, 2: 0.152076038, 3: 0.1885942971, 4: 0.152076038, 5: 0.1565782891, 6: 0.1445722861, 7: 0.1570785393, 8: 0.1705852926, 9: 0.1605802901, 10: 0.1740870435, 11: 0.1670835418, 12: 0.1805902951, 13: 0.1740870435, 14: 0.1460730365, 15: 0.1810905453, 16: 0.1425712856, 17: 0.1580790395, 18: 0.1455727864, 19: 0.1590795398, 20: 0.001500750375, 21: 0.00300150075, 22: 0.002501250625, 23: 0.002501250625, 24: 0.0020010005, 25: 0.002501250625, 26: 0.0020010005, 27: 0.001500750375, 28: 0.004002001001, 29: 0.00300150075, 30: 0.0020010005, 31: 0.0, 32: 0.004002001001, 33: 0.0005002501251, 34: 0.0020010005, 35: 0.0, 36: 0.004502251126, 37: 0.002501250625, 38: 0.001500750375, 39: 0.004002001001, 40: 0.3851925963, 41: 0.3851925963, 42: 0.4097048524, 43: 0.3756878439, 44: 0.4112056028, 45: 0.4212106053, 46: 0.3791895948, 47: 0.4127063532, 48: 0.4432216108, 49: 0.4152076038, 50: 0.3871935968, 51: 0.4197098549, 52: 0.3896948474, 53: 0.4107053527, 54: 0.4062031016, 55: 0.4252126063, 56: 0.4112056028, 57: 0.3931965983, 58: 0.4372186093, 59: 0.4157078539}, 'Q-Learning_rej_ratio': {0: 0.1790895448, 1: 0.1645822911, 2: 0.1545772886, 3: 0.1905952976, 4: 0.1510755378, 5: 0.1595797899, 6: 0.148074037, 7: 0.1575787894, 8: 0.1715857929, 9: 0.1590795398, 10: 0.1690845423, 11: 0.168084042, 12: 0.180090045, 13: 0.1785892946, 14: 0.1495747874, 15: 0.1815907954, 16: 0.1435717859, 17: 0.1685842921, 18: 0.1505752876, 19: 0.1670835418, 20: 0.001500750375, 21: 0.00300150075, 22: 0.002501250625, 23: 0.002501250625, 24: 0.0020010005, 25: 0.002501250625, 26: 0.0020010005, 27: 0.001500750375, 28: 0.004002001001, 29: 0.00300150075, 30: 0.0020010005, 31: 0.0, 32: 0.004002001001, 33: 0.0005002501251, 34: 0.0020010005, 35: 0.0, 36: 0.004502251126, 37: 0.002501250625, 38: 0.001500750375, 39: 0.004002001001, 40: 0.3856928464, 41: 0.4167083542, 42: 0.3786893447, 43: 0.4187093547, 44: 0.4157078539, 45: 0.392196098, 46: 0.4032016008, 47: 0.4452226113, 48: 0.4217108554, 49: 0.3876938469, 50: 0.4192096048, 51: 0.388194097, 52: 0.4122061031, 53: 0.4152076038, 54: 0.4172086043, 55: 0.4137068534, 56: 0.3956978489, 57: 0.4342171086, 58: 0.4082041021, 59: 0.4032016008}, 'Parametrized_factor': {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 0.2, 22: 0.2, 23: 0.2, 24: 0.2, 25: 0.2, 26: 0.2, 27: 0.2, 28: 0.2, 29: 0.2, 30: 0.2, 31: 0.2, 32: 0.2, 33: 0.2, 34: 0.2, 35: 0.2, 36: 0.2, 37: 0.2, 38: 0.2, 39: 0.2, 40: 2.0, 41: 2.0, 42: 2.0, 43: 2.0, 44: 2.0, 45: 2.0, 46: 2.0, 47: 2.0, 48: 2.0, 49: 2.0, 50: 2.0, 51: 2.0, 52: 2.0, 53: 2.0, 54: 2.0, 55: 2.0, 56: 2.0, 57: 2.0, 58: 2.0, 59: 2.0}}
data2 = pd.DataFrame(data)
# figure size
plt.figure(figsize=(12, 8))
ax = sns.pointplot(y="BestFit_rej_ratio", x="Parametrized_factor", data=data2, linestyles='-.', color='g', capsize=.1, scale=.2, errwidth=.5)
ax = sns.pointplot(y="MDP_rej_ratio", x="Parametrized_factor", data=data2, linestyles='-', color='r', capsize=.12, scale=.2, errwidth=.5)
ax = sns.pointplot(y="Q-Learning_rej_ratio", x="Parametrized_factor", data=data2, linestyles=':', color='k', capsize=.15, scale=.5, errwidth=.5)
ax.legend(bbox_to_anchor=(1.15, 1), loc='upper left')
labels = ax.legend(['BestFit', 'MDP', 'Q-Learning'])
colors = ['green', 'red', 'black']
i = 0
for l in labels.get_texts():
l.set_color(colors[i])
i+=1
plt.setp(ax.get_legend().get_texts(), fontsize='12')
# for legend text
ax.set_ylabel('Rejection ratio')
ax.set_xlabel('Parametrized factor')
plt.show()
Problem is that the plot is not respecting the scale of the x-axis values (Parametrized_factor).
How can I solve it?
Problem:
I have two pandas dataframes of ~3GB each, to join on 1) postcode & 2) every combination of house_identifier variables until either a row has found a join (in a for loop) or all variable-to-variable combinations have failed for the row.
After two columns have joined a row in the loop, that row will be appended to a separate list and removed from the dataframes. postcode is a non-unique index.
Table columns (not hirarchical)
dataset_1 has these variables:
postcode, house_identifier_1, house_identifier_2, house_identifier_3, id
dataset_2 has these variables:
postcode, house_identifier_a, house_identifier_b, id_2
Column combinations to join in a loop:
table_1_variables = ['number_x', 'number_y', 'number_z']
table_2_variables = ['number_a', 'number_b']
for i in table_1_variables:
for j in table_2_variables:
To join the tables efficiently a stratgy seems to be to first join on the indexes (postcodes), and then on the non-indexed columns. However it seems like this would create a very large intermediary join which would push an 8GB memory over limits and the syntax is also unclear between combining (left_index=True, right_index=True, left_on=, right_on=)
Meanwhile, indexing/reindexing and then sorting_index inside a loop seems very inefficient.
Is there a better way to join or merge theese efficiently?
Example of intersects:
{'id': {27: '{582D0636-8DEF-8F22-E053-6C04A8C01BAC}',
41: '{D9E869FE-7B55-4C36-AC43-695B9033A13B}',
33: '{93E6821E-554E-40FD-E053-6B04A8C0C1DF}',
1: '{288DCE29-0589-E510-E050-A8C06205480E}',
48: '{3A23DDD5-A0E8-41D2-A514-5B09385C301F}',
52: '{CEB16957-F7FA-4D1B-B45F-A390214735BC}',
13: '{404A5AF3-9B20-CD2B-E050-A8C063055C7B}',
16: '{64342BFD-FD07-422C-E053-6C04A8C0FB8A}',
57: '{29A8E769-8A10-4477-9494-FF55EF5FAE4B}',
10: '{404A5AF3-0B58-CD2B-E050-A8C063055C7B}',
21: '{55BDCAE6-0C10-521D-E053-6B04A8C0DD7A}',
31: '{5C676A02-1781-4152-950C-6E5CA2CBC487}',
7: '{68FEB20B-142E-38DA-E053-6C04A8C051AE}',
45: '{8F1B26BD-673F-53DB-E053-6C04A8C03649}',
12: '{2F115F7A-8F81-4124-9FD4-FB76E742B2C1}',
36: '{344AB2D7-4B59-4AB4-8F52-75B29BE8C509}',
20: '{965B6D91-D4B6-95E4-E053-6C04A8C07729}',
56: '{59872FD9-F39D-4BB9-95F6-91E002D948B1}',
22: '{6141DFF0-973F-4FEC-A582-7F310B566031}'},
'id_2': {27: 10002277489,
41: 64023255,
33: 10007367447,
1: 22229221,
48: 10033235735,
52: 100062162615,
13: 50103744,
16: 10022903998,
57: 12015624,
10: 12154940,
21: 10024247587,
31: 100041193990,
7: 10008230730,
45: 10091640210,
12: 202107394,
36: 5062293,
20: 48114659,
56: 10001311242,
22: 10000443154},
'postcode': {27: 'lu72la',
41: 'cf626nt',
33: 'hr40aq',
1: 'bn32pd',
48: 'sg13ae',
52: 'gu97jx',
13: 'ct202ef',
16: 'bh14rn',
57: 'ub24af',
10: 'w55bu',
21: 'po302dp',
31: 'tq148aq',
7: 'e82ag',
45: 'ch47ew',
12: 'ha90ae',
36: 'nw34tt',
20: 'sw192rw',
56: 'so143hw',
22: 'se218hp'},
'house_identifier_1': {27: '76',
41: 'flat6',
33: '49',
1: 'flat10',
48: '145',
52: '31',
13: 'flat19',
16: 'flat7',
57: '76',
10: 'flat1',
21: 'flat1',
31: 'flat43',
7: 'flata',
45: '8',
12: '42',
36: 'flat9',
20: 'flat43',
56: 'flat156',
22: 'flat2'},
'house_identifier_2': {27: 'eastdock',
41: 'courtlands',
33: 'watkinscourt',
1: 'ascothouse',
48: 'monumentcourt',
52: 'sumnercourt',
13: '22-24',
16: '77',
57: 'osterleyviews',
10: '55-59',
21: '138',
31: 'leandercourt',
7: '130',
45: 'greenbankhall',
12: 'danescourt',
36: 'holmefieldcourt',
20: 'bennetscourtyard',
56: 'oceanaboulevard',
22: '124f'},
'house_identifier_3': {27: 'eastdock',
41: 'courtlands',
33: 'watkinscourt',
1: 'ascothouse',
48: 'monumentcourt',
52: 'sumnercourt',
13: None,
16: None,
57: 'osterleyviews',
10: None,
21: None,
31: 'leandercourt',
7: None,
45: 'greenbankhall',
12: 'danescourt',
36: 'holmefieldcourt',
20: 'bennetscourtyard',
56: 'oceanaboulevard',
22: None},
'house_identifier_a': {27: None,
41: None,
33: None,
1: '18-20',
48: None,
52: None,
13: '22-24',
16: '77',
57: None,
10: '55-59',
21: '138',
31: None,
7: '130',
45: None,
12: None,
36: None,
20: None,
56: None,
22: '124f'},
'house_identifier_b': {27: '76',
41: 'flat6',
33: '49',
1: 'flat10',
48: '145',
52: '31',
13: 'flat19',
16: 'flat7',
57: '76',
10: 'flat1',
21: 'flat1',
31: 'flat43',
7: 'flata',
45: '8',
12: '42',
36: 'flat9',
20: 'flat43',
56: 'flat156',
22: 'flat2'}}
I am making a romaji to hiragana translator and am getting an error when I try this concatenation. I made a list of keys and am using a for loop to make a dictionary by the sequential nature of unicode.
combos = {}
for hexy in range(12363, 12435):
combos[sounds[12363 - hexy]] = ('\u%s' % str(chr(hex(hexy))))
I get the error
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
I have tried doubling the backslashes, separating the u and backslash, using a + instead of value insertion, and checking out when other people got similar errors.
If you want the character, just use chr(hexy).
If you want an escape code and assuming Python 3.6+, use f'\\u{hexy:04x}'.
for example (dropped sounds since it wasn't defined):
combos = {}
for i,hexy in enumerate(range(12363, 12435)):
combos[i] = chr(hexy)
print(combos)
Result:
{0: 'か', 1: 'が', 2: 'き', 3: 'ぎ', 4: 'く', 5: 'ぐ', 6: 'け', 7: 'げ', 8: 'こ', 9: 'ご', 10: 'さ', 11: 'ざ', 12: 'し', 13: 'じ', 14: 'す', 15: 'ず', 16: 'せ', 17: 'ぜ', 18: 'そ', 19: 'ぞ', 20: 'た', 21: 'だ', 22: 'ち', 23: 'ぢ', 24: 'っ', 25: 'つ', 26: 'づ', 27: 'て', 28: 'で', 29: 'と', 30: 'ど', 31: 'な', 32: 'に', 33: 'ぬ', 34: 'ね', 35: 'の', 36: 'は', 37: 'ば', 38: 'ぱ', 39: 'ひ', 40: 'び', 41: 'ぴ', 42: 'ふ', 43: 'ぶ', 44: 'ぷ', 45: 'へ', 46: 'べ', 47: 'ぺ', 48: 'ほ', 49: 'ぼ', 50: 'ぽ', 51: 'ま', 52: 'み', 53: 'む', 54: 'め', 55: 'も', 56: 'ゃ', 57: 'や', 58: 'ゅ', 59: 'ゆ', 60: 'ょ', 61: 'よ', 62: 'ら', 63: 'り', 64: 'る', 65: 'れ', 66: 'ろ', 67: 'ゎ', 68: 'わ', 69: 'ゐ', 70: 'ゑ', 71: 'を'}
I'd like to be able to assign the following keys to these values in Python:
Numbers 01 - 10 : 5.01
Numbers 11 - 20 : 7.02
Numbers 21 - 30 : 9.03
Numbers 31 - 40 : 11.04
Numbers 41 - 50 : 15.00
Numbers 51 - 60 : 17.08
Numbers 61 - 70 : 19.15
I know that this is possible:
rates = dict.fromkeys(range(1, 11), 5.01)
rates.update(dict.fromkeys(range(11, 21), 7.02)
# ...etc
and that's okay. However, is there a way to do this in one line or one initializer list in Python?
Use a dictionary comprehension and an initial mapping:
numbers = {1: 5.01, 11: 7.02, 21: 9.03, 31: 11.04, 41: 15.0, 51: 71.08, 61: 19.15}
numbers = {k: v for start, v in numbers.items() for k in range(start, start + 10)}
Demo:
>>> from pprint import pprint
>>> numbers = {1: 5.01, 11: 7.02, 21: 9.03, 31: 11.04, 41: 15.0, 51: 71.08, 61: 19.15}
>>> numbers = {k: v for start, v in numbers.items() for k in range(start, start + 10)}
>>> pprint(numbers)
{1: 5.01,
2: 5.01,
3: 5.01,
4: 5.01,
5: 5.01,
6: 5.01,
7: 5.01,
8: 5.01,
9: 5.01,
10: 5.01,
11: 7.02,
12: 7.02,
13: 7.02,
14: 7.02,
15: 7.02,
16: 7.02,
17: 7.02,
18: 7.02,
19: 7.02,
20: 7.02,
21: 9.03,
22: 9.03,
23: 9.03,
24: 9.03,
25: 9.03,
26: 9.03,
27: 9.03,
28: 9.03,
29: 9.03,
30: 9.03,
31: 11.04,
32: 11.04,
33: 11.04,
34: 11.04,
35: 11.04,
36: 11.04,
37: 11.04,
38: 11.04,
39: 11.04,
40: 11.04,
41: 15.0,
42: 15.0,
43: 15.0,
44: 15.0,
45: 15.0,
46: 15.0,
47: 15.0,
48: 15.0,
49: 15.0,
50: 15.0,
51: 71.08,
52: 71.08,
53: 71.08,
54: 71.08,
55: 71.08,
56: 71.08,
57: 71.08,
58: 71.08,
59: 71.08,
60: 71.08,
61: 19.15,
62: 19.15,
63: 19.15,
64: 19.15,
65: 19.15,
66: 19.15,
67: 19.15,
68: 19.15,
69: 19.15,
70: 19.15}
The dictionary expression produces both a key and a value for each iteration of the loops. There are two loops in that expression, and you need to read them from left to right as nested in that order. Written out as a non-comprehension set of loops, you'd get:
numbers = {1: 5.01, 11: 7.02, 21: 9.03, 31: 11.04, 41: 15.0, 51: 71.08, 61: 19.15}
output = {}
# loop over the (key, value) pairs in the numbers dictionary
for start, v in numbers.items():
for k in range(start, start + 10):
output[k] = v
numbers = output
Essentially the keys in the original numbers dictionary are turned into ranges to form 10 new keys in the output dictionary, all with the same value.
I have printed an output in python shell like:
>>>{1: 117.33282674772036, 2: 119.55324074074075, 3: 116.45497076023392, 4: 113.77561475409836, 5: 112.93896713615024, 6: 114.23583333333333, 7: 124.92402972749794, 8: 121.40603448275863, 9: 116.4946452476573, 10: 112.89107142857142, 11: 122.33312577833125, 12: 116.57083333333334, 13: 122.2856334841629, 14: 125.26688815060908, 15: 129.13817204301074, 16: 128.78991596638656, 17: 127.54600301659126, 18: 133.65972222222223, 19: 127.28315789473685, 20: 125.07205882352942, 21: 124.79464285714286, 22: 131.36170212765958, 23: 130.17974002689377, 24: 138.37055555555557, 25: 132.72380952380954, 26: 138.44230769230768, 27: 134.82251082251082, 28: 147.12448979591838, 29: 149.86879730866275, 30: 145.04521072796936, 31: 143.72442396313363, 32: 148.12940140845072, 33: 140.06355218855219, 34: 145.44537815126051, 35: 146.50366300366301, 36: 146.2173611111111, 37: 152.36319881525361, 38: 156.42249459264599, 39: 154.6977564102564, 40: 155.47647058823529, 41: 158.72357723577235, 42: 162.23746031746032, 43: 149.30991931656382, 44: ........
It represents adjacent neighbors. How can I save this output in a text file in python line-by-line?
like:
1:117.3328268788
2:119.5532822788
Something like this:
with open('some_file.txt', 'w') as f:
for k in sorted(your_dic):
f.write("{}:{}\n".format(k, your_dic[k]))