Create Hierarchical tree with nodes name as str - python

I need to create tree showing calculation of formula at each level. I receive data in as dictionaries
level1={'Operating Cash Flow':['Total Revenue','Operating Expenses']}
level2={'Total Revenue':['Net Income','Change in Asset'],'Operating Expenses':['Non Cash Expense','XYZ']}
And So on. Idea is to create complex trackers using publicly available data. And then output(in excel) it in a hierarchical format to show it's calculation. Something like this
+-------------------------------------------------------+
| Operating |
| Cash Flow |
+-------------------------------------------------------+
| 60 |
+-------------------------------------------------------+
| Total Revenue | Operating Expense |
+------------------------------+------------------------+
| 10 | 50 |
+------------------------------+------------------------+
| Net Income | Change in asset | Non cash Expense | XYZ |
+------------+-----------------+------------------+-----+
| 20 | -10 | 40 | 10 |
+------------+-----------------+------------------+-----+
So I tried to make a tree like structure for this.
class Node(object):
def __init__(self):
self.score=0
self.fields=[]
self.weights=1
self.children=[]
def add_child(self,obj):
self.children.append(obj)
self.weights= self.weights + obj.weights -1
def enter_score(self,value):
self.score=value
where Node.fields will store their fields and Node.children points to there value. Since field names in dict are str, I am not sure how to make this complete tree. Also any help regarding how to output this as excel after tree is done would be helpful.

Related

Adding a filter menu on folium heatmap with time

I'm working with a car accidents dataset, and I want to create a folium heatmap with time, that shows all the casualties with their location. Here's a sample of the data called df_weather_casualties:
|Date2 | Latitude | Longitude | Number_of_Casualties | Weather_Details
----------------------------------------------------------------------------------
|2005-01-04 | 51.489096 | -0.191170 | 1 | Raining no high winds
|2005-01-05 | 51.520075 | -0.211708 | 1 | Fine no high winds
|2005-01-06 | 51.525301 | -0.206458 | 1 | Fine no high winds
|2005-01-06 | 51.482442 | -0.173862 | 1 | Fine no high winds
|2005-01-10 | 51.495752 | -0.156618 | 1 | Fine no high winds
Because there are duplicate dates (as there are multiple casualties in a day) I've created a time index with:
time_index = list(df_weather_casualties['Date2'].sort_values().astype('str').unique())
I've created a list of lists meaning each element of the data will contain another list which represents Latitude, Longitude, Number_of_Casualties, and Weather_Details of each date with the following code:
df_weather_casualties['Date2'] = df_weather_casualties['Date2'].sort_values(ascending=True)
weather_casualties_data = []
for _, d in df_weather_casualties.groupby('Date2'):
weather_casualties_data.append([[row['Latitude'], row['Longitude'], row['Number_of_Casualties'], row['Weather_Details']] for _, row in d.iterrows()])
So the first element looks like this:
[[51.516575, -0.08126, 1, 'Fine no high winds'],
[51.512515, -0.130576, 1, 'Fine no high winds'],
[51.542651, -0.148234, 1, 'Raining no high winds']]
I've created a folium map with the following code:
hmt = folium.Map(location=[55.000, -2.0000],
tiles='cartodbdark_matter',
zoom_start=5,
control_scale=True)
HeatMapWithTime(
weather_casualties_data,
index=time_index,
auto_play=False,
blur=1.0,
radius=8,
max_opacity=0.4
).add_to(hmt)
hmt
How can I add a filter menu to the map that filters through different "Weather_Detail" values, only showing the accidents belonging to each weather detail? I've seen people do it, but I've been unable to figure it out.
This code does what I want with a normal map, but I'm trying to get it to work with the time element:
hm = folium.Map(location=[55.000, -2.0000],
tiles='cartodbdark_matter',
zoom_start=5,
control_scale=True)
for weather in df_weather_casualties["Weather_Details"].unique():
weather_group = folium.FeatureGroup(name=weather)
HeatMap(data=df_weather_casualties[df_weather_casualties["Weather_Details"] == weather][["Latitude", "Longitude"]], blur=1.0, radius=8, max_opacity=0.4).add_to(weather_group)
weather_group.add_to(hm)
folium.LayerControl().add_to(hm)
hm
Any help or pointers would be great.
Thanks.

How to split a 2d-list based on a condition in python?

I have a 2d list, which is an accumulation of multiple (unknown amount) tables.
Table | Value | Name
"1" | "0.1" | "part_1"
"1" | "0.11" | "part_1"
"2" | "2.0" | "e_1"
...
"139" | "20.0" | "kf_1"
Now I want to split this list into another structure (list?) where I can access the different tables directly. Like:
tables[0].[0].[1] = "0.1"
tables[1].[0].[2] = "e_1"
In total, this concludes > 50 mio data rows, therefore RAM efficiency is important.

PySpark - How to loop through the dataframe and match against another common value in another dataframe

This is a recommender system and I have a Dataframe which contains about 10 recommended item for each user (recommendation_df) and I have another Dataframe which consist of the recent purchases of each user (recent_df).
I am trying to code out this task but I can't seem to get along the syntax, and the manipulation
I am implementing a hit/miss ratio, basically for every new_party_id in recent_df, if any of the merch_store_code matches the merch_store_code for the same party_id in the recommendation_df, count +=1 (Hit)
Then calculating the hit/miss ratio by using count/total user count
(However in recent_df, each user might have multiple recent purchases, but if any of the purchases is in the list of recommendations_list for the same user, take it as a hit (count +=1)
recommendation_df
+--------------+----------------+-----------+----------+
|party_id_index|merch_store_code| rating| party_id|
+--------------+----------------+-----------+----------+
| 148| 900000166| 0.4021678|G18B00332C|
| 148| 168339566| 0.27687865|G18B00332C|
| 148| 168993309| 0.15999989|G18B00332C|
| 148| 168350313| 0.1431974|G18B00332C|
| 148| 168329726| 0.13634883|G18B00332C|
| 148| 168351967|0.120235085|G18B00332C|
| 148| 168993312| 0.11800903|G18B00332C|
| 148| 168337234|0.116267696|G18B00332C|
| 148| 168993256| 0.10836013|G18B00332C|
| 148| 168339482| 0.10341005|G18B00332C|
| 463| 168350313| 0.93455887|K18M926299|
| 463| 900000072| 0.8275664|K18M926299|
| 463| 700012303| 0.70220494|K18M926299|
| 463| 700012180| 0.23209469|K18M926299|
| 463| 900000157| 0.1727839|K18M926299|
| 463| 700013689| 0.13854747|K18M926299|
| 463| 900000166| 0.12866624|K18M926299|
| 463| 168993284|0.107065596|K18M926299|
| 463| 168993269| 0.10272527|K18M926299|
| 463| 168339566| 0.10256036|K18M926299|
+--------------+----------------+-----------+----------+
recent_df
+------------+---------------+----------------+
|new_party_id|recent_purchase|merch_store_code|
+------------+---------------+----------------+
| A11275842R| 2022-05-21| 168289403|
| A131584211| 2022-06-01| 168993311|
| A131584211| 2022-06-01| 168349493|
| A131584211| 2022-06-01| 168350192|
| A182P3539K| 2022-03-26| 168341707|
| A182V2883F| 2022-05-26| 168350824|
| A183B5482P| 2022-05-10| 168993464|
| A183C6900K| 2022-05-14| 168338795|
| A183D56093| 2022-05-20| 700012303|
| A183J5388G| 2022-03-18| 700013650|
| A183U8880P| 2022-04-01| 900000072|
| A183U8880P| 2022-04-01| 168991904|
| A18409762L| 2022-05-10| 168319352|
| A18431276J| 2022-05-14| 168163905|
| A18433684M| 2022-03-21| 168993324|
| A18433978F| 2022-05-20| 168341876|
| A184410389| 2022-05-04| 900000166|
| A184716280| 2022-04-06| 700013653|
| A18473797O| 2022-05-24| 168330339|
| A18473797O| 2022-05-24| 168350592|
+------------+---------------+----------------+
Here is my current coding logic:
count = 0
def hitratio(recommendation_df, recent_df):
for i in recent_df['new_party_id']:
for j in recommendation_df['party_id']:
if (i = j) & i.merch_store_code == j.merch_store_code:
count += 1
return (count/recent_df.count())
In Spark, refrain from loops on rows. Spark does not work like that, you need to think of the whole column, not about row-by-row scenario.
You need to join both tables and select users, but they need to be without duplicates (distinct)
from pyspark.sql import functions as F
df_distinct_matches = (
recent_df
.join(recommendation_df, F.col('new_party_id') == F.col('party_id'))
.select('party_id').distinct()
)
hit = df_distinct_matches.count()
assumption :
i am taking all the count rows of recent df as denominator for calculating the hit/miss ratio you can change the formula.
from pyspark.sql import functions as F
matching_cond = ((recent_df["merch_store_code"]==recommender_df["merch_store_code"]) &(recommendation_df["party_id"].isNotNull()))
df_recent_fnl= df_recent.join(recommendation_df,df_recent["new_party_id"]==recommendation_df["party_id"],"left")\
.select(df_recent["*"],recommender_df["merch_store_code"],recommendation_df["party_id"])\
.withColumn("hit",F.when(matching_cond,F.lit(True)).otherwise(F.lit(False)))\
.withColumn("hit/miss",df_recent_fnl.filter(F.col("hit")).count()/df_recent.count())
do let me know if you have any questions around this .
If you like my solution , you can upvote

django get record which is latest and other column value

I have a model with some columns, between them there are 2 columns: equipment_id (a CharField) and date_saved (a DateTimeField).
I have multiple rows with the same equipment_id and but different date_saved (each time the user saves the record I save the now date time).
I want to retrieve the record that has a specific equipment_id and is the latest saved, i.e.:
| Equipment_id | Date_saved |
| --- ----- | --------------------- -------- |
| 1061a | 26-DEC-2020 10:10:23|
| 1061a | 26-DEC-2020 10:11:52|
| 1061a | 26-DEC-2020 10:22:03|
| 1061a | 26-DEC-2020 10:31:15|
| 1062a | 21-DEC-2020 10:11:52|
| 1062a | 25-DEC-2020 10:22:03|
| 1073a | 20-DEC-2020 10:31:15|
I want to retrieve for example the latest equipment_id=1061.
I have tried various approach without success:
prg = Program.objects.filter(equipment_id=id)
program = Program.objects.latest('date_saved')
when I use program I get the latest record saved with no relation to the previous filter
You can chain the filtering as,
result = Program.objects.filter(equipment_id=id).latest('date_saved')

Making Table without using Texttable

I am writing Python code to show items in a store .... as I am still learning I want to know how to make a table which looks exactly like a table made by using Texttable ....
My code is
Goods = ['Book','Gold']
Itemid= [711001,711002]
Price= [200,50000]
Count= [100,2]
Category= ['Books','Jewelry']
titles = ['', 'Item Id', 'Price', 'Count','Category']
data = [titles] + list(zip(Goods, Itemid, Price, Count, Category))
for i, d in enumerate(data):
line = '|'.join(str(x).ljust(12) for x in d)
print(line)
if i == 0:
print('=' * len(line))
My Output:
|Item Id |Price |Count |Category
================================================================
Book |711001 |200 |100 |Books
Gold |711002 |50000 |2 |Jewelry
Output I want:
+------+---------+-------+-------+-----------+
| | Item Id | Price | Count | Category |
+======+=========+=======+=======+===========+
| Book | 711001 | 200 | 100 | Books |
+------+---------+-------+-------+-----------+
| Gold | 711002 | 50000 | 2 | Jewelry |
+------+---------+-------+-------+-----------+
You code is building your output by hand, using string.join(). You can do it that way but it is very tedious. Use string formatting instead.
To help you along here is one line:
content_format = "| {Goods:4.4s} | {ItemId:<7d} | {Price:<5d} | {Count:<5d} | {Category:9s} |"
output_line = content_format.format(Goods="Book",ItemId=711001,Price=200,Count=100,Category="Books")
Texttable adjusts its cell widths to fit the data. If you want to do the same, then you will have to put computed field widths in content_format instead of using numeric literals the way I have done in the example above. Again, here is one example to get you going:
content_format = "| {Goods:4.4s} | {ItemId:<7d} | {Price:<5d} | {Count:<5d} | {Category:{CategoryWidth}s} |"
output_line = content_format.format(Goods="Book",ItemId=711001,Price=200,Count=100,Category="Books",CategoryWidth=9)
But if you already know how to do this using Texttable, why not use that? Your comment says it's not available in Python: not true, I just downloaded version 0.9.0 using pip.

Categories