Inverse line graph year count matplotlib pandas python

Inverse line graph year count matplotlib pandas python - python

I'm trying to create a lineplot of the count of three different groups i.e. desktop, mobile & tablet with the x axis having the years of 2014, 2015 and 2016 but I am getting the error
my code is currently:
#year-by-year change
desktop14 = od.loc[(od.Account_Year_Week >= 201401) & (od.Account_Year_Week <= 201453) & (od.online_device_type_detail == "DESKTOP"), "Gross_Demand_Pre_Credit"]
desktop15 = od.loc[(od.Account_Year_Week >= 201501) & (od.Account_Year_Week <= 201553) & (od.online_device_type_detail == "DESKTOP"), "Gross_Demand_Pre_Credit"]
desktop16 = od.loc[(od.Account_Year_Week >= 201601) & (od.Account_Year_Week <= 201653) & (od.online_device_type_detail == "DESKTOP"), "Gross_Demand_Pre_Credit"]
mobile14 = od.loc[(od.Account_Year_Week >= 201401) & (od.Account_Year_Week <= 201453) & (od.online_device_type_detail == "MOBILE"), "Gross_Demand_Pre_Credit"]
mobile15 = od.loc[(od.Account_Year_Week >= 201501) & (od.Account_Year_Week <= 201553) & (od.online_device_type_detail == "MOBILE"), "Gross_Demand_Pre_Credit"]
mobile16 = od.loc[(od.Account_Year_Week >= 201601) & (od.Account_Year_Week <= 201653) & (od.online_device_type_detail == "MOBILE"), "Gross_Demand_Pre_Credit"]
tablet14 = od.loc[(od.Account_Year_Week >= 201401) & (od.Account_Year_Week <= 201453) & (od.online_device_type_detail == "TABLET"), "Gross_Demand_Pre_Credit"]
tablet15 = od.loc[(od.Account_Year_Week >= 201501) & (od.Account_Year_Week <= 201553) & (od.online_device_type_detail == "TABLET"), "Gross_Demand_Pre_Credit"]
tablet16 = od.loc[(od.Account_Year_Week >= 201601) & (od.Account_Year_Week <= 201653) & (od.online_device_type_detail == "TABLET"), "Gross_Demand_Pre_Credit"]
devicedata = [["Desktop", desktop14.count(), desktop15.count(), desktop16.count()], ["Mobile", mobile14.count(), mobile15.count(), mobile16.count()], ["Tablet", tablet14.count(), tablet15.count(), tablet16.count()]]
df = pd.DataFrame(devicedata, columns=["Device", "2014", "2015", "2016"]).set_index("Device")
plt.show()
I want to make each of the lines the Device types and the x axis showing the change in year. How do I do this - (essentially reversing the axis).
any help is greatly appreciated

Just do
df.transpose().plot()
Result will be something like this:

Related

xarray .where() function is too slow over datasets

I am using .where() function to select time and certain criteria in xarray dataset.
import numpy as np
import xarray as xr
ds1 = xr.open_dataset('COD.nc')
ds2 = xr.open_dataset('CDNC.nc')
ds3 = xr.open_dataset('LWP.nc')
ds4 = xr.open_dataset('CTT.nc')
ds5 = xr.open_dataset('CTP.nc')
ds6 = xr.open_dataset('CER.nc')
ds11 = ds1.where((ds1.time == ds2.time))
ds22 = ds2.where((ds2.time == ds11.time))
ds33 = ds3.where((ds3.time == ds2.time))
ds44 = ds4.where((ds4.time == ds2.time))
ds55 = ds5.where((ds5.time == ds2.time))
ds66 = ds6.where((ds6.time == ds2.time))
COD = ds11.Cloud_Optical_Thickness
CDNC= ds22.Cloud_Droplet_Concentration
LWP = ds33.Cloud_Water_Path
CTT = ds44.Cloud_Top_Temperature
CTP = ds55.Cloud_Top_Pressure
CER = ds66.Cloud_Effective_Radius
cod = COD.where((CTT >= 273.0) & (CTP > 680.0) & (CER > 4) & (COD > 4))
lwp = LWP.where((CTT >= 273.0) & (CTP > 680.0) & (CER > 4) & (COD > 4))
cdnc = CDNC.where((CTT >= 273.0) & (CTP > 680.0) & (CER > 4) & (COD > 4))
but its too slow....even for small dataset......
Dimension of my each dataset is (time: 7555, lat= 35, lon=71). Its running for more than two hours....
is there any way to fasten the performance? Thanks!!

Create new column based on condtions of others

I have this df:
Segnale Prezzo Prezzo_exit
0 Long 44645 43302
1 Short 41169 44169
2 Long 44322 47093
3 Short 45323 42514
sample code to generate it:
tbl2 = {
"Segnale" : ["Long", "Short", "Long", "Short"],
"Prezzo" : [44645, 41169, 44322, 45323],
"Prezzo_exit" : [43302, 44169, 47093, 42514]}
df = pd.DataFrame(tbl2)
I need to create a new column named "esito" with this conditions:
if df["Segnale"] =="Long" and df["Prezzo"] < df["Prezzo_exit"] #row with "target"
if df["Segnale"] =="Long" and df["Prezzo"] > df["Prezzo_exit"] #row with "stop"
if df["Segnale"] =="Short" and df["Prezzo"] < df["Prezzo_exit"] #row with "stop"
if df["Segnale"] =="Short" and df["Prezzo"] > df["Prezzo_exit"] #row with "target"
So the final result will be:
Segnale Prezzo Prezzo_exit esito
0 Long 44645 43302 stop
1 Short 41169 44169 stop
2 Long 44322 47093 target
3 Short 45323 42514 target
I tried with no success:
df.loc[(df['Segnale'].str.contains('Long') & df['Prezzo'] <
df['Prezzo_exit']), 'Esito'] = 'Target'
df.loc[(df['Segnale'].str.contains('Long') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Stop'
df.loc[(df['Segnale'].str.contains('Short') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Target'
df.loc[(df['Segnale'].str.contains('Short') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Stop'

This will do what your question asks:
df.loc[(df.Segnale=='Long') & (df.Prezzo < df.Prezzo_exit), 'esito'] = 'target'
df.loc[(df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit), 'esito'] = 'stop'
df.loc[(df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit), 'esito'] = 'stop'
df.loc[(df.Segnale=='Short') & (df.Prezzo > df.Prezzo_exit), 'esito'] = 'target'
Output:
Segnale Prezzo Prezzo_exit esito
0 Long 44645 43302 stop
1 Short 41169 44169 stop
2 Long 44322 47093 target
3 Short 45323 42514 target
UPDATE:
You could also do this:
df['esito'] = ( pd.Series(['stop']*len(df)).where(
((df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit)) | ((df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit)),
'target') )
... or this:
df['esito'] = ( np.where(
((df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit)) | ((df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit)),
'stop', 'target') )

You need add parentheses to following comparison
(df['Prezzo'] < df['Prezzo_exit'])
For simplification, you can use np.select to select condition and choice in one statement.

excel if and logic to data frame

I have aloe of excel files I am trying to convert to python codes and need some help :)
I have a data frame like this:
Date STD-3 STD-25 STD-2 STD-15 STD-1 Data STD1 STD15 STD2 STD25 STD3
11.05.2022 -0,057406797 -0,047838998 -0,038271198 -0,028703399 -0,019135599 0,021233631 0,019135599 0,028703399 0,038271198 0,047838998 0,057406797
I need to check for this logic:
"Data" < "STD1" and "Data" > "STD-1" = 0
"Data" > "STD1" and "Data" < "STD15" = 1
"Data" > "STD15" and "Data" < "STD2" = 1,5
"Data" > "STD2" and "Data" < "STD25" = 2
"Data" > "STD25" and "Data" < "STD3" = 2,5
"Data" > "STD3" = 3
"Data" < "STD-1" and "Data" > "STD-15" = -1
"Data" < "STD-15" and "Data" > "STD-2" = -1,5
"Data" < "STD-2" and "Data" > "STD-25" = -2
"Data" < "STD-25" and "Data" > "STD-3" = -2,5
"Data" > "STD3" = -3
And add the output to a new column.

condition = [((df['DATA'] < df['STD1']) & (df['DATA'] > df['STD-1'])), ((df['DATA'] > df['STD1']) & (df['DATA'] < df['STD15'])), ((df['DATA'] > df['STD15']) & (df['DATA'] < df['STD2'])), ((df['DATA'] > df['STD2']) & (df['DATA'] < df['STD25'])), ((df['DATA'] > df['STD25']) & (df['DATA'] < df['STD3'])), df['DATA'] > df['STD3'], ((df['DATA'] < df['STD-1']) & (df['DATA'] > df['STD-15'])), ((df['DATA'] < df['STD-15']) & (df['DATA'] > df['STD-2'])), ((df['DATA'] < df['STD-25']) & (df['DATA'] > df['STD-3'])), df['DATA'] > df['STD-3']]
result = [0, 1, 1.5, 2, 2.5, 3, -1, -1.5, -2.5, -3]
df['RESULT'] = np.select(condition, result, None)

how to do filter on pandas dataframe?

Example Code here :
x7 = ['Spammer','Suspicious','Normal','Micro Influencer','Influencer']
rasio_real_spammer = df[(df['Rasio Followers/Followings'] < 0.5) & (df['fake'] == 0)].count()
temp = df[(df['Rasio Followers/Followings'] > 0.5) & (df['Rasio Followers/Followings'] < 1.0)].count()
rasio_real_suspicious = temp & (df['fake'] == 0).count()
temp2=df[(df['Rasio Followers/Followings'] >= 1.0) & (df['Rasio Followers/Followings'] < 2.0)].count()
rasio_real_normal = temp2 & (df['fake'] == 0).count()
temp3=df[(df['Rasio Followers/Followings'] >= 2.0) & (df['Rasio Followers/Followings'] < 10.0)].count()
rasio_real_micro = temp3 & (df['fake'] == 0).count()
rasio_real_influencer = df[(df['Rasio Followers/Followings'] >= 10.0 ) & (df['fake'] == 0)].count()
plt.bar(x7[0], rasio_real_spammer , color='red',label='Spammer')
plt.bar(x7[1], rasio_real_suspicious, color='yellow',label='Suspicious')
plt.bar(x7[2], rasio_real_normal, color='blue',label='Normal')
plt.bar(x7[3], rasio_real_micro, color='green',label='Micro Influencer')
plt.bar(x7[4], rasio_real_influencer, color='gray',label='Influencer')
plt.title("Distribution Rasio on Real Class")
plt.legend()
plt.show()
when I do a manual check of the results of rasio_real_spammer and rasio_real_influencer was correct. but the other results are not correct, maybe an error when filtering the class. any solutions ?

x7 = ['Spammer','Suspicious','Normal','Micro Influencer','Influencer']
rasio_real_spammer = df[(df['Rasio Followers/Followings'] < 0.5) & (df['fake'] == 0)].count()
rasio_real_suspicious = df[(df['Rasio Followers/Followings'] > 0.5) & (df['Rasio Followers/Followings'] < 1.0) & (df['fake'] == 0)].count()
rasio_real_normal=df[(df['Rasio Followers/Followings'] >= 1.0) & (df['Rasio Followers/Followings'] < 2.0) & (df['fake'] == 0)].count()
rasio_real_micro =df[(df['Rasio Followers/Followings'] >= 2.0) & (df['Rasio Followers/Followings'] < 10.0) & (df['fake'] == 0)].count()
rasio_real_influencer = df[(df['Rasio Followers/Followings'] >= 10.0 ) & (df['fake'] == 0)].count()
plt.bar(x7[0], rasio_real_spammer , color='red',label='Spammer')
plt.bar(x7[1], rasio_real_suspicious, color='yellow',label='Suspicious')
plt.bar(x7[2], rasio_real_normal, color='blue',label='Normal')
plt.bar(x7[3], rasio_real_micro, color='green',label='Micro Influencer')
plt.bar(x7[4], rasio_real_influencer, color='gray',label='Influencer')
plt.title("Distribution Rasio on Real Class")
plt.legend()
plt.show()

numpy.where makes code slow

I have the following block of code:
def hasCleavage(tags, pair, fragsize):
limit = int(fragsize["mean"] + fragsize["sd"] * 4)
if pair.direction == "F1R2" or pair.direction == "R2F1":
x1 = np.where((tags[pair.chr_r1] >= pair.r1["pos"]) & (tags[pair.chr_r1] <= pair.r1["pos"]+limit))[0]
x2 = np.where((tags[pair.chr_r2] <= pair.r2["pos"]+pair.frside) & (tags[pair.chr_r2] >= pair.r2["pos"]+pair.frside-limit))[0]
elif pair.direction == "F1F2" or pair.direction == "F2F1":
x1 = np.where((tags[pair.chr_r1] >= pair.r1["pos"]) & (tags[pair.chr_r1] <= pair.r1["pos"]+limit))[0]
x2 = np.where((tags[pair.chr_r2] >= pair.r2["pos"]) & (tags[pair.chr_r2] <= pair.r2["pos"]+limit))[0]
elif pair.direction == "R1R2" or pair.direction == "R2R1":
x1 = np.where((tags[pair.chr_r1] <= pair.r1["pos"]+pair.frside) & (tags[pair.chr_r1] >= pair.r1["pos"]+pair.frside-limit))[0]
x2 = np.where((tags[pair.chr_r2] <= pair.r2["pos"]+pair.frside) & (tags[pair.chr_r2] >= pair.r2["pos"]+pair.frside-limit))[0]
else: #F2R1 or R1F2
x1 = np.where((tags[pair.chr_r2] >= pair.r2["pos"]) & (tags[pair.chr_r2] <= pair.r2["pos"]+limit))[0]
x2 = np.where((tags[pair.chr_r1] <= pair.r1["pos"]+pair.frside) & (tags[pair.chr_r1] >= pair.r1["pos"]+pair.frside-limit))[0]
if x1.size > 0 and x2.size > 0:
return True
else:
return False
My script takes 16 minutes to finish. It calls hasCleavage millions of times, one time per row reading a file. When I add above the variable limit a return True (preventing calling np.where), the script takes 5 minutes.
tags is a dictionary containing numpy arrays with ascending numbers.
Do you have any suggestions to improve performance?
EDIT:
tags = {'JH584302.1': array([ 351, 1408, 2185, 2378, 2740, 2904, 3364, 3657,
4240, 5324, 5966, 5977, 5986, 6488, 6531, 6847,
6961, 6973, 6991, 7107, 7383, 7395, 7557, 7569,
9178, 10077, 10456, 10471, 11271, 11466, 12311, 12441,
12598, 13051, 13123, 13859, 14167, 14672, 15156, 15252,
15268, 15273, 15694, 15786, 16361, 17073, 17293, 17454])
}
fragsize = {'sd': 130.29407997430428, 'mean': 247.56636}
And pair is an object of a custom class
<__main__.Pair object at 0x17129ad0>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Inverse line graph year count matplotlib pandas python - python

Just do df.transpose().plot() Result will be something like this:

Related

xarray .where() function is too slow over datasets

Create new column based on condtions of others

excel if and logic to data frame

how to do filter on pandas dataframe?

numpy.where makes code slow

Categories

Resources