Inverse line graph year count matplotlib pandas python - python

I'm trying to create a lineplot of the count of three different groups i.e. desktop, mobile & tablet with the x axis having the years of 2014, 2015 and 2016 but I am getting the error
my code is currently:
#year-by-year change
desktop14 = od.loc[(od.Account_Year_Week >= 201401) & (od.Account_Year_Week <= 201453) & (od.online_device_type_detail == "DESKTOP"), "Gross_Demand_Pre_Credit"]
desktop15 = od.loc[(od.Account_Year_Week >= 201501) & (od.Account_Year_Week <= 201553) & (od.online_device_type_detail == "DESKTOP"), "Gross_Demand_Pre_Credit"]
desktop16 = od.loc[(od.Account_Year_Week >= 201601) & (od.Account_Year_Week <= 201653) & (od.online_device_type_detail == "DESKTOP"), "Gross_Demand_Pre_Credit"]
mobile14 = od.loc[(od.Account_Year_Week >= 201401) & (od.Account_Year_Week <= 201453) & (od.online_device_type_detail == "MOBILE"), "Gross_Demand_Pre_Credit"]
mobile15 = od.loc[(od.Account_Year_Week >= 201501) & (od.Account_Year_Week <= 201553) & (od.online_device_type_detail == "MOBILE"), "Gross_Demand_Pre_Credit"]
mobile16 = od.loc[(od.Account_Year_Week >= 201601) & (od.Account_Year_Week <= 201653) & (od.online_device_type_detail == "MOBILE"), "Gross_Demand_Pre_Credit"]
tablet14 = od.loc[(od.Account_Year_Week >= 201401) & (od.Account_Year_Week <= 201453) & (od.online_device_type_detail == "TABLET"), "Gross_Demand_Pre_Credit"]
tablet15 = od.loc[(od.Account_Year_Week >= 201501) & (od.Account_Year_Week <= 201553) & (od.online_device_type_detail == "TABLET"), "Gross_Demand_Pre_Credit"]
tablet16 = od.loc[(od.Account_Year_Week >= 201601) & (od.Account_Year_Week <= 201653) & (od.online_device_type_detail == "TABLET"), "Gross_Demand_Pre_Credit"]
devicedata = [["Desktop", desktop14.count(), desktop15.count(), desktop16.count()], ["Mobile", mobile14.count(), mobile15.count(), mobile16.count()], ["Tablet", tablet14.count(), tablet15.count(), tablet16.count()]]
df = pd.DataFrame(devicedata, columns=["Device", "2014", "2015", "2016"]).set_index("Device")
plt.show()
I want to make each of the lines the Device types and the x axis showing the change in year. How do I do this - (essentially reversing the axis).
any help is greatly appreciated

Just do
df.transpose().plot()
Result will be something like this:

Related

xarray .where() function is too slow over datasets

I am using .where() function to select time and certain criteria in xarray dataset.
import numpy as np
import xarray as xr
ds1 = xr.open_dataset('COD.nc')
ds2 = xr.open_dataset('CDNC.nc')
ds3 = xr.open_dataset('LWP.nc')
ds4 = xr.open_dataset('CTT.nc')
ds5 = xr.open_dataset('CTP.nc')
ds6 = xr.open_dataset('CER.nc')
ds11 = ds1.where((ds1.time == ds2.time))
ds22 = ds2.where((ds2.time == ds11.time))
ds33 = ds3.where((ds3.time == ds2.time))
ds44 = ds4.where((ds4.time == ds2.time))
ds55 = ds5.where((ds5.time == ds2.time))
ds66 = ds6.where((ds6.time == ds2.time))
COD = ds11.Cloud_Optical_Thickness
CDNC= ds22.Cloud_Droplet_Concentration
LWP = ds33.Cloud_Water_Path
CTT = ds44.Cloud_Top_Temperature
CTP = ds55.Cloud_Top_Pressure
CER = ds66.Cloud_Effective_Radius
cod = COD.where((CTT >= 273.0) & (CTP > 680.0) & (CER > 4) & (COD > 4))
lwp = LWP.where((CTT >= 273.0) & (CTP > 680.0) & (CER > 4) & (COD > 4))
cdnc = CDNC.where((CTT >= 273.0) & (CTP > 680.0) & (CER > 4) & (COD > 4))
but its too slow....even for small dataset......
Dimension of my each dataset is (time: 7555, lat= 35, lon=71). Its running for more than two hours....
is there any way to fasten the performance? Thanks!!

Create new column based on condtions of others

I have this df:
Segnale Prezzo Prezzo_exit
0 Long 44645 43302
1 Short 41169 44169
2 Long 44322 47093
3 Short 45323 42514
sample code to generate it:
tbl2 = {
"Segnale" : ["Long", "Short", "Long", "Short"],
"Prezzo" : [44645, 41169, 44322, 45323],
"Prezzo_exit" : [43302, 44169, 47093, 42514]}
df = pd.DataFrame(tbl2)
I need to create a new column named "esito" with this conditions:
if df["Segnale"] =="Long" and df["Prezzo"] < df["Prezzo_exit"] #row with "target"
if df["Segnale"] =="Long" and df["Prezzo"] > df["Prezzo_exit"] #row with "stop"
if df["Segnale"] =="Short" and df["Prezzo"] < df["Prezzo_exit"] #row with "stop"
if df["Segnale"] =="Short" and df["Prezzo"] > df["Prezzo_exit"] #row with "target"
So the final result will be:
Segnale Prezzo Prezzo_exit esito
0 Long 44645 43302 stop
1 Short 41169 44169 stop
2 Long 44322 47093 target
3 Short 45323 42514 target
I tried with no success:
df.loc[(df['Segnale'].str.contains('Long') & df['Prezzo'] <
df['Prezzo_exit']), 'Esito'] = 'Target'
df.loc[(df['Segnale'].str.contains('Long') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Stop'
df.loc[(df['Segnale'].str.contains('Short') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Target'
df.loc[(df['Segnale'].str.contains('Short') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Stop'
This will do what your question asks:
df.loc[(df.Segnale=='Long') & (df.Prezzo < df.Prezzo_exit), 'esito'] = 'target'
df.loc[(df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit), 'esito'] = 'stop'
df.loc[(df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit), 'esito'] = 'stop'
df.loc[(df.Segnale=='Short') & (df.Prezzo > df.Prezzo_exit), 'esito'] = 'target'
Output:
Segnale Prezzo Prezzo_exit esito
0 Long 44645 43302 stop
1 Short 41169 44169 stop
2 Long 44322 47093 target
3 Short 45323 42514 target
UPDATE:
You could also do this:
df['esito'] = ( pd.Series(['stop']*len(df)).where(
((df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit)) | ((df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit)),
'target') )
... or this:
df['esito'] = ( np.where(
((df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit)) | ((df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit)),
'stop', 'target') )
You need add parentheses to following comparison
(df['Prezzo'] < df['Prezzo_exit'])
For simplification, you can use np.select to select condition and choice in one statement.

excel if and logic to data frame

I have aloe of excel files I am trying to convert to python codes and need some help :)
I have a data frame like this:
Date STD-3 STD-25 STD-2 STD-15 STD-1 Data STD1 STD15 STD2 STD25 STD3
11.05.2022 -0,057406797 -0,047838998 -0,038271198 -0,028703399 -0,019135599 0,021233631 0,019135599 0,028703399 0,038271198 0,047838998 0,057406797
I need to check for this logic:
"Data" < "STD1" and "Data" > "STD-1" = 0
"Data" > "STD1" and "Data" < "STD15" = 1
"Data" > "STD15" and "Data" < "STD2" = 1,5
"Data" > "STD2" and "Data" < "STD25" = 2
"Data" > "STD25" and "Data" < "STD3" = 2,5
"Data" > "STD3" = 3
"Data" < "STD-1" and "Data" > "STD-15" = -1
"Data" < "STD-15" and "Data" > "STD-2" = -1,5
"Data" < "STD-2" and "Data" > "STD-25" = -2
"Data" < "STD-25" and "Data" > "STD-3" = -2,5
"Data" > "STD3" = -3
And add the output to a new column.
condition = [((df['DATA'] < df['STD1']) & (df['DATA'] > df['STD-1'])), ((df['DATA'] > df['STD1']) & (df['DATA'] < df['STD15'])), ((df['DATA'] > df['STD15']) & (df['DATA'] < df['STD2'])), ((df['DATA'] > df['STD2']) & (df['DATA'] < df['STD25'])), ((df['DATA'] > df['STD25']) & (df['DATA'] < df['STD3'])), df['DATA'] > df['STD3'], ((df['DATA'] < df['STD-1']) & (df['DATA'] > df['STD-15'])), ((df['DATA'] < df['STD-15']) & (df['DATA'] > df['STD-2'])), ((df['DATA'] < df['STD-25']) & (df['DATA'] > df['STD-3'])), df['DATA'] > df['STD-3']]
result = [0, 1, 1.5, 2, 2.5, 3, -1, -1.5, -2.5, -3]
df['RESULT'] = np.select(condition, result, None)

how to do filter on pandas dataframe?

Example Code here :
x7 = ['Spammer','Suspicious','Normal','Micro Influencer','Influencer']
rasio_real_spammer = df[(df['Rasio Followers/Followings'] < 0.5) & (df['fake'] == 0)].count()
temp = df[(df['Rasio Followers/Followings'] > 0.5) & (df['Rasio Followers/Followings'] < 1.0)].count()
rasio_real_suspicious = temp & (df['fake'] == 0).count()
temp2=df[(df['Rasio Followers/Followings'] >= 1.0) & (df['Rasio Followers/Followings'] < 2.0)].count()
rasio_real_normal = temp2 & (df['fake'] == 0).count()
temp3=df[(df['Rasio Followers/Followings'] >= 2.0) & (df['Rasio Followers/Followings'] < 10.0)].count()
rasio_real_micro = temp3 & (df['fake'] == 0).count()
rasio_real_influencer = df[(df['Rasio Followers/Followings'] >= 10.0 ) & (df['fake'] == 0)].count()
plt.bar(x7[0], rasio_real_spammer , color='red',label='Spammer')
plt.bar(x7[1], rasio_real_suspicious, color='yellow',label='Suspicious')
plt.bar(x7[2], rasio_real_normal, color='blue',label='Normal')
plt.bar(x7[3], rasio_real_micro, color='green',label='Micro Influencer')
plt.bar(x7[4], rasio_real_influencer, color='gray',label='Influencer')
plt.title("Distribution Rasio on Real Class")
plt.legend()
plt.show()
when I do a manual check of the results of rasio_real_spammer and rasio_real_influencer was correct. but the other results are not correct, maybe an error when filtering the class. any solutions ?
x7 = ['Spammer','Suspicious','Normal','Micro Influencer','Influencer']
rasio_real_spammer = df[(df['Rasio Followers/Followings'] < 0.5) & (df['fake'] == 0)].count()
rasio_real_suspicious = df[(df['Rasio Followers/Followings'] > 0.5) & (df['Rasio Followers/Followings'] < 1.0) & (df['fake'] == 0)].count()
rasio_real_normal=df[(df['Rasio Followers/Followings'] >= 1.0) & (df['Rasio Followers/Followings'] < 2.0) & (df['fake'] == 0)].count()
rasio_real_micro =df[(df['Rasio Followers/Followings'] >= 2.0) & (df['Rasio Followers/Followings'] < 10.0) & (df['fake'] == 0)].count()
rasio_real_influencer = df[(df['Rasio Followers/Followings'] >= 10.0 ) & (df['fake'] == 0)].count()
plt.bar(x7[0], rasio_real_spammer , color='red',label='Spammer')
plt.bar(x7[1], rasio_real_suspicious, color='yellow',label='Suspicious')
plt.bar(x7[2], rasio_real_normal, color='blue',label='Normal')
plt.bar(x7[3], rasio_real_micro, color='green',label='Micro Influencer')
plt.bar(x7[4], rasio_real_influencer, color='gray',label='Influencer')
plt.title("Distribution Rasio on Real Class")
plt.legend()
plt.show()

numpy.where makes code slow

I have the following block of code:
def hasCleavage(tags, pair, fragsize):
limit = int(fragsize["mean"] + fragsize["sd"] * 4)
if pair.direction == "F1R2" or pair.direction == "R2F1":
x1 = np.where((tags[pair.chr_r1] >= pair.r1["pos"]) & (tags[pair.chr_r1] <= pair.r1["pos"]+limit))[0]
x2 = np.where((tags[pair.chr_r2] <= pair.r2["pos"]+pair.frside) & (tags[pair.chr_r2] >= pair.r2["pos"]+pair.frside-limit))[0]
elif pair.direction == "F1F2" or pair.direction == "F2F1":
x1 = np.where((tags[pair.chr_r1] >= pair.r1["pos"]) & (tags[pair.chr_r1] <= pair.r1["pos"]+limit))[0]
x2 = np.where((tags[pair.chr_r2] >= pair.r2["pos"]) & (tags[pair.chr_r2] <= pair.r2["pos"]+limit))[0]
elif pair.direction == "R1R2" or pair.direction == "R2R1":
x1 = np.where((tags[pair.chr_r1] <= pair.r1["pos"]+pair.frside) & (tags[pair.chr_r1] >= pair.r1["pos"]+pair.frside-limit))[0]
x2 = np.where((tags[pair.chr_r2] <= pair.r2["pos"]+pair.frside) & (tags[pair.chr_r2] >= pair.r2["pos"]+pair.frside-limit))[0]
else: #F2R1 or R1F2
x1 = np.where((tags[pair.chr_r2] >= pair.r2["pos"]) & (tags[pair.chr_r2] <= pair.r2["pos"]+limit))[0]
x2 = np.where((tags[pair.chr_r1] <= pair.r1["pos"]+pair.frside) & (tags[pair.chr_r1] >= pair.r1["pos"]+pair.frside-limit))[0]
if x1.size > 0 and x2.size > 0:
return True
else:
return False
My script takes 16 minutes to finish. It calls hasCleavage millions of times, one time per row reading a file. When I add above the variable limit a return True (preventing calling np.where), the script takes 5 minutes.
tags is a dictionary containing numpy arrays with ascending numbers.
Do you have any suggestions to improve performance?
EDIT:
tags = {'JH584302.1': array([ 351, 1408, 2185, 2378, 2740, 2904, 3364, 3657,
4240, 5324, 5966, 5977, 5986, 6488, 6531, 6847,
6961, 6973, 6991, 7107, 7383, 7395, 7557, 7569,
9178, 10077, 10456, 10471, 11271, 11466, 12311, 12441,
12598, 13051, 13123, 13859, 14167, 14672, 15156, 15252,
15268, 15273, 15694, 15786, 16361, 17073, 17293, 17454])
}
fragsize = {'sd': 130.29407997430428, 'mean': 247.56636}
And pair is an object of a custom class
<__main__.Pair object at 0x17129ad0>

Categories