I am trying to extract information of largest 10 values from my csv based on column 'id' and plot them together in a subplot(each subplot for each id).
#data : contains original csv
#data_big : contains refined data
data_big = data_big.nlargest(10, 'a') #searched first 10 largest entries from 'a'
fig, ax = plt.subplots(nrows=5, ncols=2, figsize=(12, 16))
fig.subplots_adjust(hspace=0.5)
# match data_big['id] and data['id]
for i in data_big['id']:
if i in data['id']:
data_big1 = data[data['id'] == i]
#count unique values in column id
count = data_big1['id'].value_counts()
#print(count)
for k in range(5):
for n in range(2):
data_big1.plot(x='TIMESTEP', y='a', ax=ax[k,n], label='big_{}'.format(i))
ax[k,n].set_xlabel('TIMESTEP')
ax[k,n].set_ylabel('a')
ax[k,n].legend()
plt.show()
This code genrates a subplot of 5 rows and 2 columns with 10 ids in each plot.
What I want is each id in each plot (not 10 ids in each plot).
Can I get some help to figure out what did I miss?
Thanks
You have to enumerate data_big and use this value to calculate k,n
And plot it without using for-loops.
for number, i in enumerate(data_big['id']):
k = number // 2
n = number % 2
if i in data['id']:
data_big1 = data[data['id'] == i]
#count unique values in column id
count = data_big1['id'].value_counts()
#print(count)
# plot it only once - without using `for`-loops
data_big1.plot(x='TIMESTEP', y='a', ax=ax[k,n], label='big_{}'.format(i))
ax[k,n].set_xlabel('TIMESTEP')
ax[k,n].set_ylabel('a')
ax[k,n].legend()
I am trying to produce a box plot using matplotlib with data from nested dictionaries. Below is a rough outline of the structure of dictionary in question.
m_data = {scenario:{variable:{'model_name':value, ''model_name':value ...}
One issue is that I want to look at the change in the models output between the two different scenarios ( scenario 1 [VAR1] - scenario 2 [VAR2]) and then plot this difference in a box plot.
I have managed to do this, however, I want to be able to label the outliers with the model name. My current method separates the keys from the values, therefore the outlier data point has no name associated with it anymore.
#BOXPLOT
#set up blank lists
future_rain = []
past_rain = []
future_temp = []
past_temp = []
#single out the values for each model from the nested dictioaries
for key,val in m_data[FUTURE_SCENARIO][VAR1].items():
future_rain.append(val)
for key,val in m_data[FUTURE_SCENARIO][VAR2].items():
future_temp.append(val)
for key,val in m_data['historical'][VAR1].items():
past_rain.append(val)
for key,val in m_data['historical'][VAR2].items():
past_temp.append(val)
#blanks for final data
bx_plt_rain = []
bx_plt_temp = []
#allow for the subtration of two lists
zip_object = zip(future_temp, past_temp)
for future_temp_i, past_temp_i in zip_object:
bx_plt_temp.append(future_temp_i - past_temp_i)
zip_object = zip(future_rain, past_rain)
for future_rain_i, past_rain_i in zip_object:
bx_plt_rain.append(future_rain_i - past_rain_i)
#colour ouliers red
c = 'red'
outlier_col = {'flierprops': dict(color =c, markeredgecolor=c)}
#plot
bp = plt.boxplot(bx_plt_rain, patch_artist=True, showmeans=True, vert= False, meanline=True, **outlier_col)
bp['boxes'][0].set(facecolor = 'lightgrey')
plt.show()
If anyone knows of a workaround for this I would be extremely grateful.
As a bit of a hack you could create a function that looks through the dict for the outlier value and returns the key.
def outlier_name(outlier_val, inner_dict):
for key, value in inner_dict.items():
if value == outlier_val:
return key
This could be pretty intensive if your data sets are large.
I have a dataframe (portbase) that contains multiple signals (signalname) and their returns.
I want to subset every single, calculate the cumulative return and then plot them in a single figure.
I have done it step by step with on single as an example:
ChInvIA = portbase[portbase['signalname'] == 'ChInvIA']
cum_perf_ChInvIA = ChInvIA['return'].cumsum() + 100
cum_perf_ChInvIA.plot()
plt.show()
With multiple signals this would take me way too long and therefore i've tries to loop over my dataframe.
for i in signals:
i = portbase[portbase['signalname'] == 'i']
cum_perf_i = i['return'].cumsum() + 100
cum_perf_i.plot()
plt.show()
It doesn't work and i've tried to find a solution.
You are calling both the looping variable and a variable in the loop by the name i, and comparing signalname to a string containing i ('i') instead of the variable itself. You should do something like this instead:
for i in signals:
signal_i = portbase[portbase['signalname'] == i]
cum_perf_i = signal_i['return'].cumsum() + 100
cum_perf_i.plot()
plt.show()
To have all the plots in the same figure, you should use matplotlib's subplots function:
fig, ax = plt.subplots(len(signals))
for ind, i in enumerate(signals):
signal_i = portbase[portbase['signalname'] == i]
cum_perf_i = signal_i['return'].cumsum() + 100
cum_perf_i.plot(ax=ax[ind])
plt.show()
I am working on global terrorism database and trying to plot a bar plot of various target types of terrorists.
I want to assign a code to every xlabel and print a legend on the graph showing corresponding codes with the target types
Till now I have not found any solution to this problem.
Link to the dataset in consideration.
df['targtype1_txt'].value_counts().plot(kind = 'bar')
plt.title("Favoirite Target types of Terrorist")
plt.xlabel("Target Type")
plt.ylabel("Value_Counts")
Link to output of the above code.
I would like to map the values of xlabel to some numerical code and put it as list on the side of the plot
You can do something like this:
c = df.targtype1_txt.astype('category')
d = dict(enumerate(c.cat.categories))
df['targtype1_txt_code'] = c.cat.codes
df['targtype1_txt_code'].value_counts().plot(kind = 'bar')
plt.title("Favorite Target types of Terrorist")
plt.xlabel("Target Type")
plt.xticks(rotation=0)
plt.ylabel("Value_Counts")
s ='\n'.join(['%s: %s' % (key, value) for (key, value) in d.items()])
plt.text(2.6, 1, s, fontsize=14)
Create a new column with a code for each value and plot that column. Then add a text box on the side using the dictionary that has been used to code the values.
Output (with few values):
The data in the dataset is comprised purely of chars. For example:
p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g
A complete copy of the data can be found in agaricus-lepiota.data in the uci machine learning datasets mushroom dataset
Are there methods of visualisation for using char data (instead of having to convert the data set to numeric) via matplotlib?
Just for any sort of visualization i.e:
filename = 'mushrooms.csv'
df_mushrooms = pd.read_csv(filename, names = ["Classes", "Cap-Shape", "Cap-Surface", "Cap-Colour", "Bruises", "Odor", "Gill-Attachment", "Gill-Spacing", "Gill-Size", "Gill-Colour", "Stalk-Shape", "Stalk-Root", "Stalk-Surface-Above-Ring", "Stalk-Surface-Below-Ring", "Stalk-Colour-Above-Ring", "Stalk-Colour-Below-Ring", "Veil-Type", "Veil-Colour", "Ring-Number", "Ring-Type", "Spore-Print-Colour", "Population", "Habitat"])
#If there are any entires (rows) with any missing values/NaN's drop the row.
df_mushrooms.dropna(axis = 0, how = 'any', inplace = True)
df_mushrooms.plot.scatter(x = 'Classes', y = 'Cap-Shape')
It is possible to do this, but with this approach it doesn't really make any sense from a graphical point of view. If you were to do what you asked for it would look like this:
And I know I shouldn't tread into the territory of telling someone how to present their graphs, but this doesn't convey any information to me. The issue is that using Classes and Cap-Shape fields for your x and y indices will always put the same letter in the same place. There is no variability. Perhaps there is some other field you could use as the index and then use the Cap-Shape as your marker, but as it is this doesn't add any value. Again that is to me personally.
To use a string as a marker you can use the "$...$" marker described in matplotlib.markers, but again I must provide the caveat that graphing like this is much slower than the traditional method as you must iterate over the rows of your dataframe.
fig, ax = plt.subplots()
# Classes only has 'p' and 'e' as unique values so we will map them as 1 and 2 on the index
df['Class_Id'] = df.Classes.map(lambda x: 1 if x == 'p' else 2)
df['Cap_Val'] = df['Cap-Shape'].map(lambda x: ord(x) - 96)
for idx, row in df.iterrows():
ax.scatter(x=row.Class_Id, y=row.Cap_Val, marker=r"$ {} $".format(row['Cap-Shape']), c=plt.cm.nipy_spectral(row.Cap_Val / 26))
ax.set_xticks([0,1,2,3])
ax.set_xticklabels(['', 'p', 'e', ''])
ax.set_yticklabels(['', 'e', 'j', 'o', 't', 'y'])
fig.show()