I have 2 datasets as same size with (395088, 12) as y_test and y_pred. Now I have done as below in order to do the plots in between those datasets.
plt.scatter(y_test[:,4], y_pred[:,4])
plt.show()
plt.scatter(y_test[:,5], y_pred[:,5])
plt.show()
However, I want to loop through all the 12 rows with respected in between 2 datasets and have 12 subplots without doing it one by one. Any idea would be really appreciated.
Related
i have so many columns or variables, but actually I just want to know the correlation betweeen some columns because it is too much. may be I can divided the data by 5 times. I use the code like this
plt.figure(figsize=(20, 20))
cmap = sns.diverging_palette(222, 10, as_cmap=True)
_ = sns.heatmap(df_new.corr(), annot=True, vmax=.8, square=True, cmap=cmap)```
and the output is
isnt it to much?
How can I divided the data into several columns like 25 columns for 1 heatmap not 120 variables for 1 heatmap correlation?
Thank you for helping me!
create a new df that includes all the columns that you want to plot the correlations for and then include this df in the data attribute of the sns.heatmap() function.
I am very new with Python but I want to plot data from a .csv-file. In this .csv-file I have 8 columns. It is no problem for me to plot with one y-axis but I need three.
One y-axis for column #2 with axis limits 0 and 100.
One y axis for column #1 and #6 with axis limits 0 and 20000.
And one y axis for the remaining 5 columns with axis limits 0 to 1000.
Is there some easy way to make such a plot?
I hope my english is understandable.
Best regards!
I am plotting a histogram with this data.
dict_values([2.5039286220812003e-18, 8.701119009863531e-17, 9.181036322384948e-17, 8.972473923736572e-17, 9.160265320730097e-17, 8.826609291023463e-17, 8.888913336226638e-17, 8.993242948900264e-17, 9.556623462346049e-17, 8.847279448923369e-17, 8.86804710730486e-17, 8.806035948033239e-17])
This is my code:
print(len(new_dictonary.values()))
plt.figure(figsize=(15, 5))
plt.hist(new_dictonary.values())
plt.show()
I expect to have 12 bar, but I got only two bars. I have to use plt.hist
How could correct my code to have the right picture?
Edited answer: The problem is that your values are very small in magnitude and 11 out of 12 are very close to each other and the remaining one is far away. So to have each value plotted individually as a separate bar, you need a large number of bins. Now if you limit your x-axis to show the 11 similar values out of 12, you will see that having bins=1000 (a large number) shows 11 bars.
plt.hist(new_dictonary, bins=1000, edgecolor='k')
plt.xlim(0.8e-16, 1e-16)
If you show them all, you will see how far they are. I don't know how you plan to fit a distribution to such data.
plt.hist(new_dictonary, bins=1000, edgecolor='k')
I already asked about labeling the axes and I have got the answer that satisfied me for the data that I had at that time. But now I'm trying plot the dataframe with kind=line to see better the evaluation of my values. so I'm using these pandas methods, that don't work in the same manner for kind=line as for kind=bar, and though don't provide the labels for axes. So my dataframe :
name Homework_note Class_note Behavior_note
Alice Ji 7 6 6
Eleonora LI 2 5 4
Mike The 6 5 3
Helen Wo 5 3 5
the script I use:
df=pd.read_csv(os.path.join(path,'class_notes.csv'), sep='\t|,', engine='python')
df.columns=['name', 'Homework_note', 'Class_note', 'Behavior_note']
ax=df.plot(kind='line', x='name', color=['red','blue', 'green'], figsize=(400,100))
ax.set_xlabel("Names", fontsize=56)
ax.set_ylabel("Notes", fontsize=56)
ax.set_title("Notes evaluation", fontsize=79)
plt.legend(loc=2,prop={'size':60})
plt.savefig(os.path.join(path,'notes_names.png'), bbox_inches='tight', dpi=100)
What else can I add to put labels on the axes (both x and y)? I prefer to stay with these pandas methods, cause I find them more comfortable to work with dataframes, but I haven't found the way to put the labels while using this type of plot line.
The labels will show up for a smaller figsize (in inches!). For the the legend, take a look here
I am struggling to set xlim for each histogram and create 1 column of graphs so the x-axis ticks are aligned. Being new pandas, I am unsure of how to apply answer applies: Overlaying multiple histograms using pandas.
>import from pandas import DataFrame, read_csv
>import matplotlib.pyplot as plt
>import pandas as pd
>df=DataFrame({'score0':[0.047771,0.044174,0.044169,0.042892,0.036862,0.036684,0.036451,0.035530,0.034657,0.033666],
'score1':[0.061010,0.054999,0.048395,0.048327,0.047784,0.047387,0.045950,0.045707,0.043294,0.042243]})
>print df
score0 score1
0 0.047771 0.061010
1 0.044174 0.054999
2 0.044169 0.048395
3 0.042892 0.048327
4 0.036862 0.047784
5 0.036684 0.047387
6 0.036451 0.045950
7 0.035530 0.045707
8 0.034657 0.043294
9 0.033666 0.042243
>df.hist()
>plt.xlim(-1.0,1.0)
The result sets only one of the bounds on the x-axis to be [-1,1].
I'm very familiar ggplot in R and just trying out pandas/matplotlib in python. I'm open to suggestions for better plotting ideas. Any help would be greatly appreciated.
update #1 (#ct-zhu):
I have tried the following, but the xlim edit on the subplot does not seem to translate the bin widths across the new x-axis values. As a result, the graph now has odd bin widths and still has more than one column of graphs:
for array in df.hist(bins=10):
for subplot in array:
subplot.set_xlim((-1,1))
update #2:
Getting closer with the use of layout, but the width of bins does not equal the interval length divided by bin count. In the example below, I set bins=10. Hence, the width of each bin over the interval from [-1,1] should be 2/10=0.20; however, the graph does not have any bins with a width of 0.20.
for array in df.hist(layout=(2,1),bins=10):
for subplot in array:
subplot.set_xlim((-1,1))
There are two subplots, and you can access each of them and modify them seperately:
ax_list=df.hist()
ax_list[0][0].set_xlim((0,1))
ax_list[0][1].set_xlim((0.01, 0.07))
What you are doing, by plt.xlim, changes the limit of the current working axis only. In this case, it is the second plot which is the most recently generated.
Edit:
To make the plots into 2 rows 1 column, use layout argument. To make the bin edges aligns, use bins argument. Set the x limit to (-1, 1) is probably not a good idea, you numbers are all smallish.
ax_list=df.hist(layout=(2,1),bins=np.histogram(df.values.ravel())[1])
ax_list[0][0].set_xlim((0.01, 0.07))
ax_list[1][0].set_xlim((0.01, 0.07))
Or specify exactly 10 bins between (-1,1):
ax_list=df.hist(layout=(2,1),bins=np.linspace(-1,1,10))
ax_list[0][0].set_xlim((-1,1))
ax_list[1][0].set_xlim((-1,1))