In Python, I have a 2D array, e.g.:
1.3 5.7 3.2
5.6 2.3 9.5
1.1 4.1 5.2
I then used 'imshow' to get what I needed - I essentially had a plot where the x axis was:
(column) 0 (column) 1 (column) 2 ....
and the y axis was:
.
.
(row) 2
(row) 1
(row) 0
and then the actual values (5.6 or 2.3 or whatever) were represented by colours, which was just what I wanted.
But then later, instead of the x axis just being column 0 column 1 and column 2 etc., I wanted the x axis to show the date which corresponds to column 0 column 1 and column 2 etc.. This information was stored in a different list, say "date_info[]".
So instead of an arbitrary indexing scheme on the bottom, I want the x values of the imshow to correspond to the values of the date_info[] list - instead of the number 2 for example, I wanted date_info[2] on the x axis.
Now with the help of this forum, I was able to do this using:
plt.xticks(mjdaxis,[int(np.floor(data_info[i])) for i in mjdaxis])
which was sufficient for a while, but I am just changing the labels of the x axis here right? rather than what is being plotted. Now when I am trying to lay one other plot (just a regular curve) on top of my original, the x axis scaling gets messed up, and my columns get bunched up as (1,2,3...) again, instead of their corresponding date_info values (55500, 55530, 55574...)
If anyone can make any sense of what I am saying, that would be great!!
For reference, here is the code that I am now trying:
fig = plt.figure()
ax1 = fig.add_subplot(111)
mjdaxis=np.linspace(0,date_info[0]-1,20).astype('int')
ax1.set_xticks(mjdaxis,[int(np.floor(date_info[i])) for i in mjdaxis])
ax1.imshow(residuals, aspect="auto")
ax2 = ax1.twinx()
ax2.plot(pdot[8:,0],pdot[8:,1])
plt.show()
If I understand you correctly, you should be able to just add the line ax2.set_xticks([]) before your plt.show(). You might also want to read up on the kwarg hold.
Related
I am trying to plot a tri/quad mesh along with results on that mesh. I am plotting results of a CFD simulation.
I am using matplotlib.collections.PolyCollection to plot because it handles non-tri elements, where other methods only support tri elements.
my current code works fine, but when I try to plot results where some cells have no water (have them set to np.nan right now), the plotting crashes and the contour colors get all screwed up.
My current code is:
ax = plt.subplot(111)
cmap = matplotlib.cm.jet
polys = element_coords #list of Nx2 np.arrays containing the coordinates of each element polygon)
facecolors = element_values #np array of values at each element, same length as polys
pc = matplotlib.collections.PolyCollection(polys, cmap=cmap)
pc.set_array(facecolors)
ax.add_collection(pc)
ax.plot()
When element_values does not contain any nan values, it works fine and looks something like this:
However, when element_values does contain nan values, it crashes and I get this error:
C:\Users\deden\AppData\Local\Continuum\anaconda3\envs\test\lib\site-packages\matplotlib\colors.py:527: RuntimeWarning: invalid value encountered in less
xa[xa < 0] = -1
I played around with element_values and can confirm this only happens when nan values are present.
I initially tried to ignore the nan values by doing this just to make them clear:
pc.cmap.set_bad(color='white',alpha=0)
But I still get the same error.
So... I tried setting all the nan values to -999 then trying to cut off the colormap like this:
vmin = np.nanmin(facecolors)
vmax = np.nanmax(facecolors)
facecolors[np.isnan(facecolors)] = -999
pc.cmap.set_under(color='white',alpha=0)
then tried to set the limits of the colormap based on other stack questions I've seen..like:
pc.cmap.set_clim(vmin,vmax)
but then I get:
AttributeError: 'ListedColormap' object has no attribute 'set_clim'
I'm out of ideas here...can anyone help me? I just want to NOT COLOR any element where the value is nan.
To reproduce my error..you can try using this dummy data:
polys = [np.array([[ 223769.2075899 , 1445713.24572239],
[ 223769.48419606, 1445717.09102757],
[ 223764.48282055, 1445714.84782264]]),
np.array([[ 223757.9584215 , 1445716.57576502],
[ 223764.48282055, 1445714.84782264],
[ 223762.05868674, 1445720.48031478]])]
facecolors = np.array([np.nan, 1]) #will work if you replace np.nan with a number
SIDE NOTE - if anyone knows how I can plot this mesh+data without polycollections that'd be great..it includes 3 and 4 sided mesh elements
Matplotlib's colormapping mechanics come from a time when numpy.nan wasn't around. Instead it works with masked arrays.
facecolors = np.ma.array(facecolors, mask=np.isnan(facecolors))
Concerning the other error you get, note that .set_clim is an attribute of the colorbar, not the colormap.
Finally, if your mesh contained only 3-sided elements, you could use tripcolor, but that won't work with 4-sided meshes.
I need to plot one categorical variable over multiple numeric variables.
My DataFrame looks like this:
party media_user business_user POLI mass
0 Party_a 0.513999 0.404201 0.696948 0.573476
1 Party_b 0.437972 0.306167 0.432377 0.433618
2 Party_c 0.519350 0.367439 0.704318 0.576708
3 Party_d 0.412027 0.253227 0.353561 0.392207
4 Party_e 0.479891 0.380711 0.683606 0.551105
And I would like a scatter plot with different colors for the different variables; eg. one plot per party per [media_user, business_user, POLI, mass] each in different color.
So like this just with scatters instead of bars:
The closest I've come is this
sns.catplot(x="party", y="media_user", jitter=False, data=sns_df, height = 4, aspect = 5);
producing:
By messing around with some other graphs I found that by simply adding linestyle = '' I could remove the line and add markers. Hope this may help somebody else!
sim_df.plot(figsize = (15,5), linestyle = '', marker = 'o')
I'm trying to correlate two measures(DD & DRE) from a data set which contains many more columns. I created a data frame and called it as 'Data'.
Within this Data, I want to create a scatterplot between DD(X axis) & DRE(y Axis), I want to include DD values between 0 and 100.
Please help me with the first line of my code to get the condition of DD between 0 and 100
Also when I plot the scatterplot, I get dots beyond 100% ( Y axis is DRE in %) though I dont have any value >100%.
Data1= Data[ Data['DD']<100]
plt.scatter(Data1.DD,Data1.DRE)
tick_val = [0,10,20,30,40,50,60,70,80,90,100]
tick_lab = ['0%','10%','20%','30%','40%','50%','60%','70%','80%','90%','100']
plt.yticks(tick_val,tick_lab)
plt.show()
I have dataframes with columns containing x,y coordinates for multiple points. One row can consist of several points.
I'm trying to find out an easy way to be able to plot lines between each point generating a curve for each row of data.
Here is a simplified example where two lines are represented by two points each.
line1 = {'p1_x':1, 'p1_y':10, 'p2_x':2, 'p2_y':11 }
line2 = {'p1_x':2, 'p1_y':9, 'p2_x':3, 'p2_y':12 }
df = pd.DataFrame([line1,line2])
df.plot(y=['p1_y','p2_y'], x=['p1_x','p2_x'])
when trying to plot them I expect line 1 to start where x=1 and line 2 to start where x=2.
Instead, the x axis contains two value-pairs (1,2) and (2,3) and both lines have the same start and end-point in x-axis.
How do I get around this problem?
Edit:
If using matplotlib, the following hardcoded values generates the plot i'm interested in
plt.plot([[1,2],[2,3]],[[10,9],[11,12]])
While I'm sure that there should be a more succinct way using pure pandas, here's a simple approach using matplotlib and some derivatives from the original df.(I hope I understood the question correctly)
Assumption: In df, you place x values in even columns and y values in odd columns
Obtain x values
x = df.loc[:, df.columns[::2]]
x
p1_x p2_x
0 1 2
1 2 3
Obtain y values
y = df.loc[:, df.columns[1::2]]
y
p1_y p2_y
0 10 11
1 9 12
Then plot using a for loop
for i in range(len(df)):
plt.plot(x.iloc[i,:], y.iloc[i,:])
One does not need to create additional data frames. One can loop through the rows to plot these lines:
line1 = {'p1_x':1, 'p1_y':10, 'p2_x':2, 'p2_y':11 }
line2 = {'p1_x':2, 'p1_y':9, 'p2_x':3, 'p2_y':12 }
df = pd.DataFrame([line1,line2])
for i in range(len(df)): # for each row:
# plt.plot([list of Xs], [list of Ys])
plt.plot([df.iloc[i,0],df.iloc[i,2]],[df.iloc[i,1],df.iloc[i,3]])
plt.show()
The lines will be drawn in different colors. To get lines of same color, one can add option c='k' or whatever color one wants.
plt.plot([df.iloc[i,0],df.iloc[i,2]],[df.iloc[i,1],df.iloc[i,3]], c='k')
I generaly don't use the pandas plotting because I think it is rather limited, if using matplotlib is not an issue, the following code works:
from matplotlib import pyplot as plt
plt.plot(df.p1_x,df.p1_y)
plt.plot(df.p2_x,df.p2_y)
plt.plot()
if you got lots of lines to plot, you can use a for loop.
I have an array of numbers, lets say 100 members of that array.
I know how to draw the cdf function, but my problem is, that I want the cdf-value of each member of the array.
How can I iterate through an array and give me back the according cdf-value of a member of that array?
cumsum() and hist()
could solve my problem. I didn't find any library which I can use to give me back the value.
norm.cdf()
is not working for me (for any reason)
For example
import matplotlib.pyplot as plt
import numpy as np
# create some randomly ddistributed data:
data = np.random.randn(10000)
# sort the data:
data_sorted = np.sort(data)
# calculate the proportional values of samples
p = 1. * arange(len(data)) / (len(data) - 1)
# plot the sorted data:
fig = figure()
ax1 = fig.add_subplot(121)
ax1.plot(p, data_sorted)
ax1.set_xlabel('$p$')
ax1.set_ylabel('$x$')
ax2 = fig.add_subplot(122)
ax2.plot(data_sorted, p)
ax2.set_xlabel('$x$')
ax2.set_ylabel('$p$')
Draws a line(or 2 lines) which represent the cdf. How can I get the values out of that? I mean there are values lying behind the graph, how can I use the corresponding values of my x?
But in my opinion, its not completly right. He just divides the range by the number of rows. And it doesnt pay attention to repeating values :/
Thanks in advance
EDIT
What would you say about that:
cur.execute("Select AGE From **** ")
output = []
for row in cur:
output.append(float(row[0]))
data_sorted = np.sort(output)
length=len(data_sorted)
yvals = np.arange(len(data_sorted))/float(len(data_sorted))
print yvals
plt.plot(data_sorted, yvals)
plt.show()
The result is, that the array is 5 members long. So that each member has a 1/5=0,2
That leads to:
[ 1 2 2 9 58]
[ 0. 0.2 0.4 0.6 0.8]
But it should be 1 is 0.2; 2 is 0,6 (because 2 appears 2 times, so 3 out of 5 are 2 or less)
How do I get the 0,6???
I mean, I could write that in a view and sum it up, after group by AGE but, I dont know, would prefer to do it in python...