matplotlib y-adjustment of texts using adjustText - python

That is a plot i generated using pyplot and (attempted to) adjust the text using the adjustText library which i also found here.
as you can see, it gets pretty crowded in the parts where 0 < x < 0.1. i was thinking that there's still ample space in 0.8 < y < 1.0 such that they could all fit and label the points pretty well.
my attempt was:
plt.plot(df.fpr,df.tpr,marker='.',ls='-')
texts = [plt.text(df.fpr[i],df.tpr[i], str(df.thr1[i])) for i in df.index]
adjust_text(texts,
expand_text=(2,2),
expand_points=(2,2),
expand_objects=(2,2),
force_objects = (2,20),
force_points = (0.1,0.25),
lim=150000,
arrowprops=dict(arrowstyle='-',color='red'),
autoalign='y',
only_move={'points':'y','text':'y'}
)
where my df is a pandas dataframe which can be found here
from what i understood in the docs, i tried varying the bounding boxes and the y-force by making them larger, thinking that it would push the labels further up, but it does not seem to be the case.

I'm the author of adjustText, sorry I just noticed this question. you are having this problem because you have a lot of overlapping texts with exactly the same y-coordinate. It's easy to solve by adding a tiny random shift along the y to the labels (and you do need to increase the force for texts, otherwise along one dimension it works very slowly), like so:
np.random.seed(0)
f, ax = plt.subplots(figsize=(12, 6))
plt.plot(df.fpr,df.tpr,marker='.',ls='-')
texts = [plt.text(df.fpr[i], df.tpr[i]+np.random.random()/100, str(df.thr1[i])) for i in df.index]
plt.margins(y=0.125)
adjust_text(texts,
force_text=(2, 2),
arrowprops=dict(arrowstyle='-',color='red'),
autoalign='y',
only_move={'points':'y','text':'y'},
)
Also notice that I increased the margins along the y axis, it helps a lot with the corners. The result is not quite perfect, limiting the algorithm to just one axis make life more difficult... But it's OK-ish already.
Have to mention, size of the figure is very important, I don't know what yours was.

Related

How to Convert Color-Code Legends from Logarithmic Scale to Actual Values?

What is the best way to display actual vallues in color-code legend when using logarithmic scale color coding in plotly.figure_factory.create_choropleth?
Here is the sample code:
import plotly.figure_factory as ff
fips = df['fips']
values = np.log10(df['values'])
endpts = list(np.linspace(0, 4, len(colorscale) - 1))
fig = ff.create_choropleth(fips=fips, values=values, scope = ['usa'], binning_endpoints = endpts)
Here is what I have currently:
Here is what I wish to have:
Exactly same as above map except in the legend displaying actual numbers instead of log10(values). For example instead of 0.0-0.5, and 0.5-1.0 (meaning 10^0-to-10^1/2, and 10^1/2-to-10^1) I would like to see: 1-3, 4-10 and so forth.
I am not familiar with Plotly API and since you do not provide a minimal working example, it is hard for me to test, but I am quite confident that you could specify a colormap. If so, then you could just convert the colormap in logarithmic scale while feeding the numbers in liner scale.

How to Order Coordinates in Matplotlib (xticks and yticks)

Alright, so I was working on a simple program to just pull coordinates out of a text pad and then graph what was in the text pad on a graph. I thought it would be pretty simple, but I am VERY new to matplotlib, so I still don't fully understand. I got most of the code done correctly, but the only thing that is not working is that when I put the values in the graph, they come all out of order. I want to order the xticks and yticks so that it actually looks like a real line graph you'd see in math, so you can see how the lower coordinates lower than the higher coordinates, and vice versa. Here is my code:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def split(word):
return list(word)
fileIWant = open('C:/Users/JustA/Desktop/Python Shenanigans/Converting Coordinates in a .txt to a Graph/Coordinates.txt', 'r');
stopwords = ['\n']
array = fileIWant.readlines()
array = [array.replace('\n', '') for array in array if array not in stopwords]
fileIWant.close()
editFile = open('C:/Users/JustA/Desktop/Python Shenanigans/Converting Coordinates in a .txt to a Graph/Coordinates.txt', 'w')
array_length = len(array)
x = []
y = []
for i in range(array_length):
dataSplit = array[i].split()
getCoordinateX = dataSplit[1]
getCoordinateY = dataSplit[3]
x.append(getCoordinateX)
y.append(getCoordinateY)
plt.scatter(x, y)
plt.plot(x, y) #Add this line in if you want to show lines.
plt.title('Your Coordinate Graph')
plt.xlabel('X Coordinates')
plt.ylabel('Y Coordinates')
#plt.xticks([-100,-80,-60,-40,-20,0,20,40,60,80,100])
#plt.yticks([-100,-80,-60,-40,-20,0,20,40,60,80,100])
plt.show()
editFile.close()
I commented out what I put for the ticks, because it was not working at all. With those commented out, it looks okay, but it is very confusing. I think it just puts them in the order they are at in the .txt, when I want them to order themselves in the code. Here is what it is outputting right now:
Sorry if this is so simple that it has never been asked before, like I said, very new to matplotlib, and numpy if I have to use that at all. I imported it because I thought I may have to, but I don't think I really used it as of yet. Also, I am going to rewrite the coordinates into the graph in order, but I think I can do that myself later.
The problem is that your coordinates are strings, which means matplotlib is just plotting strings against strings ("categorical" axis labels). To fix, you simply have to convert your strings to numbers, e.g. x.append(int(getCoordinateX)).
Note that you also don't have to put plt.scatter/plt.plot in the loop - you only have to call one of those once on the full array. That'll probably make things a little faster too.

How to hack this Bokeh HexTile plot to fix the coords, label placement and axes?

Below is Bokeh 1.4.0 code that tries to draw a HexTile map of the input dataframe, with axes, and tries to place labels on each hex.
I've been stuck on this for two days solid, reading bokeh doc, examples and github known issues, SO, Bokeh Discourse and Red Blob Games's superb tutorial on Hexagonal Grids, and trying code. (I'm less interested in raising Bokeh issues for the future, and far more interested in pragmatic workarounds to known limitations to just get my map code working today.) Plot is below, and code at bottom.
Here are the issues, in rough decreasing order of importance (it's impossible to separate the root-cause and tell which causes which, due to the way Bokeh handles glyphs. If I apply one scale factor or coord transform it fixes one set of issues, but breaks another, 'whack-a-mole' effect):
The label placement is obviously wrong, but I can't seem to hack up any variant of either (x,y) coords or (q,r) coords to work. (I tried combinations of figure(..., match_aspect=True)), I tried 1/sqrt(2) scaling the (x,y)-coords, I tried Hextile(... size, scale) params as per redblobgames, e.g. size = 1/sqrt(3) ~ 0.57735).
Bokeh forces the origin to be top left, and y-coords to increase as you go down, however the default axis labels show y or r as being negative. I found I still had to use p.text(q, -r, .... I suppose I have to manually patch the auto-supplied yaxis labels or TickFormatter to be positive.
I use np.mgrid to generate the coord grid, but I still seem to have to assign q-coords right-to-left: np.mgrid[0:8, (4+1):0:-1]. Still no matter what I do, the hexes are flipped L-to-R
(Note: empty '' counties are placeholders to get the desired shape, hence the boolean mask [counties!=''] on grid coords. This works fine and I want to leave it as-is)
The source (q,r) coords for the hexes are integers, and I use 'odd-r' offset coords (not axial or hexagonal coords). No matter what HexTile(..., size, scale) args I use, one or both dimensions in the plot is wrong or squashed. Or whether I include the 1/sqrt(2) factor in coord transform.
My +q-axis is east and my +r-axis should be 120° SSE
Ideally I'd like to have my origin at bottom-left (math plot style, not computer graphics). But Bokeh apparently doesn't support that, I can live without that. However defaulting the y-axis labels to negative, while requiring a mix of positive and negative coords, is confusing. Anyway, how to hack an automatic fix to that with minimum grief? (manual p.yrange = Range1d(?, ?)?)
Bokeh's approach to attaching (hex) glyphs to plots is a hard idiom to use. Ideally I simply want to reference (q,r)-coords everywhere for hexes, labels, axes. I never want to see (x,y)-coords appearing on axes, label coords, tick-marks, etc. but seems Bokeh won't allow you. I guess you have to manually hack the axes and ticks later. Also, the plot<->glyph interface doesn't allow you to expose a (q,r) <-> (x,y) coord transform function, certainly not a bidirectional one.
The default axes don't seem to have any accessors to automatically find their current extent/limits; p.yaxis.start/end are empty unless you specified them. The result from p.yaxis.major_tick_in,p.yaxis.major_tick_out is also wrong, for this plot it gives (2,6) for both x and y, seems to be clipping those to the interior multiples of 2(?). How to automatically get the axes' extent?
My current plot:
My code:
import pandas as pd
import numpy as np
from math import sqrt
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
from bokeh.models.glyphs import HexTile
from bokeh.io import show
# Data source is a list of county abbreviations, in (q,r) coords...
counties = np.array([
['TE','DY','AM','DN', ''],
['DL','FM','MN','AH', ''],
['SO','LM','CN','LH', ''],
['MO','RN','LD','WH','MH'],
['GA','OY','KE','D', ''],
['', 'CE','LS','WW', ''],
['LC','TA','KK','CW', ''],
['KY','CR','WF','WX', ''],
])
#counties = counties[::-1] # UNUSED: flip so origin is at bottom-left
# (q,r) Coordinate system is “odd/even-r” horizontal Offset coords
r, q = np.mgrid[0:8, (4+1):0:-1]
q = q[counties!='']
r = r[counties!='']
sqrt3 = sqrt(3)
# Try to transform odd-r (q,r) offset coords -> (x,y). Per Red Blob Games' tutorial.
x = q - (r//2) # this may be slightly dubious
y = r
counties_df = pd.DataFrame({'q': q, 'r': r, 'abbrev': counties[counties!=''], 'x': x, 'y': y })
counties_ds = ColumnDataSource(ColumnDataSource.from_df(counties_df)) # ({'q': q, 'r': r, 'abbrev': counties[counties != '']})
p = figure(tools='save,crosshair') # match_aspect=True?
glyph = HexTile(orientation='pointytop', q='x', r='y', size=0.76, fill_color='#f6f699', line_color='black') # q,r,size,scale=??!?!!? size=0.76 is an empirical hack.
p.add_glyph(counties_ds, glyph)
p.xaxis.minor_tick_line_color = None
p.yaxis.minor_tick_line_color = None
print(f'Axes: x={p.xaxis.major_tick_in}:{p.xaxis.major_tick_out} y={p.yaxis.major_tick_in}:{p.yaxis.major_tick_out}')
# Now can't manage to get the right coords for text labels
p.text(q, -r, text=["(%d, %d)" % (q,r) for (q, r) in zip(q, r)], text_baseline="middle", text_align="center")
# Ideally I ultimately want to fix this and plot `abbrev` column as the text label
show(p)
There is an axial_to_cartesian function that will just compute the hex centers for you. You can then attach the labels in a variety of orientations and anchoring from these.
Bokeh does not force the origin to be anywhere. There is one axial to cartesian mapping Bokeh uses, exactly what is given by axial_to_cartesian. The position of the Hex tiles (and hence the cartesian coordinates that the axes display) follows from this. If you want different ticks, Bokeh affords lots of control points over both tick location and tick labelling.
There is more than one convention for Axial coords. Bokeh picked the one that has the r-axis tile "up an to the left", i.e. the one explicitly shown here:
https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html#hex-tiles
Bokeh expects up-and-to-the-left axial coords. You will need to convert whatever coordinate system you have to that. For "squishing" you will need to set match_aspect=True to ensure the "data space" aspect ratio matches the "pixel space" aspect ratio 1-1.
Alternatively, if you don't or can't use auto-ranging you will need to set the plot size carefully and also control the border sizes with min_border_left etc to make sure the borders are always big enough to accommodate any tick labels you have (so that the inner region will not be resized)
I don't really understand this question, but you have absolute control over what ticks visually appear, regardless of the underlying tick data. Besides the built-in formatters, there is FuncTickFormatter that lets you format ticks any way you want with a snippet of JS code. [1] (And you also have control of where ticks are located, if you want that.)
[1] Please note the CoffeeScript and from_py_func options are both deprecated and being removed in then next 2.0 release.
Again, you'll want to use axial_to_cartesian to position anything other then Hex tiles. No other glyphs in Bokeh understand axial coordinates (which is why we provide the conversion function).
You misunderstood what major_tick_in and major_tick_out are for. They are literally how far the ticks visually extend inside and outside the plot frame, in pixels.
Auto-ranging (with DataRange1d) is only computed in the browser, in JavaScript, which is why the start/end are not available on the "Python" side. If you need to know the start/end, you will need to explicitly set the start/end, yourself. Note, however that match_aspect=True only function with DataRange1d. If you explicitly set start/end manually, Bokeh will assume you know what you want, and will honor what you ask for, regardless of what it does to aspect.
Below are my solution and plot. Mainly per #bigreddot's advice, but there's still some coordinate hacking needed:
Expecting users to pass input coords as axial instead of offset coords is a major limitation. I work around this. There's no point in creating a offset_to_cartesian() because we need to negate r in two out of three places:
My input is even-r offset coords. I still need to manually apply the offset: q = q + (r+1)//2
I need to manually negate r in both the axial_to_cartesian() call and the datasource creation for the glyph. (But not in the text() call).
The call needs to be: axial_to_cartesian(q, -r, size=2/3, orientation='pointytop')
Need p = figure(match_aspect=True ...) to prevent squishing
I need to manually create my x,y axes to get the range right
Solution:
import pandas as pd
import numpy as np
from math import sqrt
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, Range1d
from bokeh.models.glyphs import HexTile
from bokeh.io import curdoc, show
from bokeh.util.hex import cartesian_to_axial, axial_to_cartesian
counties = np.array([
['DL','DY','AM','', ''],
['FM','TE','AH','DN', ''],
['SO','LM','CN','MN', ''],
['MO','RN','LD','MH','LH'],
['GA','OY','WH','D' ,'' ],
['' ,'CE','LS','KE','WW'],
['LC','TA','KK','CW','' ],
['KY','CR','WF','WX','' ]
])
counties = np.flip(counties, (0)) # Flip UD for bokeh
# (q,r) Coordinate system is “odd/even-r” horizontal Offset coords
r, q = np.mgrid[0:8, 0:(4+1)]
q = q[counties!='']
r = r[counties!='']
# Transform for odd-r offset coords; +r-axis goes up
q = q + (r+1)//2
#r = -r # cannot globally negate 'r', see comments
# Transform odd-r offset coords (q,r) -> (x,y)
x, y = axial_to_cartesian(q, -r, size=2/3, orientation='pointytop')
counties_df = pd.DataFrame({'q': q, 'r': -r, 'abbrev': counties[counties!=''], 'x': x, 'y': y })
counties_ds = ColumnDataSource(ColumnDataSource.from_df(counties_df)) # ({'q': q, 'r': r, 'abbrev': counties[counties != '']})
p = figure(match_aspect=True, tools='save,crosshair')
glyph = HexTile(orientation='pointytop', q='q', r='r', size=2/3, fill_color='#f6f699', line_color='black') # q,r,size,scale=??!?!!?
p.add_glyph(counties_ds, glyph)
p.x_range = Range1d(-2,6)
p.y_range = Range1d(-1,8)
p.xaxis.minor_tick_line_color = None
p.yaxis.minor_tick_line_color = None
p.text(x, y, text=["(%d, %d)" % (q,r) for (q, r) in zip(q, r)],
text_baseline="middle", text_align="center")
show(p)

Matplotlib -- UserWarning: Attempting to set identical left == right == 737342.0 results in singular transformations;

By Using Matplotlib i am trying to create a Line chart but i am facing below issue. Below is the code. Can someone help me with any suggestion
Head = ['Date','Count1','Count2','Count3']
df9 = pd.DataFrame(Data, columns=Head)
df9.set_index('Date',inplace=True)
fig,ax = plt.subplots(figsize=(15,10))
df9.plot(ax=ax)
ax.xaxis.set_major_locator(mdates.WeekdayLocator(SATURDAY))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
plt.legend()
plt.xticks(fontsize= 15)
plt.yticks(fontsize= 15)
plt.savefig(Line.png)
i am getting below error
Error: Matplotlib UserWarning: Attempting to set identical left == right == 737342.0 results in singular transformations; automatically expanding (ax.set_xlim(left, right))
Sample Data:
01-10-2010, 100, 0 , 100
X Axis: I am trying to display date on base of date every saturdays
Y Axis: all other 3 counts
Can some one please help me whats this issue and how can i fix this...
The issue is caused by the fact that somehow, pandas.DataFrame.plot explicitly sets the x- and y- limits of your plot to the limits of your data. This is normally fine, and no one notices. In fact, I had a lot of trouble finding references to your warning anywhere at all, much less the Pandas bug list.
The workaround is to set your own limits manually in your call to DataFrame.plot:
if len(df9) == 1:
delta = pd.Timedelta(days=1)
lims = [df9.index[0] - delta, df9.index[0] + delta]
else:
lims = [None, None]
df9.plot(ax=ax, xlim=lims)
This issue can also arise in a more tricky situation, when you do NOT only have one point, but only one cat get on your plot : Typically, when only one point is >0 and your plot yscale is logarithmic.
One should always set limits on a log scale when there 0 values. Because, there is no way the program can decide on a good scale lower limit.

Setting different errors for pandas plot bar

I'm trying to plot a probability distribution using a pandas.Series and I'm struggling to set different yerr for each bar. In summary, I'm plotting the following distribution:
It comes from a Series and it is working fine, except for the yerr. It cannot overpass 1 or 0. So, I'd like to set different errors for each bar. Therefore, I went to the documentation, which is available here and here.
According to them, I have 3 options to use either the yerr aor xerr:
scalar: Symmetric +/- values for all data points.
scalar: Symmetric +/- values for all data points.
shape(2,N): Separate - and + values for each bar. The first row contains the lower errors, the second row contains the upper errors.
The case I need is the last one. In this case, I can use a DataFrame, Series, array-like, dict and str. Thus, I set the arrays for each yerr bar, however it's not working as expected. Just to replicate what's happening, I prepared the following examples:
First I set a pandas.Series:
import pandas as pd
se = pd.Series(data=[0.1,0.2,0.3,0.4,0.4,0.5,0.2,0.1,0.1],
index=list('abcdefghi'))
Then, I'm replicating each case:
This works as expected:
err1 = [0.2]*9
se.plot(kind="bar", width=1.0, yerr=err1)
This works as expected:
err2 = err1
err2[3] = 0.5
se.plot(kind="bar", width=1.0, yerr=err1)
Now the problem: This doesn't works as expected!
err_up = [0.3]*9
err_low = [0.1]*9
err3 = [err_low, err_up]
se.plot(kind="bar", width=1.0, yerr=err3)
It's not setting different errors for low and up. I found an example here and a similar SO question here, although they are using matplotlib instead of pandas, it should work here.
I'm glad if you have any solution about that.
Thank you.
Strangely, plt.bar works as expected:
err_up = [0.3]*9
err_low = [0.1]*9
err3 = [err_low, err_up]
fig, ax = plt.subplots()
ax.bar(se.index, se, width=1.0, yerr=err3)
plt.show()
Output:
A bug/feature/design-decision of pandas maybe?
Based on #Quanghoang comment, I started to think it was a a bug. So, I tried to change the yerr shape, and surprisely, the following code worked:
err_up = [0.3]*9
err_low = [0.1]*9
err3 = [[err_low, err_up]]
print (err3)
se.plot(kind="bar", width=1.0, yerr=err3)
Observe I included a new axis in err3. Now it's a (1,2,N) array. However, the documentation says it should be (2,N).
In addition, a possible work around that I found was set the ax.ylim(0,1). It doesn't solve the problem, but plots the graph correctly.

Categories