Unreadable values in heatmap - python

I have a dataframe df containing numeric values 1432,22390,43223 and so on...
When I try to plot it using a heatmap like this sns.heatmap(df[cols].transpose(), annot=True)
I'm getting these unreadble values 2.2e+04, 1.7e+03 etc.
The thing is that in another notebook I'm using the same code and it works perfectly.
So what is the problem?

When you set annot=True, the default param fmt='.2g' is being applied (i.e, it is going to round each number to 2 significant digits and then formats the result in either fixed-point format or in scientific notation, depending on its magnitude).
To have the general format, you simply change fmt='.2g' to fmt='g'.
Here you can find more information about formats

Related

Painting numbers used as markers in different colors in matplotlib

I want to represent a dataset of dots, each dot having an x and y coordinate and a single-digit value, and each of the values should be represented by a particular color. What I managed to do is creating sth like that, which looks nice enough:
# x, y and digits are one-dimensional np.arrays of the same shape
plt.scatter(x, y, marker='o', c=digits)
#random colormap
plt.viridis()
Now I want to represent digits themselves instead of just colored dots. As I understand it, marker-argument cannot be an array so I decided that sth like that would work:
for i in range(len(digits)):
plt.scatter(x[i], y[i], c=digits[i], marker=('${}$'.format(digits[i])))
Which almost did, but 'c=digits[i]' doesn't seem to work because a digit doesn't actually encode any particular color. I think (correct me if I'm wrong) that the first code works, because python somehow automatically understands that by c=digits I didn't mean any actual color but wanted to differentiate between two dots.
So the question is:
What is the easiest thing I can do in the second case to indicate the colors of the digits without stating them directly but using a default colormap? (ideally I would like to get sth identical to what first code does, but with digits instead of dots)
It seems I have found a solution, so in case sb has a similar question:
cmap = plt.cm.get_cmap('name_of_demanded_colormap', neededColorsNum).colors
for i in range(len(digits)):
plt.scatter(x[i], y[i], \\
c=np.array([cmap[digits[i]]]), \\
marker=('${}$'.format(digits[i])))
Some "explanations":
- cm is an object needed for gods know what reason;
get_cmap is the method which creates the specified color map, which is unexpectedly not an array but another gods know what object;
colors is the method of color map which finally creates sth to work with, a matrix with rows representing colors;
the code is so natural, jupyter demands to make the color-row two-dimensional, because some misinterpretation possibility.
Whatever complaints, works as intended, so good enough.

How do I use multiple data points from one Excel column in Python?

I am doing some image processing in python, and need to crop an area of the image. However, my pixel coordinate data is arranged as three values in one excel column seperated by commas, as follows:
[1345.83,1738,44.26] (i.e. [x,y,r]) - this is exactly how it appears in the excel cell, square brackets and all.
Any idea how I can read this into my script and start cropping images according to the pixel coord values? Is there a function that can seperate them and treat them as three independent values?
Thanks, Rhod
My understanding is that if you use pandas.read_excel(), you will get a column of strings in this situation. There's lots of options but here I would do, assuming your column name is xyr:
# clean up strings to remove braces on either side
data['xyr_clean'] = data['xyr'].str.lstrip('[').str.rstrip(']')
data[['x', 'y', 'r']] = (
data['xyr_clean'].str.split(', ', expand=True).astype(float)
)
The key thing to know is that pandas string columns have a .str attribute that contains adapted versions of all or most of Python's built-in string methods. Then you can search for "pandas convert string column to float" to get the last bit!

Subtract 2 dataframe columns and get the result without weird rounding (floating point arithmetic)

I have 2 Pandas dataframes, with thousands of values. I load them from a csv file with Pandas' read_csv function.
I need to subtract a column ("open") of the second one from a column of the first, and i do it like this:
subtraction = shiftedDataset.open - dataset.open
And i get a series with the results.
The problem is the results come with the weird rounding that comes from the floating point arithmetic.
(e.g. a value that should be 0.00003 is -2.999999999997449e-05)
How can i get the correct results? I can manipulate the dataframe before the subtraction or the values after the subtraction, i don't care, but i need to get the best performance possible.
This is scientific notation, and is probably more accurate if you want to do more calculations. If this is purely for display purposes, look at this post
Example:
v = -2.999999999997449e-05
print('%f' % v)
>>> '-0.000030'
Some are for formatting the output (tuning your value into a string, might not be what you want), but there's also a pandas setting you can use (also on the same post, scroll down a bit).

How to get h2o to return results in non-scientific notation

I wrote python code to return the count of each levels of a feature of a h2o dataframe, but the result always come back in scientific notation. How do I get it to display using decimal?
Code I used:
print(all_propensity["HasLoss"].table())
What it returns:
HasLoss Count
0 1.46457e+07
1 35277
What I want it to return:
HasLoss Count
0 14,645,700
1 35,277
In R you would use options(digits=12), or something like that, to not have it use scientific precision until that number of digits. But in Python there seems no way to override the global default (which I think is 6 digits), and all the answers I found were about doing the formatting yourself.
But you can control it in ipython/Jupyter with:
%precision 12
(See https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-precision )
Or, assuming you have pandas imported, the table H2O returns is actually a pandas table, so there are formatting options there. I think pd.options.display.float_format = '{:.0f}'.format would do it. Or change the column data type to an int64, as suggested here: https://stackoverflow.com/a/49910142/841830
All the options for pandas are here: https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#available-options or search for pandas ways to format data. (I.e. just remember that H2O gives you a pandas data set, so it is a pandas question once you have the data in python.)

Naming columns by mathematical symbols in pandas dataframe

I want to add the units of my parameters next to each parameter as the name of a column in my dataframe. I also need to use statistical symbols for some column names such as μ and σ2.
I tried following code according to mathematical symbols in python that is r"$...$ but it does not work for dataframe:
P[r"Infiltration rate ($1/\h^-1$)"]=r['ACH_Base']
in order to give (1/h^-1) unit to Infiltration rate parameter.
In my code I have already created a new dataframe "P" and I am adding the ACH_Base column in "r" dataframe to P.
How can I add mathematical symbols for naming the columns in dataframes?
Thanks!!
It should work, but it depends on the backend used to display the dataframe. For instance, matplotlib has support to render LaTeX in plots.
Here is an example:
https://matplotlib.org/users/usetex.html#text-rendering-with-latex
LaTeX can also be rendered in jupyter notebooks, but this does not apply to Python code, only for markdown cells:
http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html?highlight=latex#LaTeX-equations
"\h" is an unknown symbol.
Does P[r"Infiltration rate ($1/h^-1$)"]=r['ACH_Base'] work to display what you want?
What unit do you wish to display? You can refer to https://matplotlib.org/users/mathtext.html and https://matplotlib.org/users/usetex.html#usetex-tutorial for more information on how to render text with LaTex.

Categories