Syntax for float format string in pandas? - python

I want to use the to_csv() function in pandas to create a csv string from a dataframe. The function has a float_format parameter to control how floats are formatted in the output. However, I cannot find any documentation about how to use this parameter.
The pandas documentation helpfully only says "Format string for floating point numbers". I have tried searching the whole pandas documentation for "float_format" but found only references to the term and a few examples, such as in IO tools or options and settings, no explanation or definition. It is used in many other functions as well but it does not seem to be documented at all.
Can anyone point me to a documentation of the float_format parameter in pandas?

You can find some information concerning the values that float_format can take in the python docs. More specifically in the section for the Format Specification Mini-Language. The link is : Format Specification Mini-Language.. It is not pandas docs but I hope this helps.

Related

Pandas Styles removing default table format

I am trying to format a pandas DataFrame value representation.
Basically, all I want is to get the "Thousand" separator on my values.
I managed to do it using the pd.style.format function. It does the job, but also "breaks" all my table original design.
here is an example of what is going on:
Is there anything I can do to avoid doing it? I want to keep the original table format, only changing the format of the value.
PS: Don't know if it makes any difference, but I am using Google Colab.
In case anyone is having the same problem as I was using Colab, I have found a solution:
.set_table_attributes('class="dataframe"') seems to solve the problem
More infos can be found here: https://github.com/googlecolab/colabtools/issues/1687
For this case you could do:
pdf.assign(a=pdf['a'].map("{:,.0f}".format))

Python data type type codes comprehensive table or resource

Today, and on several other occasions, I received an error like this:
{TypeError}ufunc subtract cannot use operands with types dtype('<M8[us]') and dtype('O').
On other days, I'd want to do some printf type command and be at a loss for which character stood for some obtuse data type (e.g. signed octal value).
I always had a hard time finding the definitions of what I now found to be called "type codes" or "Array-protocol type strings" in the first example and not to be confused with "printf-style String Formatting conversion characters" as in the later case, as they are single characters with string literal quotes, and thus Googling them is just a mess or trying to find synonyms for a word I didn't know. Maybe I'm just bad at RegEx and can't navigate man pages well enough, but I just wanted to throw up a possibly self answered question, in order to tag a bunch of synonyms for things I was trying to find and in the end landed on type code. I knew I was looking for python or numpy data types, and was scouring the internet for a dtype('<M8[us]') for the longest time so thought I'd help those who end up in a similar situation by providing a would-be online bookmark.
I had already read about various data types and this syntax in the past from various sources, knowing about the little-endian symbol '<', that '8' had something to do with the size, but would change depending on the dtype, but I had no idea what 'M' or '[us]' was defining. In my late night stupidity I looked over the numpy and python docs, but both for an earlier version than I had in my current env, and it looks like this 'M' did not appear until recently so I was left thinking all the tables in the docs were non-exhaustive and there was some other Unix or C based definition of all these type codes (which I still have not ruled out, but assume this is not the case now that I've found 'M' in my current Numpy version doc).
I will put the various resources that I've located regarding these various type codes in python and associated libraries here, but I'm sure there are plenty more, so would welcome others' additions/edits. I'll add all my links as an answer, and who knows, if others also found themselves in this situation, maybe I'll make a type code cheat sheet or something as a general resource online somewhere. Anyways, I think they'd be helpful to gather in a place tagged by a bunch of keywords that I was using trying to find them, to no avail like: python numpy data type shorthand definitions, python numpy dtype abbreviations, python array dtype codes, etc. If you have any other words that came to your mind when labeling these un-googleable terms, feel free to edit and add.
General notes:
Make sure you are reading the doc for the right version of python, numpy, etc.
The codes used depend on the use case (i.e. numpy array-protocol type strings are different than those used to define the types in general python arrays)
Even worse, some of the same characters are used to mean different things depending on the use case ('b' and 'B' for example if you compare numpy and python arrays, or 'd' if comparing python printf and array codes).
Numpy 1.17: Array-protocol type strings and the 'M' type
Python 3.8.0: printf conversion types
Python 3.8.0 Array type codes. Edit: This class is not used often, but just wanted here for comparative and exhaustive reference.
Python 3.8.0 string formatter "mini language" syntax, aka "presentation types"
I won't go to the trouble of reiterating the docs despite my answer being primarily links since I don't expect the docs to go down anytime soon, but for the main point of how I got here, 'M' stands for a datetime type in numpy and '[us]' was for microsecond resolution

How to modify the default mapper of the StringConverter class?

I am trying to read a csv file in Python3 using the numpy genfromtxt function. In my csv file I have a field which is a string that looks like the following: "0x30375107333f3333".
I need to use the "dtype=None" option because I need this section of code to work with many different csv files, only some of them having such a field. Unfortunately numpy interprets this as a float128 which is a pain because 1) it is not a float and 2) I cannot find way to convert it to an int after it has been read as a float128 (without losing precision).
What I would like to do is instead interpret this as a string because it is enough for me. I found on the Numpy documentation that there is a way of getting around this, but they give cryptic instructions:
This behavior may be changed by modifying the default mapper of the StringConverter class.
Unfortunately whenever I Google something related to this I fall back to this documentation page.
I would greatly appreciate either an explanation of what they mean in the above quoted text or a solution to my above stated problem.

Pandas - Decimal format when writing to_csv instead of scientific

I've just started using Pandas and I'm trying to export my dataset using the to_csv function on my dataframe fp_df
One column (entitled fp_df['Amount Due'])has multiple decimal places (the result is 0.000042) - but when using to_csv it's being output in a scientific notation, which the resulting system will be unable to read. It needs to be output as '0.000042'.
What is the easiest way to do this? The other answers I've found seem overly complex and I don't understand how or why they work.
(Apologies if any of my terminology is off, I'm still learning)
Check the documentation for to_csv(), you'll find an attribute called float_format
df.to_csv(..., float_format='%.6f')
you can define the format you want as defined in the Format Specification Mini-Language

Finding median with pandas transform

I needed to find the median for a pandas dataframe and used a piece of code from this previous SO answer: How I do find median using pandas on a dataset?.
I used the following code from that answer:
data['metric_median'] = data.groupby('Segment')['Metric'].transform('median')
It seemed to work well, so I'm happy about that, but I had a question: how is it that transform method took the argument 'median' without any prior specification? I've been reading the documentation for transform but didn't find any mention of using it to find a median.
Basically, the fact that .transform('median') worked seems like magic to me, and while I have no problem with magic and fancy myself a young Tony Wonder, I'm curious about how it works.
I'd recommend diving into the source code to see exactly why this works (and I'm mobile so I'll be terse).
When you pass the argument 'median' to tranform pandas converts this behind the scenes via getattr to the appropriate method then behaves like you passed it a function.

Categories