Pandas - Decimal format when writing to_csv instead of scientific - python

I've just started using Pandas and I'm trying to export my dataset using the to_csv function on my dataframe fp_df
One column (entitled fp_df['Amount Due'])has multiple decimal places (the result is 0.000042) - but when using to_csv it's being output in a scientific notation, which the resulting system will be unable to read. It needs to be output as '0.000042'.
What is the easiest way to do this? The other answers I've found seem overly complex and I don't understand how or why they work.
(Apologies if any of my terminology is off, I'm still learning)

Check the documentation for to_csv(), you'll find an attribute called float_format
df.to_csv(..., float_format='%.6f')
you can define the format you want as defined in the Format Specification Mini-Language

Related

Pandas Styles removing default table format

I am trying to format a pandas DataFrame value representation.
Basically, all I want is to get the "Thousand" separator on my values.
I managed to do it using the pd.style.format function. It does the job, but also "breaks" all my table original design.
here is an example of what is going on:
Is there anything I can do to avoid doing it? I want to keep the original table format, only changing the format of the value.
PS: Don't know if it makes any difference, but I am using Google Colab.
In case anyone is having the same problem as I was using Colab, I have found a solution:
.set_table_attributes('class="dataframe"') seems to solve the problem
More infos can be found here: https://github.com/googlecolab/colabtools/issues/1687
For this case you could do:
pdf.assign(a=pdf['a'].map("{:,.0f}".format))

excel converting the string to scientific notation

is there way to prevent data exported from python to be converted into the scientific notation in excel.
ID
1E1
2E9
3E4
After exporting in csv format iam getting:
ID
1.00E+01
2.00E+09
3.00E+04
I found a similar thread however none have a clear explanation or links were broken.
This is not the issue with Python writing the wrong value in CSV file. If you open the csv file, you will see value is written in correct numeric format. If that is not the case, please provide your code and sample data.
Assuming it is written correctly in CSV using python, then Please look for converting the values in excel from scientific notation to text or number.

Syntax for float format string in pandas?

I want to use the to_csv() function in pandas to create a csv string from a dataframe. The function has a float_format parameter to control how floats are formatted in the output. However, I cannot find any documentation about how to use this parameter.
The pandas documentation helpfully only says "Format string for floating point numbers". I have tried searching the whole pandas documentation for "float_format" but found only references to the term and a few examples, such as in IO tools or options and settings, no explanation or definition. It is used in many other functions as well but it does not seem to be documented at all.
Can anyone point me to a documentation of the float_format parameter in pandas?
You can find some information concerning the values that float_format can take in the python docs. More specifically in the section for the Format Specification Mini-Language. The link is : Format Specification Mini-Language.. It is not pandas docs but I hope this helps.

Python Panda.read_csv rounds to get import errors?

I have a 10000 x 250 dataset in a csv file. When I use the command
data = pd.read_csv('pool.csv', delimiter=',',header=None)
while I am in the correct path I actually import the values.
First I get the Dataframe. Since I want to work with the numpy package I need to convert this to its values using
data = data.values
And this is when i gets weird. I have at position [9999,0] in the file a -0.3839 as value. However after importing and calculating with it I noticed, that Python (or numpy) does something strange while importing.
Calling the value of data[9999,0] SHOULD give the expected -0.3839, but gives something like -0.383899892....
I already imported the file in other languages like Matlab and there was no issue of rounding those values. I aswell tried to use the .to_csv command from the pandas package instead of .values. However there is the exact same problem.
The last 10 elements of the first column are
-0.2716
0.3711
0.0487
-1.518
0.5068
0.4456
-1.753
-0.4615
-0.5872
-0.3839
Is there any import routine, which does not have those rounding errors?
Passing float_precision='round_trip' should solve this issue:
data = pd.read_csv('pool.csv',delimiter=',',header=None,float_precision='round_trip')
That's a floating point error. This is because of how computers work. (You can look it up if you really want to know how it works.) Don't be bothered by it, it is very small.
If you really want to use exact precision (because you are testing for exact values) you can look at the decimal module of Python, but your program will be a lot slower (probably like 100 times slower).
You can read more here: https://docs.python.org/3/tutorial/floatingpoint.html
You should know that all languages have this problem, only some are better in hiding it. (Also note that in Python3 this "hiding" of the floating point error has been improved.)
Since this problem cannot be solved by an ideal solution, you are given the task to solve it yourself and choose the most appropriate solution for your situtation
I don't know about 'round_trip' and its limitations, but it probably can help you. Other solutions would be to use float_format from the to_csv method. (https://docs.python.org/3/library/string.html#format-specification-mini-language)

How to modify the default mapper of the StringConverter class?

I am trying to read a csv file in Python3 using the numpy genfromtxt function. In my csv file I have a field which is a string that looks like the following: "0x30375107333f3333".
I need to use the "dtype=None" option because I need this section of code to work with many different csv files, only some of them having such a field. Unfortunately numpy interprets this as a float128 which is a pain because 1) it is not a float and 2) I cannot find way to convert it to an int after it has been read as a float128 (without losing precision).
What I would like to do is instead interpret this as a string because it is enough for me. I found on the Numpy documentation that there is a way of getting around this, but they give cryptic instructions:
This behavior may be changed by modifying the default mapper of the StringConverter class.
Unfortunately whenever I Google something related to this I fall back to this documentation page.
I would greatly appreciate either an explanation of what they mean in the above quoted text or a solution to my above stated problem.

Categories