Pandas Dataframe.to_csv decimal=',' doesn't work - python

In Python, I'm writing my Pandas Dataframe to a csv file and want to change the decimal delimiter to a comma (,). Like this:
results.to_csv('D:/Data/Kaeashi/BigData/ProcessMining/Voorbeelden/Voorbeeld/CaseEventsCel.csv', sep=';', decimal=',')
But the decimal delimiter in the csv file still is a .
Why? What do I do wrong?

If the decimal parameter doesn't work, maybe it's because the type of the column is object. (check the dtype value in the last line when you do df[column_name])
That can happen if some rows have values that couldn't be parsed as numbers.
You can force the column to change type:
Change data type of columns in Pandas.
But that can make you lose non numerical data in that column.

This functionality wasn't added until 0.16.0
Added decimal option in to_csv to provide formatting for non-‘.’ decimal separators (GH781)
Upgrade pandas to something more recent and it will work. The code below uses the 10 minute tutorial and pandas version 0.18.1
>>> import pandas as pd
>>> import numpy as np
>>> dates = pd.date_range('20130101', periods=6)
>>> df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
>>> df
A B C D
2013-01-01 -0.157833 1.719554 0.564592 -0.228870
2013-01-02 -0.316600 1.545763 -0.206499 0.793412
2013-01-03 1.905803 1.172803 0.744010 1.563306
2013-01-04 -0.142676 -0.362548 -0.554799 -0.086404
2013-01-05 1.708246 -0.505940 -1.135422 0.810446
2013-01-06 -0.150899 0.794215 -0.628903 0.598574
>>> df.to_csv("test.csv", sep=';', decimal=',')
This creates a "test.csv" file that looks like this:
;A;B;C;D
2013-01-01;-0,157833276159;1,71955439009;0,564592278787;-0,228870244247
2013-01-02;-0,316599953358;1,54576303958;-0,206499307398;0,793411528039
2013-01-03;1,90580284184;1,17280324924;0,744010110291;1,56330623177
2013-01-04;-0,142676406494;-0,36254842687;-0,554799190671;-0,0864039782679
2013-01-05;1,70824597265;-0,50594004498;-1,13542154086;0,810446051841
2013-01-06;-0,150899136973;0,794214730009;-0,628902891897;0,598573645748

In the case when data is an object, and not a plain float type, for example python decimal.Decimal(10.12). First, change a type and then write to CSV file:
import pandas as pd
from decimal import Decimal
data_frame = pd.DataFrame(data={'col1': [1.1, 2.2], 'col2': [Decimal(3.3), Decimal(4.4)]})
data_frame.to_csv('report_decimal_dot.csv', sep=';', decimal=',', float_format='%.2f')
data_frame = data_frame.applymap(lambda x: float(x) if isinstance(x, Decimal) else x)
data_frame.to_csv('report_decimal_comma.csv', sep=';', decimal=',', float_format='%.2f')

Somehow i don't get this to work either. I always just end up using the following script to rectify it. It's dirty but it works for my ends:
for col in df.columns:
try:
df[col] = df[col].apply(lambda x: float(x.replace('.','').replace(',','.')))
except:
pass
EDIT: misread the question, you might use the same tactic the other way around by changing all your floats to strings :). Then again, you should probably just figure out why it's not working. Due post it if you get it to work.

This example suppose to work (as it works for me):
import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(10))
with open('Data/out.csv', 'w') as f:
s.to_csv(f, index=True, header=True, decimal=',', sep=';', float_format='%.3f')
out.csv:
;0 0;0,091 1;-0,009 2;-1,427 3;0,022 4;-1,270
5;-1,134 6;-0,965 7;-1,298 8;-0,854 9;0,150
I don't see exactly why your code doesn't work, but anyway, try to use the above example to your needs.

Related

Rounding columns in a dataframe with file.dtypes.iteritems()

I'm making a function on rounding decimals, I wrote the following already:
for colname, coltype in file.dtypes.iteritems():
if coltype in lst:
file[colname] = file[colname].round(desired_decimals)
However, it does not round the columns it should round.
Two files are created, the values are written in the first in their own format, in the second in a string format.
import pandas as pd
df = pd.DataFrame({'A':[5.291752, 8.920429, 10.0]})
pd.options.display.float_format = '{:,.6f}'.format
df.to_csv('string_no.csv', index=False, header=['A'])
with open('string_yes.csv', 'w') as file:
file.writelines(df.to_string(index=False, header=['A']))
Output ordinary
Output string format

Pandas to_csv with extra zeroes

I am having some issues reading a csv to a dataframe, then when I convert to csv it will have extra decimals in it.
Currently using pandas 1.0.5 and python 3.7
For example consider the simple example below:
from io import StringIO
import pandas as pd
d = """ticker,open,close
aapl,108.922,108.583
aapl,109.471,110.25
aapl,113.943,114.752
aapl,117.747,118.825
"""
df = pd.read_csv(StringIO(d), sep=",", header=0, index_col=0)
print(df)
print("\n", df.to_csv())
The output is:
open close
ticker
aapl 108.922 108.583
aapl 109.471 110.250
aapl 113.943 114.752
aapl 117.747 118.825
ticker,open,close
aapl,108.92200000000001,108.583
aapl,109.471,110.25
aapl,113.943,114.75200000000001
aapl,117.74700000000001,118.825
as you can see there are extra zeroes added to the to_csv() output. If I change the read_csv to have dtype=str like df = pd.read_csv(StringIO(d), sep=",", dtype=str, header=0, index_col=0) then I would get my desired output, but I want the dtype to be decided by pandas, to be int64, or float depending on the column values. Instead of forcing all to be object/str.
Is there a way to eliminate these extra zeroes without forcing the dtype to str?
You can use the float-format argument:
d = """ticker,open,close
aapl,108.922,108.583
aapl,109.471,110.25
aapl,113.943,114.752
aapl,117.747,118.825
"""
df = pd.read_csv(StringIO(d), sep=",", header=0, index_col=0)
df.to_csv('output.csv',float_format='%.3f')
#This is how the output.csv file looks:
ticker,open,close
aapl,108.922,108.583
aapl,109.471,110.250
aapl,113.943,114.752
aapl,117.747,118.825

pandas read csv is confused when commas within quotes

col1, col2, geometry
11.54000000,0.00000000,"{"type":"Polygon","coordinates":[[[-61.3115751786311,-33.83968838375797],[-61.29737019968823,-33.83207774370677],[-61.29443049860791,-33.83592770721248],[-61.29241347742871,-33.83489393774538],[-61.28994584513501,-33.83806650089736],[-61.292499308117186,-33.83938539699006],[-61.28958106470898,-33.8431993873636],[-61.29307859612687,-33.84495487100211],[-61.295256567865046,-33.846135537383866],[-61.296388484054326,-33.84676149889543],[-61.296747927196776,-33.84651421268175],[-61.297498943449426,-33.84670133707654],[-61.297992472179686,-33.847120134589964],[-61.299741220055196,-33.84901812154847],[-61.3012164422457,-33.85018089588664],[-61.3015892874819,-33.850566250375365],[-61.30284190607861,-33.85079121660985],[-61.30496105223345,-33.848193766906206],[-61.306084952130036,-33.84682375029292],[-61.30707604410075,-33.845532812572294],[-61.30672627175046,-33.84527169005647],[-61.306290670206494,-33.845188781884744],[-61.304604048903514,-33.847304098561025],[-61.30309763921784,-33.84654473836309],[-61.30013213880613,-33.84478736144466],[-61.30110629620797,-33.8431690707163],[-61.303046037678854,-33.844170576767105],[-61.30433047221653,-33.84266156764314],[-61.30484242472771,-33.842899106713375],[-61.30696068650711,-33.844104878773436],[-61.306418212892446,-33.84505221083753],[-61.307163201216696,-33.845464893960255],[-61.30760172622554,-33.84490909256552],[-61.307932962646014,-33.844513681420494],[-61.309176116985405,-33.84280834206188],[-61.30596211112515,-33.841126948963954],[-61.3056475423994,-33.841449215098756],[-61.30526859890979,-33.841557611902374],[-61.30483601097522,-33.84149669494795],[-61.30448925534122,-33.84120408616046],[-61.30410688411086,-33.840609953572034],[-61.30400151682434,-33.839925243738094],[-61.30240379835875,-33.83889223688216],[-61.30188418287129,-33.838444480832685],[-61.301130848179525,-33.83943255499186],[-61.30078636095504,-33.83996223583909],[-61.30059265818967,-33.84016469670277],[-61.30048478527255,-33.840438447848506],[-61.300252198180424,-33.84026774340676],[-61.29876711207748,-33.839489883020924],[-61.29799408649143,-33.840597902688785],[-61.297669258508,-33.84103160870988],[-61.297566592962134,-33.84112444052047],[-61.29748538503245,-33.841083604060834],[-61.297140578061956,-33.84134946797752],[-61.29709617977233,-33.84160419097128],[-61.297170540239335,-33.84168254110631],[-61.297341460506956,-33.84179653572337],[-61.297243418161194,-33.84197105818567],[-61.29699517169225,-33.84200300239938],[-61.29680176950715,-33.84179064473802],[-61.29691703393983,-33.8416707218475],[-61.297053755769845,-33.841604265738546],[-61.29707920124143,-33.84154875978832],[-61.29709391784669,-33.84147543150246],[-61.29711262215961,-33.84133768608576],[-61.296951411710374,-33.84119216012805],[-61.297262269660294,-33.84089514360839],[-61.297626491077864,-33.84051497848962],[-61.29865532547658,-33.83935363544152],[-61.30027710358755,-33.84011486145675],[-61.30046658230606,-33.83996490243917],[-61.30063460268783,-33.83979712050095],[-61.300992098665965,-33.8393813535522],[-61.301799802937595,-33.83832425565103],[-61.30135527704997,-33.837671541923235],[-61.30082030025984,-33.83731962483044],[-61.299512855628244,-33.83689640801839],[-61.29879550338594,-33.8363083288346],[-61.29831419490918,-33.835559835856905],[-61.298360098160686,-33.83408067231082],[-61.29976541168753,-33.83467181800819],[-61.30104200723692,-33.83586895614681],[-61.30133434017162,-33.83606352507277],[-61.30153415160492,-33.836339043812224],[-61.30164813329583,-33.83657891551336],[-61.30124575062752,-33.83743146168004],[-61.30195917352424,-33.83831965157767],[-61.30196183786503,-33.83843401993221],[-61.30250094586367,-33.83890484694379],[-61.304002690127376,-33.83984352469762],[-61.30473149692381,-33.8397514189025],[-61.3054487998093,-33.839941491549894],[-61.30582354557356,-33.84016574092716],[-61.30604808932503,-33.84046128014441],[-61.306143888278996,-33.840801374736316],[-61.30598219492593,-33.841088001849094],[-61.30757239940571,-33.841967156609876],[-61.30920555104759,-33.84277500140921],[-61.3115751786311,-33.83968838375797],[-61.3115751786311,-33.83968838375797]]]}"
How do I read a csv with syntax like above?
I am doing:
import pandas as pd
df = pd.read_csv('file.csv')
However, read_csv gets confused with the , within "{"type":"Polygon","coordinates": I want it to ignore the , within the quotes.
Your csv file contains a MultiIndex, which is causing your read and split issues.
I have tried multiple methods to read your file correctly. The best method that I have found so far is using the Python engine with an advanced separator in the read_csv function.
import pandas as pd
# these are for viewing the output
pd.set_option('display.max_columns', 30)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 120)
# The separator matches the format of the string that you provided.
# I'm sure that it can be modified to be more efficient.
df = pd.read_csv('test.csv', skiprows=1, sep='(\d{1,2}.\d{1,8}),(\d{1,2}.\d{1,8}),("{"type":.*)',engine="python")
# some cleanup
df = df.drop(df.columns[0], axis=1)
# I had to save the processed file
df.to_csv('test_01.csv')
# read in the new file
df = pd.read_csv('test_01.csv', header=None, index_col=0)
print(df.to_string(index=False))
11.54 0.0 "{"type":"Polygon","coordinates":[[[-61.3115751786311,-33.83968838375797],[-61.29737019968823,-33.83207774370677],[-61.29443049860791,-33.83592770721248],[-61.29241347742871,-33.83489393774538],[-61.28994584513501,-33.83806650089736],[-61.292499308117186,-33.83938539699006],[-61.28958106470898,-33.8431993873636],[-61.29307859612687,-33.84495487100211],[-61.295256567865046,-33.846135537383866],[-61.296388484054326,-33.84676149889543],[-61.296747927196776,-33.84651421268175],[-61.297498943449426,-33.84670133707654],[-61.297992472179686,-33.847120134589964],[-61.299741220055196,-33.84901812154847],[-61.3012164422457,-33.85018089588664],[-61.3015892874819,-33.850566250375365],[-61.30284190607861,-33.85079121660985],[-61.30496105223345,-33.848193766906206],[-61.306084952130036,-33.84682375029292],[-61.30707604410075,-33.845532812572294],[-61.30672627175046,-33.84527169005647],[-61.306290670206494,-33.845188781884744],[-61.304604048903514,-33.847304098561025],[-61.30309763921784,-33.84654473836309],[-61.30013213880613,-33.84478736144466],[-61.30110629620797,-33.8431690707163],[-61.303046037678854,-33.844170576767105],[-61.30433047221653,-33.84266156764314],[-61.30484242472771,-33.842899106713375],[-61.30696068650711,-33.844104878773436],[-61.306418212892446,-33.84505221083753],[-61.307163201216696,-33.845464893960255],[-61.30760172622554,-33.84490909256552],[-61.307932962646014,-33.844513681420494],[-61.309176116985405,-33.84280834206188],[-61.30596211112515,-33.841126948963954],[-61.3056475423994,-33.841449215098756],[-61.30526859890979,-33.841557611902374],[-61.30483601097522,-33.84149669494795],[-61.30448925534122,-33.84120408616046],[-61.30410688411086,-33.840609953572034],[-61.30400151682434,-33.839925243738094],[-61.30240379835875,-33.83889223688216],[-61.30188418287129,-33.838444480832685],[-61.301130848179525,-33.83943255499186],[-61.30078636095504,-33.83996223583909],[-61.30059265818967,-33.84016469670277],[-61.30048478527255,-33.840438447848506],[-61.300252198180424,-33.84026774340676],[-61.29876711207748,-33.839489883020924],[-61.29799408649143,-33.840597902688785],[-61.297669258508,-33.84103160870988],[-61.297566592962134,-33.84112444052047],[-61.29748538503245,-33.841083604060834],[-61.297140578061956,-33.84134946797752],[-61.29709617977233,-33.84160419097128],[-61.297170540239335,-33.84168254110631],[-61.297341460506956,-33.84179653572337],[-61.297243418161194,-33.84197105818567],[-61.29699517169225,-33.84200300239938],[-61.29680176950715,-33.84179064473802],[-61.29691703393983,-33.8416707218475],[-61.297053755769845,-33.841604265738546],[-61.29707920124143,-33.84154875978832],[-61.29709391784669,-33.84147543150246],[-61.29711262215961,-33.84133768608576],[-61.296951411710374,-33.84119216012805],[-61.297262269660294,-33.84089514360839],[-61.297626491077864,-33.84051497848962],[-61.29865532547658,-33.83935363544152],[-61.30027710358755,-33.84011486145675],[-61.30046658230606,-33.83996490243917],[-61.30063460268783,-33.83979712050095],[-61.300992098665965,-33.8393813535522],[-61.301799802937595,-33.83832425565103],[-61.30135527704997,-33.837671541923235],[-61.30082030025984,-33.83731962483044],[-61.299512855628244,-33.83689640801839],[-61.29879550338594,-33.8363083288346],[-61.29831419490918,-33.835559835856905],[-61.298360098160686,-33.83408067231082],[-61.29976541168753,-33.83467181800819],[-61.30104200723692,-33.83586895614681],[-61.30133434017162,-33.83606352507277],[-61.30153415160492,-33.836339043812224],[-61.30164813329583,-33.83657891551336],[-61.30124575062752,-33.83743146168004],[-61.30195917352424,-33.83831965157767],[-61.30196183786503,-33.83843401993221],[-61.30250094586367,-33.83890484694379],[-61.304002690127376,-33.83984352469762],[-61.30473149692381,-33.8397514189025],[-61.3054487998093,-33.839941491549894],[-61.30582354557356,-33.84016574092716],[-61.30604808932503,-33.84046128014441],[-61.306143888278996,-33.840801374736316],[-61.30598219492593,-33.841088001849094],[-61.30757239940571,-33.841967156609876],[-61.30920555104759,-33.84277500140921],[-61.3115751786311,-33.83968838375797],[-61.3115751786311,-33.83968838375797]]]}"
Try this:
pd.read_csv('file.csv',quotechar='"',skipinitialspace=True)

Can I output large numeric to csv as a string

I have a txt file that has columns several columns and some with large numbers and when I read it in through python and output it to a csv the numbers change and I lose important info. Example of txt file:
Identifier
12450006300638672
12450006300638689
12450006300638693
Example csv output:
Identifier Changed_format_in_csv
1.245E+16 12450006300638600
1.245E+16 12450006300638600
1.245E+16 12450006300638600
Is there a way I can get the file to output tho a csv without it changing the large numbers. I have a lot of other columns that are a mix between string and numeric data type, but I was just thinking if I could output everything as a string it would be fine.
This is what I've tried:
import pandas as pd
file1 = 'file.txt'
df = pd.read_csv(file1, sep="|", names=['Identifier'], index_col=False, dtype=str)
df.to_csv('file_new.csv', index=False)
I want the csv file to output like the txt file looks. Was hoping setting dtype=str would help, but it doesn't. Any help would be appreciated.
Short story:
I think this problem is related to the data type pandas is interpreting the content of 'file.txt'.
You could try:
df = df.assign(Identifier=lambda x: x['Identifier'].astype(int))
Long story:
I created file.txt with this content:
12450006300638672
12450006300638689
12450006300638693
Using pandas v0.23.3, I couldn't reproduce your problem with your displayed code, as shown here:
>>> import pandas as pd
>>> df = pd.read_csv('file.txt', sep="|", names=['Identifier'], index_col=False, dtype=str)
>>> df.to_csv('file_new.csv', index=False)
>>> print(df)
Identifier
0 12450006300638672
1 12450006300638689
2 12450006300638693
>>> exit()
$ cat file_new.csv
Identifier
12450006300638672
12450006300638689
12450006300638693
But I could reproduce your problem using pd.read_csv(..., dtype=float) instead:
>>> import pandas as pd
>>> df = pd.read_csv('file.txt', sep="|", names=['Identifier'], index_col=False, dtype=float)
>>> df.to_csv('file_new.csv', index=False)
>>> print(df)
Identifier
0 1.245001e+16
1 1.245001e+16
2 1.245001e+16
>>> exit()
$ cat file_new.csv
Identifier
1.2450006300638672e+16
1.2450006300638688e+16
1.2450006300638692e+16
It seems to be your case, where integer numbers are interpreted as float numbers.
If for some reason you can't interpret them as integers, you could do as follows:
>>> import pandas as pd
>>> df = pd.read_csv('file.txt', sep="|", names=['Identifier'], index_col=False, dtype=float)
>>> print(df)
Identifier
0 1.245001e+16
1 1.245001e+16
2 1.245001e+16
>>> df = df.assign(Identifier=lambda x: x['Identifier'].astype(int))
>>> print(df)
Identifier
0 12450006300638672
1 12450006300638688
2 12450006300638692
>>> df.to_csv('file_new.csv', index=False)
>>> exit()
$ cat file_new.csv
Identifier
12450006300638672
12450006300638688
12450006300638692
It's not pandas that's changing the large numbers, it's the app you're using to view the CSV. To hint to CSV apps that those numbers should be treated as strings, make sure that they're quoted in the output:
import csv
df.to_csv('file_new.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)
It should look like this:
"Identifier"
"12450006300638672"
"12450006300638689"
"12450006300638693"

How to read complex numbers from file with NumPy?

I need to read columns of complex numbers in the format:
# index; (real part, imaginary part); (real part, imaginary part)
1 (1.2, 0.16) (2.8, 1.1)
2 (2.85, 6.9) (5.8, 2.2)
NumPy seems great for reading in columns of data with only a single delimiter, but the parenthesis seem to ruin any attempt at using numpy.loadtxt().
Is there a clever way to read in the file with Python, or is it best to just read the file, remove all of the parenthesis, then feed it to NumPy?
This will need to be done for thousands of files so I would like an automated way, but maybe NumPy is not capable of this.
Here's a more direct way than #Jeff's answer, telling loadtxt to load it in straight to a complex array, using a helper function parse_pair that maps (1.2,0.16) to 1.20+0.16j:
>>> import re
>>> import numpy as np
>>> pair = re.compile(r'\(([^,\)]+),([^,\)]+)\)')
>>> def parse_pair(s):
... return complex(*map(float, pair.match(s).groups()))
>>> s = '''1 (1.2,0.16) (2.8,1.1)
2 (2.85,6.9) (5.8,2.2)'''
>>> from cStringIO import StringIO
>>> f = StringIO(s)
>>> np.loadtxt(f, delimiter=' ', dtype=np.complex,
... converters={1: parse_pair, 2: parse_pair})
array([[ 1.00+0.j , 1.20+0.16j, 2.80+1.1j ],
[ 2.00+0.j , 2.85+6.9j , 5.80+2.2j ]])
Or in pandas:
>>> import pandas as pd
>>> f.seek(0)
>>> pd.read_csv(f, delimiter=' ', index_col=0, names=['a', 'b'],
... converters={1: parse_pair, 2: parse_pair})
a b
1 (1.2+0.16j) (2.8+1.1j)
2 (2.85+6.9j) (5.8+2.2j)
Since this issue is still not resolved in pandas, let me add another solution. You could modify your DataFrame with a one-liner after reading it in:
import pandas as pd
df = pd.read_csv('data.csv')
df = df.apply(lambda col: col.apply(lambda val: complex(val.strip('()'))))
If your file only has 5 columns like you've shown, you could feed it to pandas with a regex for conversion, replacing the parentheses with commas on every line. After that, you could combine them as suggested in this SO answer to get complex numbers.
Pandas makes it easier, because you can pass a regex to its read_csv method, which lets you write clearer code and use a converter like this. The advantage over the numpy version is that you can pass a regex for the delimiter.
import pandas as pd
from StringIO import StringIO
f_str = "1 (2, 3) (5, 6)\n2 (3, 4) (4, 8)\n3 (0.2, 0.5) (0.6, 0.1)"
f.seek(0)
def complex_converter(txt):
txt = txt.strip("()").replace(", ", "+").replace("+-", "-") + "j"
return complex(txt)
df = pd.read_csv(buf, delimiter=r" \(|\) \(", converters = {1: complex_converter, 2: complex_converter}, index_col=0)
EDIT: Looks like #Dougal came up with this just before I posted this...really just depends on how you want to handle the complex number. I like being able to avoid the explicit use of the re module.

Categories