How to stop to_csv from using comma as separator - python

I have a dataframe in which one of the columns contains list object like [1,2].
I am trying to export to csv with the following line
df.to_csv('df.csv', sep = ';')
However, the resultant csv, instead of having each row in a single cell, split the row at the comma inside the list object, so I have something like
Column A
Column B
0;xxx;xxx;[1
2];xxx;xxx;xx
Can someone help? Thanks!
What I want is
Column A
0;xxx;xxx;[1,2];xxx;xxx;xx
Updates:
I have tried to make the column filled with strings like
"[1,2,3]" or "100,000,000", it would still split at the comma.

You’ll probably need to surround the list objects with double quotes to make them strings. Then you can use something like this to transform each string back into a list:
import ast
ast.literal_eval(list_as_string)

This problem can be solved by using quotes:
import csv
df.to_csv('df.csv', sep = ',', quoting=csv.QUOTE_ALL)

Related

pandas read csv with extra commas and quotations in column

I'm reading a basic csv file where the columns are separated by commas. However, the body column is a string which may contain commas and quotations.
For example, there are some cells like "Bahamas\", The" and "Germany, West"
I have tried
text = pd.read_table("input.txt", encoding = 'utf-16', quotechar='"', sep = ','),
text = pd.read_table("input.txt", encoding = 'utf-16', quotechar='"', delimiter = ','). But they both cannot work.
Is there a way to go around this problem?
Are you able to regenerate the csv? If yes, change the delimit character to a pipe, I.e | . If not, you may be forced to take the long route... because there is no way for any code to figure out which characters are delimiting/quoting and which are part of the value if you have both commas and quotes lurking inside the value.
A workaround could involve leveraging the column position where this problem occurs... I.e first you could isolate the columns to the left of the troubled column, isolate all columns to the right, then all characters remaining are your troubled column. Can you post a few example rows? It would be good to see a few rows that have this issue, and a few that work fine

How to give double quotes to column with strings that have comma's in csv

I have a csv file that has a column of strings that has comma's inside the string. If i want to read the csv using pandas it sees the extra comma's as extra columns.Which gives me the error of have more rows then expected. I thought of using double quotes around the strings as solution to the problem.
This is how the csv currently looks
lead,Chat.Event,Role,Data,chatid
lead,x,Lead,Hello, how are you,1
How it should look like
lead,Chat.Event,Role,Data,chatid
lead,x,Lead,"Hello, how are you",1
Is using double quotes around the strings the best solution? and if yes how do i do that? And if not what other solution can you recommend?
if you got the original file / database through which you generated the csv, you should do it again using a different kind of separator (the default is comma), one which you would not have within your strings, such as "|" (vertical bar).
than, when reading the csv with pandas, you can just pass the argument:
pd.read_csv(file_path, sep="your separator symbol here")
hope that helps

On python, how do I get rid of quotations after joining a list of floats?

Apologies if this has been asked before, I couldn't find the same question.
I'm trying to write 3 things to a CSV file in one line, productcode, amountentered and changecoins.
The changecoins is a list of floats which I have joined together using changecoins=",".join(map(str,changecoins)). This works fine except I still have quotations around the values which are then written into my csv file.
I've tried using strip and replace but they don't seem to work.
I've attached my code and output in the csv file below, does anyone know how to fix this?
changecoins=",".join(map(str,changecoins)).replace('"', '')
changeline=(productcode, amountentered, changecoins)
changewriter.writerow(changeline)
Output
01,1.0,"0.1,0.1"
01,2.0,"0.5,0.5,0.1,0.1"
04,1.0,"0.1,0.1,0.1,0.1"
Why not use extend?
result = [productcode, amountentered]
result.extend(changecoins)
changewriter.writerow(result)
if you want to get even more slick, you can just do:
result = [productcode, amountentered] + changecoins
changewriter.writerow(result)
or even just:
changewriter.writerow([productcode, amountentered] + changecoins)
It seems you're unnecessarily joining the floats...You already have a list of floats, just tack it on to the other two guys and then pass that to your csv writer.
This is most likely because you're using "," to join your values when the CSV delimiter is also a ,. Python is wrapping the column in quotes so the "," inside the cell value isn't confused for a delimiter.
If you change to joining with a different character than "," or change the delimiter for the file, the quotes will go away.

Tuple and CSV Reader in Python

Attempting something relatively simple.
First, I have dictionary with tuples as keys as follows:
(0,1,1,0): "Index 1"
I'm reading in a CSV file which has a corresponding set of fields with various combinations of those zeroes and ones. So for example, the row in the CSV may read 0,1,1,0 without any quoting. I'm trying to match the combination of zeroes and ones in the file to the keys of the dictionary. Using the standard CSV module for this
However the issue is that the zeroes and ones are being read in as strings with single quotes rather than integers. In other words, the tuple created from each row is structured as ('0','1','1','0') which does not match (0,1,1,0)
Can anyone shed some light on how to bring the CSV in and remove the single quotes? Tuple matching and CSV reading seem to work -- just need to straighten out the data format.
Thanks!
tuple(int(x) for x in ('0','1','1','0')) # returns (0,1,1,0)
So, if your CSV reader object is called csv_reader, you just need a loop like this:
for row in csv_reader:
tup = tuple(int(x) for x in row)
# ...
when you read in the CSV file, depending on what libraries you're using, you can specify the delimiter.
typically, the comma is interpreted as the delimiter. perhaps you can specify the delimiter to be something else, e.g. '-', so that set of digits are read together as a string, and you can convert it to a tuple using variety of methods, such as using ast.literal_eval mentioned in converting string to tuple
hope that helps!

Replacing part of string in python pandas dataframe

I have a similar problem to the one posted here:
Pandas DataFrame: remove unwanted parts from strings in a column
I need to remove newline characters from within a string in a DataFrame. Basically, I've accessed an api using python's json module and that's all ok. Creating the DataFrame works amazingly, too. However, when I want to finally output the end result into a csv, I get a bit stuck, because there are newlines that are creating false 'new rows' in the csv file.
So basically I'm trying to turn this:
'...this is a paragraph.
And this is another paragraph...'
into this:
'...this is a paragraph. And this is another paragraph...'
I don't care about preserving any kind of '\n' or any special symbols for the paragraph break. So it can be stripped right out.
I've tried a few variations:
misc['product_desc'] = misc['product_desc'].strip('\n')
AttributeError: 'Series' object has no attribute 'strip'
here's another
misc['product_desc'] = misc['product_desc'].str.strip('\n')
TypeError: wrapper() takes exactly 1 argument (2 given)
misc['product_desc'] = misc['product_desc'].map(lambda x: x.strip('\n'))
misc['product_desc'] = misc['product_desc'].map(lambda x: x.strip('\n\t'))
There is no error message, but the newline characters don't go away, either. Same thing with this:
misc = misc.replace('\n', '')
The write to csv line is this:
misc_id.to_csv('C:\Users\jlalonde\Desktop\misc_w_id.csv', sep=' ', na_rep='', index=False, encoding='utf-8')
Version of Pandas is 0.9.1
Thanks! :)
strip only removes the specified characters at the beginning and end of the string. If you want to remove all \n, you need to use replace.
misc['product_desc'] = misc['product_desc'].str.replace('\n', '')
You could use regex parameter of replace method to achieve that:
misc['product_desc'] = misc['product_desc'].replace(to_replace='\n', value='', regex=True)

Categories