Tuple and CSV Reader in Python - python

Attempting something relatively simple.
First, I have dictionary with tuples as keys as follows:
(0,1,1,0): "Index 1"
I'm reading in a CSV file which has a corresponding set of fields with various combinations of those zeroes and ones. So for example, the row in the CSV may read 0,1,1,0 without any quoting. I'm trying to match the combination of zeroes and ones in the file to the keys of the dictionary. Using the standard CSV module for this
However the issue is that the zeroes and ones are being read in as strings with single quotes rather than integers. In other words, the tuple created from each row is structured as ('0','1','1','0') which does not match (0,1,1,0)
Can anyone shed some light on how to bring the CSV in and remove the single quotes? Tuple matching and CSV reading seem to work -- just need to straighten out the data format.
Thanks!

tuple(int(x) for x in ('0','1','1','0')) # returns (0,1,1,0)
So, if your CSV reader object is called csv_reader, you just need a loop like this:
for row in csv_reader:
tup = tuple(int(x) for x in row)
# ...

when you read in the CSV file, depending on what libraries you're using, you can specify the delimiter.
typically, the comma is interpreted as the delimiter. perhaps you can specify the delimiter to be something else, e.g. '-', so that set of digits are read together as a string, and you can convert it to a tuple using variety of methods, such as using ast.literal_eval mentioned in converting string to tuple
hope that helps!

Related

How to stop to_csv from using comma as separator

I have a dataframe in which one of the columns contains list object like [1,2].
I am trying to export to csv with the following line
df.to_csv('df.csv', sep = ';')
However, the resultant csv, instead of having each row in a single cell, split the row at the comma inside the list object, so I have something like
Column A
Column B
0;xxx;xxx;[1
2];xxx;xxx;xx
Can someone help? Thanks!
What I want is
Column A
0;xxx;xxx;[1,2];xxx;xxx;xx
Updates:
I have tried to make the column filled with strings like
"[1,2,3]" or "100,000,000", it would still split at the comma.
You’ll probably need to surround the list objects with double quotes to make them strings. Then you can use something like this to transform each string back into a list:
import ast
ast.literal_eval(list_as_string)
This problem can be solved by using quotes:
import csv
df.to_csv('df.csv', sep = ',', quoting=csv.QUOTE_ALL)

Processing a CSV with colon separated pairs in fields

I have a CSV in the format of
Fruit:Apple,Seeds:Yes,Colour:Red or Green
Fruit:Orange,Seeds:No,Colour:Orange
Fruit:Pear,Seeds:Yes,Colour:Green,Shape:Odd
Fruit:Banana,Seeds:No,Colour:Yellow,Shape:Also Odd
and I want to be able to use create a JSON object for these values that looks something like
{"requestdata":{
"testdata":"example",
"testcategory":"category",
"fruits":{
"Fruit":{
"value":"Apple"
"type":"string"},
"Seeds":{
"value":"Yes"
"type":"bool"}
}
etc
I know I can load the CSV with a delimiter of my choosing, but how would I specify the second delimiter? Or should I try and build a dictionary instead for each cell of data and treat it as a string to split?
You should just split on the comma and use a string split to process the remaining elements, building a dictionary, then have the json module produce JSON from the dictionary. It is fairly easy to create malformed JSON when trying to be clever with text processing, such as
Forgetting to quote keys.
Quoting Values you didn't mean to
Not escaping JSON special characters
Building the dictionary and then having the moule do its thing will make your code much more maintainable and less error prone.

How to give double quotes to column with strings that have comma's in csv

I have a csv file that has a column of strings that has comma's inside the string. If i want to read the csv using pandas it sees the extra comma's as extra columns.Which gives me the error of have more rows then expected. I thought of using double quotes around the strings as solution to the problem.
This is how the csv currently looks
lead,Chat.Event,Role,Data,chatid
lead,x,Lead,Hello, how are you,1
How it should look like
lead,Chat.Event,Role,Data,chatid
lead,x,Lead,"Hello, how are you",1
Is using double quotes around the strings the best solution? and if yes how do i do that? And if not what other solution can you recommend?
if you got the original file / database through which you generated the csv, you should do it again using a different kind of separator (the default is comma), one which you would not have within your strings, such as "|" (vertical bar).
than, when reading the csv with pandas, you can just pass the argument:
pd.read_csv(file_path, sep="your separator symbol here")
hope that helps

Write to CSV file after re.sub Python

I have done a regex substitution on a CSV file that prints following output, just like anything else:
H1,H2,H3
A1,GG,98
B3,KLK,Oe
But when I write it to a CSV file, it writes complete line in one cell (doesn't use commas as delimiters even though specified). I used the writer.writerow(row.split("\n")) to write, where row is the data obtained after re.sub (i.e. the output posted above).
From the docs:
A row must be a sequence of strings or numbers
You are passing a list of rows, not individual values. You have to split each row by commas:
for row in row.split('\n'):
writer.writerow(row.split(','))

Remove unwanted commas from CSV using Python

I need some help, I have a CSV file that contains an address field, whoever input the data into the original database used commas to separate different parts of the address - for example:
Flat 5, Park Street
When I try to use the CSV file it treats this one entry as two separate fields when in fact it is a single field. I have used Python to strip commas out where they are between inverted commas as it is easy to distinguish them from a comma that should actually be there, however this problem has me stumped.
Any help would be gratefully received.
Thanks.
You can define the separating and quoting characters with Python's CSV reader. For example:
With this CSV:
1,`Flat 5, Park Street`
And this Python:
import csv
with open('14144315.csv', 'rb') as csvfile:
rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')
for row in rowreader:
print row
You will see this output:
['1', 'Flat 5, Park Street']
This would use commas to separate values but inverted commas for quoted commas
The CSV file was not generated properly. CSV files should have some form of escaping of text, usually using double-quotes:
1,John Doe,"City, State, Country",12345
Some CSV exports do this to all fields (this is an option when exporting from Excel/LibreOffice), but ambiguous fields (such as those including commas) must be escaped.
Either fix this manually or properly regenerate the CSV. Naturally, this cannot be fixed programatically.
Edit: I just noticed something about "inverted commas" being used for escaping - if that is the case see Jason Sperske's answer, which is spot on.

Categories