Add special key as a delimeter for CSV file? - python

In my csv file the data is separated by a special character. When I view in Notepad++ it shows 'SOH'.
ATT_AI16601A.PV01-Apr-2014 05:02:192.94752310FalseFalseFalse
ATT_AI16601A.PV[]01-Apr-2014 05:02:19[]2.947523[]1[]0[]False[]False[]False[]
It is present in the data but not visible. I have put markers in the second string where those characters are.
My point is that I need to read that data in Python delimited by these markers. How can I use these special characters as delimiters while reading data?

You can use Python csv module by specifying , as delimiter like this.
import csv
reader = csv.reader(file, delimiter='what ever is your delimiter')
In your case
reader = csv.reader(file, delimiter='\x01')
This is because SOH is an ASCII control character with a code point of 1

Related

Inconsistent quotes on .csv file

I have a comma delimited file which also contains commas in the actual field values, something like this:
foo,bar,"foo, bar"
This file is very large so I am wondering if there is a way in python to either put double quotes around ever field:
eg: "foo","bar","foo, bar"
or just change the delimeter overall?
eg: foo|bar|foo, bar
End goal:
The goal is to ultimately load this file into sql server. Given the size of the file bulk insert is only feasible approach for loading but I cannot specify a text qualifier/field quote due to the version of ssms I have.
This leads me to believe the only remaining approach is to do some preprocessing on the source file.
Changing the delimiter just requires parsing and re-encoding the data.
with open("data.csv") as input, open("new_data.csv", "w") as output:
r = csv.reader(input, delimiter=",", quotechar='"')
w = csv.writer(output, delimiter="|")
w.writerows(r)
Given that your input file is a fairly standard version of CSV, you don't even need to specify the delimiter and quote arguments to reader; the defaults will suffice.
r = csv.reader(input)
It is not an inconsistent quotes. If a value in a CSV file has comma or new line in it quotes are added to it. It shoudn't be a problem, because all standard CSV readers can read it properly.

Read csv file containing escape characters in Python [duplicate]

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed last month.
Hi and many thanks in advance!
I'm working on a Python script handling utf-8 strings and replacing specific characters. Therefore I use msgText.replace(thePair[0], thePair[1]) while looping trough a list which defines unicode characters and their desired replacement, as shown below.
theList = [
('\U0001F601', '1f601.png'),
('\U0001F602', '1f602.png'), ...
]
Up to here everything works fine. But now consider a csv file which contains the characters to be replaced, as shown below.
\U0001F601;1f601.png
\U0001F602;1f602.png
...
I miserably failed in reading the csv data into the list due to the escape characters. I read the data using the csv module like this:
with open('Data.csv', newline='', encoding='utf-8-sig') as theCSV:
theList=[tuple(line) for line in csv.reader(theCSV, delimiter=';')]
This results in pairs like ('\\U0001F601', '1f601.png') which evade the escape characters (note the double backslash). I tried several methods of modifying the string or other methods of reading the csv data, but I was not able to solve my problem.
How could I accomplish my goal to read csv data into pairs which contain escape characters?
I'm adding the solution for reading csv data containing escape characters for the sake of completeness. Consider a file Data.csv defining the replacement pattern:
\U0001F601;1f601.png
\U0001F602;1f602.png
Short version (using list comprehensions):
import csv
# define replacement list (short version)
with open('Data.csv', newline='', encoding='utf-8-sig') as csvFile:
replList=[(line[0].encode().decode('unicode-escape'), line[1]) \
for line in csv.reader(csvFile, delimiter=';') if line]
csvFile.close()
Prolonged version (probably easier to understand):
import csv
# define replacement list (step by step)
replList=[]
with open('Data.csv', newline='', encoding='utf-8-sig') as csvFile:
for line in csv.reader(csvFile, delimiter=';'):
if line: # skip blank lines
replList.append((line[0].encode().decode('unicode-escape'), line[1]))
csvFile.close()

Importing file format similar to csv file with | delimiters into Python

I have a data format, that appears similar to a csv file, however has vertical bars between character strings, but not between Boolean fields. For example:
|2000|,|code_no|,|first name, last name|,,,0,|word string|,0
|2000|,|code_no|,|full name|,,,0,|word string|,0
I'm not sure what format this is (it is saved as a txt file). What format is this, and how would i import into python?
For referece, I had been trying to use:
with open(csv_file, 'rb') as f:
r = unicodecsv.reader(f)
And then stripping out the | from the start and end of the fields. This works ok, with the exception of fields which have a comma in them (e.g. |first name, last name| where the field gets split because of the comma.
It looks like the pipes are being used as quote characters, not delimiters. Have you tried initializing the reader to use pipe ('|') as the quote character, and perhaps to use csv.QUOTE_NONNUMERIC as the quoting rules?
csv.reader(f, quotechar='|', quoting=csv.QUOTE_NONNUMERIC)
Have you tried .reader(f, delimiter=',', quotechar='|') ?

CSV writing strings of text that need a unique delimiter

I wrote an HTML parser in python used to extract data to look like this in a csv file:
itemA, itemB, itemC, Sentence that might contain commas, or colons: like this,\n
so I used a delmiter ":::::" thinking that it wouldn't be mined in the data
itemA, itemB, itemC, ::::: Sentence that might contain commas, or colons: like this,::::\n
This works for most of the thousands of lines, however, apparently a colon : offset this when I imported the csv in Calc.
My question is, what is the best or a unique delimiter to use when creating a csv with many variations of sentences that need to be separated with some delimiter? Am I understanding delimiters correctly in that they separate the values within a CSV?
As I suggested informally in a comment, unique just means you need to use some character that won't be in the data — chr(255) might be a good choice. For example:
Note: The code shown is for Python 2.x — see comments for a Python 3 version.
import csv
DELIMITER = chr(255)
data = ["itemA", "itemB", "itemC",
"Sentence that might contain commas, colons: or even \"quotes\"."]
with open('data.csv', 'wb') as outfile:
writer = csv.writer(outfile, delimiter=DELIMITER)
writer.writerow(data)
with open('data.csv', 'rb') as infile:
reader = csv.reader(infile, delimiter=DELIMITER)
for row in reader:
print row
Output:
['itemA', 'itemB', 'itemC', 'Sentence that might contain commas, colons: or even "quotes".']
If you're not using the csv module and instead are writing and/or reading the data manually, then it would go something like this:
with open('data.csv', 'wb') as outfile:
outfile.write(DELIMITER.join(data) + '\n')
with open('data.csv', 'rb') as infile:
row = infile.readline().rstrip().split(DELIMITER)
print row
Yes, delimiters separate values within each line of a CSV file. There are two strategies to delimiting text that has a lot of punctuation marks. First, you can quote the values, e.g.:
Value 1, Value 2, "This value has a comma, <- right there", Value 4
The second strategy is to use tabs (i.e., '\t').
Python's built-in CSV module can both read and write CSV files that use quotes. Check out the example code under the csv.reader function. The built-in csv module will handle quotes correctly, e.g. it will escape quotes that are in the value itself.
CSV files usually use double quotes " to wrap long fields that might contain a field separator like a comma. If the field contains a double quote it's escaped with a backslash: \".

Mulitple Lines in a single Excel cell

What is the easiest method for writing multple lines into a single cell within excel using python. Ive trying the csv module without success.
import csv
with open('xyz.csv', 'wb') as outfile:
w = csv.writer(outfile)
w.writerow(['stringa','string_multiline',])
Also each of the mutliline stringshave a number of characters in which are typically used for csv`s ie commas.
Any help would be really appreciated.
To figure this out, I created a file in Excel with a single multiline cell.
Then I saved it as CSV and opened it up in a text editor:
"a^Mb"
It looks like Excel interprets Ctrl-M characters as newlines.
Let’s try that with Python:
#!/usr/bin/env python2.7
import csv
with open('xyz.csv', 'wb') as outfile:
w = csv.writer(outfile)
w.writerow(['stringa','multiline\015string',])
Yup, that worked!
you need to pass extra double quotes (") at the start and end of the string. Seperate the different lines of the cell using newline character (\n) .
e.g "line1\nline2\nline3"
f=open("filename.csv","w")
f.write("\"line1\nline2\nline3\"")`
The code creates this csv

Categories