How to include escaped quotes with regex python [duplicate] - python

Anyone know of a simple library or function to parse a csv encoded string and turn it into an array or dictionary?
I don't think I want the built in csv module because in all the examples I've seen that takes filepaths, not strings.

You can convert a string to a file object using io.StringIO and then pass that to the csv module:
from io import StringIO
import csv
scsv = """text,with,Polish,non-Latin,letters
1,2,3,4,5,6
a,b,c,d,e,f
gęś,zółty,wąż,idzie,wąską,dróżką,
"""
f = StringIO(scsv)
reader = csv.reader(f, delimiter=',')
for row in reader:
print('\t'.join(row))
simpler version with split() on newlines:
reader = csv.reader(scsv.split('\n'), delimiter=',')
for row in reader:
print('\t'.join(row))
Or you can simply split() this string into lines using \n as separator, and then split() each line into values, but this way you must be aware of quoting, so using csv module is preferred.
On Python 2 you have to import StringIO as
from StringIO import StringIO
instead.

Simple - the csv module works with lists, too:
>>> a=["1,2,3","4,5,6"] # or a = "1,2,3\n4,5,6".split('\n')
>>> import csv
>>> x = csv.reader(a)
>>> list(x)
[['1', '2', '3'], ['4', '5', '6']]

The official doc for csv.reader() https://docs.python.org/2/library/csv.html is very helpful, which says
file objects and list objects are both suitable
import csv
text = """1,2,3
a,b,c
d,e,f"""
lines = text.splitlines()
reader = csv.reader(lines, delimiter=',')
for row in reader:
print('\t'.join(row))

Per the documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just turn your string into a single element list.
Importing StringIO seems a bit excessive to me when this example is explicitly in the docs.

As others have already pointed out, Python includes a module to read and write CSV files. It works pretty well as long as the input characters stay within ASCII limits. In case you want to process other encodings, more work is needed.
The Python documentation for the csv module implements an extension of csv.reader, which uses the same interface but can handle other encodings and returns unicode strings. Just copy and paste the code from the documentation. After that, you can process a CSV file like this:
with open("some.csv", "rb") as csvFile:
for row in UnicodeReader(csvFile, encoding="iso-8859-15"):
print row

Not a generic CSV parser but usable for simple strings with commas.
>>> a = "1,2"
>>> a
'1,2'
>>> b = a.split(",")
>>> b
['1', '2']
To parse a CSV file:
f = open(file.csv, "r")
lines = f.read().split("\n") # "\r\n" if needed
for line in lines:
if line != "": # add other needed checks to skip titles
cols = line.split(",")
print cols

https://docs.python.org/2/library/csv.html?highlight=csv#csv.reader
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
Thus, a StringIO.StringIO(), str.splitlines() or even a generator are all good.

Use this to have a csv loaded into a list
import csv
csvfile = open(myfile, 'r')
reader = csv.reader(csvfile, delimiter='\t')
my_list = list(reader)
print my_list
>>>[['1st_line', '0'],
['2nd_line', '0']]

Here's an alternative solution:
>>> import pyexcel as pe
>>> text="""1,2,3
... a,b,c
... d,e,f"""
>>> s = pe.load_from_memory('csv', text)
>>> s
Sheet Name: csv
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
>>> s.to_array()
[[u'1', u'2', u'3'], [u'a', u'b', u'c'], [u'd', u'e', u'f']]
Here's the documentation

For anyone still looking for a reliable way of converting a standard CSV str to a list[str] as well as in reverse, here are two functions I put together from some of the answers in this and other SO threads:
def to_line(row: list[str]) -> str:
with StringIO() as line:
csv.writer(line).writerow(row)
return line.getvalue().strip()
def from_line(line: str) -> list[str]:
return next(csv.reader([line]))

Related

python: shlex splitting [duplicate]

Anyone know of a simple library or function to parse a csv encoded string and turn it into an array or dictionary?
I don't think I want the built in csv module because in all the examples I've seen that takes filepaths, not strings.
You can convert a string to a file object using io.StringIO and then pass that to the csv module:
from io import StringIO
import csv
scsv = """text,with,Polish,non-Latin,letters
1,2,3,4,5,6
a,b,c,d,e,f
gęś,zółty,wąż,idzie,wąską,dróżką,
"""
f = StringIO(scsv)
reader = csv.reader(f, delimiter=',')
for row in reader:
print('\t'.join(row))
simpler version with split() on newlines:
reader = csv.reader(scsv.split('\n'), delimiter=',')
for row in reader:
print('\t'.join(row))
Or you can simply split() this string into lines using \n as separator, and then split() each line into values, but this way you must be aware of quoting, so using csv module is preferred.
On Python 2 you have to import StringIO as
from StringIO import StringIO
instead.
Simple - the csv module works with lists, too:
>>> a=["1,2,3","4,5,6"] # or a = "1,2,3\n4,5,6".split('\n')
>>> import csv
>>> x = csv.reader(a)
>>> list(x)
[['1', '2', '3'], ['4', '5', '6']]
The official doc for csv.reader() https://docs.python.org/2/library/csv.html is very helpful, which says
file objects and list objects are both suitable
import csv
text = """1,2,3
a,b,c
d,e,f"""
lines = text.splitlines()
reader = csv.reader(lines, delimiter=',')
for row in reader:
print('\t'.join(row))
Per the documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just turn your string into a single element list.
Importing StringIO seems a bit excessive to me when this example is explicitly in the docs.
As others have already pointed out, Python includes a module to read and write CSV files. It works pretty well as long as the input characters stay within ASCII limits. In case you want to process other encodings, more work is needed.
The Python documentation for the csv module implements an extension of csv.reader, which uses the same interface but can handle other encodings and returns unicode strings. Just copy and paste the code from the documentation. After that, you can process a CSV file like this:
with open("some.csv", "rb") as csvFile:
for row in UnicodeReader(csvFile, encoding="iso-8859-15"):
print row
Not a generic CSV parser but usable for simple strings with commas.
>>> a = "1,2"
>>> a
'1,2'
>>> b = a.split(",")
>>> b
['1', '2']
To parse a CSV file:
f = open(file.csv, "r")
lines = f.read().split("\n") # "\r\n" if needed
for line in lines:
if line != "": # add other needed checks to skip titles
cols = line.split(",")
print cols
https://docs.python.org/2/library/csv.html?highlight=csv#csv.reader
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
Thus, a StringIO.StringIO(), str.splitlines() or even a generator are all good.
Use this to have a csv loaded into a list
import csv
csvfile = open(myfile, 'r')
reader = csv.reader(csvfile, delimiter='\t')
my_list = list(reader)
print my_list
>>>[['1st_line', '0'],
['2nd_line', '0']]
Here's an alternative solution:
>>> import pyexcel as pe
>>> text="""1,2,3
... a,b,c
... d,e,f"""
>>> s = pe.load_from_memory('csv', text)
>>> s
Sheet Name: csv
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
>>> s.to_array()
[[u'1', u'2', u'3'], [u'a', u'b', u'c'], [u'd', u'e', u'f']]
Here's the documentation
For anyone still looking for a reliable way of converting a standard CSV str to a list[str] as well as in reverse, here are two functions I put together from some of the answers in this and other SO threads:
def to_line(row: list[str]) -> str:
with StringIO() as line:
csv.writer(line).writerow(row)
return line.getvalue().strip()
def from_line(line: str) -> list[str]:
return next(csv.reader([line]))

write output with delimiter in python

I want to save my output as csv file with custom name and delimiter.
I tried this code but now works for me.
out = open('out.csv', 'w')
for row in l:
for column in row:
out.write('%d;' % column)
out.write('\n')
out.close()
Here is my data
100A7E54111FB143
100D11CF822BBBDB
1014120EE9CCB1E0
10276825CD5B4A26
10364F56076B46B7
103D1DDAD3064A66
103F4F66EEB54308
104310B0280E4F20
104E80752424B1C3
106BE9DBB186BEC5
10756F745D8A4123
107966C82D8BAD8
I want to save like this
input.csv
input_id data
number 107966C82D8BAD8 | 10756F745D8A4123 | 106BE9DBB186BEC5
The delmiter would be '|'.The data is in dtype:object
Any help will be appreciated.
Use a writer for the CSV object instead:
import csv
with open('out.csv', 'w', newline='') as out:
spamwriter = csv.writer(out, delimiter='|')
spamwriter.writerow(column)
I have omitted your for loop
One simple way is to use print of python 3.x, If you are using python 2.x then import print from future
from __future__ import print_function #Needed in python 2.x
print(value, ..., sep=' ', end='\n', file=sys.stdout)
You can use this snippet of code as an example. I did something similar in my program, creating a text file with a comma separator, no csv module involved. Not sure about newlines. I just used it for one line in my case...
cachef_h = [a,b,c,d]
f = open('cachef_h.txt', 'x')
f.write(cachef_h[0])
for column_headers in cachef_h[1:]:
f.write(',' + column_headers)

How to write a number as text while writing in csv file in python

import csv
a = ['679L', 'Z60', '033U', '0003']
z = csv.writer(open("test1.csv", "wb"))
z.writerow(a)
Consider the code above
Output:
676L Z60 33U 3
I need to get it in the text format itself as
676L Z60 033U 0003
How to do that.
The Python csv module does not treat strings as numbers when writing the file:
>>> import csv
>>> from StringIO import StringIO
>>> a = ['679L', 'Z60', '033U', '0003']
>>> out = StringIO()
>>> z = csv.writer(out)
>>> z.writerow(a)
>>> out.getvalue()
'679L,Z60,033U,0003\r\n'
If you are seeing 3 in some other tool when reading you need to fix that tool; Python is not at fault here.
You can instruct the csv.writer() to put quotes around anything that is not a number; this could make it clearer to whatever reads your CSV that the column is not numeric. Set quoting to csv.QUOTE_NONNUMERIC:
>>> out = StringIO()
>>> z = csv.writer(out, quoting=csv.QUOTE_NONNUMERIC)
>>> z.writerow(a)
>>> out.getvalue()
'"679L","Z60","033U","0003"\r\n'
but this won't prevent Excel from treating the column as numeric anyway.
If you are loading this into Excel then don't use the Open feature. Instead create a new empty worksheet and use the Import feature instead. This will let you designate a column as Text rather than General.

Remove character between two characters in Python

My input string looks like this:
"1,724,741","24,527,465",14.00,14.35,14.00,14.25
I want the output to look like this:
1724741,24527465,14.00,14.35,14.00,14.25
I played with re.sub but still couldn't figure out.
Any help would be appreciated.
The csv module handles the quoting nicely:
>>> s = '"1,724,741","24,527,465",14.00,14.35,14.00,14.25'
>>> import csv
>>> r = csv.reader([s])
>>> for row in r:
... print ','.join(x.replace(",", "") for x in row)
...
1724741,24527465,14.00,14.35,14.00,14.25
A quite hacky solution is to use ast.literal_eval():
>>> from ast import literal_eval
>>> s = '"1,724,741","24,527,465",14.00,14.35,14.00,14.25'
>>> print ",".join(x.replace(",", "") if isinstance(x, str) else str(x)
... for x in literal_eval(s))
1724741,24527465,14.0,14.35,14.0,14.25
Note that this also reformats the floating point numbers.
Edit: Since you are apparently dealing with a CSV file and integers with thousands separators, a cleaner solution might be
import csv
import locale
locale.setlocale(locale.LC_ALL, 'en_GB.UTF8')
converters = [locale.atoi] * 2 + [locale.atof] * 4
with open("input.csv", "rb") as f, open("output.csv", "wb") as g:
out = csv.writer(g)
for row in csv.reader(f):
out.writerow([conv(x) for conv, x in zip(converters, row)])
You will need to substitute en_GB.UTF8 by a locale supported by your machine (and having comma as a thousands separator).

Map over csv in python

I'm trying to use "map" on a csv file in python.
However, the line map(lambda x: x, reseller_csv) gives nothing.
I've tried iterating over the csv object, and it works fine and can print the rows.
Here's the code.
# imports
import csv
# Opens files
ifile = open('C:\Users\josh.SCL\Desktop\Records.csv', 'r')
ofile = open('C:\Users\josh.SCL\Desktop\RecordsNew.csv', 'w')
resellers_file = open('C:\Users\josh.SCL\Desktop\Reseller.csv', 'r')
# Setup CSV objects
csvfile = csv.DictReader(ifile, delimiter=',')
reseller_csv = csv.DictReader(resellers_file, delimiter=',')
# Get names only in resellers
resellers = map(lambda x: x.get('Reseller'), reseller_csv)
A csv.DictReader is a use-once gadget. You probably ran it a second time.
>>> import csv
>>> iterable = ['Reseller,cost', 'fred,100', 'joe,99']
>>> reseller_csv = csv.DictReader(iterable)
>>> map(lambda x: x.get('Reseller'), reseller_csv)
['fred', 'joe']
>>> map(lambda x: x.get('Reseller'), reseller_csv)
[]
>>>
While we're here:
(1) [Python 2.x] Always open csv files in BINARY mode.
[Python 3.x] Always open csv files in text mode (the default), and use newline=''
(2) If you insist on hardcoding file paths in Windows, use r"...." instead of "...", or use forward slashes -- otherwise \n and \t will be interpreted as control characters.
The following works for me:
>>> data = ["name,age", "john,32", "bob,45"]
>>> list(map(lambda x: x.get("name"), csv.DictReader(data))) # Python 3 so using list to see values.
['john', 'bob']
Are you sure you get any data at all from your DictReader? Do you read any data from it prior to that, exhausting the reader perhaps?
First on your specific problem: try checking if there is actually a key named 'Reseller', chances are its there with different capitalization or extra space. See list of all the keys (assuming non-exhausted DictReader):
>>> csvfile.next().keys()
Otherwise the map() should work fine. But i'd argue it's more readable (and faster!) done like this:
resellers = [x['Reseller'] for x in reseller_csv]

Categories