Remove character between two characters in Python - python

My input string looks like this:
"1,724,741","24,527,465",14.00,14.35,14.00,14.25
I want the output to look like this:
1724741,24527465,14.00,14.35,14.00,14.25
I played with re.sub but still couldn't figure out.
Any help would be appreciated.

The csv module handles the quoting nicely:
>>> s = '"1,724,741","24,527,465",14.00,14.35,14.00,14.25'
>>> import csv
>>> r = csv.reader([s])
>>> for row in r:
... print ','.join(x.replace(",", "") for x in row)
...
1724741,24527465,14.00,14.35,14.00,14.25

A quite hacky solution is to use ast.literal_eval():
>>> from ast import literal_eval
>>> s = '"1,724,741","24,527,465",14.00,14.35,14.00,14.25'
>>> print ",".join(x.replace(",", "") if isinstance(x, str) else str(x)
... for x in literal_eval(s))
1724741,24527465,14.0,14.35,14.0,14.25
Note that this also reformats the floating point numbers.
Edit: Since you are apparently dealing with a CSV file and integers with thousands separators, a cleaner solution might be
import csv
import locale
locale.setlocale(locale.LC_ALL, 'en_GB.UTF8')
converters = [locale.atoi] * 2 + [locale.atof] * 4
with open("input.csv", "rb") as f, open("output.csv", "wb") as g:
out = csv.writer(g)
for row in csv.reader(f):
out.writerow([conv(x) for conv, x in zip(converters, row)])
You will need to substitute en_GB.UTF8 by a locale supported by your machine (and having comma as a thousands separator).

Related

How to include escaped quotes with regex python [duplicate]

Anyone know of a simple library or function to parse a csv encoded string and turn it into an array or dictionary?
I don't think I want the built in csv module because in all the examples I've seen that takes filepaths, not strings.
You can convert a string to a file object using io.StringIO and then pass that to the csv module:
from io import StringIO
import csv
scsv = """text,with,Polish,non-Latin,letters
1,2,3,4,5,6
a,b,c,d,e,f
gęś,zółty,wąż,idzie,wąską,dróżką,
"""
f = StringIO(scsv)
reader = csv.reader(f, delimiter=',')
for row in reader:
print('\t'.join(row))
simpler version with split() on newlines:
reader = csv.reader(scsv.split('\n'), delimiter=',')
for row in reader:
print('\t'.join(row))
Or you can simply split() this string into lines using \n as separator, and then split() each line into values, but this way you must be aware of quoting, so using csv module is preferred.
On Python 2 you have to import StringIO as
from StringIO import StringIO
instead.
Simple - the csv module works with lists, too:
>>> a=["1,2,3","4,5,6"] # or a = "1,2,3\n4,5,6".split('\n')
>>> import csv
>>> x = csv.reader(a)
>>> list(x)
[['1', '2', '3'], ['4', '5', '6']]
The official doc for csv.reader() https://docs.python.org/2/library/csv.html is very helpful, which says
file objects and list objects are both suitable
import csv
text = """1,2,3
a,b,c
d,e,f"""
lines = text.splitlines()
reader = csv.reader(lines, delimiter=',')
for row in reader:
print('\t'.join(row))
Per the documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just turn your string into a single element list.
Importing StringIO seems a bit excessive to me when this example is explicitly in the docs.
As others have already pointed out, Python includes a module to read and write CSV files. It works pretty well as long as the input characters stay within ASCII limits. In case you want to process other encodings, more work is needed.
The Python documentation for the csv module implements an extension of csv.reader, which uses the same interface but can handle other encodings and returns unicode strings. Just copy and paste the code from the documentation. After that, you can process a CSV file like this:
with open("some.csv", "rb") as csvFile:
for row in UnicodeReader(csvFile, encoding="iso-8859-15"):
print row
Not a generic CSV parser but usable for simple strings with commas.
>>> a = "1,2"
>>> a
'1,2'
>>> b = a.split(",")
>>> b
['1', '2']
To parse a CSV file:
f = open(file.csv, "r")
lines = f.read().split("\n") # "\r\n" if needed
for line in lines:
if line != "": # add other needed checks to skip titles
cols = line.split(",")
print cols
https://docs.python.org/2/library/csv.html?highlight=csv#csv.reader
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
Thus, a StringIO.StringIO(), str.splitlines() or even a generator are all good.
Use this to have a csv loaded into a list
import csv
csvfile = open(myfile, 'r')
reader = csv.reader(csvfile, delimiter='\t')
my_list = list(reader)
print my_list
>>>[['1st_line', '0'],
['2nd_line', '0']]
Here's an alternative solution:
>>> import pyexcel as pe
>>> text="""1,2,3
... a,b,c
... d,e,f"""
>>> s = pe.load_from_memory('csv', text)
>>> s
Sheet Name: csv
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
>>> s.to_array()
[[u'1', u'2', u'3'], [u'a', u'b', u'c'], [u'd', u'e', u'f']]
Here's the documentation
For anyone still looking for a reliable way of converting a standard CSV str to a list[str] as well as in reverse, here are two functions I put together from some of the answers in this and other SO threads:
def to_line(row: list[str]) -> str:
with StringIO() as line:
csv.writer(line).writerow(row)
return line.getvalue().strip()
def from_line(line: str) -> list[str]:
return next(csv.reader([line]))

python: shlex splitting [duplicate]

Anyone know of a simple library or function to parse a csv encoded string and turn it into an array or dictionary?
I don't think I want the built in csv module because in all the examples I've seen that takes filepaths, not strings.
You can convert a string to a file object using io.StringIO and then pass that to the csv module:
from io import StringIO
import csv
scsv = """text,with,Polish,non-Latin,letters
1,2,3,4,5,6
a,b,c,d,e,f
gęś,zółty,wąż,idzie,wąską,dróżką,
"""
f = StringIO(scsv)
reader = csv.reader(f, delimiter=',')
for row in reader:
print('\t'.join(row))
simpler version with split() on newlines:
reader = csv.reader(scsv.split('\n'), delimiter=',')
for row in reader:
print('\t'.join(row))
Or you can simply split() this string into lines using \n as separator, and then split() each line into values, but this way you must be aware of quoting, so using csv module is preferred.
On Python 2 you have to import StringIO as
from StringIO import StringIO
instead.
Simple - the csv module works with lists, too:
>>> a=["1,2,3","4,5,6"] # or a = "1,2,3\n4,5,6".split('\n')
>>> import csv
>>> x = csv.reader(a)
>>> list(x)
[['1', '2', '3'], ['4', '5', '6']]
The official doc for csv.reader() https://docs.python.org/2/library/csv.html is very helpful, which says
file objects and list objects are both suitable
import csv
text = """1,2,3
a,b,c
d,e,f"""
lines = text.splitlines()
reader = csv.reader(lines, delimiter=',')
for row in reader:
print('\t'.join(row))
Per the documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just turn your string into a single element list.
Importing StringIO seems a bit excessive to me when this example is explicitly in the docs.
As others have already pointed out, Python includes a module to read and write CSV files. It works pretty well as long as the input characters stay within ASCII limits. In case you want to process other encodings, more work is needed.
The Python documentation for the csv module implements an extension of csv.reader, which uses the same interface but can handle other encodings and returns unicode strings. Just copy and paste the code from the documentation. After that, you can process a CSV file like this:
with open("some.csv", "rb") as csvFile:
for row in UnicodeReader(csvFile, encoding="iso-8859-15"):
print row
Not a generic CSV parser but usable for simple strings with commas.
>>> a = "1,2"
>>> a
'1,2'
>>> b = a.split(",")
>>> b
['1', '2']
To parse a CSV file:
f = open(file.csv, "r")
lines = f.read().split("\n") # "\r\n" if needed
for line in lines:
if line != "": # add other needed checks to skip titles
cols = line.split(",")
print cols
https://docs.python.org/2/library/csv.html?highlight=csv#csv.reader
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
Thus, a StringIO.StringIO(), str.splitlines() or even a generator are all good.
Use this to have a csv loaded into a list
import csv
csvfile = open(myfile, 'r')
reader = csv.reader(csvfile, delimiter='\t')
my_list = list(reader)
print my_list
>>>[['1st_line', '0'],
['2nd_line', '0']]
Here's an alternative solution:
>>> import pyexcel as pe
>>> text="""1,2,3
... a,b,c
... d,e,f"""
>>> s = pe.load_from_memory('csv', text)
>>> s
Sheet Name: csv
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
>>> s.to_array()
[[u'1', u'2', u'3'], [u'a', u'b', u'c'], [u'd', u'e', u'f']]
Here's the documentation
For anyone still looking for a reliable way of converting a standard CSV str to a list[str] as well as in reverse, here are two functions I put together from some of the answers in this and other SO threads:
def to_line(row: list[str]) -> str:
with StringIO() as line:
csv.writer(line).writerow(row)
return line.getvalue().strip()
def from_line(line: str) -> list[str]:
return next(csv.reader([line]))

Python Add a New Line without \n

I have this
f = open(os.path.join(r"/path/to/file/{}.txt").format(userid), "w")
f.write(str(points))
f.write(str(level))
f.write(str(prevtime))
f.close()
I know about using with open(blah) as f: and prefer this but when I have this code, even if I write the file first and then change to append mode, without adding a +"\n" it doesn't add to a new line. The reason \n is a problem is that when I go to get the data using
f = open(os.path.join(r"blah\{}.txt").format(userid), "r")
lines = f.readlines()
points = float(lines[0])
I'll get an error telling me it can't interpret (for example: 500\n) as a float because it reads the \n. Any idea what I can do?
EDIT
I ended up fixing it by just not making it a float, but now that is giving me a ValueError Unconverted Data Remains. These issues are only happening due to the line in the txt file that should contain a date in the format of %H:%M
EDIT 2019
I see lots of people trying to search the same question. My problem actually ended up with me ignoring and having a very clear lack of understanding of types in Python.
To answer the question that I imagine many people are searching for when they view this, \n is Python's newline character. An alternative (if using print()) would to be to call print() exactly as shown with no data, resulting in a blank line.
So, you have something like this
>>> f = open("test.txt", "w")
>>> f.write(str(4))
>>> f.write(str(20))
>>> f.close()
>>> f = open("test.txt")
>>> f.readlines()
['420']
But, you need to write newlines, so just do so
>>> f = open("test.txt", "w")
>>> f.write("{}\n".format(4))
>>> f.write("{}\n".format(20))
>>> f.close()
>>> f = open("test.txt")
>>> f.readlines()
['4\n', '20\n']
>>> f.close()
If you need no newline characters, try read().splitlines()
>>> f = open("test.txt")
>>> f.read().splitlines()
['4', '20']
EDIT
As far as the time value is concerned, here's an example.
>>> from datetime import datetime
>>> time_str = datetime.now().strftime("%H:%M")
>>> time_str
'18:26'
>>> datetime.strptime(time_str, "%H:%M")
datetime.datetime(1900, 1, 1, 18, 26)
To print without newlines, use below.
But without newlines, you might need to add some separator like space to separate your data
>>> sys.stdout.write('hello world')
hello world>>>
With the newlines remain, you could use rstrip to strip off the newlines when reading out
lines = f.readlines()
points = float(lines[0].rstrip)
Alternatively, I prefer more pythonic way below
lines = f.read().splitlines()
points = float(lines[0])

Storing a random byte string in Python

For my project, I need to be able to store random byte strings in a file and read the byte string again later. For example, I want to store randomByteString from the following code:
>>> from os import urandom
>>> randomByteString=urandom(8)
>>> randomByteString
b'zOZ\x84\xfb\xceM~'
What would be the proper way to do this?
Edit: Forgot to mention that I also want to store 'normal' string alongside the byte strings.
Code like:
>>> fh = open("e:\\test","wb")
>>> fh.write(randomByteString)
8
>>> fh.close()
Operate the file as binary mode. Also, you could do it in a better manner if the file operations are near one place (Thanks to #Blender):
>>> with open("e:\\test","wb") as fh:
fh.write(randomByteString)
Update: if you want to strong normal strings, you could encode it and then write it like:
>>> "test".encode()
b'test'
>>> fh.write("test".encode())
Here the fh means the same file handle opened previously.
Works just fine. You can't expect the output to make much sense though.
>>> import os
>>> with open("foo.txt", "wb") as fh:
... fh.write(os.urandom(8))
...
>>> fh.close()
>>> with open("foo.txt", "r") as fh:
... for line in fh.read():
... print line
...
^J^JM-/
^O
R
M-9
J
~G

Map over csv in python

I'm trying to use "map" on a csv file in python.
However, the line map(lambda x: x, reseller_csv) gives nothing.
I've tried iterating over the csv object, and it works fine and can print the rows.
Here's the code.
# imports
import csv
# Opens files
ifile = open('C:\Users\josh.SCL\Desktop\Records.csv', 'r')
ofile = open('C:\Users\josh.SCL\Desktop\RecordsNew.csv', 'w')
resellers_file = open('C:\Users\josh.SCL\Desktop\Reseller.csv', 'r')
# Setup CSV objects
csvfile = csv.DictReader(ifile, delimiter=',')
reseller_csv = csv.DictReader(resellers_file, delimiter=',')
# Get names only in resellers
resellers = map(lambda x: x.get('Reseller'), reseller_csv)
A csv.DictReader is a use-once gadget. You probably ran it a second time.
>>> import csv
>>> iterable = ['Reseller,cost', 'fred,100', 'joe,99']
>>> reseller_csv = csv.DictReader(iterable)
>>> map(lambda x: x.get('Reseller'), reseller_csv)
['fred', 'joe']
>>> map(lambda x: x.get('Reseller'), reseller_csv)
[]
>>>
While we're here:
(1) [Python 2.x] Always open csv files in BINARY mode.
[Python 3.x] Always open csv files in text mode (the default), and use newline=''
(2) If you insist on hardcoding file paths in Windows, use r"...." instead of "...", or use forward slashes -- otherwise \n and \t will be interpreted as control characters.
The following works for me:
>>> data = ["name,age", "john,32", "bob,45"]
>>> list(map(lambda x: x.get("name"), csv.DictReader(data))) # Python 3 so using list to see values.
['john', 'bob']
Are you sure you get any data at all from your DictReader? Do you read any data from it prior to that, exhausting the reader perhaps?
First on your specific problem: try checking if there is actually a key named 'Reseller', chances are its there with different capitalization or extra space. See list of all the keys (assuming non-exhausted DictReader):
>>> csvfile.next().keys()
Otherwise the map() should work fine. But i'd argue it's more readable (and faster!) done like this:
resellers = [x['Reseller'] for x in reseller_csv]

Categories