Printing list elements and strings in Python have different results - python

I have a .csv file which is encoded in UTF-8.
I am working with Python 2.7.
Something intereseting happens on Ubuntu.
When I print out the results of the file like this:
with open("file.csv", "r") as file:
myFile = csv.reader(file, delimiter = ",")
for row in myFile:
print row
I get signs like \xc3\x, \xa1\, .... Note that row is a list and all the elements in my list are marked as strings by '' in the output.
When I print out the results like this:
with open("file.csv", "r") as file:
myFile = csv.reader(file, delimiter = ",")
for row in myFile:
print ",".join(row)
Everything is decoded fine. Note that every row from my original file is one big string here.
Why is that?

This is because in the case of printing a list, Python is using repr(), but when printing a string it is using str(). Example:
unicode_str = 'åäö'
unicode_str_list = [unicode_str, unicode_str]
print 'unwrapped:', unicode_str
print 'in list:', unicode_str_list
print 'repr:', repr(unicode_str)
print 'str:', str(unicode_str)
Produces:
unwrapped: åäö
in list: ['\xc3\xa5\xc3\xa4\xc3\xb6', '\xc3\xa5\xc3\xa4\xc3\xb6']
repr: '\xc3\xa5\xc3\xa4\xc3\xb6'
str: åäö

Related

Parsing csv file into Boolean expression in Python

I've got a csv file such as:
cutsets
x1
x3,x5
x2
x4,x6
x5,x7
x6,x8
x7,x9
x6,x8,x10
I run the following Py script:
import csv
# Reads Boolean expression from cutsets file
expr = []
with open("MCS_overlap.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file)
# skip the first row
next(csv_reader)
for lines in csv_reader:
expr = expr + lines + ['|']
del expr[-1]
final_expr=str(''.join(expr)).replace(",","&")
print("The Boolean expression is")
print(final_expr)
and get the output:
The Boolean expression is
x1|x3x5|x2|x4x6|x5x7|x6x8|x7x9|x6x8x10
With final_expr=str(''.join(expr)).replace(",","&") I was hoping to get a "&" between any two variables enclosed by a "|", e.g. "x4&x6","x6&x8&x10". But as can be seen the variables were simply concatenated. How do I accomplish insert "&" given I cannot change the format of the input file?
Thanks
Gui
Here you go:
expr =[]
f = open('MCS_overlap.csv')
expr.append(f.read())
final_expr = expr[0].replace('\n', '|').replace(',', '&')
print(final_expr)
Prints:
'x1|x3&x5|x2|x4&x6|x5&x7|x6&x8|x7&x9|x6&x8&x10'
Because you are using csv module, lines is a list and as a result expr is a list with elements being all x-es and some pipes |. You can print to see for yourself. When you do ''.join(expr) it just concatenates all elements, no commas (i.e. nothing to replace).
this should do
import csv
# Reads Boolean expression from cutsets file
with open("MCS_overlap.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file)
# skip the first row
next(csv_reader)
lines = ('&'.join(line) for line in csv_reader)
final_expr = '|'.join(lines)
print(final_expr)
of course, you can do without csv module
with open("MCS_overlap.csv", "r") as csv_file:
next(csv_file)
lines = (line.strip().replace(',', "&") for line in csv_file)
final_expr = '|'.join(lines)
print(final_expr)
Note, both snippets not tested, but I expect to do the task for you.

Why are the Escape Characters of a String are Printed when Added to a List? [duplicate]

I have a .csv file which is encoded in UTF-8.
I am working with Python 2.7.
Something intereseting happens on Ubuntu.
When I print out the results of the file like this:
with open("file.csv", "r") as file:
myFile = csv.reader(file, delimiter = ",")
for row in myFile:
print row
I get signs like \xc3\x, \xa1\, .... Note that row is a list and all the elements in my list are marked as strings by '' in the output.
When I print out the results like this:
with open("file.csv", "r") as file:
myFile = csv.reader(file, delimiter = ",")
for row in myFile:
print ",".join(row)
Everything is decoded fine. Note that every row from my original file is one big string here.
Why is that?
This is because in the case of printing a list, Python is using repr(), but when printing a string it is using str(). Example:
unicode_str = 'åäö'
unicode_str_list = [unicode_str, unicode_str]
print 'unwrapped:', unicode_str
print 'in list:', unicode_str_list
print 'repr:', repr(unicode_str)
print 'str:', str(unicode_str)
Produces:
unwrapped: åäö
in list: ['\xc3\xa5\xc3\xa4\xc3\xb6', '\xc3\xa5\xc3\xa4\xc3\xb6']
repr: '\xc3\xa5\xc3\xa4\xc3\xb6'
str: åäö

python parsing string to csv format

I have a file containing a line with the following format
aaa=A;bbb=B;ccc=C
I want to convert it to a csv format so the literals on the equation sides will be columns and the semicolon as a row separator. I tried doing something like this
f = open("aaa.txt", "r")
with open("ccc.csv", 'w') as csvFile:
writer = csv.writer(csvFile)
rows = []
if f.mode == 'r':
single = f.readline()
lns = single.split(";")
for item in lns:
rows.append(item.replace("=", ","))
writer.writerows(rows)
f.close()
csvFile.close()
but I am getting each letter as a column so the result looks like :
a,a,a,",",A
b,b,b,",",B
c,c,c,",",C,"
The expected result should look like
aaa,A
bbb,B
ccc,C
The following 1 line change worked for me:
rows.append(item.split('='))
instead of the existing code
rows.append(item.replace("=", ",")).
That way, I was able to create a list of lists which can easily be read by the writer so that the row list looks like [['aaa', 'A'], ['bbb', 'B'], ['ccc', 'C']]instead of ['aaa,A', 'bbb,B', 'ccc,C']
Just write the strings into the target file line by line:
import os
f = open("aaa.txt", "r")
with open("ccc.csv", 'w') as csvFile:
single = f.readline()
lns = single.split(";")
for item in lns:
csvFile.write(item.replace("=", ",") + os.linesep)
f.close()
The output would be:
aaa,A
bbb,B
ccc,C
It helps to interactively execute the commands and print the values, or add debug print in the code (that will be removed or commented when everything works). Here you could have seen that rows is ['aaa,A', 'bbb,B', 'ccc,C'] that is 3 strings when it should be three sequences.
As a string is a (read only) sequence of chars writerows uses each char as a field.
So you do not want to replace the = with a comma (,), but want to split on the equal sign:
...
for item in lns:
rows.append(item.split("=", 1))
...
But the csv module requires for proper operation the output file to be opened with newline=''.
So you should have:
with open("ccc.csv", 'w', newline='') as csvFile:
...
The parameter to writer.writerows() must be an iterable of rows, which must in turn be iterables of strings or numbers. Since you pass it a list of strings, characters in the strings are treated as separate fields. You can obtain the proper list of rows by splitting the line first on ';', then on '=':
import csv
with open('in.txt') as in_file, open('out.csv', 'w') as out_file:
writer = csv.writer(out_file)
line = next(in_file).rstrip('\n')
rows = [item.split('=') for item in line.split(';')]
writer.writerows(rows)

Python: replace a string in a CSV file

I am a beginner and I have an issue with a short code. I want to replace a string from a csv to with another string, and put out a new
csv with an new name. The strings are separated with commas.
My code is a catastrophe:
import csv
f = open('C:\\User\\Desktop\\Replace_Test\\Testreplace.csv')
csv_f = csv.reader(f)
g = open('C:\\Users\\Desktop\\Replace_Test\\Testreplace.csv')
csv_g = csv.writer(g)
findlist = ['The String, that should replaced']
replacelist = ['The string that should replace the old striong']
#the function ?:
def findReplace(find,replace):
s = f.read()
for item, replacement in zip(findlist,replacelist):
s = s.replace(item,replacement)
g.write(s)
for row in csv_f:
print(row)
f.close()
g.close()
You can do this with the regex package re. Also, if you use with you don't have to remember to close your files, which helps me.
EDIT: Keep in mind that this matches the exact string, meaning it's also case-sensitive. If you don't want that then you probably need to use an actual regex to find the strings that need replacing. You would do this by replacing find_str in the re.sub() call with r'your_regex_here'.
import re
# open your csv and read as a text string
with open(my_csv_path, 'r') as f:
my_csv_text = f.read()
find_str = 'The String, that should replaced'
replace_str = 'The string that should replace the old striong'
# substitute
new_csv_str = re.sub(find_str, replace_str, my_csv_text)
# open new file and save
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(new_csv_str)

Replace multiple cell in csv python module

I've a large csv file(comma delimited). I would like to replace/rename few random cell with the value "NIL" to an empty string "".
I tried this to find the keyword "NIL" and replace with '' empty
string. But it's giving me an empty csv file
ifile = open('outfile', 'rb')
reader = csv.reader(ifile,delimiter='\t')
ofile = open('pp', 'wb')
writer = csv.writer(ofile, delimiter='\t')
findlist = ['NIL']
replacelist = [' ']
s = ifile.read()
for item, replacement in zip(findlist, replacelist):
s = s.replace(item, replacement)
ofile.write(s)
From seeing you code i fell you directly should
read the file
with open("test.csv") as opened_file:
data = opened_file.read()
then use regex to change all NIL to "" or " " and save back the data to the file.
import re
data = re.sub("NIL"," ",data) # this code will replace NIL with " " in the data string
NOTE: you can give any regex instead of NIL
for more info see re module.
EDIT 1: re.sub returns a new string so you need to return it to data.
A few tweaks and your example works. I edited your question to get rid of some indenting errors - assuming those were a cut/paste problem. The next problem is that you don't import csv ... but even though you create a reader and writer, you don't actually use them, so it could just be removed. So, opening in text instead of binary mode, we have
ifile = open('outfile') # 'outfile' is the input file...
ofile = open('pp', 'w')
findlist = ['NIL']
replacelist = [' ']
s = ifile.read()
for item, replacement in zip(findlist, replacelist):
s = s.replace(item, replacement)
ofile.write(s)
We could add 'with' clauses and use a dict to make replacements more clear
replace_this = { 'NIL': ' '}
with open('outfile') as ifile, open('pp', 'w') as ofile:
s = ifile.read()
for item, replacement in replace_this.items:
s = s.replace(item, replacement)
ofile.write(s)
The only real problem now is that it also changes things like "NILIST" to "IST". If this is a csv with all numbers except for "NIL", that's not a problem. But you could also use the csv module to only change cells that are exactly "NIL".
with open('outfile') as ifile, open('pp', 'w') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile)
for row in reader:
# row is a list of columns. The following builds a new list
# while checking and changing any column that is 'NIL'.
writer.writerow([c if c.strip() != 'NIL' else ' '
for c in row])

Categories