Parsing csv file into Boolean expression in Python - python

I've got a csv file such as:
cutsets
x1
x3,x5
x2
x4,x6
x5,x7
x6,x8
x7,x9
x6,x8,x10
I run the following Py script:
import csv
# Reads Boolean expression from cutsets file
expr = []
with open("MCS_overlap.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file)
# skip the first row
next(csv_reader)
for lines in csv_reader:
expr = expr + lines + ['|']
del expr[-1]
final_expr=str(''.join(expr)).replace(",","&")
print("The Boolean expression is")
print(final_expr)
and get the output:
The Boolean expression is
x1|x3x5|x2|x4x6|x5x7|x6x8|x7x9|x6x8x10
With final_expr=str(''.join(expr)).replace(",","&") I was hoping to get a "&" between any two variables enclosed by a "|", e.g. "x4&x6","x6&x8&x10". But as can be seen the variables were simply concatenated. How do I accomplish insert "&" given I cannot change the format of the input file?
Thanks
Gui

Here you go:
expr =[]
f = open('MCS_overlap.csv')
expr.append(f.read())
final_expr = expr[0].replace('\n', '|').replace(',', '&')
print(final_expr)
Prints:
'x1|x3&x5|x2|x4&x6|x5&x7|x6&x8|x7&x9|x6&x8&x10'

Because you are using csv module, lines is a list and as a result expr is a list with elements being all x-es and some pipes |. You can print to see for yourself. When you do ''.join(expr) it just concatenates all elements, no commas (i.e. nothing to replace).
this should do
import csv
# Reads Boolean expression from cutsets file
with open("MCS_overlap.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file)
# skip the first row
next(csv_reader)
lines = ('&'.join(line) for line in csv_reader)
final_expr = '|'.join(lines)
print(final_expr)
of course, you can do without csv module
with open("MCS_overlap.csv", "r") as csv_file:
next(csv_file)
lines = (line.strip().replace(',', "&") for line in csv_file)
final_expr = '|'.join(lines)
print(final_expr)
Note, both snippets not tested, but I expect to do the task for you.

Related

python parsing string to csv format

I have a file containing a line with the following format
aaa=A;bbb=B;ccc=C
I want to convert it to a csv format so the literals on the equation sides will be columns and the semicolon as a row separator. I tried doing something like this
f = open("aaa.txt", "r")
with open("ccc.csv", 'w') as csvFile:
writer = csv.writer(csvFile)
rows = []
if f.mode == 'r':
single = f.readline()
lns = single.split(";")
for item in lns:
rows.append(item.replace("=", ","))
writer.writerows(rows)
f.close()
csvFile.close()
but I am getting each letter as a column so the result looks like :
a,a,a,",",A
b,b,b,",",B
c,c,c,",",C,"
The expected result should look like
aaa,A
bbb,B
ccc,C
The following 1 line change worked for me:
rows.append(item.split('='))
instead of the existing code
rows.append(item.replace("=", ",")).
That way, I was able to create a list of lists which can easily be read by the writer so that the row list looks like [['aaa', 'A'], ['bbb', 'B'], ['ccc', 'C']]instead of ['aaa,A', 'bbb,B', 'ccc,C']
Just write the strings into the target file line by line:
import os
f = open("aaa.txt", "r")
with open("ccc.csv", 'w') as csvFile:
single = f.readline()
lns = single.split(";")
for item in lns:
csvFile.write(item.replace("=", ",") + os.linesep)
f.close()
The output would be:
aaa,A
bbb,B
ccc,C
It helps to interactively execute the commands and print the values, or add debug print in the code (that will be removed or commented when everything works). Here you could have seen that rows is ['aaa,A', 'bbb,B', 'ccc,C'] that is 3 strings when it should be three sequences.
As a string is a (read only) sequence of chars writerows uses each char as a field.
So you do not want to replace the = with a comma (,), but want to split on the equal sign:
...
for item in lns:
rows.append(item.split("=", 1))
...
But the csv module requires for proper operation the output file to be opened with newline=''.
So you should have:
with open("ccc.csv", 'w', newline='') as csvFile:
...
The parameter to writer.writerows() must be an iterable of rows, which must in turn be iterables of strings or numbers. Since you pass it a list of strings, characters in the strings are treated as separate fields. You can obtain the proper list of rows by splitting the line first on ';', then on '=':
import csv
with open('in.txt') as in_file, open('out.csv', 'w') as out_file:
writer = csv.writer(out_file)
line = next(in_file).rstrip('\n')
rows = [item.split('=') for item in line.split(';')]
writer.writerows(rows)

Printing matching Regex(re class) not working while looping through a CSV file? (Using Python 3.6 on Win 10)

I am trying to use Regex to return Canadian postal codes through a each line of a CSV file.
Environment: Python 3.6 on Win 10. Code tested through Jupyter Notebook and through the Win 10 CLI prompt.
The problem is that I can't seem to get the object to return the string when found using a FOR LOOP through a CSV file.
Using re through a list works fine:
import re
address = [ 'H1T3R9',
'/a/b/c/la_seg_x005_y003.npy',
'H1K 3H3',
'F2R2V2',
'H1L 3W6',
'j1r 4v5',
'/y',
'h2r 2x8',
'J9R 5V9',
'Non disponible, h2r 2x8, montreal']
# I also tried this one at some point,# r'^((\d{5}-\d{4})|(\d{5})|([AaBbCcEeGgHhJjKkLlMmNnPpRrSsTtVvXxYy]\d[A-Za-z]\s?\d[A-Za-z]\d))$))
regex = re.compile(r'\b[a-z]\d[a-z]\s\d[a-z]\d\b')
goodPostalCode = filter(regex.search, address)
print(*goodPostalCode)
Output:
j1r 4v5 h2r 2x8 Non disponible, h2r 2x8, montreal
But when adding the CSV component it seems to break.
import re
import csv
with open('data.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
#print(row)
regex = re.compile(r'\b[a-z]\d[a-z]\s\d[a-z]\d\b')
postcode = filter(regex.search, row[7])
print(postcode)
Output:
<filter object at 0x000001E4FA70D908>
The object filter object seems to be found every iteration
My understanding was that I could loop through a CSV as each line would return a list or a tuple, then I could use *re to find matching patterns in the string at a specific column using its index.
Where do I go wrong here?
You shouldn't need to use filter in the loop, since the value of row[7] is a string, not a list of strings.
codes = []
with open('data.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
if regex.search(row[7]):
codes.append(row[7])
Alternatively, you could create a list of the lines first, and then run filter
with open('data.csv', newline='') as f:
reader = csv.reader(f)
lines = [row[7] for row in reader]
regex = re.compile(r'\b[a-z]\d[a-z]\s\d[a-z]\d\b')
goodPostalCode = filter(regex.search, lines)

python 3 csv reader + Ignore empty records [duplicate]

This is my code i am able to print each line but when blank line appears it prints ; because of CSV file format, so i want to skip when blank line appears
import csv
import time
ifile = open ("C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv", "rb")
for line in csv.reader(ifile):
if not line:
empty_lines += 1
continue
print line
If you want to skip all whitespace lines, you should use this test: ' '.isspace().
Since you may want to do something more complicated than just printing the non-blank lines to the console(no need to use CSV module for that), here is an example that involves a DictReader:
#!/usr/bin/env python
# Tested with Python 2.7
# I prefer this style of importing - hides the csv module
# in case you do from this_file.py import * inside of __init__.py
import csv as _csv
# Real comments are more complicated ...
def is_comment(line):
return line.startswith('#')
# Kind of sily wrapper
def is_whitespace(line):
return line.isspace()
def iter_filtered(in_file, *filters):
for line in in_file:
if not any(fltr(line) for fltr in filters):
yield line
# A dis-advantage of this approach is that it requires storing rows in RAM
# However, the largest CSV files I worked with were all under 100 Mb
def read_and_filter_csv(csv_path, *filters):
with open(csv_path, 'rb') as fin:
iter_clean_lines = iter_filtered(fin, *filters)
reader = _csv.DictReader(iter_clean_lines, delimiter=';')
return [row for row in reader]
# Stores all processed lines in RAM
def main_v1(csv_path):
for row in read_and_filter_csv(csv_path, is_comment, is_whitespace):
print(row) # Or do something else with it
# Simpler, less refactored version, does not use with
def main_v2(csv_path):
try:
fin = open(csv_path, 'rb')
reader = _csv.DictReader((line for line in fin if not
line.startswith('#') and not line.isspace()),
delimiter=';')
for row in reader:
print(row) # Or do something else with it
finally:
fin.close()
if __name__ == '__main__':
csv_path = "C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv"
main_v1(csv_path)
print('\n'*3)
main_v2(csv_path)
Instead of
if not line:
This should work:
if not ''.join(line).strip():
my suggestion would be to just use the csv reader who can delimite the file into rows. Like this you can just check whether the row is empty and if so just continue.
import csv
with open('some.csv', 'r') as csvfile:
# the delimiter depends on how your CSV seperates values
csvReader = csv.reader(csvfile, delimiter = '\t')
for row in csvReader:
# check if row is empty
if not (row):
continue
You can always check for the number of comma separated values. It seems to be much more productive and efficient.
When reading the lines iteratively, as these are a list of comma separated values you would be getting a list object. So if there is no element (blank link), then we can make it skip.
with open(filename) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
for row in csv_reader:
if len(row) == 0:
continue
You can strip leading and trailing whitespace, and if the length is zero after that the line is empty.
import csv
with open('userlist.csv') as f:
reader = csv.reader(f)
user_header = next(reader) # Add this line if there the header is
user_list = [] # Create a new user list for input
for row in reader:
if any(row): # Pick up the non-blank row of list
print (row) # Just for verification
user_list.append(row) # Compose all the rest data into the list
This example just prints the data in array form while skipping the empty lines:
import csv
file = open("data.csv", "r")
data = csv.reader(file)
for line in data:
if line: print line
file.close()
I find it much clearer than the other provided examples.
import csv
ifile=csv.reader(open('C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv', 'rb'),delimiter=';')
for line in ifile:
if set(line).pop()=='':
pass
else:
for cell_value in line:
print cell_value

how to find specific string with a substring python

I have similar problem to this guy: find position of a substring in a string
The difference is that I don't know what my "mystr" is. I know my substring but my string in the input file could be random amount of words in any order, but i know one of those words include substring cola.
For example a csv file: fanta,coca_cola,sprite in any order.
If my substring is "cola", then how can I make a code that says
mystr.find('cola')
or
match = re.search(r"[^a-zA-Z](cola)[^a-zA-Z]", mystr)
or
if "cola" in mystr
When I don't know what my "mystr" is?
this is my code:
import csv
with open('first.csv', 'rb') as fp_in, open('second.csv', 'wb') as fp_out:
reader = csv.DictReader(fp_in)
rows = [row for row in reader]
writer = csv.writer(fp_out, delimiter = ',')
writer.writerow(["new_cola"])
def headers1(name):
if "cola" in name:
return row.get("cola")
for row in rows:
writer.writerow([headers1("cola")])
and the first.csv:
fanta,cocacola,banana
0,1,0
1,2,1
so it prints out
new_cola
""
""
when it should print out
new_cola
1
2
Here is a working example:
import csv
with open("first.csv", "rb") as fp_in, open("second.csv", "wb") as fp_out:
reader = csv.DictReader(fp_in)
writer = csv.writer(fp_out, delimiter = ",")
writer.writerow(["new_cola"])
def filter_cola(row):
for k,v in row.iteritems():
if "cola" in k:
yield v
for row in reader:
writer.writerow(list(filter_cola(row)))
Notes:
rows = [row for row in reader] is unnecessary and inefficient (here you convert a generator to list which consumes a lot of memory for huge data)
instead of return row.get("cola") you meant return row.get(name)
in the statement return row.get("cola") you access a variable outside of the current scope
you can also use the unix tool cut. For example:
cut -d "," -f 2 < first.csv > second.csv

how to skip blank line while reading CSV file using python

This is my code i am able to print each line but when blank line appears it prints ; because of CSV file format, so i want to skip when blank line appears
import csv
import time
ifile = open ("C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv", "rb")
for line in csv.reader(ifile):
if not line:
empty_lines += 1
continue
print line
If you want to skip all whitespace lines, you should use this test: ' '.isspace().
Since you may want to do something more complicated than just printing the non-blank lines to the console(no need to use CSV module for that), here is an example that involves a DictReader:
#!/usr/bin/env python
# Tested with Python 2.7
# I prefer this style of importing - hides the csv module
# in case you do from this_file.py import * inside of __init__.py
import csv as _csv
# Real comments are more complicated ...
def is_comment(line):
return line.startswith('#')
# Kind of sily wrapper
def is_whitespace(line):
return line.isspace()
def iter_filtered(in_file, *filters):
for line in in_file:
if not any(fltr(line) for fltr in filters):
yield line
# A dis-advantage of this approach is that it requires storing rows in RAM
# However, the largest CSV files I worked with were all under 100 Mb
def read_and_filter_csv(csv_path, *filters):
with open(csv_path, 'rb') as fin:
iter_clean_lines = iter_filtered(fin, *filters)
reader = _csv.DictReader(iter_clean_lines, delimiter=';')
return [row for row in reader]
# Stores all processed lines in RAM
def main_v1(csv_path):
for row in read_and_filter_csv(csv_path, is_comment, is_whitespace):
print(row) # Or do something else with it
# Simpler, less refactored version, does not use with
def main_v2(csv_path):
try:
fin = open(csv_path, 'rb')
reader = _csv.DictReader((line for line in fin if not
line.startswith('#') and not line.isspace()),
delimiter=';')
for row in reader:
print(row) # Or do something else with it
finally:
fin.close()
if __name__ == '__main__':
csv_path = "C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv"
main_v1(csv_path)
print('\n'*3)
main_v2(csv_path)
Instead of
if not line:
This should work:
if not ''.join(line).strip():
my suggestion would be to just use the csv reader who can delimite the file into rows. Like this you can just check whether the row is empty and if so just continue.
import csv
with open('some.csv', 'r') as csvfile:
# the delimiter depends on how your CSV seperates values
csvReader = csv.reader(csvfile, delimiter = '\t')
for row in csvReader:
# check if row is empty
if not (row):
continue
You can always check for the number of comma separated values. It seems to be much more productive and efficient.
When reading the lines iteratively, as these are a list of comma separated values you would be getting a list object. So if there is no element (blank link), then we can make it skip.
with open(filename) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
for row in csv_reader:
if len(row) == 0:
continue
You can strip leading and trailing whitespace, and if the length is zero after that the line is empty.
import csv
with open('userlist.csv') as f:
reader = csv.reader(f)
user_header = next(reader) # Add this line if there the header is
user_list = [] # Create a new user list for input
for row in reader:
if any(row): # Pick up the non-blank row of list
print (row) # Just for verification
user_list.append(row) # Compose all the rest data into the list
This example just prints the data in array form while skipping the empty lines:
import csv
file = open("data.csv", "r")
data = csv.reader(file)
for line in data:
if line: print line
file.close()
I find it much clearer than the other provided examples.
import csv
ifile=csv.reader(open('C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv', 'rb'),delimiter=';')
for line in ifile:
if set(line).pop()=='':
pass
else:
for cell_value in line:
print cell_value

Categories