csv to txt python conversion script leaves file with 0 bytes - python

I have a SSIS Loop package that is calling a python script multiple times.
The intent:
There is a folder of csv files. I need them converted to pipe-delimited text files. Some of the files have bad rows in them. The python script converts the csv files into the pipe files while removing the bad records.
the python code:
import csv
import sys
if len(sys.argv) != 4:
print(sys.argv)
sys.exit("usage: python csvtopipe.py <<SOURCE.csv>> <<TARGET.txt>> <<number of columns>>")
source = sys.argv[1]
target = sys.argv[2]
colcount = sys.argv[3]
file_comma = open(source, "r", encoding="unicode_escape")
reader_comma = csv.reader(file_comma, delimiter=',')
file_pipe = open(target, 'w', encoding="utf-8")
writer_pipe = csv.writer(file_pipe, delimiter='|', lineterminator='\n')
for row in reader_comma:
if len(row) == int(colcount):
print("write this..")
writer_pipe.writerow(row)
file_pipe.close()
file_comma.close()
The SSIS Package:
The python call from SSIS:
python csvtopipe.py <<SOURCE.csv>> <<TARGET.txt>> <<number of columns>>
The problem.
The loop works correctly, but when the individual call finishes, the file re-writes to 0 bytes. I can't tell if it's a SSIS problem or a python problem.
THanks!
UPDATE 1
This is the original version of the code. same result:
import csv
import sys
if len(sys.argv) != 4:
print(sys.argv)
sys.exit("usage: python csvtopipe.py <<SOURCE.csv>> <<TARGET.txt>> <<number of columns>>")
source = sys.argv[1]
target = sys.argv[2]
colcount = sys.argv[3]
with open(source, "r", encoding="unicode_escape") as file_comma:
reader_comma = csv.reader(file_comma, delimiter=',')
with open(target, 'w', encoding="utf-8") as file_pipe:
writer_pipe = csv.writer(file_pipe, delimiter='|', lineterminator='\n')
for row in reader_comma:
if len(row) == int(colcount):
print("write")
writer_pipe.writerow(row)

Firstly I would switch to using with open()... rather then separate open() and close() functions. This will help to ensure that the file is automatically closed in the event of a problem.
As the script is being invoked multiple times, I would add a timestamp to your output filename. This would help to ensure that each time it is run, a different file is produced.
Lastly, you could add a test to ensure that only one copy of the script is executed at the same time. For Windows based applications this can be done using a Windows Mutex. On Linux, the use of a file lock can be used. This approach is sometimes referred to as the singleton pattern.
import win32event
import win32api
from winerror import ERROR_ALREADY_EXISTS
from datetime import datetime
import csv
import sys
import os
import time
if len(sys.argv) != 4:
print(sys.argv)
sys.exit("usage: python csvtopipe.py <<SOURCE.csv>> <<TARGET.txt>> <<number of columns>>")
# Wait up to 30 seconds for another copy of the script to stop running
windows_mutex = win32event.CreateMutex(None, False, 'CSV2PIPE')
win32event.WaitForSingleObject(windows_mutex, 30000)
source = sys.argv[1]
target = sys.argv[2]
colcount = sys.argv[3]
# Add a filestamp
path, ext = os.path.splitext(target)
timestamp = datetime.now().strftime("%Y_%m_%d %H%M_%S")
target = f'{path}_{timestamp}{ext}'
with open(source, "r", encoding="unicode_escape") as file_comma, \
open(target, 'w', encoding="utf-8") as file_pipe:
reader_comma = csv.reader(file_comma, delimiter=',')
writer_pipe = csv.writer(file_pipe, delimiter='|', lineterminator='\n')
for row in reader_comma:
if len(row) == int(colcount):
print("write this..")
writer_pipe.writerow(row)

Related

Read from file while it is being written to in Python?

I followed the solution proposed here
In order to test it, I used two programs, writer.py and reader.py respectively.
# writer.py
import time
with open('pipe.txt', 'w', encoding = 'utf-8') as f:
i = 0
while True:
f.write('{}'.format(i))
print('I wrote {}'.format(i))
time.sleep(3)
i += 1
# reader.py
import time, os
#Set the filename and open the file
filename = 'pipe.txt'
file = open(filename, 'r', encoding = 'utf-8')
#Find the size of the file and move to the end
st_results = os.stat(filename)
st_size = st_results[6]
file.seek(st_size)
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print(line)
But when I run:
> python writer.py
> python reader.py
the reader will print the lines after the writer has exited (when I kill the process)
Is there any other way around to read the contents the time they are being written ?
[EDIT]
The program that actually writes to the file is an .exe application and I don't have access to the source code.
You need to flush your writes/prints to files, or they'll default to being block-buffered (so you'd have to write several kilobytes before the user mode buffer would actually be sent to the OS for writing).
Simplest solution is to call .flush for after write calls:
f.write('{}'.format(i))
f.flush()
There are 2 different problems here:
OS and file system must allow concurrent accesses to a file. If you get no error it is the case, but on some systems it could be disallowed
The writer must flush its output to have it reach the disk so that the reader can find it. If you do not, the output stays in in memory buffer until those buffers are full which can require several kbytes
So writer should become:
# writer.py
import time
with open('pipe.txt', 'w', encoding = 'utf-8') as f:
i = 0
while True:
f.write('{}'.format(i))
f.flush()
print('I wrote {}'.format(i))
time.sleep(3)
i += 1

Why does my application close immediately after launching

I am a beginner to python language. I have two questions.
When I run auto-py-to-exe my application does not work correctly.
Up to line 8 of my code I want user to select the .csv file to upload to the py program. After running the auto-py-to-exe, the application closes and does not let the user select the file to upload.
What am I missing in my code to have the application stay open until the user selects the required file, then run the rest of the code?
when running this py.file in pycharm it works as it should.
This next question is un-related t this topic but...
How to i pass a filename dynamically?
In line 33 of my code I will be writing a specially formatted .csv as result.csv to a local dir. How do I pass the variable "filename" as a filename in the path statement in line 33.
df.to_csv('C:\WriteFilesHere\ result.csv')
When I run the program it always overwrites "result.csv" with the new file selected in line 8 of my code. I want a new file create with a different name every time I run the program.
from tkinter import *
import pandas as pd
import csv
from tkinter.filedialog import askopenfilename
Tk().withdraw()
filename = askopenfilename()
print(filename)
with open('preliminary.csv', 'w') as output:
with open(filename) as csv_file:
output_data = csv.writer(output, delimiter=",")
data = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in data:
if line_count == 0:
output_data.writerow(row)
line_count += 1
else:
output_data.writerow(row)
line_count += 1
with open("preliminary.csv") as readfile:
reader1 = csv.reader(readfile)
read =[]
for row in reader1:
if len(row) !=0:
read = read +[row]
readfile.close()
pd.set_option('display.max_rows',4000)
df = pd.DataFrame(read)
print(df)
df.to_csv('C:\WriteFilesHere\ result.csv')
About the exe, you can declare a variable and add at the end of your program:
DonotClose=input("press close to exit")
It will pause the program, you can use any variable you like.

Terminal in PyChram not showing me an output

This is my test code, but I have a more elaborate one - but they both don't work. In python 3.x.
import sys
def main():
inputfile = 'hi'
print(inputfile)
if __name__ == '__main__':
main()
EDIT: This what I want to use the terminal for (and syntax errors - same problem):
import csv
import sys
import json
inputfile = sys.argv[1]
outputfile = sys.argv[2]
# reading the csv
with open(inputfile, 'r') as inhandle: # r is reading while w is writing
reader = csv.DictReader(inhandle)
data = []
for row in reader:
data.append(row)
print(data)
# writing the json
with open(outputfile, "W") as outhandle:
json.dump(data, outhandle, indent=2)
As far as I understood by the code you've attached, hi must be wrote as 'hi'. In your original code, hi is regarded as another variable being assigned to inputfile, but it's not defined yet.

Exit part of a script in Spyder

I am working in a simple task of appending and adding an extra column to multiple CSV files.
The following code works perfectly in Python prompt shell:
import csv
import glob
import os
data_path = "C:/Users/mmorenozam/Documents/Python Scripts/peptidetestivory/"
outfile_path = "C:/Users/mmorenozam/Documents/Python Scripts/peptidetestivory/alldata.csv"
filewriter = csv.writer(open(outfile_path,'wb'))
file_counter = 0
for input_file in glob.glob(os.path.join(data_path,'*.csv')):
with open(input_file,'rU') as csv_file:
filereader = csv.reader(csv_file)
name,ext = os.path.splitext(input_file)
ident = name[-29:-17]
for i, row in enumerate(filereader):
row.append(ident)
filewriter.writerow(row)
file_counter += 1
However, when I run this code using Spyder, in order to have the desired .csv file, I have to add
exit()
or type in the IPython console "%reset".
Is there a better way to finish this part of the script? because the following parts of my code work with the .csv file generated in this part, and using the options above is annoying

Add file name as last column of CSV file

I have a Python script which modifies a CSV file to add the filename as the last column:
import sys
import glob
for filename in glob.glob(sys.argv[1]):
file = open(filename)
data = [line.rstrip() + "," + filename for line in file]
file.close()
file = open(filename, "w")
file.write("\n".join(data))
file.close()
Unfortunately, it also adds the filename to the header (first) row of the file. I would like the string "ID" added to the header instead. Can anybody suggest how I could do this?
Have a look at the official csv module.
Here are a few minor notes on your current code:
It's a bad idea to use file as a variable name, since that shadows the built-in type.
You can close the file objects automatically by using the with syntax.
Don't you want to add an extra column in the header line, called something like Filename, rather than just omitting a column in the first row?
If your filenames have commas (or, less probably, newlines) in them, you'll need to make sure that the filename is quoted - just appending it won't do.
That last consideration would incline me to use the csv module instead, which will deal with the quoting and unquoting for you. For example, you could try something like the following code:
import glob
import csv
import sys
for filename in glob.glob(sys.argv[1]):
data = []
with open(filename) as finput:
for i, row in enumerate(csv.reader(finput)):
to_append = "Filename" if i == 0 else filename
data.append(row+[to_append])
with open(filename,'wb') as foutput:
writer = csv.writer(foutput)
for row in data:
writer.writerow(row)
That may quote the data slightly differently from your input file, so you might want to play with the quoting options for csv.reader and csv.writer described in the documentation for the csv module.
As a further point, you might have good reasons for taking a glob as a parameter rather than just the files on the command line, but it's a bit surprising - you'll have to call your script as ./whatever.py '*.csv' rather than just ./whatever.py *.csv. Instead, you could just do:
for filename in sys.argv[1:]:
... and let the shell expand your glob before the script knows anything about it.
One last thing - the current approach you're taking is slightly dangerous, in that if anything fails when writing back to the same filename, you'll lose data. The standard way of avoiding this is to instead write to a temporary file, and, if that was successful, rename the temporary file over the original. So, you might rewrite the whole thing as:
import csv
import sys
import tempfile
import shutil
for filename in sys.argv[1:]:
tmp = tempfile.NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name,'wb') as ftmp:
writer = csv.writer(ftmp)
for i, row in enumerate(csv.reader(finput)):
to_append = "Filename" if i == 0 else filename
writer.writerow(row+[to_append])
shutil.move(tmp.name,filename)
You can try:
data = [file.readline().rstrip() + ",id"]
data += [line.rstrip() + "," + filename for line in file]
You can try changing your code, but using the csv module is recommended. This should give you the result you want:
import sys
import glob
import csv
filename = glob.glob(sys.argv[1])[0]
yourfile = csv.reader(open(filename, 'rw'))
csv_output=[]
for row in yourfile:
if len(csv_output) != 0: # skip the header
row.append(filename)
csv_output.append(row)
yourfile = csv.writer(open(filename,'w'),delimiter=',')
yourfile.writerows(csv_output)
Use the CSV module that comes with Python.
import csv
import sys
def process_file(filename):
# Read the contents of the file into a list of lines.
f = open(filename, 'r')
contents = f.readlines()
f.close()
# Use a CSV reader to parse the contents.
reader = csv.reader(contents)
# Open the output and create a CSV writer for it.
f = open(filename, 'wb')
writer = csv.writer(f)
# Process the header.
header = reader.next()
header.append('ID')
writer.writerow(header)
# Process each row of the body.
for row in reader:
row.append(filename)
writer.writerow(row)
# Close the file and we're done.
f.close()
# Run the function on all command-line arguments. Note that this does no
# checking for things such as file existence or permissions.
map(process_file, sys.argv[1:])
You can run this as follows:
blair#blair-eeepc:~$ python csv_add_filename.py file1.csv file2.csv
you can use fileinput to do in place editing
import sys
import glob
import fileinput
for filename in glob.glob(sys.argv[1]):
for line in fileinput.FileInput(filename,inplace=1) :
if fileinput.lineno()==1:
print line.rstrip() + " ID"
else
print line.rstrip() + "," + filename

Categories