I am trying to read the first column of my CSV, run a web-service using this column, take the output from this and append it to my CSV. I'd like to do this on a line-by-line basis.
Here is what I have come up with so far :
loadData = lambda f: np.genfromtxt(open(f,'r'), delimiter='\n')
with open('FinalCSV.csv','rb') as tsvin, open('FinalCSV.csv', 'a+b') as csvout:
tsvin = list(np.array(p.read_table('train.tsv'))[:,0])
writer = csv.writer(csvout)
count = 0
for row in csvout:
sep = '|'
row = row.split(sep, 1)[0]
cmd = subprocess.Popen("python GetJustAlexaRanking.py " + row ,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True)
(output, err) = cmd.communicate()
exit_code = cmd.wait()
outlist = output.split('\r\n')
try:
outrank1 = outlist[1][outlist[1].index(':')+1:]
except ValueError:
outrank1 = "?"
row.append(str(outrank1).rstrip()) #writing,error here
print [str(outlist[0]).rstrip(), str(outrank1).rstrip()]
count+=1
However this is giving me the error that
Traceback (most recent call last):
File "File.py", line 28, in <module>
row.append(str(outrank1).rstrip()) #writing,error here
AttributeError: 'str' object has no attribute 'append'
How can I accomplish what I wish to do?
Edit :
loadData = lambda f: np.genfromtxt(open(f,'r'), delimiter='\n')
with open('FinalCSV.csv','rb') as tsvread, open('FinalCSVFin.csv', 'wb') as csvout:
tsvin = list(np.array(p.read_table('train.tsv'))[:,0])
writer = csv.writer(csvout)
count = 0
for row in tsvread:
sep = '|'
row = row.split(sep, 1)[0]
cmd = subprocess.Popen("python GetJustAlexaRanking.py " + row ,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True)
(output, err) = cmd.communicate()
exit_code = cmd.wait()
outlist = output.split('\r\n')
try:
outrank1 = outlist[1][outlist[1].index(':')+1:]
except ValueError:
outrank1 = "?"
row = [row, outrank1.rstrip()]
writer.writerow(row)
print [str(outlist[0]).rstrip(), str(outrank1).rstrip()]
count+=1
Your row is not a list, but a string:
row = row.split(sep, 1)[0]
You then use that string in a subprocess command.
You'll need to make it a list again; instead of append, use:
row = [row, outrank1.rstrip()]
where outrank1 is always a string anyway, no need to call str() on it.
Note that if you are trying to both read from and write to the csvout file handle, you'll have to be very careful about your read-write position. You cannot just write to a file handle and hope to replace existing data for example. Best to use a separate, new file to write to and have that replace the old file location by moving one over the other.
Related
I am writing a code in python where I am removing all the text after a specific word but in output lines are missing. I have a text file in unicode which have 3 lines:
my name is test1
my name is
my name is test 2
What I want is to remove text after word "test" so I could get the output as below
my name is test
my name is
my name is test
I have written a code but it does the task but also removes the second line "my name is"
My code is below
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
with open(r"test.txt", "w") as fp:
fp.write(txt)
It looks like if there is no keyword found the index become -1.
So you are avoiding the lines w/o keyword.
I would modify your if by adding the condition as follows:
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
elif index < 0:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
No need to add \n because the line already contains it.
Your code does not append the line if the splitStr is not defined.
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index != -1:
txt += line[:index + len(splitStr)] + "\n"
else:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
In my solution I simulate the input file via io.StringIO. Compared to your code my solution remove the else branch and only use one += operater. Also splitStr is set only one time and not on each iteration. This makes the code more clear and reduces possible errore sources.
import io
# simulates a file for this example
the_file = io.StringIO("""my name is test1
my name is
my name is test 2""")
txt = ""
splitStr = "test"
with the_file as fp:
# each line
for line in fp.readlines():
# cut somoething?
if splitStr in line:
# find index
index = line.find(splitStr)
# cut after 'splitStr' and add newline
line = line[:index + len(splitStr)] + "\n"
# append line to output
txt += line
print(txt)
When handling with files in Python 3 it is recommended to use pathlib for that like this.
import pathlib
file_path = pathlib.Path("test.txt")
# read from wile
with file_path.open('r') as fp:
# do something
# write back to the file
with file_path.open('w') as fp:
# do something
Suggestion:
for line in fp.readlines():
i = line.find('test')
if i != -1:
line = line[:i]
I am working on creating a program to concatenate rows within a file. Each file has a header, datarows labeled DAT001 to DAT113 and a trailer. Each line of concatenated rows will have DAT001 to DAT100 and 102-113 is optional. I need to print the header, concatenating DAT001-113 and when the file finds a row with DAT001 I need to start a new line concatenating DAT001-113 again. After that is all done, I will print the trailer. I have an IF statement started but it only writes the header and skips all other logic. I apologize that this is very basic - but I am struggling with reading rows over and over again without knowing how long the file might be.
I have tried the below code but it won't read or print after the header.
import pandas as pd
destinationFile = "./destination-file.csv"
sourceFile = "./TEST.txt"
header = "RHR"
data = "DPSPOS"
beg_data = "DAT001"
data2 = "DAT002"
data3 = "DAT003"
data4 = "DAT004"
data5 = "DAT005"
data6 = "DAT006"
data7 = "DAT007"
data8 = "DAT008"
data100 = "DAT100"
data101 = "DAT101"
data102 = "DAT102"
data103 = "DAT103"
data104 = "DAT104"
data105 = "DAT105"
data106 = "DAT106"
data107 = "DAT107"
data108 = "DAT108"
data109 = "DAT109"
data110 = "DAT110"
data111 = "DAT111"
data112 = "DAT112"
data113 = "DAT113"
req_data = ''
opt101 = ''
opt102 = ''
with open(sourceFile) as Tst:
for line in Tst.read().split("\n"):
if header in line:
with open(destinationFile, "w+") as dst:
dst.write(line)
elif data in line:
if beg_data in line:
req_data = line+line+line+line+line+line+line+line+line
if data101 in line:
opt101 = line
if data102 in line:
opt102 = line
new_line = pd.concat(req_data,opt101,opt102)
with open(destinationFile, "w+") as dst:
dst.write(new_line)
else:
if trailer in line:
with open(destinationFile, "w+") as dst:
dst.write(line)
Just open the output file once for the whole loop, not every time through the loop.
Check whether the line begins with DAT101. If it does, write the trailer to the current line and start a new line by printing the header.
Then for every line that begins with DAT, write it to the file in the current line.
first_line = True
with open(sourceFile) as Tst, open(destinationFile, "w+") as dst:
for line in Tst.read().split("\n"):
# start a new line when reading DAT101
if line.startswith(beg_data):
if not first_line: # need to end the current line
dst.write(trailer + '\n')
first_line = False
dst.write(header)
# copy all the lines that begin with `DAT`
if line.startswith('DAT'):
dst.write(line)
# end the last line
dst.write(trailer + '\n')
See if the following code helps make progress. It was not tested because no
Minimum Runnable Example is provided.
with open(destinationFile, "a") as dst:
# The above will keep the file open until after all the indented code runs
with open(sourceFile) as Tst:
# The above will keep the file open until after all the indented code runs
for line in Tst.read().split("\n"):
if header in line:
dst.write(line)
elif data in line:
if beg_data in line:
req_data = line + line + line + line + line + line + line + line + line
if data101 in line:
opt101 = line
if data102 in line:
opt102 = line
new_line = pd.concat(req_data, opt101, opt102)
dst.write(new_line)
else:
if trailer in line:
dst.write(line)
# With is a context manager which will automatically close the files.
I have a script that outputs a text file (Mod_From_SCRSTXT.txt). I need to delete the first line of that file.
I have tried changing the last line of the find function shown below. The first line still get printed in the new file created even with the changes.
def find(substr, infile, outfile):
with open(infile) as a, open(outfile, 'a') as b:
for line in a:
if substr in line:
b.write(line[1:])
srcn_path1 = input(" Enter Path. Example: U:\...\...\SRCNx\SCRS.TXT\n" +
" Enter SRCS.TXT's Path: ")
print ()
scrNumber1 = input(' Enter SCR number: ')
print ()
def find(substr, infile, outfile):
with open(infile) as a, open(outfile, 'a') as b:
for line in a:
if substr in line:
b.write(line) # or (line + '\n')
# action station:
find(scrNumber1, srcn_path1, 'Mod_From_SCRSTXT.txt')
Actual result:
VSOAU-0004 16999
VSOAU-0004
VSOAU-0004
VSOAU-0004
VSOAU-0004
Expected result:
VSOAU-0004
VSOAU-0004
VSOAU-0004
VSOAU-0004
You'll want to make a minor adjustment:
You can either count the lines in the file:
numberOfLines = 0
for line in file:
numberOfLines += 1
for line in range(1, linesInFile + 1):
Or you can ignore the first line through many different ways, this being a simple one:
ignoredLine = 0
for line in file:
if not ignoredLine:
ignoredLine = 1
else:
#Do stuff with the other lines
import pathlib
import os
import copy
import io
def delete_first_line(read_path):
try:
read_path = pathlib.Path(str(read_path))
write_path = str(copy.copy(read_path)) + ".temp"
while os.path.exists(write_path):
write_path = write_path + ".temp"
with open(read_path , mode = "r") as inf:
with open(write_path, mode="w") as outf:
it_inf = iter(inf)
next(it_inf) # discard first line
for line in it_inf:
print(line, file = outf)
os.remove(read_path)
os.rename(write_path, read_path)
except StopIteration:
with io.StringIO() as string_stream:
print(
"Cannot remove first line from an empty file",
read_path,
file = string_stream,
sep = "\n"
)
msg = string_stream.getvalue()
raise ValueError(msg)
except FileNotFoundError:
with io.StringIO() as string_stream:
print(
"Cannot remove first line from non-existant file",
read_path,
file = string_stream,
sep = "\n"
)
msg = string_stream.getvalue()
raise ValueError(msg)
finally:
pass
return
I am writing a Python script to read in a file, read through that file line by line and parse out data from that file to another text file via user command line argument. Right now, I am able to read the input file line by line and parse out the data via command line argument. However, the output file that I am trying to write to print all in one line rather than break the output line by line.
temp.log:
06 May 19 03:40:35 3 abCodeClearTrap Error Clear Trap (agent: 12367a12,
chassis:12367a12, ErrIdText: ERROR ID TEXT, csssi: EXTIFG, clearedID:
0x089088394)
06 May 19 03:44:35 3 abCodeErrorTrap Error Trap (agent: 12368a15, chassis:
12368a15, ErrIdText: Skip this item, csssi: SSRSSR, clearedID:
0x089088394)
My code:
import re, sys
with open('temp.log') as f:
lines = f.readlines()
with open('output.txt') as o:
data = []
for line in lines:
if 'date' in sys.argv:
try:
date = re.match(r'\date{2} \w+ \date{2}', line).group()
row.append(date)
except:
date = 'date'
if 'agent' in sys.argv:
try:
agent = re.search(r'agent:\s(.*?),', line).group()
row.append(agent)
except:
agent = 'agent:'
if 'err' in sys.argv:
try:
errID = re.search(r'ErrIdText:\s(.*?),', line).group()
row.append(errID)
except:
errID = 'ErrIdText:'
if 'clear' in sys.argv:
try:
clear = re.search(r'clearedID:\s(.*?)\)', line).group()
row.append(clear)
except:
clear = 'clearedID:'
row = []
data.append(row)
for row in data:
lines = o.writelines(row)
print(row)
o.close()
There is no error message but I want my output.txt file to break down line by line.
For example:
If the user run:
python export.py agent chassis
I expect the output.txt to print
['agent: 12367a12,', 'chassis:12367a12,']
['agent: 12368a15,', 'chassis:12368a15,']
But the output in the output.txt is:
agent:12367a12, chassis:12367a12, agent:12368a15, chassis:12368a15,
Here you go :)
for row in data:
lines = o.writelines(row)
lines = o.writelines("\n")
print(row)
or
for row in data:
row.append("\n")
lines = o.writelines(row)
print(row)
btw I am surprised that this code works ... because of you have defined row after you are using it
// ( * )
if something
try:
date = re.match(r'\date{2} \w+ \date{2}', line).group()
row.append(date) // should crash
except:
date = 'date'
if 'agent' in sys.argv:
try:
agent = re.search(r'agent:\s(.*?),', line).group()
row.append(agent) // should crash
except:
agent = 'agent:'
if 'err' in sys.argv:
try:
errID = re.search(r'ErrIdText:\s(.*?),', line).group()
row.append(errID) // should crash
except:
errID = 'ErrIdText:'
if 'clear' in sys.argv:
try:
clear = re.search(r'clearedID:\s(.*?)\)', line).group()
row.append(clear) // should crash
except:
clear = 'clearedID:'
row = [] // this should be defined where I put the star ( * )
data.append(row) // always appends empty row ( [] )
I am doing text processing and using 'readline()' function as follows:
ifd = open(...)
for line in ifd:
while (condition)
do something...
line = ifd.readline()
condition = ....
#Here when the condition becomes false I need to rewind the pointer so that the 'for' loop read the same line again.
ifd.fseek() followed by readline is giving me a '\n' character. How to rewind the pointer so that the whole line is read again.
>>> ifd.seek(-1,1)
>>> line = ifd.readline()
>>> line
'\n'
Here is my code
labtestnames = sorted(tmp)
#Now read each line in the inFile and write into outFile
ifd = open(inFile, "r")
ofd = open(outFile, "w")
#read the header
header = ifd.readline() #Do nothing with this line. Skip
#Write header into the output file
nl = "mrn\tspecimen_id\tlab_number\tlogin_dt\tfluid"
offset = len(nl.split("\t"))
nl = nl + "\t" + "\t".join(labtestnames)
ofd.write(nl+"\n")
lenFields = len(nl.split("\t"))
print "Reading the input file and converting into modified file for further processing (correlation analysis etc..)"
prevTup = (0,0,0)
rowComplete = 0
k=0
for line in ifd:
k=k+1
if (k==200): break
items = line.rstrip("\n").split("\t")
if((items[0] =='')):
continue
newline= list('' for i in range(lenFields))
newline[0],newline[1],newline[3],newline[2],newline[4] = items[0], items[1], items[3], items[2], items[4]
ltests = []
ltvals = []
while(cmp(prevTup, (items[0], items[1], items[3])) == 0): # If the same mrn, lab_number and specimen_id then fill the same row. else create a new row.
ltests.append(items[6])
ltvals.append(items[7])
pos = ifd.tell()
line = ifd.readline()
prevTup = (items[0], items[1], items[3])
items = line.rstrip("\n").split("\t")
rowComplete = 1
if (rowComplete == 1): #If the row is completed, prepare newline and write into outfile
indices = [labtestnames.index(x) for x in ltests]
j=0
ifd.seek(pos)
for i in indices:
newline[i+offset] = ltvals[j]
j=j+1
if (rowComplete == 0): #
currTup = (items[0], items[1], items[3])
ltests = items[6]
ltvals = items[7]
pos = ifd.tell()
line = ifd.readline()
items = line.rstrip("\n").split("\t")
newTup = (items[0], items[1], items[3])
if(cmp(currTup, newTup) == 0):
prevTup = currTup
ifd.seek(pos)
continue
else:
indices = labtestnames.index(ltests)
newline[indices+offset] = ltvals
ofd.write(newline+"\n")
The problem can be handled more simply using itertools.groupby. groupby can cluster all the contiguous lines that deal with the same mrn, specimen_id, and lab_num.
The code that does this is
for key, group in IT.groupby(reader, key = mykey):
where reader iterates over the lines of the input file, and mykey is defined by
def mykey(row):
return (row['mrn'], row['specimen_id'], row['lab_num'])
Each row from reader is passed to mykey, and all rows with the same key are clustered together in the same group.
While we're at it, we might as well use the csv module to read each line into a dict (which I call row). This frees us from having to deal with low-level string manipulation like line.rstrip("\n").split("\t") and instead of referring to columns by index numbers (e.g. row[3]) we can write code that speaks in higher-level terms such as row['lab_num'].
import itertools as IT
import csv
inFile = 'curious.dat'
outFile = 'curious.out'
def mykey(row):
return (row['mrn'], row['specimen_id'], row['lab_num'])
fieldnames = 'mrn specimen_id date lab_num Bilirubin Lipase Calcium Magnesium Phosphate'.split()
with open(inFile, 'rb') as ifd:
reader = csv.DictReader(ifd, delimiter = '\t')
with open(outFile, 'wb') as ofd:
writer = csv.DictWriter(
ofd, fieldnames, delimiter = '\t', lineterminator = '\n', )
writer.writeheader()
for key, group in IT.groupby(reader, key = mykey):
new = {}
row = next(group)
for key in ('mrn', 'specimen_id', 'date', 'lab_num'):
new[key] = row[key]
new[row['labtest']] = row['result_val']
for row in group:
new[row['labtest']] = row['result_val']
writer.writerow(new)
yields
mrn specimen_id date lab_num Bilirubin Lipase Calcium Magnesium Phosphate
4419529 1614487 26.2675 5802791G 0.1
3319529 1614487 26.2675 5802791G 0.3 153 8.1 2.1 4
5713871 682571 56.0779 9732266E 4.1
This seems to be a perfect use case for yield expressions. Consider the following example that prints lines from a file, repeating some of them at random:
def buflines(fp):
r = None
while True:
r = yield r or next(fp)
if r:
yield None
from random import randint
with open('filename') as fp:
buf = buflines(fp)
for line in buf:
print line
if randint(1, 100) > 80:
print 'ONCE AGAIN::'
buf.send(line)
Basically, if you want to process an item once again, you send it back to the generator. On the next iteration you will be reading the same item once again.