I am working on creating a program to concatenate rows within a file. Each file has a header, datarows labeled DAT001 to DAT113 and a trailer. Each line of concatenated rows will have DAT001 to DAT100 and 102-113 is optional. I need to print the header, concatenating DAT001-113 and when the file finds a row with DAT001 I need to start a new line concatenating DAT001-113 again. After that is all done, I will print the trailer. I have an IF statement started but it only writes the header and skips all other logic. I apologize that this is very basic - but I am struggling with reading rows over and over again without knowing how long the file might be.
I have tried the below code but it won't read or print after the header.
import pandas as pd
destinationFile = "./destination-file.csv"
sourceFile = "./TEST.txt"
header = "RHR"
data = "DPSPOS"
beg_data = "DAT001"
data2 = "DAT002"
data3 = "DAT003"
data4 = "DAT004"
data5 = "DAT005"
data6 = "DAT006"
data7 = "DAT007"
data8 = "DAT008"
data100 = "DAT100"
data101 = "DAT101"
data102 = "DAT102"
data103 = "DAT103"
data104 = "DAT104"
data105 = "DAT105"
data106 = "DAT106"
data107 = "DAT107"
data108 = "DAT108"
data109 = "DAT109"
data110 = "DAT110"
data111 = "DAT111"
data112 = "DAT112"
data113 = "DAT113"
req_data = ''
opt101 = ''
opt102 = ''
with open(sourceFile) as Tst:
for line in Tst.read().split("\n"):
if header in line:
with open(destinationFile, "w+") as dst:
dst.write(line)
elif data in line:
if beg_data in line:
req_data = line+line+line+line+line+line+line+line+line
if data101 in line:
opt101 = line
if data102 in line:
opt102 = line
new_line = pd.concat(req_data,opt101,opt102)
with open(destinationFile, "w+") as dst:
dst.write(new_line)
else:
if trailer in line:
with open(destinationFile, "w+") as dst:
dst.write(line)
Just open the output file once for the whole loop, not every time through the loop.
Check whether the line begins with DAT101. If it does, write the trailer to the current line and start a new line by printing the header.
Then for every line that begins with DAT, write it to the file in the current line.
first_line = True
with open(sourceFile) as Tst, open(destinationFile, "w+") as dst:
for line in Tst.read().split("\n"):
# start a new line when reading DAT101
if line.startswith(beg_data):
if not first_line: # need to end the current line
dst.write(trailer + '\n')
first_line = False
dst.write(header)
# copy all the lines that begin with `DAT`
if line.startswith('DAT'):
dst.write(line)
# end the last line
dst.write(trailer + '\n')
See if the following code helps make progress. It was not tested because no
Minimum Runnable Example is provided.
with open(destinationFile, "a") as dst:
# The above will keep the file open until after all the indented code runs
with open(sourceFile) as Tst:
# The above will keep the file open until after all the indented code runs
for line in Tst.read().split("\n"):
if header in line:
dst.write(line)
elif data in line:
if beg_data in line:
req_data = line + line + line + line + line + line + line + line + line
if data101 in line:
opt101 = line
if data102 in line:
opt102 = line
new_line = pd.concat(req_data, opt101, opt102)
dst.write(new_line)
else:
if trailer in line:
dst.write(line)
# With is a context manager which will automatically close the files.
Related
My Problem is the following
I have one file, it contains more than 1000 rows, i am getting expected output , problem is some lines get skipped not appended to output file.i have tried but failed to found the issue
def grouping():
global date,os
try:
output = []
temp = []
currIdLine = ""
with( open ('usergroups.csv', 'r')) as f:
for lines in f.readlines():
line = lines.strip()
if not line:
continue
if line.startswith('uuid'):
output.append(line)
continue
if line.startswith('id:'):
if temp:
#print(temp)
output.append(currIdLine + ";" + ','.join(temp))
temp.clear()
currIdLine = line
else:
temp.append(line)
output.append(currIdLine + ";" + ','.join(temp))
with open('usergroup.csv', 'w') as f1:
for row in output:
f1.write(row + '\n')
print("Emails appended to Previous line")
CSV = 'usergroups.csv'
if(os.path.exists(CSV) and os.path.isfile(CSV)):
os.remove(CSV)
except:
print("Emails appended - Failed")
my sample source file:
uuid;UserGroup;Name;Description;Owner;Visibility;Members ----> header
id:;group1;raji;xyzabc;ramya;public;
abc
def
geh
id:;group2;ram;xyzabc;mitu;public; ---> This line not appended to output file
id:;group3;ram;xyzabc;mitu;public; ---> This line not appended to output file
id:;group4;raji;rtyui;ramya;private
cvb
nmh
poi
output of the above code
uuid;UserGroup;Name;Description;Owner;Visibility;Members ----> header of the file
id:;group1;raji;xyzabc;ramya;public;abcdefgeh
id:;group4;raji;rtyui;ramya;private
my desired output:
uuid;UserGroup;Name;Description;Owner;Visibility;Members ----> header of the file
id:;group1;raji;xyzabc;ramya;public;abcdefgeh
id:;group2;ram;xyzabc;mitu;public;
id:;group3;ram;xyzabc;mitu;public;
id:;group4;raji;rtyui;ramya;private
finally found the mistake. mentioned my mistake here
if temp:
#print(temp)
output.append(currIdLine + ";" + ','.join(temp))
temp.clear()
**else: # <-- this block is needed**
output.append(currIdLine)
I am writing a code in python where I am removing all the text after a specific word but in output lines are missing. I have a text file in unicode which have 3 lines:
my name is test1
my name is
my name is test 2
What I want is to remove text after word "test" so I could get the output as below
my name is test
my name is
my name is test
I have written a code but it does the task but also removes the second line "my name is"
My code is below
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
with open(r"test.txt", "w") as fp:
fp.write(txt)
It looks like if there is no keyword found the index become -1.
So you are avoiding the lines w/o keyword.
I would modify your if by adding the condition as follows:
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
elif index < 0:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
No need to add \n because the line already contains it.
Your code does not append the line if the splitStr is not defined.
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index != -1:
txt += line[:index + len(splitStr)] + "\n"
else:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
In my solution I simulate the input file via io.StringIO. Compared to your code my solution remove the else branch and only use one += operater. Also splitStr is set only one time and not on each iteration. This makes the code more clear and reduces possible errore sources.
import io
# simulates a file for this example
the_file = io.StringIO("""my name is test1
my name is
my name is test 2""")
txt = ""
splitStr = "test"
with the_file as fp:
# each line
for line in fp.readlines():
# cut somoething?
if splitStr in line:
# find index
index = line.find(splitStr)
# cut after 'splitStr' and add newline
line = line[:index + len(splitStr)] + "\n"
# append line to output
txt += line
print(txt)
When handling with files in Python 3 it is recommended to use pathlib for that like this.
import pathlib
file_path = pathlib.Path("test.txt")
# read from wile
with file_path.open('r') as fp:
# do something
# write back to the file
with file_path.open('w') as fp:
# do something
Suggestion:
for line in fp.readlines():
i = line.find('test')
if i != -1:
line = line[:i]
Say I have an input file like this (splitfile.txt):
INPUT
HEADER
OF A TXT FILE
line 1
line 2
line 3
line 4
line 5
line 6
I want to split these files and keep the three header lines like this:
INPUT
HEADER
OF A TXT FILE
line 1
line 2
INPUT
HEADER
OF A TXT FILE
line 3
line 4
INPUT
HEADER
OF A TXT FILE
line 5
line 6
My Python code so far is just only splitting up this textfile:
lines_per_file = 2
s = None
with open('splitfile.txt') as split:
for lineno, line in enumerate(split):
if lineno % lines_per_file == 0:
if s:
s.close()
sfilename = 'step_{}.txt'.format(lineno + lines_per_file)
s = open(sfilename, "w")
s.write(line)
if s:
s.close()
How can I do this?
you can read the header and save it in a variable to write to each new file you create.
lines_per_file = 2
s = None
with open('a.txt') as f:
lines = f.readlines()
headers, lines = lines[:3], lines[3:]
for lineno, line in enumerate(lines):
if lineno % lines_per_file == 0:
if s:
s.close()
sfilename = f'step_{lineno + lines_per_file}.txt'
s = open(sfilename, "w")
s.writelines(headers)
s.write(line)
if s:
s.close()
A cleaner answer:
LINES_PER_FILE = 2
def writer_to_file(name, headers, lines):
with open(name, "w") as f:
print(headers + lines)
f.writelines(headers + lines)
with open('a.txt') as f:
lines = f.readlines()
headers, lines = lines[:3], lines[3:]
[writer_to_file(f'step_{i + LINES_PER_FILE}.txt', headers, lines[i: i+LINES_PER_FILE]) for i in range(0, len(lines), LINES_PER_FILE)]
I prefer this one because there is no global variable s and by using with statement there is no need to worry about closing file.
Also it's better to have UPPER_CASE constant variables.
I changed part of column data, but when a i replaced that, the new file text, the columns in not align.
i send to you the files (input, output files), also you can see the following image.
Thank you for you answer!
Note: I read two columns in datar.csv
You can see the files: https://www.shorturl.at/luFX7
import pandas as pd
df = pd.read_csv('datar.csv')
j=0
#input file
fin = open("input.txt", "rt")
#output file to write the result to
fout = open("output.txt", "wt")
#for each line in the input file
for line in fin:
ori= df["data_o"][j]
ori=str(ori)
ree= df["data_r"][j]
ree=str(ree)
fout.write(line.replace(ori,ree))
j=j+1
fin.close()
fout.close()
I want it to be like this picture
Maybe add an default width:
# e.g: 6 characters lenght
# [...]
line = line.replace(ori, ree) # get line after convert
line_original_width = len(line) # get lenght of line
if line_original_width < 6: # check if fill is necessary
fill = 6 - line_original_width # get amount of necessary fill
line = (" " * fill) + line # fill the left with space
fout.write(line) # write all in the file
# [...]
Else, you can use the f-string to do the same things. But I can't show you an example because I'm not very good at.
Ok,
There's a more complicated way for your code, but I'm sure it will work.
This program will make a default width for each column.
import pandas as pd
df = pd.read_csv('datar.csv')
j=0
with open("input.txt", "rt") as fino:
fin = fino.read() # !!!
fin = fin.split("\n") # !!!
fout = open("output.txt", "wt")
out = [[]] * len(fin) # !!!
# --- decode part ---
iter_count = 0 # !!!
for line in fin:
ori= df["data_o"][j]
ori=str(ori)
ree= df["data_r"][j]
ree=str(ree)
line = line.replace(ori, ree)
line = line.split(" ")
while "" in line: line.remove("")
out[iter_count].append(line)
iter_count += 1
# --- padding part ---
column_count = len(out[0])
row_count = len(out)
for column in range(column_count):
maxl = []
for row in range(row_count):
maxl.append(len(out[row][column]))
maxl = max(maxl) + 1
for row in range(row_count):
pad = " " * (maxl - len(out[row][column]))
out[row][column] = pad + out[row][column]
# --- writting part ---
for row in out:
for num in row:
fout.write(num)
fout.write("\n")
Watch-out, I modified the top part of your original code.
Mention me if you publish.
I want to read specific data from an input file. How can I read it?
For example my file has data like:
this is my first line
this is my second line.
So I just want to read first from the first line and secon from the second line.
Try the following code for your needs but please read the comments above.
# ----------------------------------------
# open text file and write reduced lines
# ----------------------------------------
#this is my first line
#this is my second line.
pathnameIn = "D:/_working"
filenameIn = "foobar.txt"
pathIn = pathnameIn + "/" + filenameIn
pathnameOut = "D:/_working"
filenameOut = "foobar_reduced.txt"
pathOut = pathnameOut + "/" + filenameOut
fileIn = open(pathIn,'r')
fileOut = open(pathOut,'w')
print(fileIn)
print(fileOut)
i = 0
# Save all reduced lines to a file.
for lineIn in fileIn.readlines():
i += 1 # number of lines read
lineOut = lineIn[11:16]
fileOut.writelines(lineOut +"\n")
print("*********************************")
print("gelesene Zeilen: " + str(i))
print("*********************************")
fileIn.close()
fileOut.close()