Python reading data from input file - python

I want to read specific data from an input file. How can I read it?
For example my file has data like:
this is my first line
this is my second line.
So I just want to read first from the first line and secon from the second line.

Try the following code for your needs but please read the comments above.
# ----------------------------------------
# open text file and write reduced lines
# ----------------------------------------
#this is my first line
#this is my second line.
pathnameIn = "D:/_working"
filenameIn = "foobar.txt"
pathIn = pathnameIn + "/" + filenameIn
pathnameOut = "D:/_working"
filenameOut = "foobar_reduced.txt"
pathOut = pathnameOut + "/" + filenameOut
fileIn = open(pathIn,'r')
fileOut = open(pathOut,'w')
print(fileIn)
print(fileOut)
i = 0
# Save all reduced lines to a file.
for lineIn in fileIn.readlines():
i += 1 # number of lines read
lineOut = lineIn[11:16]
fileOut.writelines(lineOut +"\n")
print("*********************************")
print("gelesene Zeilen: " + str(i))
print("*********************************")
fileIn.close()
fileOut.close()

Related

Lines missing in python

I am writing a code in python where I am removing all the text after a specific word but in output lines are missing. I have a text file in unicode which have 3 lines:
my name is test1
my name is
my name is test 2
What I want is to remove text after word "test" so I could get the output as below
my name is test
my name is
my name is test
I have written a code but it does the task but also removes the second line "my name is"
My code is below
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
with open(r"test.txt", "w") as fp:
fp.write(txt)
It looks like if there is no keyword found the index become -1.
So you are avoiding the lines w/o keyword.
I would modify your if by adding the condition as follows:
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
elif index < 0:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
No need to add \n because the line already contains it.
Your code does not append the line if the splitStr is not defined.
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index != -1:
txt += line[:index + len(splitStr)] + "\n"
else:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
In my solution I simulate the input file via io.StringIO. Compared to your code my solution remove the else branch and only use one += operater. Also splitStr is set only one time and not on each iteration. This makes the code more clear and reduces possible errore sources.
import io
# simulates a file for this example
the_file = io.StringIO("""my name is test1
my name is
my name is test 2""")
txt = ""
splitStr = "test"
with the_file as fp:
# each line
for line in fp.readlines():
# cut somoething?
if splitStr in line:
# find index
index = line.find(splitStr)
# cut after 'splitStr' and add newline
line = line[:index + len(splitStr)] + "\n"
# append line to output
txt += line
print(txt)
When handling with files in Python 3 it is recommended to use pathlib for that like this.
import pathlib
file_path = pathlib.Path("test.txt")
# read from wile
with file_path.open('r') as fp:
# do something
# write back to the file
with file_path.open('w') as fp:
# do something
Suggestion:
for line in fp.readlines():
i = line.find('test')
if i != -1:
line = line[:i]

Reading a Python File to EOF while performing if statments

I am working on creating a program to concatenate rows within a file. Each file has a header, datarows labeled DAT001 to DAT113 and a trailer. Each line of concatenated rows will have DAT001 to DAT100 and 102-113 is optional. I need to print the header, concatenating DAT001-113 and when the file finds a row with DAT001 I need to start a new line concatenating DAT001-113 again. After that is all done, I will print the trailer. I have an IF statement started but it only writes the header and skips all other logic. I apologize that this is very basic - but I am struggling with reading rows over and over again without knowing how long the file might be.
I have tried the below code but it won't read or print after the header.
import pandas as pd
destinationFile = "./destination-file.csv"
sourceFile = "./TEST.txt"
header = "RHR"
data = "DPSPOS"
beg_data = "DAT001"
data2 = "DAT002"
data3 = "DAT003"
data4 = "DAT004"
data5 = "DAT005"
data6 = "DAT006"
data7 = "DAT007"
data8 = "DAT008"
data100 = "DAT100"
data101 = "DAT101"
data102 = "DAT102"
data103 = "DAT103"
data104 = "DAT104"
data105 = "DAT105"
data106 = "DAT106"
data107 = "DAT107"
data108 = "DAT108"
data109 = "DAT109"
data110 = "DAT110"
data111 = "DAT111"
data112 = "DAT112"
data113 = "DAT113"
req_data = ''
opt101 = ''
opt102 = ''
with open(sourceFile) as Tst:
for line in Tst.read().split("\n"):
if header in line:
with open(destinationFile, "w+") as dst:
dst.write(line)
elif data in line:
if beg_data in line:
req_data = line+line+line+line+line+line+line+line+line
if data101 in line:
opt101 = line
if data102 in line:
opt102 = line
new_line = pd.concat(req_data,opt101,opt102)
with open(destinationFile, "w+") as dst:
dst.write(new_line)
else:
if trailer in line:
with open(destinationFile, "w+") as dst:
dst.write(line)
Just open the output file once for the whole loop, not every time through the loop.
Check whether the line begins with DAT101. If it does, write the trailer to the current line and start a new line by printing the header.
Then for every line that begins with DAT, write it to the file in the current line.
first_line = True
with open(sourceFile) as Tst, open(destinationFile, "w+") as dst:
for line in Tst.read().split("\n"):
# start a new line when reading DAT101
if line.startswith(beg_data):
if not first_line: # need to end the current line
dst.write(trailer + '\n')
first_line = False
dst.write(header)
# copy all the lines that begin with `DAT`
if line.startswith('DAT'):
dst.write(line)
# end the last line
dst.write(trailer + '\n')
See if the following code helps make progress. It was not tested because no
Minimum Runnable Example is provided.
with open(destinationFile, "a") as dst:
# The above will keep the file open until after all the indented code runs
with open(sourceFile) as Tst:
# The above will keep the file open until after all the indented code runs
for line in Tst.read().split("\n"):
if header in line:
dst.write(line)
elif data in line:
if beg_data in line:
req_data = line + line + line + line + line + line + line + line + line
if data101 in line:
opt101 = line
if data102 in line:
opt102 = line
new_line = pd.concat(req_data, opt101, opt102)
dst.write(new_line)
else:
if trailer in line:
dst.write(line)
# With is a context manager which will automatically close the files.

Python Replace String in File in With clause

I am trying to replace a string in a file.
Below code is simply modifying certain substrings within the bigger string from the file. Any ideas on how I can actually replace line with current_line in the filename?
from sys import *
import os
import re
import datetime
import fileinput
script, filename = argv
userhome = os.path.expanduser('~')
username = os.path.split(userhome)[-1]
print "\n"
print "User: " + username
today = datetime.date.today().strftime("%Y/%m/%d")
time = datetime.datetime.now().strftime("%H:%M:%S")
print "Date: " + str(today)
print "Current time: " + str(time)
print "Filename: %s\n" % filename
def replace_string():
found = False
with open(filename, 'r+') as f:
for line in f:
if re.search("CVS Header", line):
print line
####################################################################################
# Below logic: #
# if length of revision number is 4 characters (e.g. 1.15) then increment by 0.01 #
# else if it is 3 characters (e.g. 1.5) then increment by 0.1 #
####################################################################################
if len(line.split("$Revision: ")[1].split()[0]) == 4:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.01))
elif len(line.split("$Revision: ")[1].split()[0]) == 3:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.1))
###
###
newer_line = str.replace(new_line, line.split("$Author: ")[1].split()[0], username)
newest_line = str.replace(newer_line, line.split("$Date: ")[1].split()[0], today)
current_line = str.replace(newest_line, line.split("$Date: ")[1].split()[1], time)
print current_line
found = True
if not found:
print "No CVS Header exists in %s" % filename
if __name__ == "__main__":
replace_string()
I tried adding something like..
f.write(f.replace(line, current_line))
but this just clears all the contents out of the file and leaves it blank so obviously that is incorrect.
The fileinput provides a way to edit a file in place. If you use the inplace parameter the file is moved to a backup file and standard output is directed to the input file.
import fileinput
def clause(line):
return len(line) < 5
for line in fileinput.input('file.txt', inplace=1):
if clause(line):
print '+ ' + line[:-1]
fileinput.close()
Trying to apply this idea to your example, it could be something like this:
def replace_string():
found = False
for line in fileinput.input(filename, inplace=1): # <-
if re.search("CVS Header", line):
#print line
####################################################################################
# Below logic: #
# if length of revision number is 4 characters (e.g. 1.15) then increment by 0.01 #
# else if it is 3 characters (e.g. 1.5) then increment by 0.1 #
####################################################################################
if len(line.split("$Revision: ")[1].split()[0]) == 4:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.01))
elif len(line.split("$Revision: ")[1].split()[0]) == 3:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.1))
###
###
newer_line = str.replace(new_line, line.split("$Author: ")[1].split()[0], username)
newest_line = str.replace(newer_line, line.split("$Date: ")[1].split()[0], today)
current_line = str.replace(newest_line, line.split("$Date: ")[1].split()[1], time)
print current_line[:-1] # <-
found = True
else:
print line[:-1] # <- keep original line otherwise
fileinput.close() # <-
if not found:
print "No CVS Header exists in %s" % filename
The solution proposed by user2040251 is the correct way, and the way used but all text editors I know. The reason is that in case of a major problem when writing the file, you keep the previous version unmodified until the new version is ready.
But of course if you want you can edit in place, if you accept the risk of completely losing the file in case of crash - it can be acceptable for a file under version control since you can always get previous commited version.
The principle is then a read before write, ensuring that you never write something that you have not still read.
At the simplest level, you load everything in memory with readlines, replace the line rewind the file the the correct position (or to the beginning) and write it back.
Edit : here is a simple implementation when all lines can be loaded in memory :
fd = open(filename, "r+")
lines = fd.readlines()
for i, line in enumerate(lines):
# test if line if the searched line
if found :
lines[i] = replacement_line
break
fd.seek(0)
fd.writelines()
It could be done even for a big file using readlines(16384) for example instead of readlines() to read by chunks of little more than 16K, and always reading one chunk before writing previous, but it is really much more complicated and anyway you should use a backup file when processing big files.
You can create another file and write the output to it. After that, you can just remove the original file and rename the new file.

Pick parts from a txt file and copy to another file with python

I'm in trouble here. I need to read a file. Txt file that contains a sequence of records, check the records that I want to copy them to a new file.
The file content is like this (this is just an example, the original file has more than 30 000 lines):
AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460
99999|35|436 #end register
00000|46|316 #begin register
03000|SP|467
99999|33|130 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file
The records that begin with 03000 and have the characters 'TO' must be written to a new file. Based on the example, the file should look like this:
AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file
Code:
file = open("file.txt",'r')
newFile = open("newFile.txt","w")
content = file.read()
file.close()
# here I need to check if the record exists 03000 characters 'TO', if it exists, copy the recordset 00000-99999 for the new file.
I did multiple searches and found nothing to help me.
Thank you!
with open("file.txt",'r') as inFile, open("newFile.txt","w") as outFile:
outFile.writelines(line for line in inFile
if line.startswith("03000") and "TO" in line)
If you need the previous and the next line, then you have to iterate inFile in triads. First define:
def gen_triad(lines, prev=None):
after = current = next(lines)
for after in lines:
yield prev, current, after
prev, current = current, after
And then do like before:
outFile.writelines(''.join(triad) for triad in gen_triad(inFile)
if triad[1].startswith("03000") and "TO" in triad[1])
import re
pat = ('^00000\|\d+\|\d+.*\n'
'^03000\|TO\|\d+.*\n'
'^99999\|\d+\|\d+.*\n'
'|'
'^AAAAA\|\d+\|\d+.*\n'
'|'
'^ZZZZZ\|\d+\|\d+.*')
rag = re.compile(pat,re.MULTILINE)
with open('fifi.txt','r') as f,\
open('newfifi.txt','w') as g:
g.write(''.join(rag.findall(f.read())))
For files with additional lines between lines beginning with 00000, 03000 and 99999, I didn't find simpler code than this one:
import re
pat = ('(^00000\|\d+\|\d+.*\n'
'(?:.*\n)+?'
'^99999\|\d+\|\d+.*\n)'
'|'
'(^AAAAA\|\d+\|\d+.*\n'
'|'
'^ZZZZZ\|\d+\|\d+.*)')
rag = re.compile(pat,re.MULTILINE)
pit = ('^00000\|.+?^03000\|TO\|\d+.+?^99999\|')
rig = re.compile(pit,re.DOTALL|re.MULTILINE)
def yi(text):
for g1,g2 in rag.findall(text):
if g2:
yield g2
elif rig.match(g1):
yield g1
with open('fifi.txt','r') as f,\
open('newfifi.txt','w') as g:
g.write(''.join(yi(f.read())))
file = open("file.txt",'r')
newFile = open("newFile.txt","w")
content = file.readlines()
file.close()
newFile.writelines(filter(lambda x:x.startswith("03000") and "TO" in x,content))
This seems to work. The other answers seem to only be writing out records that contain '03000|TO|' but you have to write out the record before and after that as well.
import sys
# ---------------------------------------------------------------
# ---------------------------------------------------------------
# import file
file_name = sys.argv[1]
file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name
file = open(file_path,"r")
# ---------------------------------------------------------------
# create output files
output_file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name + '.out'
output_file = open(output_file_path,"w")
# create output files
# ---------------------------------------------------------------
# process file
temp = ''
temp_out = ''
good_write = False
bad_write = False
for line in file:
if line[:5] == 'AAAAA':
temp_out += line
elif line[:5] == 'ZZZZZ':
temp_out += line
elif good_write:
temp += line
temp_out += temp
temp = ''
good_write = False
elif bad_write:
bad_write = False
temp = ''
elif line[:5] == '03000':
if line[6:8] != 'TO':
temp = ''
bad_write = True
else:
good_write = True
temp += line
temp_out += temp
temp = ''
else:
temp += line
output_file.write(temp_out)
output_file.close()
file.close()
Output:
AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file
Does it have to be python? These shell commands would do the same thing in a pinch.
head -1 inputfile.txt > outputfile.txt
grep -C 1 "03000|TO" inputfile.txt >> outputfile.txt
tail -1 inputfile.txt >> outputfile.txt
# Whenever I have to parse text files I prefer to use regular expressions
# You can also customize the matching criteria if you want to
import re
what_is_being_searched = re.compile("^03000.*TO")
# don't use "file" as a variable name since it is (was?) a builtin
# function
with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
for this_line in source_file:
if what_is_being_searched.match(this_line):
destination_file.write(this_line)
and for those who prefer a more compact representation:
import re
with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
destination_file.writelines(this_line for this_line in source_file
if re.match("^03000.*TO", this_line))
code:
fileName = '1'
fil = open(fileName,'r')
import string
##step 1: parse the file.
parsedFile = []
for i in fil:
##tuple1 = (1,2,3)
firstPipe = i.find('|')
secondPipe = i.find('|',firstPipe+1)
tuple1 = (i[:firstPipe],\
i[firstPipe+1:secondPipe],\
i[secondPipe+1:i.find('\n')])
parsedFile.append(tuple1)
fil.close()
##search criterias:
searchFirst = '03000'
searchString = 'TO' ##can be changed if and when required
##step 2: used the parsed contents to write the new file
filout = open('newFile','w')
stringToWrite = parsedFile[0][0] + '|' + parsedFile[0][1] + '|' + parsedFile[0][2] + '\n'
filout.write(stringToWrite) ##to write the first entry
for i in range(1,len(parsedFile)):
if parsedFile[i][1] == searchString and parsedFile[i][0] == searchFirst:
for j in range(-1,2,1):
stringToWrite = parsedFile[i+j][0] + '|' + parsedFile[i+j][1] + '|' + parsedFile[i+j][2] + '\n'
filout.write(stringToWrite)
stringToWrite = parsedFile[-1][0] + '|' + parsedFile[-1][1] + '|' + parsedFile[-1][2] + '\n'
filout.write(stringToWrite) ##to write the first entry
filout.close()
I know that this solution may be a bit long. But it is quite easy to understand. And it seems an intuitive way to do it. And I have already checked this with the Data that you have provided and it works perfectly.
Please tell me if you need some more explanation on the code. I will definitely add the same.
I tip (Beasley and Joran elyase) very interesting, but it only allows to get the contents of the line 03000. I would like to get the contents of the lines 00000 to line 99999.
I even managed to do here, but I am not satisfied, I wanted to make a more cleaner.
See how I did:
file = open(url,'r')
newFile = open("newFile.txt",'w')
lines = file.readlines()
file.close()
i = 0
lineTemp = []
for line in lines:
lineTemp.append(line)
if line[0:5] == '03000':
state = line[21:23]
if line[0:5] == '99999':
if state == 'TO':
newFile.writelines(lineTemp)
else:
linhaTemp = []
i = i+1
newFile.close()
Suggestions...
Thanks to all!

Python CSV module, add column to the side, not the bottom

I am new in python, and I need some help. I made a python script that takes two columns from a file and copies them into a "new file". However, every now and then I need to add columns to the "new file". I need to add the columns on the side, not the bottom. My script adds them to the bottom. Someone suggested using CSV, and I read about it, but I can't make it in a way that it adds the new column to the side of the previous columns. Any help is highly appreciated.
Here is the code that I wrote:
import sys
import re
filetoread = sys.argv[1]
filetowrite = sys.argv[2]
newfile = str(filetowrite) + ".txt"
openold = open(filetoread,"r")
opennew = open(newfile,"a")
rline = openold.readlines()
number = int(len(rline))
start = 0
for i in range (len(rline)) :
if "2theta" in rline[i] :
start = i
for line in rline[start + 1 : number] :
words = line.split()
word1 = words[1]
word2 = words[2]
opennew.write (word1 + " " + word2 + "\n")
openold.close()
opennew.close()
Here is the second code I wrote, using CSV:
import sys
import re
import csv
filetoread = sys.argv[1]
filetowrite = sys.argv[2]
newfile = str(filetowrite) + ".txt"
openold = open(filetoread,"r")
rline = openold.readlines()
number = int(len(rline))
start = 0
for i in range (len(rline)) :
if "2theta" in rline[i] :
start = i
words1 = []
words2 = []
for line in rline[start + 1 : number] :
words = line.split()
word1 = words[1]
word2 = words[2]
words1.append([word1])
words2.append([word2])
with open(newfile, 'wb') as file:
writer = csv.writer(file, delimiter= "\n")
writer.writerow(words1)
writer.writerow(words2)
These are some samples of input files:
https://dl.dropbox.com/u/63216126/file5.txt
https://dl.dropbox.com/u/63216126/file6.txt
My first script works "almost" great, except that it writes the new columns at the bottom and I need them at side of the previous columns.
The proper way to use writerow is to give it a single list that contains the data for all the columns.
words.append(word1)
words.append(word2)
writer.writerow(words)

Categories