Replace Line with New Line in Python - python

I am reading a text file and searching data line by line, based on some condition, changing some values in the line and writing it back into another file. The new file should not contain the old Line. I have tried the following, but it did not work. I think I am missing a very basic thing.
Solution: In C++ we can increment line but in Python I am not sure how to achieve this. So as of now, I am writing old line than new line. But in the new file, I want only the new line.
Example:
M0 38 A 19 40 DATA2 L=4e-08 W=3e-07 nf=1 m=1 $X=170 $Y=140 $D=8
M0 VBN A 19 40 TEMP2 L=4e-08 W=3e-07 nf=1 m=1 $X=170 $Y=140 $D=8
The code which i tried is the following:
def parsefile():
fp = open("File1", "rb+")
update_file = "File1" + "_update"
fp_latest = open(update_file, "wb+")
for line in fp:
if line.find("DATA1") == -1:
fp_latest.write(line)
if line.find("DATA1") != -1:
line = line.split()
pin_name = find_pin_order(line[1])
update_line = "DATA " + line[1] + " " + pin_name
fp_latest.write(update_line)
line = ''.join(line)
if line.find("DATA2") != -1:
line_data = line.split()
line_data[1] = "TEMP2"
line_data =' '.join(line_data)
fp_latest.write(line_data)
if line.find("DATA3") != -1:
line_data = line.split()
line_data[1] = "TEMP3"
line_data =' '.join(line_data)
fp_latest.write(line_data)
fp_latest.close()
fp.close()

The main problem with your current code is that your first if block, which checks for "DATA1" and writes the line out if it is not found runs when "DATA2" or "DATA3" is present. Since those have their own blocks, the line ends up being duplicated in two different forms.
Here's a minimal modification of your loop that should work:
for line in fp:
if line.find("DATA1") != -1:
data = line.split()
pin_name = find_pin_order(data[1])
line = "DATA " + data[1] + " " + pin_name
if line.find("DATA2") != -1:
data = line.split()
data[1] = "TEMP2"
line =' '.join(data)
if line.find("DATA3") != -1:
data = line.split()
data[1] = "TEMP3"
line =' '.join(data)
fp_latest.write(line)
This ensures that only one line is written because there's only a single write() call in the code. The special cases simply modify the line that is to be written. I'm not sure I understand the modifications you want to have done in those cases, so there may be more bugs there.
One thing that might help would be to make the second and third if statements into elif statements instead. This would ensure that only one of them would be run (though if you know your file will never have multiple DATA entries on a single line, this may not be necessary).

If you want to write a new line in a file replacing the old content that has been readed last time, you can use the file.seek() method for moving arround the file, there is an example.
with open("myFile.txt", "r+") as f:
offset = 0
lines = f.readlines()
for oldLine in lines:
... calculate the new line value ...
f.seek(offset)
f.write(newLine)
offset += len(newLine)
f.seek(offset)

Related

How to loop to dictionary in dictionary to organize data from CSV in specified way

I made a script that:
-takes data from CSV file -sort it by same values in first column of data file
-instert sorted data in specifield line in different template text file
-save the file in as many copies as there are different values in first column from data file This picture below show how it works:
But there are two more things I need to do. When in separate files as showed above, there are some of the same values from second column of the data file, then this file should insert value from third column instead of repeating the same value from second column. On the picture below I showed how it should look like:
What I also need is to add somewhere separeted value of first column from data file by "_".
There is datafile:
111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG
and there is code i made:
import shutil
with open("data.csv") as f:
contents = f.read()
contents = contents.splitlines()
values_per_baseline = dict()
for line in contents:
key = line.split(',')[0]
values = line.split(',')[1:]
if key not in values_per_baseline:
values_per_baseline[key] = []
values_per_baseline[key].append(values)
for file in values_per_baseline.keys():
x = 3
shutil.copyfile("of.txt", (f"of_%s.txt" % file))
filename = f"of_%s.txt" % file
for values in values_per_baseline[file]:
with open(filename, "r") as f:
contents = f.readlines()
contents.insert(x, ' o = ' + values[0] + '\n ' + 'a = ' + values[1] +'\n')
with open(filename, "w") as f:
contents = "".join(contents)
f.write(contents)
f.close()
I have been trying to make something like a dictionary of dictionaries of lists but I can't implement it in correct way to make it works.
When I run your code, I get this error:
contents.insert(x, ' o = ' + values[0] + '\n ' + 'a = ' + values[3] +'\n')
IndexError: list index out of range
Let's think where this error is coming from. It is an IndexError on a list. The only list used on this line is values so that seems like a good place to start looking.
To debug, you can consider adding something like this before the line that is spitting the error:
print(values)
print(values[0])
print(values[3])
which gives
['3005', 'QWE']
3005
Traceback (most recent call last):
File "qqq.py", line 25, in <module>
print(values[3])
IndexError: list index out of range
So the problem is with values[3], which makes sense since len(values)==2 and so the indices need to be 0 and 1. If we change values[3] to values[1] then I think you get what you want. e.g.:
$ cat of_111_0.txt
line
line
line
o = 3006
a = LFR
o = 3006
a = SDE
o = 3005
a = QWE
line
line
line
line
line
To get to the next step in your problem, I would suggest you change your first loop to:
for line in contents:
key = line.split(',')[0]
values = line.split(',')[1:]
if key not in values_per_baseline:
values_per_baseline[key] = {}
if values[0] not in values_per_baseline[key]:
values_per_baseline[key][values[0]] = values[1]
else:
values_per_baseline[key][values[0]] += '<COMMA>' + values[1]
That gives your dictionary to be:
{'111_0': {'3005': 'QWE',
'3006': 'SDE<COMMA>LFR'},
'111_1': {'3005': 'QWE',
'5345': 'JTR'},
'112_0': {'3103': 'JPP',
'3343': 'PDK'},
'113_0': {'2137': 'TRE<COMMA>OMG'}}
Then when writing to the file, you would need to change your loop to:
for key in values_per_baseline[file]:
contents.insert(x, f'{6*sp}o = {key}\n{10*sp}a = {values_per_baseline[file][key]}\n')
And your file now looks like:
line
line
line
o = 3006
a = SDE<COMMA>LFR
o = 3005
a = QWE
line
line
line
line
line
Other things you could do
Now, there are a couple of things you can do to streamline your code while keeping it readable.*
On lines 10 and 11, there is no need to use line.split twice. Just add a line that has something like split_line = line.split(',') and then have key = split_line[0] and values = split_line[1:]. (You could do away with key and values all together and just reference split_line[0] and split_line[1] but that would make your code less readable.
On line 17, you are defining x in every loop. Just take it out of the loop.
On lines 12 and 13, you are first using (f"of_%s.txt" % file) and then defining it in a file on the next line. Suggest you define filename first and then just have shutil.copyfile("of.txt", filename). Also, you are using f-strings incorrectly. You could just write filename = f"of_{file}.txt".
On line 23, you could change your insert command to an f-string (if you find it more readable). For example: contents.insert(x, f'{6*sp}o = {values[0]}\n{10*sp}a = {values[1]}\n')
At the end, in your for values in values_per_baseline.keys() loop, you are opening and closing files way more than you need to. You can reorder your operations:
with open(filename, "r") as f:
contents = f.readlines()
for values in values_per_baseline[file]:
contents.insert(x, ' o = ' + values[0] + '\n ' + 'a = ' + values[1] +'\n')
with open(filename, "w") as f:
contents = "".join(contents)
f.write(contents)
f.close()
*For a short script like this, I would argue that making sure it is readable is more important than making sure it is efficient, since you will want to be able to come back in 3 weeks or 3 years and understand what you did. For that reason, I would also recommend you comment what you did.

Remove linebreak in csv

I have a CSV file that has errors. The most common one is a too early linebreak.
But now I don't know how to remove it ideally. If I read the line by line
with open("test.csv", "r") as reader:
test = reader.read().splitlines()
the wrong structure is already in my variable. Is this still the right approach and do I use a for loop over test and create a copy or can I manipulate directly in the test variable while iterating over it?
I can identify the corrupt lines by the semikolon, some rows end with a ; others start with it. So maybe counting would be an alternative way to solve it?
EDIT:
I replaced reader.read().splitlines() with reader.readlines() so I could handle the rows which end with a ;
for line in lines:
if("Foobar" in line):
line = line.replace("Foobar", "")
if(";\n" in line):
line = line.replace(";\n", ";")
The only thing that remains are rows that beginn with a ;
Since I need to go back one entry in the list
Example:
Col_a;Col_b;Col_c;Col_d
2021;Foobar;Bla
;Blub
Blub belongs in the row above.
Here's a simple Python script to merge lines until you have the desired number of fields.
import sys
sep = ';'
fields = 4
collected = []
for line in sys.stdin:
new = line.rstrip('\n').split(sep)
if collected:
collected[-1] += new[0]
collected.extend(new[1:])
else:
collected = new
if len(collected) < fields:
continue
print(';'.join(collected))
collected = []
This simply reads from standard input and prints to standard output. If the last line is incomplete, it will be lost.
The separator and the number of fields can be edited into the variables at the top; exposing these as command-line parameters left as an exercise.
If you wanted to keep the newlines, it would not be too hard to only strip a newline from the last fields, and use csv.writer to write the fields back out as properly quoted CSV.
This is how I deal with this. This function fixes the line if there are more columns than needed or if there is a line break in the middle.
Parameters of the function are:
message - content of the file - reader.read() in your case
columns - number of expected columns
filename - filename (I use it for logging)
def pre_parse(message, columns, filename):
parsed_message=[]
i =0
temp_line =''
for line in message.splitlines():
#print(line)
split = line.split(',')
if len(split) == columns:
parsed_message.append(line)
elif len(split) > columns:
print(f'Line {i} has been truncated in file {filename} - too much columns'))
split = split[:columns]
line = ','.join(split)
parsed_message.append(line)
elif len(split) < columns and temp_line =='':
temp_line = line.replace('\n','')
print(temp_line)
elif temp_line !='':
line = temp_line+line
if line.count(',') == columns-1:
print((f'Line {i} has been fixed in file {filename} - extra line feed'))
parsed_message.append(line)
temp_line =''
else:
temp_line=line.replace('\n', '')
i+=1
return parsed_message
make sure you use proper split character and proper line feed characer.

[Help]Extract from csv file and output txt(python)[Help]

enter image description here
I read the file 'average-latitude-longitude-countries.csv' to the Southern Hemisphere.
Print the country name of the country in the file 'result.txt'
Question:
I want you to fix it so that it can be printed according to the image file.
infile = open("average-latitude-longitude-countries.csv","r")
outfile = open("average-latitude-longitude-countries.txt","w")
joined = []
infile.readline()
for line in infile:
splited = line.split(",")
if len(splited) > 4:
if float(splited[3]) < 0:
joined.append(splited[2])
outfile.write(str(joined) + "\n")
else:
if float(splited[2]) < 0:
joined.append(splited[1])
outfile.write(str(joined) + '\n')
It's hard without posting a head/first few lines of the CSV
However, assuming your code works and the countries list is successfully populated.
Then,
you can replace the line
outfile.write(str(joined) + '\n')
with:
outfile.write("\n".join(joined))
OR
with those 2 lines:
for country in joined:
outfile.write("%s\n" % country)
Keep in mind, those approaches just do the job. however, not optimum
Extra hints:
you can have a look at csv module of standard Python. can make your parsing easier
Also, splited = line.split(",") can lead to wrong output , if there a single quoted field that contains "," .
like this : field1_value,"field 2,value",field3, field4 , ...
Update:
Now I got you, First of all you are dumping the whole aggregated array to the file for each line you read.
you should keep adding to the array in the loop. then after the whole loop, dump in once (Splitting the accumulated array like above)
Here is your code slightly modified:
infile = open("average-latitude-longitude-countries.csv","r")
outfile = open("average-latitude-longitude-countries.txt","w")
joined = []
infile.readline()
for line in infile:
splited = line.split(",")
if len(splited) > 4:
if float(splited[3]) < 0:
joined.append(splited[2])
#outfile.write(str(joined) + "\n")
else:
if float(splited[2]) < 0:
joined.append(splited[1])
#outfile.write(str(joined) + '\n')
outfile.write("\n".join(joined))

First of two blocks of code of reading from a file not executing in python

As part of an assignment I'm writing an assembler in python that takes simplified assembly language and outputs binary machine language. Part of my code is below, where I'm reading from the assembly code in two passes. The first pass (the first line of with open(filename, "r") as asm_file) in the first block of reading from the file asm_file doesn't seem to be executing. The second one is executing fine, well it's not outputting the correct binary because the first block doesn't seem to be running correctly or at all. Am I using the "with open(filename. "r") as file:" correctly? What am I missing? Thanks in advance.
For completeness an input file is given below the code:
if __name__ == "__main__":
#fill Symbol Table and C instruction Tables
symbol_table = symbolTable()
symbol_table.initialiseTable()
comp_table = compTable()
comp_table.fillTable()
dest_table = destTable()
dest_table.fillTable()
jump_table = jumpTable()
jump_table.fillTable()
#import the file given in the command line
filename = sys.argv[-1]
#open output_file
output_file = open('output.hack', 'w')
#open said file and work on contents line by line
with open(filename, "r") as asm_file: ##### This one doesn't seem to run because
#1st pass of input file ##### The print command below doesn't output anything
num_instructions = -1
for line in asm_file:
#ignoring whitespace and comments
if line != '\n' and not line.startswith('//'):
num_instructions += 1
#remove in-line comments
if '//' in line:
marker, line = '//', line
line = line[:line.index(marker)].strip()
#search for beginning of pseudocommand
if line.startswith('('):
num_instructions -= 1
label = line.strip('()')
address = num_instructions + 1
symbol_table.addLabelAddresses(label, address)
print(num_instructions) ###### This print command doesn't output anything
with open(filename, "r") as asm_file:
#2nd pass of input file
for line in asm_file:
#ignoring whitespace and comments
if line != '\n' and not line.startswith('//') and not line.startswith('('):
#remove in-line comments
if '//' in line:
marker, line = '//', line
line = line[:line.index(marker)].strip()
#send each line to parse function to unpack into its underlying fields
instruction = parseLine(line.strip(' \n'))
inst = Instruction(instruction)
binary_string = inst.convertToBin()
#write to output file
output_file.write(binary_string +'\n')
output_file.close()
An input file example:
// This file is part of www.nand2tetris.org
// and the book "The Elements of Computing Systems"
// by Nisan and Schocken, MIT Press.
// File name: projects/06/max/Max.asm
// Computes R2 = max(R0, R1) (R0,R1,R2 refer to RAM[0],RAM[1],RAM[2])
#R0
D=M // D = first number
#R1
D=D-M // D = first number - second number
#OUTPUT_FIRST
D;JGT // if D>0 (first is greater) goto output_first
#R1
D=M // D = second number
#OUTPUT_D
0;JMP // goto output_d
(OUTPUT_FIRST)
#R0
D=M // D = first number
(OUTPUT_D)
#R2
M=D // M[2] = D (greatest number)
(INFINITE_LOOP)
#INFINITE_LOOP
0;JMP // infinite loop
Your problem seems to be that your code checks if a line starts with a (, but in the assembly it has a tab before an instruction so that it doesn't work. You should probably do a line.strip() after your first if statement like so
with open(filename, "r") as asm_file:
num_of_instructions = -1
for line in asm_file
if line != "\n":
line.strip()
#rest of code
Incidentally, is the print statement supposed to execute every time it finds a line? Because if it doesn't, you should put it after the for loop. That is why it is not outputting anything
Edit: As #TimPeters says, the print statement will also only execute if it starts with an open bracket and has a comment in it
In the first with, starting with
#search for beginning of pseudocommand
are you quite sure you don't want that and the following lines dedented a level?
As is, the only way to get to your print is if a line satisfies both
if '//' in line:
and
if line.startswith('('):
There are no lines in your input file satisfying both, so the print never executes.
In the second with, there are only two indented lines after its
if '//' in line:

Why are there two different versions of my list in my code?

I am parsing a file in to memory, editing it, removing multiple entries, newlines, etc, then writing it to a new file.
For some reason, though, the line mystatement = parsedoc[i]==parsedoc[j] always returns false. It should check the next 20 available lines (without reaching outside the list) and if they match, it should remove it. However, when I do print parsedoc[i],parsedoc[j], parsedoc[j] still has the new line at the end, which should have removed in a previous line, and which also does not show up in parsedoc[i]. I can rearrange my code to avoid this, but why is it happening?
Code:
#print "What file would you like to open?" #comment this and the next line back in
filename = "97_03_10.log" #raw_input("? ")
f = open(filename,'r')
filelines = f.readlines()
filedata = [len(filelines)]
parsedoc = []
del f
for line in filelines:
parsedoc.append(line.split("\t")[1:])
#del filelines
for i in range(20):#len(parsedoc)-1): #this is where the magic happens
if (not parsedoc[i]):
print True
continue
parsedoc[i][1] = parsedoc[i][1].replace("\n","")
if (parsedoc[i][1]==""):#remove empty entries
parsedoc[i] = []
continue
for j in range(i+1,i+(20 if (20+i<len(parsedoc)) else (len(parsedoc)-i-1))):
mystatement = parsedoc[i]==parsedoc[j]
print parsedoc[i],parsedoc[j]
if mystatement:
parsedoc[j] = []
#for line in parsedoc:
# print line
parsedoc = filter(None,parsedoc)
filedata.append(len(parsedoc))
print "Originally",
print filedata[0],
print "lines."
print "Currently",
print filedata[1],
print "lines."
for line in parsedoc[:20]:
print line
Output: Just a heads up, these are native search results. There are swear words, and the usual suspects that you'd get if you took everyone's search results and compiled them.
http://pastebin.com/KBMudX7f
First 40 many lines of my input file, for testing: Again, there are swear words, and other undesirable words. NSFW.
http://pastebin.com/AgxnBMtF
You're removing the newline characters inside the loop on the ith elements, and since j starts at i+1, when you compare the elements at indices i and j, one will be stripped, and one won't.
Changing your initialization of parsedoc to:
for line in filelines:
parsedoc.append(line.strip().split("\t")[1:])
Will strip the newline from every line, before the for i / for j loops.
This also means you can get rid of parsedoc[i][1] = parsedoc[i][1].replace("\n","")
With this edit, you'll get:
Originally 49 lines.
Currently 44 lines.
Edit: You can use the csv package to re-write your code as:
import csv
#print "What file would you like to open?" #comment this and the next line back in
filename = "97_03_10.log" #raw_input("? ")
filedata = []
# Read file into parsedoc
parsedoc = []
with open(filename, 'rb') as f:
reader = csv.reader(f, delimiter='\t')
for line in reader:
parts = line[1:]
if parts[1] == '': continue
parsedoc.append(parts)
print parts
filedata.append(len(parsedoc))
# "Filter" parsedoc
for i,pdi in enumerate(parsedoc[0:20]): # Slice notation won't raise an
for j,pdj in enumerate(parsedoc[i+1:i+1+20]): # IndexError for OOB
#print pdi,pdj
if pdi == pdj:
print("Element match found at i=%d, j=%d: %s" % (i,i+1+j, pdi))
del parsedoc[j]
filedata.append(len(parsedoc))
print("Originally %d lines." % filedata[0])
print("Currently %d lines." % filedata[1])

Categories