Concatenate text file lines with condition in python - python

I have a text file in this format:
0.jpg 12,13,14,15,16
0.jpg 13,14,15,16,17
1.jpg 1,2,3,4,5
1.jpg 2,3,4,5,6
I want to check if the image name is the same and then concatenate those lines into one line with the following format:
0.jpg 12,13,14,15,16 13,14,15,16,17
1.jpg 1,2,3,4,5 2,3,4,5,6
I have tried something like this but don't know how to do the actual comparison and also don't quite know what logic to apply since the first line_elements[0] will be taken and compared with each other line's line_elements[0]
with open("file.txt", "r") as input: # Read all data lines.
data = input.readlines()
with open("out_file.txt", "w") as output: # Create output file.
for line in data: # Iterate over data lines.
line_elements = line.split() # Split line by spaces.
line_updated = [line_elements[0]] # Initialize fixed line (without undesired patterns) with image's name.
if line_elements[0] = (next line's line_elements[0])???:
for i in line_elements[1:]: # Iterate over groups of numbers in current line.
tmp = i.split(',') # Split current group by commas.
if len(tmp) == 5:
line_updated.append(','.join(tmp))
if len(line_updated) > 1: # If the fixed line is valid, write it to output file.
output.write(f"{' '.join(line_updated)}\n")
Could be something like:
for i in range (len(data)):
if line_elements[0] in line[i] == line_elements[0] in line[i+1]:
line_updated = [line_elements[0]]
for i in line_elements[1:]: # Iterate over groups of numbers in current line.
tmp = i.split(',') # Split current group by commas.
if len(tmp) == 5:
line_updated.append(','.join(tmp))
if len(line_updated) > 1: # If the fixed line is valid, write it to output file.
output.write(f"{' '.join(line_updated)}\n")

Save the first field of the line in a variable. Then check if the first field of the current line is equal to the value. If it is, append to the value, otherwise write the saved line and start a new output line.
current_name = None
with open("out_file.txt", "w") as output:
for line in data:
name, values = line.split()
if name == current_name:
current_values += ' ' + values
continue
if current_name:
output.write(f'{current_name} {current_values}\n')
current_name, current_values = name, values
# write the last block
if current_name:
output.write(f'{current_name} {current_values}\n')

Related

Remove text lines and strip lines with condition in python

I have a text file in this format:
000000.png 712,143,810,307,0
000001.png 599,156,629,189,3 387,181,423,203,1 676,163,688,193,5
000002.png 657,190,700,223,1
000003.png 614,181,727,284,1
000004.png 280,185,344,215,1 365,184,406,205,1
I want to remove the lines that don't have a [number1,number2,number3,number4,1] or [number1,number2,number3,number4,5] ending and also strip the text line and remove the [blocks] -> [number1,number2,number3,number4,number5] that don't fulfill this condition.
The above text file should look like this in the end:
000001.png 387,181,423,203,1 676,163,688,193,5
000002.png 657,190,700,223,1
000003.png 614,181,727,284,1
000004.png 280,185,344,215,1 365,184,406,205,1
My code:
import os
with open("data.txt", "r") as input:
with open("newdata.txt", "w") as output:
# iterate all lines from file
for line in input:
# if substring contain in a line then don't write it
if ",0" or ",2" or ",3" or ",4" or ",6" not in line.strip("\n"):
output.write(line)
I have tried something like this and it didn't work obviously.
No need for Regex, this might help you:
with open("data.txt", "r") as input: # Read all data lines.
data = input.readlines()
with open("newdata.txt", "w") as output: # Create output file.
for line in data: # Iterate over data lines.
line_elements = line.split() # Split line by spaces.
line_updated = [line_elements[0]] # Initialize fixed line (without undesired patterns) with image's name.
for i in line_elements[1:]: # Iterate over groups of numbers in current line.
tmp = i.split(',') # Split current group by commas.
if len(tmp) == 5 and (tmp[-1] == '1' or tmp[-1] == '5'):
line_updated.append(i) # If the pattern is Ok, append group to fixed line.
if len(line_updated) > 1: # If the fixed line is valid, write it to output file.
output.write(f"{' '.join(line_updated)}\n")

How can i edit several numbers/words in a txt file using python?

I want to rewrite a exisiting file with things like:
Tom A
Mike B
Jim C
to
Tom 1
Mike 2
Jim 3
The letters A,B,C can also be something else. Basicaly i want to keep the spaces between the names and what comes behind, but change them to numbers. Does someone have an idea please? Thanks a lot for your help.
I assume your first and second columns are separated by a tab (i.e. \t)?
If so, you can do this by reading the file into a list, use the split function to split each line of the file into components, edit the second component of each line, concatenate the two components back together with a tab separator and finally rewrite to a file.
For example, if test.txt is your input file:
# Create list that holds the desired output
output = [1,2,3]
# Open the file to be overwritten
with open('test.txt', 'r') as f:
# Read file into a list of strings (one string per line)
text = f.readlines()
# Open the file for writing (FYI this CLEARS the file as we specify 'w')
with open('test.txt', 'w') as f:
# Loop over lines (i.e. elements) in `text`
for i,item in enumerate(text):
# Split line into elements based on whitespace (default for `split`)
line = item.split()
# Concatenate the name and desired output with a tab separator and write to the file
f.write("%s\t%s\n" % (line[0],output[i]))
I assumed your first and second columns were separated by a spaces in the file.
You can read the file contents into a list and use the function replace_end(line,newline) and it will replace the end of the line with what you passed. then you can just write out the changed list back to the file.
""" rewrite a exisiting file """
def main():
""" main """
filename = "update_me.txt"
count = 0
lst = []
with open(filename, "r",encoding = "utf-8") as filestream:
_lines = filestream.readlines()
for line in _lines:
lst.insert(count,line.strip())
count += 1
#print(f"Line {count} {line.strip()}")
count = 0
# change the list
for line in lst:
lst[count] = replace_end(line,"ABC")
count +=1
count = 0
with open(filename, "w", encoding = "utf-8") as filestream:
for line in lst:
filestream.write(line+"\n")
count +=1
def replace_end(line,newline):
""" replace the end of a line """
return line[:-len(newline)] + newline
if __name__ == '__main__':
main()

Remove linebreak in csv

I have a CSV file that has errors. The most common one is a too early linebreak.
But now I don't know how to remove it ideally. If I read the line by line
with open("test.csv", "r") as reader:
test = reader.read().splitlines()
the wrong structure is already in my variable. Is this still the right approach and do I use a for loop over test and create a copy or can I manipulate directly in the test variable while iterating over it?
I can identify the corrupt lines by the semikolon, some rows end with a ; others start with it. So maybe counting would be an alternative way to solve it?
EDIT:
I replaced reader.read().splitlines() with reader.readlines() so I could handle the rows which end with a ;
for line in lines:
if("Foobar" in line):
line = line.replace("Foobar", "")
if(";\n" in line):
line = line.replace(";\n", ";")
The only thing that remains are rows that beginn with a ;
Since I need to go back one entry in the list
Example:
Col_a;Col_b;Col_c;Col_d
2021;Foobar;Bla
;Blub
Blub belongs in the row above.
Here's a simple Python script to merge lines until you have the desired number of fields.
import sys
sep = ';'
fields = 4
collected = []
for line in sys.stdin:
new = line.rstrip('\n').split(sep)
if collected:
collected[-1] += new[0]
collected.extend(new[1:])
else:
collected = new
if len(collected) < fields:
continue
print(';'.join(collected))
collected = []
This simply reads from standard input and prints to standard output. If the last line is incomplete, it will be lost.
The separator and the number of fields can be edited into the variables at the top; exposing these as command-line parameters left as an exercise.
If you wanted to keep the newlines, it would not be too hard to only strip a newline from the last fields, and use csv.writer to write the fields back out as properly quoted CSV.
This is how I deal with this. This function fixes the line if there are more columns than needed or if there is a line break in the middle.
Parameters of the function are:
message - content of the file - reader.read() in your case
columns - number of expected columns
filename - filename (I use it for logging)
def pre_parse(message, columns, filename):
parsed_message=[]
i =0
temp_line =''
for line in message.splitlines():
#print(line)
split = line.split(',')
if len(split) == columns:
parsed_message.append(line)
elif len(split) > columns:
print(f'Line {i} has been truncated in file {filename} - too much columns'))
split = split[:columns]
line = ','.join(split)
parsed_message.append(line)
elif len(split) < columns and temp_line =='':
temp_line = line.replace('\n','')
print(temp_line)
elif temp_line !='':
line = temp_line+line
if line.count(',') == columns-1:
print((f'Line {i} has been fixed in file {filename} - extra line feed'))
parsed_message.append(line)
temp_line =''
else:
temp_line=line.replace('\n', '')
i+=1
return parsed_message
make sure you use proper split character and proper line feed characer.

Delete paragraph containing string in python

I have a file that contains blocks of information beginning and ending with the same phrase:
# Info block
Info line 1
Info line 2
Internal problem
ENDOFPARAMETERPOINT
I am trying to write a python code that deletes the entire block beginning with # Info block and ending with ENDOFPARAMETERPOINT once it detects the phrase Internal problem.
finds = '# Info block\nInfo line 1\nInfo line 2\nInternal problem\nENDOFPARAMETERPOINT'
with open(filename,"r+") as fp:
pattern = re.compile(r'[,\s]+' + re.escape(finds) + r'[\s]+')
textdata = fp.read()
line = re.sub(pattern,'',textdata)
fp.seek(0)
fp.write(line)
This code only works for one line but not the entire paragraph. Any suggestions are appreciated.
EDIT:
The code that works now is:
with open(filename,"r+") as fp:
pattern = re.compile(re.escape(finds))
textdata = fp.read()
line = re.sub(pattern,'',textdata)
fp.seek(0)
fp.write(line)
fp.truncate()
Why can't you just use pattern = re.compile(re.escape(finds))?
You can use two lists start_indexes and stop_indexes which contain respectively the start index to remove from and the end index to remove to. Then you can merge the two lists with the 'zip' method to have a matrix where each row has the start index and the end index of the rows to be removed. For each of these rows in the matrix you can create a list with the lines corresponding to the range of values and then remove the values contained in this list from the original list.
In this example the text to be processed divided into lines is stored in vals.
vals = ['string', '#blabla', 'ciao', 'miao', 'bau', 'ENDOFPARAMETERPOINT', 'as']
start_indexes = []
stop_indexes = []
for index, line in enumerate(vals):
if line[0] == '#':
start_indexes.append(index)
elif line == 'ENDOFPARAMETERPOINT':
stop_indexes.append(index)
for start, stop in zip(start_indexes, stop_indexes):
values_to_remove = [vals[x] for x in range(start, stop+1)]
for v in values_to_remove:
vals.remove(v)

Remove entire row from CSV (if blank)

I have a data file from an instrument that outputs as a CSV. Reading the file and the corresponding columns are no issue, however, due to a slight change in instrumentation, the data file has changed and I'm not sure how to change my code to still read the file.
f = open('Rotator_050816.dat')
lines = f.readlines()
i = 0
while (lines[i]<>"[Data]\n"):
i+=1
i = i + 2
Temp = []; Field = []; Resistance1 = []; Resistance2 = [];
while(i<len(lines)):
data = lines[i].split(",")
Temp.append(float(data[3])
Field.append(float(data[4])
Resistance1.append(float[12])
Resistance2.append(float[13])
i+=1
Temp = np.array(Temp)
Field_T = np.array(Field)/10000.
Resistance1 = np.array(Resistance1)
Excitation1 = np.array(Excitation1)
This is a MWE from previous usage. This has no issue if the CSV file has no blank entries, however, if there are blank entries it presents a problem as then len(Resistance1) ≠ len(Temp) so they cannot be plotted correctly. So my data file now looks like this:
Example Data File
So I need to add lines of code that can read if a row for Res. Ch1 or Res. Ch2 is empty, and then skip that entire row for all variables before appending to the final set of data. This way len(Resistance1) = len(Temp) and each Res. Ch1 measurement matches up to the right Temperature.
1) Open the file in read-only mode and get all the lines
lines_in_my_file = []
with open("my_file.csv", "r") as my_file:
lines_in_my_file = my_file.readlines()
2) Open the file again, this time in write mode, and write all non-blank lines into the file:
with open("my_file.csv", "r") as my_file:
for line in lines_in_my_file:
if line.strip().strip(",") != ""
my_file.write(line)
Keep in mind, this will remove any line that's made up of just spaces, tabs, or commas. So any rows that look like these:
,,,, (this line has only commas)
(this line has only spaces)
\n (this line is just a newline character)
...will be deleted.
Here is my working solution that I have implemented:
while (i<len(lines)):
data = lines[i].split(",")
if float(data[4]) >30000 and float(data[4]) <50000:
Temp_II.append(float(data[3])) #As K
Field_II.append(float(data[4])) #As Oe
Position_II.append(float(data[5])) #As Degree
#loop for Resistivity1 column cleanup
if data[12]!= '':
Resistivity1_II.append(float(data[12]))
Temp1_II.append(float(data[3]))
#loop for Resistivity2 column cleanup
if data[13]!= '':
Resistivity2_II.append(float(data[13]))
Temp2_II.append(float(data[3]))
i+=1
Basically, this pairs up the Resistivity1 entries that are not blank with the corresponding Temperature entries and the same for Resistivity2.

Categories