Changing a text file and making a bigger text file in python - python

I have a tab separated text file like these example:
infile:
chr1 + 1071396 1271396 LOC
chr12 + 1101483 1121483 MIR200B
I want to divide the difference between columns 3 and 4 in infile into 100 and make 100 rows per row in infile and make a new file named newfile
and make the final tab separated file with 6 columns. The first 5 columns would be like infile, the 6th column would be (5th column)_part number (number is 1 to 100).
This is the expected output file:
expected output:
chr1 + 1071396 1073396 LOC LOC_part1
chr1 + 1073396 1075396 LOC LOC_part2
.
.
.
chr1 + 1269396 1271396 LOC LOC_part100
chr12 + 1101483 1101683 MIR200B MIR200B_part1
chr12 + 1101683 1101883 MIR200B MIR200B_part2
.
.
.
chr12 + 1121283 1121483 MIR200B MIR200B_part100
I wrote the following code to get the expected output but it does not return what I expect.
file = open('infile.txt', 'rb')
cont = []
for line in file:
cont.append(line)
newfile = []
for i in cont:
percent = (i[3]-i[2])/100
for j in percent:
newfile.append(i[0], i[1], i[2], i[2]+percent, i[4], i[4]_'part'percent[j])
with open('output.txt', 'w') as f:
for i in newfile:
for j in i:
f.write(i + '\n')
Do you know how to fix the problem?

Try this:
file = open('infile.txt', 'rb')
cont = []
for line in file:
cont.append(list(filter(lambda x: not x.isspace(), line.split(' ')))
newfile = []
for i in cont:
diff= (int(i[3])-int(i[2]))/100
left = i[2]
right = i[2] + diff
for j in range(100):
newfile.append(i[0], i[1], left, right, i[4], i[4]_'part' + j)
left = right
right = right + diff
with open('output.txt', 'w') as f:
for i in newfile:
for j in i:
f.write(i + '\n')
In your code for i in cont youre loop over the string and i is a char and not string.
To fix that i split the line and remove spaces.

Here are some suggestions:
when you open the file, open it as a text file, not a binary file.
open('infile.txt','r')
now, when you read it line by line, you should strip the newline character at the end by using strip(). Then, you need to split your input text line by tabs into a list of strings, vs a just a long string containing your line, by using split('\t'):
line.strip().split('\t')
now you have:
file = open('infile.txt', 'r')
cont = []
for line in file:
cont.append(line.strip().split('\t))
now cont is a list of lists, where each list contains your tab separated data. i.e.
cont[1][0] = 'chr12'.
You will probably able to take it from here.

Others have answered your question with respect to your own code, I thought I would leave my attempt at solving your problem here.
import os
directory = "C:/Users/DELL/Desktop/"
filename = "infile.txt"
path = os.path.join(directory, filename)
with open(path, "r") as f_in, open(directory+"outfile.txt", "w") as f_out: #open input and output files
for line in f_in:
contents = line.rstrip().split("\t") #split line into words stored as a string 'contents'
diff = (int(contents[3]) - int(contents[2]))/100
for i in range(100):
temp = (f"{contents[0]}\t+\t{int(int(contents[2])+ diff*i)}\t{contents[3]}\t{contents[4]}\t{contents[4]}_part{i+1}")
f_out.write(temp+"\n")
This code doesn't follow python style convention well (excessively long lines, for example) but it works. The line temp = ... uses fstrings to format the output string conveniently, which you could read more about here.

Related

Python function to multiple individual lines in text file

I am trying to write a function that can take every individual line in a txt file and multiply that line by 2 so that each integer in the text file is doubled. So far I was able to get the code to print. However, when I added the code (reading & reading_int) to convert the strings to integers the function is now not working. There are no errors in the code to tell me what I am doing wrong. I am not sure what is wrong with reading and reading_int that is making my function not work.
def mult_num3():
data=[]
w = open('file3.txt', 'r')
with w as f:
reading = f.read()
reading_int = [int(x) for x in reading.split()]
for line in f:
currentline = line[:-1]
data.append(currentline)
for i in data:
w.write(int(i)*2)
w.close()
file3.txt:
1
2
3
4
5
6
7
8
9
10
Desired output:
2
4
6
8
10
12
14
16
18
20
Problems with original code:
def mult_num3():
data=[]
w = open('file3.txt', 'r') # only opened for reading, not writing
with w as f:
reading = f.read() # reads whole file
reading_int = [int(x) for x in reading.split()] # unused variable
for line in f: # file is empty now
currentline = line[:-1] # not executed
data.append(currentline) # not executed
for i in data: # data is empty, so...
w.write(int(i)*2) # not executed, can't write an int if it did
# and file isn't writable.
w.close() # not necessary, 'with' will close it
Note that int() ignores leading and trailing whitespace so no need for .split() if only one number per line, and a format string (f-string) can format each line as needed by converting and doubling the value and adding a newline.
with open('file3.txt', 'r') as f:
data = [f'{int(line)*2}\n' for line in f]
with open('file3.txt', 'w') as f:
f.writelines(data)
I added a try except to check for not integer data. I dont konw your data. But maybe it helps you.
def mult_num3():
input = open('file3.txt', 'r')
output = open('script_out.txt', 'w')
with input as f:
for line in f:
for value in line.split():
try:
output.write(str(int(value) * 2) + " ")
except:
output.write(
"(" + str(value + ": is not an integer") + ") ")
output.write("\n")
output.close()

How to output ONLY new additions between files with Python Difflib?

I am comparing two text files using Difflib like so:
import difflib
new_file = open(file_name, "r")
old_file = open(old_file_name, "r")
file_difference = difflib.ndiff(old_file.readlines(), new_file.readlines())
My goal is to ONLY output additions. I do not want to know about changes to existing lines. However, I've run into a problem where all changes/additions are marked with "+ ", and all subtractions are marked with "- ". I've done a lot of searching, and it appears there's no way to differentiate a line that has been changed, and a line that is brand new. I am confused on how to proceed.
import csv
f1 = open(old_file_name, "r")
contents1 = f1.read()
f2 = open(file_name, "r",)
contents2 = f2.read()
for data in contents2:
if data not in contents1:
file = open(output_path, 'a', newline='')
# writing the data into the file
with file:
write = csv.writer(file)
write.writerows(data)
A great friend of mine provided a code snippet that answered my question:
# Open the files for comparison
with open(file_name, "r") as new_file:
with open(old_file_name, "r") as old_file:
# Find the differences between the two files
file_difference = difflib.ndiff(old_file.readlines(), new_file.readlines())
new_lines = []
file_difference = tuple(x for x in file_difference)
idx = 0
fdiff_size = len(file_difference)
while idx < fdiff_size:
line = file_difference[idx]
if line.startswith("- "):
if idx + 1 < fdiff_size and file_difference[idx + 1].startswith("? "):
# this chunk is a change, so ignore this and the next 3 lines
idx += 4
continue
elif line.startswith("+ "):
new_lines.append(line)
# always iterate after new item or no change
idx += 1

Counting to 100,000 and writing that to a file

I haven't used Python for a while but I decided to create a program today to help me with some work I am trying to do. I am trying to create a program that writes the numbers 1-100,000 with the symbol | after each but can't seem to strip the file after I create it so it shows like this: 1|2|3|4.
My Code:
a = 0
b = "|"
while a < 100000:
a += 1 # Same as a = a + 1
new = (a,b)
f = open("export.txt","a") #opens file with name of "export.txt"
f.write(str(new))
f.close()
infile = "export.txt"
outfile = "newfile.txt"
delete_list = ["(","," "'"]
fin = open(infile)
fout = open(outfile, "w+")
for line in fin:
for word in delete_list:
line = line.replace(word, "")
fout.write(line)
fin.close()
fout.close()
export.txt:
newfile.txt:
It looks like you're doing a lot of work unnecessarily.
If all you want is a file that has the numbers 0-99999 with | after each, you could do:
delim = "|"
with open('export.txt', 'w') as f:
for a in xrange(100):
f.write("%d%s" % (a, delim))
I'm not sure what the purpose of the second file is, but, in general, to open one file to read from and a second to write to, you could do:
with open('export.txt', 'r') as fi:
with open('newfile.txt', 'w') as fo:
for line in fi:
for word in line.split('|'):
print(word)
fo.write(word)
Note that there are no newlines in the original file, so for line in fi is actually reading the entire contents of "export.txt" -- this could cause issues.
Try this for writing your file:
numbers = []
for x in range(1,100001):
numbers.append(str(x))
f = open('export.txt', 'w')
f.write('|'.join(numbers))
f.close()

Remove Space from Particular line In Textfile Python

I have textfiles that have the date stored on line 7 of each file, formatted as such:
Date: 1233PM 14 MAY 00
I would like to search through each file and get the new line 7 to be formatted as such:
Date: 1233PM 14 MAY 2000
So, basically, I just need to stick a '20' in front of the last two digits in line seven.
Probably not the most difficult problem, but I have been having difficulty as textfile.readlines() reads everything into the first (textfile[0]) position.
You can read all the file, change the specified line then save it again:
arc = open('file_name.txt').readlines()[0].split('\r')
#Do what you want with the 7th line i.e. arc[6]
new_arc = open('file_name.txt','w')
for line in arc:
new_arc.write(line)
new_arc.write('\n')
new_arc.close()
Maybe this:
with open(filename, 'r') as f:
lines = f.readlines()
with open(filename, 'w') as f:
for idx, line in lines:
if idx == 7: # or 6
vals = line.split()
if len(vals[-1]) == 2:
vals[-1] = '20'+vals[-1]
line = ' '.join(vals)
f.write(line)
Try this:
# open file
f = open("file.txt" , 'rU+b')
lines = f.readlines()
# modify line 7
lines[6] = lines[6][:-2] + "20" + lines[6][-2:]
# return file pointer to the top so we can rewrite the file
f.seek(0)
f.truncate()
# write the file with new content
f.write(''.join(lines))
f.close

Python take the last number of each line in a text file and make it into a list

I just want the last number of each line.
with open(home + "/Documents/stocks/" + filePath , newline='') as f:
stockArray = (line.split(',') for line in f.readlines())
for line in stockArray:
List = line.pop()
#print(line.pop())
#print(', '.join(line))
else:
print("Finished")
I tried using the line.pop() to take the last element but it only takes it from one line? How can I get it from each line and store it in list?
You probably just want something like:
last_col = [line.split(',')[-1] for line in f]
For more complicated csv files, you might want to look into the csv module in the standard library as that will properly handle quoting of fields, etc.
my_list = []
with open(home + "/Documents/stocks/" + filePath , newline='') as f:
for line in f:
my_list.append(line[-1]) # adds the last character to the list
That should do it.
If you want to add the last element of a list from the file:
my_list = []
with open(home + "/Documents/stocks/" + filePath , newline='') as f:
for line in f:
my_list.append(line.split(',')[-1]) # adds the last character to the list

Categories