I want to find and replace floats with integers in several text files.
There is one float value per text file which I want to convert. It is always after a specific keyword and has to be multiplied by 10.000.
e.g. the float 1.5 should be turned into the integer 15.000
The other floats after 1.5 don't have to be changed though
def edit(file):
with open(file, 'r') as f:
filedata = f.read()
for line in filedata:
if "keyword" in line:
filedata = filedata.replace(re.search(r"\d+\.\d+", line).group(), str(10000*re.search(r"\d+\.\d+", line).group()))
with open(file, 'w') as f:
f.write(filedata)
I was trying to replace the the float using a regex. But this doesn't work
EXAMPLE FILE EXTRACT
abcdef 178 211 208 220
ghijkl 0 0 0 0
keyword 1.50 1.63 1.56 1.45
You can iterate over lines with lines = filedata.split("\n"). Be careful because filedata is a big string containing the whole file. When you did for line in filedata, you iterated over every character of the file...
I also used another way (without regex) to find numbers and change them.
def edit(file):
with open(file, "r") as f:
filedata = f.read()
lines = filedata.split("\n") # list of lines
for index, line in enumerate(lines):
if "keyword" in line:
words = line.split() # ['keyword', '1.50', '1.63', '1.56', '1.45']
for i, w in enumerate(words):
try:
# transform number to float, multiply by 10000
# then transform to integer, then back to string
new_word = str(int(float(w)*10000))
words[i] = new_word
except:
pass
lines[index] = " ".join(words)
new_data = "\n".join(lines) # store new data to overwrite file
with open(file, "w") as f: # open file with write permission
f.write(new_data) # overwrite the file with our modified data
edit("myfile.txt")
Output :
# myfile.txt
abcdef 178 211 208 220
ghijkl 0 0 0 0
keyword 15000 16299 15600 14500
EDIT : More Compact way
def edit(file):
with open(file, "r") as f:
filedata = f.read()
line = [x for x in filedata.split("\n") if "keyword" in x][0]
new_line = line
for word in line.split():
try: new_line = new_line.replace(word, str(int(float(word)*10000)))
except: pass
with open(file, "w") as f: # open file with write permission
f.write(filedata.replace(line, new_line)) # overwrite the file with our modified data
edit("myfile.txt")
When you find yourself using a regex inside a loop, you should compile it ouside of the loop.
Next, if you want to replace a value in a line, you should not search for it in the whole file.
Finally you must cast a string to a numeric type to operate on it. If you do not you will just repeat the string ('10' * 2 is '1010' not 20 nor '20')
Here is a possible improvement of your code:
def edit(file):
with open(file, 'r') as f:
rx = re.compile(r"\d+\.\d+") # compile the regex only once
filedata = f.readlines() # get a list of the lines of the file
for i, line in enumerate(filedata): # and enumerate them
if "keyword" in line:
val = re.search(r"\d+\.\d+", line).group() # split the complex line
newval = str(int(float(val) * 10000))
filedata[i] = line.replace(val, newval) # replace only the current line
break # no need to proceed further
with open(file, 'w') as f:
f.write(filedata)
Related
I am trying to write a function that can take every individual line in a txt file and multiply that line by 2 so that each integer in the text file is doubled. So far I was able to get the code to print. However, when I added the code (reading & reading_int) to convert the strings to integers the function is now not working. There are no errors in the code to tell me what I am doing wrong. I am not sure what is wrong with reading and reading_int that is making my function not work.
def mult_num3():
data=[]
w = open('file3.txt', 'r')
with w as f:
reading = f.read()
reading_int = [int(x) for x in reading.split()]
for line in f:
currentline = line[:-1]
data.append(currentline)
for i in data:
w.write(int(i)*2)
w.close()
file3.txt:
1
2
3
4
5
6
7
8
9
10
Desired output:
2
4
6
8
10
12
14
16
18
20
Problems with original code:
def mult_num3():
data=[]
w = open('file3.txt', 'r') # only opened for reading, not writing
with w as f:
reading = f.read() # reads whole file
reading_int = [int(x) for x in reading.split()] # unused variable
for line in f: # file is empty now
currentline = line[:-1] # not executed
data.append(currentline) # not executed
for i in data: # data is empty, so...
w.write(int(i)*2) # not executed, can't write an int if it did
# and file isn't writable.
w.close() # not necessary, 'with' will close it
Note that int() ignores leading and trailing whitespace so no need for .split() if only one number per line, and a format string (f-string) can format each line as needed by converting and doubling the value and adding a newline.
with open('file3.txt', 'r') as f:
data = [f'{int(line)*2}\n' for line in f]
with open('file3.txt', 'w') as f:
f.writelines(data)
I added a try except to check for not integer data. I dont konw your data. But maybe it helps you.
def mult_num3():
input = open('file3.txt', 'r')
output = open('script_out.txt', 'w')
with input as f:
for line in f:
for value in line.split():
try:
output.write(str(int(value) * 2) + " ")
except:
output.write(
"(" + str(value + ": is not an integer") + ") ")
output.write("\n")
output.close()
I have a tab separated text file like these example:
infile:
chr1 + 1071396 1271396 LOC
chr12 + 1101483 1121483 MIR200B
I want to divide the difference between columns 3 and 4 in infile into 100 and make 100 rows per row in infile and make a new file named newfile
and make the final tab separated file with 6 columns. The first 5 columns would be like infile, the 6th column would be (5th column)_part number (number is 1 to 100).
This is the expected output file:
expected output:
chr1 + 1071396 1073396 LOC LOC_part1
chr1 + 1073396 1075396 LOC LOC_part2
.
.
.
chr1 + 1269396 1271396 LOC LOC_part100
chr12 + 1101483 1101683 MIR200B MIR200B_part1
chr12 + 1101683 1101883 MIR200B MIR200B_part2
.
.
.
chr12 + 1121283 1121483 MIR200B MIR200B_part100
I wrote the following code to get the expected output but it does not return what I expect.
file = open('infile.txt', 'rb')
cont = []
for line in file:
cont.append(line)
newfile = []
for i in cont:
percent = (i[3]-i[2])/100
for j in percent:
newfile.append(i[0], i[1], i[2], i[2]+percent, i[4], i[4]_'part'percent[j])
with open('output.txt', 'w') as f:
for i in newfile:
for j in i:
f.write(i + '\n')
Do you know how to fix the problem?
Try this:
file = open('infile.txt', 'rb')
cont = []
for line in file:
cont.append(list(filter(lambda x: not x.isspace(), line.split(' ')))
newfile = []
for i in cont:
diff= (int(i[3])-int(i[2]))/100
left = i[2]
right = i[2] + diff
for j in range(100):
newfile.append(i[0], i[1], left, right, i[4], i[4]_'part' + j)
left = right
right = right + diff
with open('output.txt', 'w') as f:
for i in newfile:
for j in i:
f.write(i + '\n')
In your code for i in cont youre loop over the string and i is a char and not string.
To fix that i split the line and remove spaces.
Here are some suggestions:
when you open the file, open it as a text file, not a binary file.
open('infile.txt','r')
now, when you read it line by line, you should strip the newline character at the end by using strip(). Then, you need to split your input text line by tabs into a list of strings, vs a just a long string containing your line, by using split('\t'):
line.strip().split('\t')
now you have:
file = open('infile.txt', 'r')
cont = []
for line in file:
cont.append(line.strip().split('\t))
now cont is a list of lists, where each list contains your tab separated data. i.e.
cont[1][0] = 'chr12'.
You will probably able to take it from here.
Others have answered your question with respect to your own code, I thought I would leave my attempt at solving your problem here.
import os
directory = "C:/Users/DELL/Desktop/"
filename = "infile.txt"
path = os.path.join(directory, filename)
with open(path, "r") as f_in, open(directory+"outfile.txt", "w") as f_out: #open input and output files
for line in f_in:
contents = line.rstrip().split("\t") #split line into words stored as a string 'contents'
diff = (int(contents[3]) - int(contents[2]))/100
for i in range(100):
temp = (f"{contents[0]}\t+\t{int(int(contents[2])+ diff*i)}\t{contents[3]}\t{contents[4]}\t{contents[4]}_part{i+1}")
f_out.write(temp+"\n")
This code doesn't follow python style convention well (excessively long lines, for example) but it works. The line temp = ... uses fstrings to format the output string conveniently, which you could read more about here.
I have a text file containing these lines
wbwubddwo 7::a number1 234 **
/// 45daa;: number2 12
time 3:44
I am trying to print for example if the program find string number1, it will print 234
I start with simple script below but it did not print what I wanted.
with open("test.txt", "rb") as f:
lines = f.read()
word = ["number1", "number2", "time"]
if any(item in lines for item in word):
val1 = lines.split("number1 ", 1)[1]
print val1
This return the following result
234 **
/// 45daa;: number2 12
time 3:44
Then I tried changing f.read() to f.readlines() but this time it did not print out anything.
Does anyone know other way to do this? Eventually I want to get the value for each line for example 234, 12 and 3:44 and store it inside the database.
Thank you for your help. I really appreciate it.
Explanations given below:
with open("test.txt", "r") as f:
lines = f.readlines()
stripped_lines = [line.strip() for line in lines]
words = ["number1", "number2", "time"]
for a_line in stripped_lines:
for word in words:
if word in a_line:
number = a_line.split()[1]
print(number)
1) First of all 'rb' gives bytes object i.e something like b'number1 234' would be returned use 'r' to get string object.
2) The lines you read will be something like this and it will be stored in a list.
['number1 234\r\n', 'number2 12\r\n', '\r\n', 'time 3:44']
Notice the \r\n those specify that you have a newline. To remove use strip().
3) Take each line from stripped_lines and take each word from words
and check if that word is present in that line using in.
4)a_line would be number1 234 but we only want the number part. So split()
output of that would be
['number1','234'] and split()[1] would mean the element at index 1. (2nd element).
5) You can also check if the string is a digit using your_string.isdigit()
UPDATE: Since you updated your question and input file this works:
import time
def isTimeFormat(input):
try:
time.strptime(input, '%H:%M')
return True
except ValueError:
return False
with open("test.txt", "r") as f:
lines = f.readlines()
stripped_lines = [line.strip() for line in lines]
words = ["number1", "number2", "time"]
for a_line in stripped_lines:
for word in words:
if word in a_line:
number = a_line.split()[-1] if (a_line.split()[-1].isdigit() or isTimeFormat(a_line.split()[-1])) else a_line.split()[-2]
print(number)
why this isTimeFormat() function?
def isTimeFormat(input):
try:
time.strptime(input, '%H:%M')
return True
except ValueError:
To check if 3:44 or 4:55 is time formats. Since you are considering them as values too.
Final output:
234
12
3:44
After some try and error, I found a solution like below. This is based on answer provided by #s_vishnu
with open("test.txt", "r") as f:
lines = f.readlines()
stripped_lines = [line.strip() for line in lines]
for item in stripped_lines:
if "number1" in item:
getval = item.split("actual ")[1].split(" ")[0]
print getval
if "number2" in item:
getval2 = item.split("number2 ")[1].split(" ")[0]
print getval2
if "time" in item:
getval3 = item.split("number3 ")[1].split(" ")[0]
print getval3
output
234
12
3:44
This way, I can also do other things for example saving each data to a database.
I am open to any suggestion to further improve my answer.
You're overthinking this. Assuming you don't have those two asterisks at the end of the first line and you want to print out lines containing a certain value(s), you can just read the file line by line, check if any of the chosen values match and print out the last value (value between a space and the end of the line) - no need to parse/split the whole line at all:
search_values = ["number1", "number2", "time"] # values to search for
with open("test.txt", "r") as f: # open your file
for line in f: # read it it line by line
if any(value in line for value in search_values): # check for search_values in line
print(line[line.rfind(" ") + 1:].rstrip()) # print the last value after space
Which will give you:
234
12
3:44
If you do have asterisks you have to more precisely define your file format as splitting won't necessarily yield you your desired value.
I want to create a text file which contains positive/negative numbers separated by ','.
i want to read this file and put it in data = []. i have written the code below and i think that it works well.
I want to ask if you guys know a better way to do it or if is it well written
thanks all
#!/usr/bin/python
if __name__ == "__main__":
#create new file
fo = open("foo.txt", "w")
fo.write( "111,-222,-333");
fo.close()
#read the file
fo = open("foo.txt", "r")
tmp= []
data = []
count = 0
tmp = fo.read() #read all the file
for i in range(len(tmp)): #len is 11 in this case
if (tmp[i] != ','):
count+=1
else:
data.append(tmp[i-count : i])
count = 0
data.append(tmp[i+1-count : i+1])#append the last -333
print data
fo.close()
You can use split method with a comma as a separator:
fin = open('foo.txt')
for line in fin:
data.extend(line.split(','))
fin.close()
Instead of looping through, you can just use split:
#!/usr/bin/python
if __name__ == "__main__":
#create new file
fo = open("foo.txt", "w")
fo.write( "111,-222,-333");
fo.close()
#read the file
with open('foo.txt', 'r') as file:
data = [line.split(',') for line in file.readlines()]
print(data)
Note that this gives back a list of lists, with each list being from a separate line. In your example you only have one line. If your files will always only have a single line, you can just take the first element, data[0]
To get the whole file content(numbers positive and negative) into list you can use split and splitlines
file_obj = fo.read()#read your content into string
list_numbers = file_obj.replace('\n',',').split(',')#split on ',' and newline
print list_numbers
if I have a text file contains all english alphabets with some corresponding value like the following:
A 0.00733659550399
B 0.00454138879023
C 0.00279849519224
D 0.00312734304092
.
.
.
I want to assign these numeric values to each line I'm reading from another txt file.
L = open(os.path.join(dir, file), "r").read()
line = L.rstrip()
tokens = line.split()
for word in tokens:
for char in word:
find
Create a dictionary from the first file like this:
with open('values.txt') as f:
values = {k:v for k,v in (line.split() for line in f)}
Then iterate over each character of the data file and replace it with the corresponding value:
with open('A.txt') as infile, open('output.txt', 'w') as outfile:
for line in infile:
for c in line.rstrip():
print(values.get(c.upper(), '0'), file=outfile)
This code (assumes Python 3 or import of print function in Python 2) will write to output.txt the numeric values corresponding to the input characters, one per line. If there is no value for a character, 0 is output (that can be changed to whatever you want). Note that the incoming characters are converted to upper case because your sample looks like it might comprise upper case letters only. If there are separate values for lower case letters, then you can remove the call to upper().
If you would prefer the values to remain on the same line then you can alter the print() function call:
with open('A.txt') as infile, open('output.txt', 'w') as outfile:
for line in infile:
print(*(values.get(c.upper(), '0') for c in line.rstrip()), file=outfile)
Now the values will be space separated.
Is this what you're looking for ?
input.txt
AAB BBC ABC
keyvalue.txt
A 123
B 456
C 789
script.py
def your_func(input_file):
char_value = {}
with open('keyvalue.txt', 'r') as f:
for row in f:
char_value[row.split()[0]] = row.split()[1]
res = []
with open(input_file) as f:
for row in f:
for word in row.split():
for c in word:
# Little trick to append only if key exists
c in char_value and res.append(char_value[c])
return '*'.join(res)
print(your_func("input.txt"))
# >>> 123*123*456*456*456*789*123*456*789