python: find and replace numbers < 1 in text file - python

I'm pretty new to Python programming and would appreciate some help to a problem I have...
Basically I have multiple text files which contain velocity values as such:
0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00
etc for many lines...
What I need to do is convert all the values in the text file that are less than 1 (e.g. 0.137865E+00 above) to an arbitrary value of 0.100000E+01. While it seems pretty simple to replace specific values with the 'replace()' method and a while loop, how do you do this if you want to replace a range?
thanks

I think when you are beginning programming, it's useful to see some examples; and I assume you've tried this problem on your own first!
Here is a break-down of how you could approach this:
contents='0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00'
The split method works on strings. It returns a list of strings. By default, it splits on whitespace:
string_numbers=contents.split()
print(string_numbers)
# ['0.259515E+03', '0.235095E+03', '0.208262E+03', '0.230223E+03', '0.267333E+03', '0.217889E+03', '0.156233E+03', '0.144876E+03', '0.136187E+03', '0.137865E+00']
The map command applies its first argument (the function float) to each of the elements of its second argument (the list string_numbers). The float function converts each string into a floating-point object.
float_numbers=map(float,string_numbers)
print(float_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 0.13786499999999999]
You can use a list comprehension to process the list, converting numbers less than 1 into the number 1. The conditional expression (1 if num<1 else num) equals 1 when num is less than 1, otherwise, it equals num.
processed_numbers=[(1 if num<1 else num) for num in float_numbers]
print(processed_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 1]
This is the same thing, all in one line:
processed_numbers=[(1 if num<1 else num) for num in map(float,contents.split())]
To generate a string out of the elements of processed_numbers, you could use the str.join method:
comma_separated_string=', '.join(map(str,processed_numbers))
# '259.515, 235.095, 208.262, 230.223, 267.333, 217.889, 156.233, 144.876, 136.187, 1'

typical technique would be:
read file line by line
split each line into a list of strings
convert each string to the float
compare converted value with 1
replace when needed
write back to the new file
As I don't see you having any code yet, I hope that this would be a good start

def float_filter(input):
for number in input.split():
if float(number) < 1.0:
yield "0.100000E+01"
else:
yield number
input = "0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00"
print " ".join(float_filter(input))

import numpy as np
a = np.genfromtxt('file.txt') # read file
a[a<1] = 0.1 # replace
np.savetxt('converted.txt', a) # save to file

You could use regular expressions for parsing the string. I'm assuming here that the mantissa is never larger than 1 (ie, begins with 0). This means that for the number to be less than 1, the exponent must be either 0 or negative. The following regular expression matches '0', '.', unlimited number of decimal digits (at least 1), 'E' and either '+00' or '-' and two decimal digits.
0\.\d+E(-\d\d|\+00)
Assuming that you have the file read into variable 'text', you can use the regexp with the following python code:
result = re.sub(r"0\.\d*E(-\d\d|\+00)", "0.100000E+01", text)
Edit: Just realized that the description doesn't limit the valid range of input numbers to positive numbers. Negative numbers can be matched with the following regexp:
-0\.\d+E[-+]\d\d
This can be alternated with the first one using the (pattern1|pattern2) syntax which results in the following Python code:
result = re.sub(r"(0\.\d+E(-\d\d|\+00)|-0\.\d+E[-+]\d\d)", "0.100000E+00", subject)
Also if there's a chance that the exponent goes past 99, the regexp can be further modified by adding a '+' sign after the '\d\d' patterns. This allows matching digits ending in two OR MORE digits.

I've got the script working as I want now...thanks people.
When writing the list to a new file I used the replace method to get rid of the brackets and commas - is there a simpler way?
ftext = open("C:\\Users\\hhp06\\Desktop\\out.grd", "r")
otext = open("C:\\Users\\hhp06\\Desktop\\out2.grd", "w+")
for line in ftext:
stringnum = line.split()
floatnum = map(float, stringnum)
procnum = [(1.0 if num<1 else num) for num in floatnum]
stringproc = str(procnum)
s = (stringproc).replace(",", " ").replace("[", " ").replace("]", "")
otext.writelines(s + "\n")
otext.close()

Related

Python: split line by comma, then by space

I'm using Python 3 and I need to parse a line like this
-1 0 1 0 , -1 0 0 1
I want to split this into two lists using Fraction so that I can also parse entries like
1/2 17/12 , 1 0 1 1
My program uses a structure like this
from sys import stdin
...
functions'n'stuff
...
for line in stdin:
and I'm trying to do
for line in stdin:
X = [str(elem) for elem in line.split(" , ")]
num = [Fraction(elem) for elem in X[0].split()]
den = [Fraction(elem) for elem in X[1].split()]
but all I get is a list index out of range error: den = [Fraction(elem) for elem in X[1].split()]
IndexError: list index out of range
I don't get it. I get a string from line. I split that string into two strings at " , " and should get one list X containing two strings. These I split at the whitespace into two separate lists while converting each element into Fraction. What am I missing?
I also tried adding X[-1] = X[-1].strip() to get rid of \n that I get from ending the line.
The problem is that your file has a line without a " , " in it, so the split doesn't return 2 elements.
I'd use split(',') instead, and then use strip to remove the leading and trailing blanks. Note that str(...) is redundant, split already returns strings.
X = [elem.strip() for elem in line.split(",")]
You might also have a blank line at the end of the file, which would still only produce one result for split, so you should have a way to handle that case.
With valid input, your code actually works.
You probably get an invalid line, with too much space or even an empty line or so. So first thing inside the loop, print line. Then you know what's going on, you can see right above the error message what the problematic line was.
Or maybe you're not using stdin right. Write the input lines in a file, make sure you only have valid lines (especially no empty lines). Then feed it into your script:
python myscript.py < test.txt
How about this one:
pairs = [line.split(",") for line in stdin]
num = [fraction(elem[0]) for elem in pairs if len(elem) == 2]
den = [fraction(elem[1]) for elem in pairs if len(elem) == 2]

Python - Concatenate a variable into string format

I'm trying to retrieve the number from a file, and determine the padding of it, so I can apply it to the new file name, but with an added number. I'm basically trying to do a file saver sequencer.
Ex.:
fileName_0026
0026 = 4 digits
add 1 to the current number and keep the same amount of digit
The result should be 0027 and on.
What I'm trying to do is retrieve the padding number from the file and use the '%04d'%27 string formatting. I've tried everything I know (my knowledge is very limited), but nothing works. I've looked everywhere to no avail.
What I'm trying to do is something like this:
O=fileName_0026
P=Retrieved padding from original file (4)
CN=retrieve current file number (26)
NN=add 1 to current file number (27)
'%0 P d' % NN
Result=fileName_0027
I hope this is clear enough, I'm having a hard time trying to articulate this.
Thanks in advance for any help.
Cheers!
There's a few things going on here, so here's my approach and a few comments.
def get_next_filename(existing_filename):
prefix = existing_filename.split("_")[0] # get string prior to _
int_string = existing_filename.split("_")[-1].split(".")[0] # pull out the number as a string so we can extract an integer value as well as the number of characters
try:
extension = existing_filename.split("_")[-1].split(".")[-1] # check for extension
except:
extension = None
int_current = int(int_string) # integer value of current filename
int_new = int(int_string) + 1 # integer value of new filename
digits = len(int_string) # number of characters/padding in name
formatter = "%0"+str(digits)+"d" # creates a statement that int_string_new can use to create a number as a string with leading zeros
int_string_new = formatter % (int_new,) # applies that format
new_filename = prefix+"_"+int_string_new # put it all together
if extension: # add the extension if present in original name
new_filename += "."+extension
return new_filename
# since we only want to do this when the file already exists, check if it exists and execute function if so
our_filename = 'file_0026.txt'
while os.path.isfile(our_filename):
our_filename = get_next_filename(our_filename) # loop until a unique filename found
I am writing some hints to acheive that. It's unclear what exactly you wanna achieve?
fh = open("fileName_0026.txt","r") #Read a file
t= fh.read() #Read the content
name= t.split("_|.") #Output:: [fileName,0026,txt]
n=str(int(name[1])+1) #27
s= n.zfill(2) #0027
newName= "_".join([fileName,s])+".txt" #"fileName_0027.txt"
fh = open(newName,"w") #Write a new file*emphasized text*
Use the rjust function from string
O=fileName_0026
P=Retrieved padding from original file (4)
CN=retrieve current file number (26)
NN=add 1 to current file number (27)
new_padding = str(NN).rjust(P, '0')
Result=fileName_ + new_padding
import re
m = re.search(r".*_(0*)(\d*)", "filenName_00023")
print m.groups()
print("fileName_{0:04d}".format(int(m.groups()[1])+1))
{0:04d} means pad out to four digits wide with leading zeros.
As you can see there are a few ways to do this that are quite similar. But one thing the other answers haven't mention is that it's important to strip off any existing leading zeroes from your file's number string before converting it to int, otherwise it will be interpreted as octal.
edit
I just realised that my previous code crashes if the file number is zero! :embarrassed:
Here's a better version that also copes with a missing file number and names with multiple or no underscores.
#! /usr/bin/env python
def increment_filename(s):
parts = s.split('_')
#Handle names without a number after the final underscore
if not parts[-1].isdigit():
parts.append('0')
tail = parts[-1]
try:
n = int(tail.lstrip('0'))
except ValueError:
#Tail was all zeroes
n = 0
parts[-1] = str(n + 1).zfill(len(tail))
return '_'.join(parts)
def main():
for s in (
'fileName_0026',
'data_042',
'myfile_7',
'tricky_99',
'myfile_0',
'bad_file',
'worse_file_',
'_lead_ing_under_score',
'nounderscore',
):
print "'%s' -> '%s'" % (s, increment_filename(s))
if __name__ == "__main__":
main()
output
'fileName_0026' -> 'fileName_0027'
'data_042' -> 'data_043'
'myfile_7' -> 'myfile_8'
'tricky_99' -> 'tricky_100'
'myfile_0' -> 'myfile_1'
'bad_file' -> 'bad_file_1'
'worse_file_' -> 'worse_file__1'
'_lead_ing_under_score' -> '_lead_ing_under_score_1'
'nounderscore' -> 'nounderscore_1'
Some additional refinements possible:
An optional arg to specify the number to add to the current file
number,
An optional arg to specify the minimum width of the file
number string,
Improved handling of names with weird number / position of
underscores.

Codeeval Challenge not returning correct output. (Python)

So I started doing the challenges in codeeval and i'm stuck at an easy challenge called "word to digit"
This is the challenge description:
Having a string representation of a set of numbers you need to print
this numbers.
All numbers are separated by semicolon. There are up to 20 numbers in one line. The numbers are "zero" to "nine"
input sample:
zero;two;five;seven;eight;four
three;seven;eight;nine;two
output sample:
025784
37892
I have tested my code and it works, but in codeeval the output is always missing the last number from each line of words in the input file.
This is my code:
import sys
def WordConverter(x):
test=str()
if (x=="zero"):
test="0"
elif (x=="one"):
test="1"
elif (x=="two"):
test="2"
elif (x=="three"):
test="3"
elif (x=="four"):
test="4"
elif (x=="five"):
test="5"
elif (x=="six"):
test="6"
elif (x=="seven"):
test="7"
elif (x=="eight"):
test="8"
elif (x=="nine"):
test="9"
return (test)
t=str()
string=str()
test_cases=open(sys.argv[1],'r')
for line in test_cases:
string=line.split(";")
for i in range(0,len(string)):
t+=WordConverter(string[i])
print (t)
t=str()
Am I doing something wrong? Or is it a Codeeval bug?
You just need to remove the newline char from the input. Replace:
string=line.split(";")
With
string=line.strip().split(";")
However, using string as the variable name is not a good decision...
When iterating over the lines of a file with for line in test_cases:, each value of line will include the newline at the end of the line (if any). This results in the last element of string having a newline at the end, and so this value won't compare equal to anything in WordConverter, causing an empty string to be returned. You need to remove the newline from the string at some point.

Extracting columns having values >= 90%

I wrote this script to extract values from my .txt file that have >= 90 % identity. However, this program does not take into consideration values higher than 100.00 for example 100.05, why?
import re
output=open('result.txt','w')
f=open('file.txt','r')
lines=f.readlines()
for line in lines:
new_list=re.split(r'\t+',line.strip())
id_per=new_list[2]
if id_per >= '90':
new_list.append(id_per)
output.writelines(line)
f.close()
output.close()
Input file example
A 99.12
B 93.45
C 100.00
D 100.05
E 87.5
You should compare them as floats not strings. Something as follows:
import re
output=open('result.txt','w')
f=open('file.txt','r')
lines=f.readlines()
for line in lines:
new_list=re.split(r'\t+',line.strip())
id_per=new_list[2]
if float(id_per) >= 90.0:
new_list.append(id_per)
output.writelines(line)
f.close()
output.close()
This is because python compares is interpreting the numbers as strings even though you want them interpreted as numbers. For strings, python does the comparisons character by character using the ASCII or Unicode rules. This is why your code will not throw any error however it will not run the way you expect it to run using float rules rather than string rules.
As an alternative to #sshashank124's answer, you could use simple string manipulation if your lines have a simple format;
output=open('result.txt','w')
f=open('file.txt','r')
for line in f:
words = line.split()
num_per=words[1]
if float(num_per) >= 90:
new_list.append(num_per)
output.writelines(line)
f.close()
output.close()
Python is dynamicaly but strongly typed language. Therefore 90 and '90' are completely different things - one is integer number and other is a string.
You're comparing strings and in string comparison, '90' is "greater" than '100.05' (strings are compared characted by character and '9' is greater than '1').
So what you need to do is:
convert id_per to number (you'll want probably floats, as you care about decimal places)
compare it to number, i.e., 90, not a '90'
In code:
id_per = float(new_list[2])
if id_per >= 90:
You are using string comparison - lexically 100 is less than 90. I bet that it works for 950...
Get rid of the quotes around the '90'

convert string into int()

I have a dataset that looks like this:
0 _ _ 23.0186E-03
10 _ _51.283E-03
20 _ _125.573E-03
where the numbers are lined up line by line (the underscores represent spaces).
The numbers in the right hand column are currently part of the line's string. I am trying to convert the numbers on the right into numerical values (0.0230186 etc). I can convert them with int() once they are in a simple numerical form, but I need to change the "E"s to get there. If you know how to change it for any value of E such as E-01, E-22 it would be very helpful.
Currently my code looks like so:
fin = open( 'stringtest1.txt', "r" )
fout = open("stringtest2.txt", "w")
while 1:
x=fin.readline()
a=x[5:-1]
##conversion code should go here
if not x:
break
fin.close()
fout.close()
I would suggest the following for the conversion:
float(x.split()[-1])
str.split() will split on white space when no arguments are provided, and float() will convert the string into a number, for example:
>>> '20 125.573E-03'.split()
['20', '125.573E-03']
>>> float('20 125.573E-03'.split()[-1])
0.12557299999999999
You should use context handlers, and file handles are iterable:
with open('test1.txt') as fhi, open('test2.txt', 'w') as fho:
for line in fhi:
f = float(line.split()[-1])
fho.write(str(f))
If I understand what you want to do correctly, there's no need to do anything with the E's: in python float('23.0186E-03') returns 0.0230186, which I think is what you want.
All you need is:
fout = open("stringtest2.txt", "w")
for line in open('stringtest1.txt', "r"):
x = float(line.strip().split()[1])
fout.write("%f\n"%x)
fout.close()
Using %f in the output string will make sure the output will be in decimal notation (no E's). If you just use str(x), you may get E's in the output depending on the original value, so the correct conversion method depends on which output you want:
>>> str(float('23.0186E-06'))
'2.30186e-05'
>>> "%f"%float('23.0186E-06')
'0.000023'
>>> "%.10f"%float('23.0186E-06')
'0.0000230186'
You can add any number to %f to specify the precision. For more about string formatting with %, see http://rgruet.free.fr/PQR26/PQR2.6.html#stringMethods (scroll down to the "String formatting with the % operator" section).
float("20 _ _125.573E-03".split()[-1].strip("_"))

Categories