Iterating on a file and comparing values using python - python

I have a section of code that opens files containing information with wavenumber and intensity like this:
500.21506 -0.00134
500.45613 0.00231
500.69720 -0.00187
500.93826 0.00129
501.17933 -0.00049
501.42040 0.00028
501.66147 0.00114
501.90253 -0.00036
502.14360 0.00247
My code attempts to parse the information between two given wavelengths: lowwav and highwav. I would like to print only the intensities of the wavenumbers that fall between lowwav and highwav. My entire code looks like:
import datetime
import glob
path = '/Users/140803/*'
files = glob.glob(path)
for line in open('sfit4.ctl', 'r'):
x = line.strip()
if x.startswith('band.1.nu_start'):
a,b = x.split('=')
b = float(b)
b = "{0:.3f}".format(b)
lowwav = b
if x.startswith('band.1.nu_stop'):
a,b = x.split('=')
b = float(b)
b = "{0:.3f}".format(b)
highwav = b
with open('\\_spec_final.t15', 'w') as f:
with open('info.txt', 'rt') as infofile:
for count, line in enumerate(infofile):
lat = float(line[88:94])
lon = float(line[119:127])
year = int(line[190:194])
month = int(line[195:197])
day = int(line[198:200])
hour = int(line[201:203])
minute = int(line[204:206])
second = int(line[207:209])
dur = float(line[302:315])
numpoints = float(line[655:660])
fov = line[481:497] # field of view?
sza = float(line[418:426])
snr = 0.0000
roe = 6396.2
res = 0.5000
lowwav = float(lowwav)
highwav = float(highwav)
spacebw = (highwav - lowwav)/ numpoints
d = datetime.datetime(year, month, day, hour, minute, second)
f.write('{:>12.5f}{:>12.5f}{:>12.5f}{:>12.5f}{:>8.1f}'.format(sza,roe,lat,lon,snr)) # line 1
f.write("\n")
f.write('{:>10d}{:>5d}{:>5d}{:>5d}{:>5d}{:>5d}'.format(year,month,day,hour,minute,second)) # line 2
f.write("\n")
f.write( ('{:%Y/%m/%d %H:%M:%S}'.format(d)) + "UT Solar Azimuth:" + ('{:>6.3f}'.format(sza)) + " Resolution:" + ('{:>6.4f}'.format(res)) + " Duration:" + ('{:>6.2f}'.format(dur))) # line 3
f.write("\n")
f.write('{:>21.13f}{:>26.13f}{:>24.17e}{:>12f}'.format(lowwav,highwav,spacebw,numpoints)) # line 4
f.write("\n")
with open(files[count], 'r') as g:
for line in g:
wave_no, tensity = [float(item) for item in line.split()]
if lowwav <= wave_no <= highwav :
f.write(str(tensity) + '\n')
g.close()
f.close()
infofile.close()
Right now, everything works fine except the last part where I compare wavelengths and print out the intensities corresponding to wavelengths between lowwav and highwav. No intensities are printing into the output file.

The problem is that when you iterate over the file g you are effectively moving its "file pointer". So the second loop finds the file at the beginning and doesn't produce any value.
Secondly, you are producing all these nums lists, but every iteration of the lop shadows the previous value, making it unreachable.
Either you want to collected all the values and then iterate on those:
with open(files[count], 'r') as g:
all_nums = []
for line in g:
all_nums.append([float(item) for item in line.split()])
for nums in all_nums:
if (lowwav - nums[0]) < 0 or (highwav - nums[0]) > 0 :
f.write(str(nums[1]))
f.write('\n')
else: break
Or just do everything inside the first loop (this should be more efficient):
with open(files[count], 'r') as g:
for line in g:
nums = [float(item) for item in line.split()]
if (lowwav - nums[0]) < 0 or (highwav - nums[0]) > 0 :
f.write(str(nums[1]))
f.write('\n')
else: break
Also note that the break statement will stop the processing of the values when the condition is false for the first time, you probably want to remove it.
This said, note that your code prints all values where nums[0] that either are bigger than lowwav, or smaller than highwav, which means that if lowwav < highwav every number value will be printed. You probably want to use and in place of or if you want to check whether they are between lowwav and highwav. Moreover in python you could just write lowwav < nums[0] < highwav for this.
I would personally use the following:
with open(files[count], 'r') as g:
for line in g:
wave_no, intensity = [float(item) for item in line.split()]
if lowwav < wave_no < highwav:
f.write(str(intensity)+'\n')

Split each line by white space, unpack the split list to two names wavelength and intensity.
[line.split() for line in r] makes
500.21506 -0.00134
500.45613 0.00231
to
[['500.21506', '-0.00134'], ['500.45613', '0.00231']]
This listcomp [(wavelength, intensity) for wavelength,intensity in lol if low <= float(wavelength) <= high] returns
[('500.21506', '-0.00134'), ('500.45613', '0.00231')]
If you join them back [' '.join((w, i)) for w,i in [('500.21506', '-0.00134'), ('500.45613', '0.00231')] you get ['500.21506 -0.00134', '500.45613 0.00231']
Use listcomp to filter out wavelength. And join wavelength and intensity back to string and write to file.
with open('data.txt', 'r') as r, open('\\_spec_final.t15', 'w') as w:
lol = (line.split() for line in r)
intensities = (' '.join((wavelength, intensity)) for wavelength,intensity in lol if low <= float(wavelength) <= high)
w.writelines(intensities)
If you want to output to terminal do print(list(intensities)) instead of w.writelines(intensities)
Contents of data.txt;
500.21506 -0.00134
500.45613 0.00231
500.69720 -0.00187
500.93826 0.00129
501.17933 -0.00049
501.42040 0.00028
501.66147 0.00114
501.90253 -0.00036
502.14360 0.00247
Output when low is 500 and high is 50`;
['500.21506 -0.00134', '500.45613 0.00231']

Related

Python - Read file I/O - Find average of each day temperature records

I have to write a Python function which records temperatures for different days. The temperature for the same day is stored on the same line.The first day is considered to be day 1, and each subsequent line of the file records the following days in sequential order (e.g. the 3rd line of data is collected from the 3rd day). If there was no data collected for a given day then the entire line will be blank. For example, The text file contains the following inputs for 6 days:
23 24.5
25
22.25 22.5
23.4
25.2 20.0
This file contains data collected for 6 days.
I am to define a function temp_record which takes a filename as a parameter. It reads the data from the parameter file and analyses the temperatures. The function should return a list of average temperatures per day. For example, the function returns the following list for the above text file:
[23.75, 25.0, 22.375, 0, 23.4, 22.6]
I wrote a code but it doesn't seem to work for all case types and I'm not sure what went wrong. Can someone help?
Here is the code I wrote:
def temp_record(filename):
input_file = open(filename,'r')
contents = input_file.read().split("\n")
sum_val = 0
lis = []
for string in contents:
split_str = string.split(" ")
for i in range(len(split_str)):
if split_str[i] == '':
split_str[i] = 0
else:
split_str[i] = float(split_str[i])
ans = (sum(split_str)/len(split_str))
if ans == 0.0:
ans = 0
lis.append(ans)
return lis
When you do contents = input_file.read().split("\n") you get an additional element in contents list that gets computed to 0.
You can fix this like this:
def temp_record(filename):
input_file = open(filename, 'r')
# read all lines
contents = input_file.readlines()
sum_val = 0
lis = []
for string in contents:
# lines end in \n use rstrip to remove it
split_str = string.rstrip().split(" ")
for i in range(len(split_str)):
if split_str[i] == '':
split_str[i] = 0
else:
split_str[i] = float(split_str[i])
ans = (sum(split_str) / len(split_str))
if ans == 0.0:
ans = 0
lis.append(ans)
return lis
but this can be much shorter:
def temp_record(filename):
result = []
with open(filename, 'r') as fp:
for line in fp:
temps = line.split()
avg_temp = sum(map(float, temps)) / len(temps) if temps else 0
result.append(avg_temp if avg_temp > 0 else 0)
return result
or even shorter if you want to play golfcode:
def temp_record2(filename):
with open(filename, 'r') as fp:
return list(map(lambda x: x if x > 0 else int(x), [sum(map(float, line.split())) / len(line.split()) if line.split() else 0 for line in fp]))
Perhaps the hidden test that fails is with an input like:
-1 1
0
30
The first two days do have recorded temperatures, but their average is 0. Following the format of using floats for all other averages, the average should be 0.0, not 0 (as that would imply no temperature was collected for the day, when in fact one was).
If this is the issue, this could be fixed:
def temp_record(filename):
input_file = open(filename,'r')
contents = input_file.read().split("\n")
sum_val = 0
lis = []
for string in contents:
split_str = string.split(" ")
for i in range(len(split_str)):
if split_str[i] == '':
split_str[i] = 0
else:
split_str[i] = float(split_str[i])
ans = (sum(split_str)/len(split_str))
if string == '':
ans = 0
lis.append(ans)
return lis

Check if duplicate exists and then append a unique digit to line?

INPUT.TXT looks like this -
pr-ec2_1034
pr-ec2_1023
pr-ec2_1099
I want to write a python script which will read this file & add +1 to the line with highest number and then print that line.
Desired output -
pr-ec2_1100
Right now I am able to add +1 to all lines like -
def increment_digits(string):
return ''.join([x if not x.isdigit() else str((int(x) + 1) % 10) for x in string])
with open('INPUT.txt', 'r') as file:
data = file.read()
print(increment_digits(data))
Output-
pr-ec3_2145
pr-ec3_2134
pr-ec3_2134
but this is not what I want. I want to find the line the with largest ending number in input.txt and add +1 to only to that one line after (last underscore)
pr-ec2_1100 is what I want
Something like this:
with open('input.txt') as f:
lines = [l.strip() for l in f.readlines()]
numbers = [int(l.split('_')[1]) for l in lines]
_max = max(numbers)
result = _max + 1
print('result: pr-ec2_{}'.format(result))
output
pr-ec2_1100

joining every 4th line in csv-file

I'd like to join every 4th line together so I thought something like this would work:
import csv
filename = "mycsv.csv"
f = open(filename, "rb")
new_csv = []
count = 1
for i, line in enumerate(file(filename)):
line = line.rstrip()
print line
if count % 4 == 0:
new_csv.append(old_line_1 + old_line_2 + old_line_3+line)
else:
old_line_1 = line[i-2]
old_line_2 = line[i-1]
old_line_3 = line
count += 1
print new_csv
But line[i-1] and line[i-2] does not take current line -1 and -2 as I thought. So how can I access current line -1 and -2?
The variable line contains only the line for the current iteration, so accessing line[i-1] will only give you one character within the current line. The other answer is probably the tersest way to put it but, building on your code, you could do something like this instead:
import csv
filename = "mycsv.csv"
with open(filename, "rb") as f:
reader = csv.reader(f)
new_csv = []
lines = []
for i, line in enumerate(reader):
line = line.rstrip()
lines.append(line)
if (i + 1) % 4 == 0:
new_csv.append("".join(lines))
lines = []
print new_csv
This should do as you require
join_every_n = 4
all_lines = [line.rstrip() for line in file(filename)] # note the OP uses some unknown func `file` here
transposed_lines = zip(*[all_lines[n::join_every_n] for n in range(join_every_n)])
joined = [''.join([l1,l2,l3,l4]) for (l1,l2,l3,l4) in transposed_lines]
likewise you could also do
joined = map(''.join, transposed_lines)
Explanation
This will return every i'th element in a your_list with an offset of n
your_list[n::i]
Then you can combine this across a range(4) to generate for every 4 lines in a list such that you get
[[line0, line3, ...], [line1, line4, ...], [line2, line6, ...], [line3, line7, ...]]
Then the transposed_lines is required to transpose this array so that it becomes like
[[line0, line1, line2, line3], [line4, line5, line6, line7], ...]
Now you can simple unpack and join each individual list element
Example
all_lines = map(str, range(100))
transposed_lines = zip(*[all_lines[n::4] for n in range(4)])
joined = [''.join([l1,l2,l3,l4]) for (l1,l2,l3,l4) in transposed_lines]
gives
['0123',
'4567',
'891011',
...

Index Error: Index out of bounds when using numpy in python

I have a code that works fine when I have small CSV's of data but errors out when I try to run large CSV's through it. In essence this code is supposed to place 3 CSV's worth of data into 3 separate dictionaries, combine those dictionaries into a master dictionary, and then preform arithmetic operations on dictionary. The input CSV's look something like this:
time A B C D
0 3 4 6 4
.001 4 6 7 8
.002 4 6 7 3
The code that I am using is the code displayed below. The error occurs within the lines 47 and 65 where I am try to preform arithmetic with the dictionary. Any explanation as to why this is going on is greatly appreciated.
import numpy
Xcoord = {}
time = []
with open ('Nodal_QuardnetsX2.csv', 'r') as f:
f.readline() # Skips first line
for line in f:
values = [s.strip()for s in line.split(',')]
Xcoord[values[0]] = map(float, values[1:])
time.append(values[0])
Ycoord = {}
with open ('Nodal_QuardnetsY2.csv', 'r') as f:
f.readline() # Skips first line
for line in f:
values = [s.strip()for s in line.split(',')]
Ycoord[values[0]] = map(float, values[1:])
Zcoord = {}
with open ('Nodal_QuardnetsZ2.csv', 'r') as f:
f.readline() # Skips first line
for line in f:
values = [s.strip()for s in line.split(',')]
Zcoord[values[0]] = map(float, values[1:])
# Create a master dictionary of the form {'key':[[x, y, z], [x, y, z]}
CoordCombo = {}
for key in Xcoord.keys():
CoordnateList = zip(Xcoord[key], Ycoord[key], Zcoord[key])
CoordCombo[key] = CoordnateList
counter = 0
keycount1 = 0
keycount2 = 0.001
difference = []
NodalDisplacements = {}
#Find the difference between the x, y, and z quardnets relative to that point in time
while keycount2 <= float(values[0]):
Sub = numpy.subtract(CoordCombo[str(keycount2)][counter], CoordCombo[str(keycount1)][counter])
counter = counter + 1
difference.append(Sub)
NodalDisplacements[keycount1] = Sub
keycount1 = keycount1 + 0.001
keycount2 = keycount2 + 0.001
counter = 0
keycount3 = 0
keycount4 = 0.001
Sum = []
breakpoint = float(values[0])-0.001
while keycount4 <= breakpoint:
Add = numpy.sum(NodalDisplacements[keycount4][counter], NodalDisplacements[keycount3][counter])
Sum.append(Add)
keycount3 = keycount3 + 0.001
keycount4 = keycount4 + 0.001
counter = counter + 1
if counter == 2:
counter = 0
print Sum
probably a line of your csv file does not contain 5 elements or the line is empty.
In your logic I would suggest to use
for line in f:
line = line.strip()
if not line: continue
if len(values) != N_COLS: continue # or error...
# other ...

Counting the number of character differences between two files

I have two somewhat large (~20 MB) txt files which are essentially just long strings of integers (only either 0,1,2). I would like to write a python script which iterates through the files and compares them integer by integer. At the end of the day I want the number of integers that are different and the total number of integers in the files (they should be exactly the same length). I have done some searching and it seems like difflib may be useful but I am fairly new to python and I am not sure if anything in difflib will count the differences or the number of entries.
Any help would be greatly appreciated! What I am trying right now is the following but it only looks at one entry and then terminates and I don't understand why.
f1 = open("file1.txt", "r")
f2 = open("file2.txt", "r")
fileOne = f1.readlines()
fileTwo = f2.readlines()
f1.close()
f2.close()
correct = 0
x = 0
total = 0
for i in fileOne:
if i != fileTwo[x]:
correct +=1
x += 1
total +=1
if total != 0:
percent = (correct / total) * 100
print "The file is %.1f %% correct!" % (percent)
print "%i out of %i symbols were correct!" % (correct, total)
Not tested at all, but look at this as something a lot easier (and more Pythonic):
from itertools import izip
with open("file1.txt", "r") as f1, open("file2.txt", "r") as f2:
data=[(1, x==y) for x, y in izip(f1.read(), f2.read())]
print sum(1.0 for t in data if t[1]) / len(data) * 100
You can use enumerate to check the chars in your strings that don't match
If all strings are guaranteed to be the same length:
with open("file1.txt","r") as f:
l1 = f.readlines()
with open("file2.txt","r") as f:
l2 = f.readlines()
non_matches = 0.
total = 0.
for i,j in enumerate(l1):
non_matches += sum([1 for k,l in enumerate(j) if l2[i][k]!= l]) # add 1 for each non match
total += len(j.split(","))
print non_matches,total*2
print non_matches / (total * 2) * 100. # if strings are all same length just mult total by 2
6 40
15.0

Categories