Python - Put new line into file at largest indent - python

Okay so my question involves using PYTHON language please - Nothing else.
Basically, I have a file with a layout such as below:
X
Y
Z
A
B
C
1
2
3
And I would look at that and say X, A and 1 mark a new step! (because their indent is larger than the previous line basically). So I want to put in a new line with text Step 1, Step 2 etc, for each new step.
Note: I care more about how to put the new line in at the correct place, than how to increase the value of N with each Step.
Note 2: The file will vary in row count, so I cannot simply use at line 3, 6, 9 etc.. That's useless to me unfortunately.

You may want a code something like -
start = 1
lines = ['Step' + str(start) + ':\n']
with open('file.txt','r') as inF:
prevspace = -1
for line in inF:
lspaces = len(line) - len(line.lstrip())
if lspaces > prevspace and prevspace != -1:
lines.append('Step' + str(start+1) + ':\n')
start = start + 1
lines.append(line)
prevspace = lspaces
else:
lines.append(line)
prevspace = lspaces
ifF.close()
with open('newfile.txt','w') as outF:
for line in lines:
outF.write(line)
outF.flush()
outF.close()

Related

Extract the index of largest number in different lines

I am writing a code for extracting specific lines from my file and then look for the maximum number, more specifically for its position (index).
So I start my code looking for the lines:
with open (filename,'r') as f:
lines = f.readlines()
for index, line in enumerate(lines):
if 'a ' in line:
x=(lines[index])
print(x)
So here from my code I got the lines I was looking for:
a 3 4 5
a 6 3 2
Then the rest of my code is looking for the maximum between the numbers and prints the index:
y = [float(item) for item in x.split()]
z=y.index(max(y[1:3]))
print(z)
now the code finds the index of the two largest numbers (so for 5 in the first line and 6 in the second):
3
1
But I want my code compare also the numbers between the two lines (so largest number between 3,4, 5,6,3,2), to have as output the index of the line, where is in the file the line containing the largest number (for example line 300) and the position in line (1).
Can you suggest to me some possible solutions?
You can try something like that.
max_value - list, where you can get max number, line and position
max_value = [0, 0, 0] # value, line, position
with open(filename, 'r') as f:
lines = f.readlines()
for index, line in enumerate(lines):
if 'a ' in line:
# get line data with digits
line_data = line.split(' ')[1:]
# check if element digit and bigger then max value - save it
for el_index, element in enumerate(line_data):
if element.isdigit() and int(element) > max_value[0]:
max_value = [int(element), index, el_index]
print(max_value)
Input data
a 3 4 5
a 6 3 2
Output data
# 6 - is max, 1 - line, 0 - position
[6, 1, 0]
You should iterate over every single line and keep track of the line number as well as the position of the items in that line all together. Btw you should run this with python 3.9+ (because of .startswith() method.)
with open(filename) as f:
lines = [line.rstrip() for line in f]
max_ = 0
line_and_position = (0, 0)
for i, line in enumerate(lines):
if line.startswith('a '):
# building list of integers for finding the maximum
list_ = [int(i) for i in line.split()[1:]]
for item in list_:
if item > max_:
max_ = item
# setting the line number and position in that line
line_and_position = i, line.find(str(item))
print(f'maximum number {max_} is in line {line_and_position[0] + 1} at index {line_and_position[1]}')
Input :
a 3 4 5
a 6 3 2
a 1 31 4
b 2 3 2
a 7 1 8
Output:
maximum number 31 is in line 3 at index 4
You can do it like below. I commented each line for explanation. This method differs from the others in that: using regex we are getting the current number and it's character position from one source. In other words, there is no going back into the line to find data after-the-fact. Everything we need comes on every iteration of the loop. Also, all the lines are filtered as they are received. Between the 2, having a stack of conditions is eliminated. We end up with 2 loops that get directly to the point and one condition to see if the requested data needs to be updated.
import re
with open(filename, 'r') as f:
#prime data
data = (0, 0, 0)
#store every line that starts with 'a' or blank line if it doesn't
for L, ln in enumerate([ln if ln[0] is 'a' else '' for ln in f.readlines()]):
#get number and line properties
for res in [(int(m.group('n')), L, m.span()[0]) for m in re.compile(r'(?P<n>\d+)').finditer(ln)]:
#compare new number with current max
if res[0] > data[0]:
#store new properties if greater
data = res
#print final
print('Max: {}, Line: {}, Position: {}'.format(*data))

python: how to count number in one file?

I need to write a Python program to read the values in a file, one per line, such as file: test.txt
1
2
3
4
5
6
7
8
9
10
Denoting these as j1, j2, j3, ... jn,
I need to sum the differences of consecutive values:
a=(j2-j1)+(j3-j2)+...+(jn-j[n-1])
I have example source code
a=0
for(j=2;j<=n;j++){
a=a+(j-(j-1))
}
print a
and the output is
9
If I understand correctly, the following equation;
a = (j2-j1) + (j3-j2) + ... + (jn-(jn-1))
As you iterate over the file, it will subtract the value in the previous line from the value in the current line and then add all those differences.
a = 0
with open("test.txt", "r") as f:
previous = next(f).strip()
for line in f:
line = line.strip()
if not line: continue
a = a + (int(line) - int(previous))
previous = line
print(a)
Solution (Python 3)
res = 0
with open("test.txt","r") as fp:
lines = list(map(int,fp.readlines()))
for i in range(1,len(lines)):
res += lines[i]-lines[i-1]
print(res)
Output: 9
test.text contains:
1
2
3
4
5
6
7
8
9
10
I'm not even sure if I understand the question, but here's my best attempt at solving what I think is your problem:
To read values from a file, use "with open()" in read mode ('r'):
with open('test.txt', 'r') as f:
-your code here-
"as f" means that "f" will now represent your file if you use it anywhere in that block
So, to read all the lines and store them into a list, do this:
all_lines = f.readlines()
You can now do whatever you want with the data.
If you look at the function you're trying to solve, a=(j2-j1)+(j3-j2)+...+(jn-(jn-1)), you'll notice that many of the values cancel out, e.g. (j2-j1)+(j3-j2) = j3-j1. Thus, the entire function boils down to jn-j1, so all you need is the first and last number.
Edit: That being said, please try and search this forum first before asking any questions. As someone who's been in your shoes before, I decided to help you out, but you should learn to reference other people's questions that are identical to your own.
The correct answer is 9 :
with open("data.txt") as f:
# set prev to first number in the file
prev = int(next(f))
sm = 0
# iterate over the remaining numbers
for j in f:
j = int(j)
sm += j - prev
# update prev
prev = j
print(sm)
Or using itertools.tee and zip:
from itertools import tee
with open("data.txt") as f:
a,b = tee(f)
next(b)
print(sum(int(j) - int(i) for i,j in zip(a, b)))

Iterating on a file and comparing values using python

I have a section of code that opens files containing information with wavenumber and intensity like this:
500.21506 -0.00134
500.45613 0.00231
500.69720 -0.00187
500.93826 0.00129
501.17933 -0.00049
501.42040 0.00028
501.66147 0.00114
501.90253 -0.00036
502.14360 0.00247
My code attempts to parse the information between two given wavelengths: lowwav and highwav. I would like to print only the intensities of the wavenumbers that fall between lowwav and highwav. My entire code looks like:
import datetime
import glob
path = '/Users/140803/*'
files = glob.glob(path)
for line in open('sfit4.ctl', 'r'):
x = line.strip()
if x.startswith('band.1.nu_start'):
a,b = x.split('=')
b = float(b)
b = "{0:.3f}".format(b)
lowwav = b
if x.startswith('band.1.nu_stop'):
a,b = x.split('=')
b = float(b)
b = "{0:.3f}".format(b)
highwav = b
with open('\\_spec_final.t15', 'w') as f:
with open('info.txt', 'rt') as infofile:
for count, line in enumerate(infofile):
lat = float(line[88:94])
lon = float(line[119:127])
year = int(line[190:194])
month = int(line[195:197])
day = int(line[198:200])
hour = int(line[201:203])
minute = int(line[204:206])
second = int(line[207:209])
dur = float(line[302:315])
numpoints = float(line[655:660])
fov = line[481:497] # field of view?
sza = float(line[418:426])
snr = 0.0000
roe = 6396.2
res = 0.5000
lowwav = float(lowwav)
highwav = float(highwav)
spacebw = (highwav - lowwav)/ numpoints
d = datetime.datetime(year, month, day, hour, minute, second)
f.write('{:>12.5f}{:>12.5f}{:>12.5f}{:>12.5f}{:>8.1f}'.format(sza,roe,lat,lon,snr)) # line 1
f.write("\n")
f.write('{:>10d}{:>5d}{:>5d}{:>5d}{:>5d}{:>5d}'.format(year,month,day,hour,minute,second)) # line 2
f.write("\n")
f.write( ('{:%Y/%m/%d %H:%M:%S}'.format(d)) + "UT Solar Azimuth:" + ('{:>6.3f}'.format(sza)) + " Resolution:" + ('{:>6.4f}'.format(res)) + " Duration:" + ('{:>6.2f}'.format(dur))) # line 3
f.write("\n")
f.write('{:>21.13f}{:>26.13f}{:>24.17e}{:>12f}'.format(lowwav,highwav,spacebw,numpoints)) # line 4
f.write("\n")
with open(files[count], 'r') as g:
for line in g:
wave_no, tensity = [float(item) for item in line.split()]
if lowwav <= wave_no <= highwav :
f.write(str(tensity) + '\n')
g.close()
f.close()
infofile.close()
Right now, everything works fine except the last part where I compare wavelengths and print out the intensities corresponding to wavelengths between lowwav and highwav. No intensities are printing into the output file.
The problem is that when you iterate over the file g you are effectively moving its "file pointer". So the second loop finds the file at the beginning and doesn't produce any value.
Secondly, you are producing all these nums lists, but every iteration of the lop shadows the previous value, making it unreachable.
Either you want to collected all the values and then iterate on those:
with open(files[count], 'r') as g:
all_nums = []
for line in g:
all_nums.append([float(item) for item in line.split()])
for nums in all_nums:
if (lowwav - nums[0]) < 0 or (highwav - nums[0]) > 0 :
f.write(str(nums[1]))
f.write('\n')
else: break
Or just do everything inside the first loop (this should be more efficient):
with open(files[count], 'r') as g:
for line in g:
nums = [float(item) for item in line.split()]
if (lowwav - nums[0]) < 0 or (highwav - nums[0]) > 0 :
f.write(str(nums[1]))
f.write('\n')
else: break
Also note that the break statement will stop the processing of the values when the condition is false for the first time, you probably want to remove it.
This said, note that your code prints all values where nums[0] that either are bigger than lowwav, or smaller than highwav, which means that if lowwav < highwav every number value will be printed. You probably want to use and in place of or if you want to check whether they are between lowwav and highwav. Moreover in python you could just write lowwav < nums[0] < highwav for this.
I would personally use the following:
with open(files[count], 'r') as g:
for line in g:
wave_no, intensity = [float(item) for item in line.split()]
if lowwav < wave_no < highwav:
f.write(str(intensity)+'\n')
Split each line by white space, unpack the split list to two names wavelength and intensity.
[line.split() for line in r] makes
500.21506 -0.00134
500.45613 0.00231
to
[['500.21506', '-0.00134'], ['500.45613', '0.00231']]
This listcomp [(wavelength, intensity) for wavelength,intensity in lol if low <= float(wavelength) <= high] returns
[('500.21506', '-0.00134'), ('500.45613', '0.00231')]
If you join them back [' '.join((w, i)) for w,i in [('500.21506', '-0.00134'), ('500.45613', '0.00231')] you get ['500.21506 -0.00134', '500.45613 0.00231']
Use listcomp to filter out wavelength. And join wavelength and intensity back to string and write to file.
with open('data.txt', 'r') as r, open('\\_spec_final.t15', 'w') as w:
lol = (line.split() for line in r)
intensities = (' '.join((wavelength, intensity)) for wavelength,intensity in lol if low <= float(wavelength) <= high)
w.writelines(intensities)
If you want to output to terminal do print(list(intensities)) instead of w.writelines(intensities)
Contents of data.txt;
500.21506 -0.00134
500.45613 0.00231
500.69720 -0.00187
500.93826 0.00129
501.17933 -0.00049
501.42040 0.00028
501.66147 0.00114
501.90253 -0.00036
502.14360 0.00247
Output when low is 500 and high is 50`;
['500.21506 -0.00134', '500.45613 0.00231']

Improving a python code reading files

I wrote a python script to treat text files.
The input is a file with several lines. At the beginning of each line, there is a number (1, 2, 3... , n). Then an empty line and the last line on which some text is written.
I need to read through this file to delete some lines at the beginning and some in the end (say number 1 to 5 and then number 78 to end). I want to write the remaining lines on a new file (in a new directory) and renumber the first numbers written on these lines (in my example, 6 would become 1, 7 2 etc.)
I wrote the following:
def treatFiles(oldFile,newFile,firstF, startF, lastF):
% firstF is simply an index
% startF corresponds to the first line I want to keep
% lastF corresponds to the last line I want to keep
numberFToDeleteBeginning = int(startF) - int(firstF)
with open(oldFile) as old, open(newFile, 'w') as new:
countLine = 0
for line in old:
countLine += 1
if countLine <= numberFToDeleteBeginning:
pass
elif countLine > int(lastF) - int(firstF):
pass
elif line.split(',')[0] == '\n':
newLineList = line.split(',')
new.write(line)
else:
newLineList = [str(countLine - numberFToDeleteBeginning)] + line.split(',')
del newLineList[1]
newLine = str(newLineList[0])
for k in range(1, len(newLineList)):
newLine = newLine + ',' + str(newLineList[k])
new.write(newLine)
if __name__ == '__main__':
from sys import argv
import os
os.makedirs('treatedFiles')
new = 'treatedFiles/' + argv[1]
treatFiles(argv[1], argv[2], newFile, argv[3], argv[4], argv[5])
My code works correctly but is far too slow (I have files of about 10Gb to treat and it's been running for hours).
Does anyone know how I can improve it?
I would get rid of the for loop in the middle and the expensive .split():
from itertools import islice
def treatFiles(old_file, new_file, index, start, end):
with open(old_file, 'r') as old, open(new_file, 'w') as new:
sliced_file = islice(old, start - index, end - index)
for line_number, line in enumerate(sliced_file, start=1):
number, rest = line.split(',', 1)
if number == '\n':
new.write(line)
else:
new.write(str(line_number) + ',' + rest)
Also, convert your three numerical arguments to integers before passing them into the function:
treatFiles(argv[1], argv[2], newFile, int(argv[3]), int(argv[4]), int(argv[5]))

Is there a way to use "for loop" in the specific range with strings?

Hi stackoverflow Users,
I am wondering how to use for loop with string.
For example,
There is a file (file.txt) like,
=====================
Initial Value
1 2 3
3 4 5
5 6 7
Middle Value <---From Here
3 5 6
5 8 8
6 9 8 <---To Here
Last Value
5 8 7
6 8 7
5 5 7
==================
I want to modify the section of the file only in "Middle Value" and write an output file
after modifying.
I think that if I use "if and for" statements, that might be solved.
I have thought a code like
with open('file.txt') as f, open('out.txt', 'w') as f2:
for line in f:
sp1 = line.split()
line = " ".join(sp1) + '\n'
if line == 'Middle':
"Do something until line == 'Last'"
I am stuck with "Do something until line == 'Last'" part.
Any comments are appreciated.
Thanks.
There are three basic approaches.
The first is to use a state machine. You could build a real state machine, but in this case the states and transitions are so trivial that it's simpler to fake it by just using a flag:
state = 0
for line in f:
sp1 = line.split()
line = " ".join(sp1) + '\n'
if state == 0:
if line == 'Middle\n':
state = 1
elif state == 1:
if line == 'Last\n':
state = 2
else:
# Thing you do until line == 'Last\n'
else:
# nothing to do after Last, so you could leave it out
Note that I checked for 'Middle\n', not 'Middle'. If you look at the way you build line above, there's no way it could match the latter, because you always add '\n'. But also note than in your sample data, the line is 'Middle Value\n', not 'Middle', so if that's true in your real data, you have to deal with that here. Whether that's line == 'Middle Value\n', line.startswith('Middle'), or something else depends on your actual data, which only you know about.
Alternatively, you can just break it into loops:
for line in f:
sp1 = line.split()
line = " ".join(sp1) + '\n'
if line == 'Middle\n':
break
for line in f:
sp1 = line.split()
line = " ".join(sp1) + '\n'
if line == 'Last\n':
break
else:
# Thing you do until line == 'Last\n'
for line in f:
# Nothing to do here, so you could leave the loop out
There are variations on this one as well. For example:
lines = (" ".join(line.split()) + '\n' for line in f)
lines = dropwhile(lambda line: line != 'Middle', lines)
middle = takewhile(lambda line: line != 'End', lines)
for line in middle:
# Thing you want to do
Finally, you can split up the file before turning it into lines, instead of after. This is harder to do iteratively, so let's just read the whole file into memory to show the idea:
contents = f.read()
_, _, rest = contents.partition('\nMiddle\n')
middle, _, _ = rest.partition('\nEnd')
for line in middle.splitlines():
# Thing you want to do
If reading the whole file into memory wastes too much space or takes too long before you get going, mmap is your friend.
I would just code the process as a simple FSM (a Finite-State Machine or more specifically an event-driven Finite-state machine):
with open('file.txt') as f, open('out.txt', 'w') as f2:
state = 1
for line in f:
if line == 'Middle Value\n':
state = 2
continue # unless there's something to do upon entering the state
elif line == 'Last Value\n': # might want to just test for blank line `\n'
state = 3
continue # unless there's something to do upon entering the state
# otherwise process to line based on the current value of "state"
if state == 1: # before 'Middle Value' has been seen
pass
elif state == 2: # after 'Middle Value' has been seen
pass
else: # after 'Last Value' (or a blank line after
pass # 'Middle Value') has been seen
Just replace the pass statements with whatever is appropriate to do at that point of reading the input file.
In your if line == 'Middle': you could flip a boolean flag that allows you to enter another if inMiddle and line !=last` statement where you can then modify your numbers
You can replace your for loop with this.
inMiddle = false
for line in f:
sp1 = line.split()
line = "".join(sp1) + '\n'
if line == 'Middle':
inMiddle = true
if inMiddle and line != 'Last':
#MODIFY YOUR NUMBERS HERE
elif line == 'Last':
inMiddle = false
Forgive me as I access files a bit differently
with open('file.txt') as f:
file_string = f.read()
middle_to_end = file_string.split('Middle Value\n')[-1]
just_middle = middle_to_end.split('Last Value\n')[0]
middle_lines = just_middle.splitlines()
for line in middle_lines:
do_something
Basically you are setting a flag to say you are 'in' the section'. Below I optionally set a different flag when finished. You could bail out when flag is 2 for example.
with open('file.txt') as f, open('out.txt', 'w') as f2:
section = 0;
for line in f:
if line.startswith("Middle"):
section = 1
elif line.startswith("Last"):
section = 2
if section == 1:
#collect digits and output to other file
f2.write(line)
elif section == 2:
#close file and break out
f.close()
f2.close()
break
else:
continue

Categories