Parsing a file from first char in each line - python

I'm trying to group a file by the first character in each line of the file.
For example, the file:
s/1/1/2/3/4/5///6
p/22/LLL/GP/1/3//
x//-/-/-/1/5/-/-/
s/1/1/2/3/4/5///6
p/22/LLL/GP/1/3//
x//-/-/-/1/5/-/-/
I need to group everything starting with the first s/ up to the next s/. I don't think split() will work since it would remove the delimiter.
Desired end result:
s/1/1/2/3/4/5///6
p/22/LLL/GP/1/3//
x//-/-/-/1/5/-/-/
s/1/1/2/3/4/5///6
p/22/LLL/GP/1/3//
x//-/-/-/1/5/-/-/
I'd prefer to do this without the re module if possible (is it?)
Edit: Attempts:
The following gets me the values in groups using list comprehension:
with open('/file/path', 'r') as f:
content = f.read()
groups = ['s/' + group for group in content.split('s/')[1:]]
Since the s/ is the first character in the sequence, I use the [1:] to avoid having an element of just s/ in groups[0].
Is there a better way? Or is this the best?

Assuming the first line of the file starts with 's/' you could try something like this:
groups = []
with open('test.txt', 'r') as f:
for line in f:
if line.startswith('s/'):
groups.append('')
groups[-1] += line
To deal with files that don't start with 's/' and have the first element be all lines until the first 's/', we can make a small change and add in an empty string on the first line:
groups = []
with open('test.txt', 'r') as f:
for line in f:
if line.startswith('s/') or not groups:
groups.append('')
groups[-1] += line
Alternatively, if we want to skip lines until the first 's/', we can do the following:
groups = []
with open('test.txt', 'r') as f:
for line in f:
if line.startswith('s/'):
groups.append('')
if groups:
groups[-1] += line

Related

delete all rows up to a specific row

How you can implement deleting lines in a text document up to a certain line?
I find the line number using the code:
#!/usr/bin/env python
lookup = '00:00:00'
filename = "test.txt"
with open(filename) as text_file:
for num, line in enumerate(text_file, 1):
if lookup in line:
print(num)
print(num) outputs me the value of the string, for example 66.
How do I delete all the lines up to 66, i.e. up to the found line by word?
As proposed here with a small modification to your case:
read all lines of the file.
iterate the lines list until you reach the keyword.
write all remaining lines
with open("yourfile.txt", "r") as f:
lines = iter(f.readlines())
with open("yourfile.txt", "w") as f:
for line in lines:
if lookup in line:
f.write(line)
break
for line in lines:
f.write(line)
That's easy.
filename = "test.txt"
lookup = '00:00:00'
with open(filename,'r') as text_file:
lines = text_file.readlines()
res=[]
for i in range(0,len(lines),1):
if lookup in lines[i]:
res=lines[i:]
break
with open(filename,'w') as text_file:
text_file.writelines(res)
Do you know what lines you want to delete?
#!/usr/bin/env python
lookup = '00:00:00'
filename = "test.txt"
with open(filename) as text_file, open('okfile.txt', 'w') as ok:
lines = text_file.readlines()
ok.writelines(lines[4:])
This will delete the first 4 lines and store them in a different document in case you wanna keep the original.
Remember to close the files when you're done with them :)
Providing three alternate solutions. All begin with the same first part - reading:
filename = "test.txt"
lookup = '00:00:00'
with open(filename) as text_file:
lines = text_file.readlines()
The variations for the second parts are:
Using itertools.dropwhile which discards items from the iterator until the predicate (condition) returns False (ie discard while predicate is True). And from that point on, yields all the remaining items without re-checking the predicate:
import itertools
with open(filename, 'w') as text_file:
text_file.writelines(itertools.dropwhile(lambda line: lookup not in line, lines))
Note that it says not in. So all the lines before lookup is found, are discarded.
Bonus: If you wanted to do the opposite - write lines until you find the lookup and then stop, replace itertools.dropwhile with itertools.takewhile.
Using a flag-value (found) to determine when to start writing the file:
with open(filename, 'w') as text_file:
found = False
for line in lines:
if not found and lookup in line: # 2nd expression not checked once `found` is True
found = True # value remains True for all remaining iterations
if found:
text_file.write(line)
Similar to #c yj's answer, with some refinements - use enumerate instead of range, and then use the last index (idx) to write the lines from that point on; with no other intermediate variables needed:
for idx, line in enumerate(lines):
if lookup in line:
break
with open(filename, 'w') as text_file:
text_file.writelines(lines[idx:])

Python: Delete lines from except certain criteria

I am trying to delete lines from a file using specific criteria
The script i have seems to work but i have to add to many Or statements
Is there a way i can make an variable that holds all the criterias i would like to remove from the files?
Example code
with open("AW.txt", "r+", encoding='utf-8') as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "PPL"not in line.split() or "PPLX"not in line.split() or "PPLC"not in line.split():
f.write(line)
f.truncate()
I was more thinking in this way but it fails when i add multiple criterias
output = []
with open('AW.txt', 'r+', encoding='utf-8') as f:
lines = f.readlines()
criteria = 'PPL'
output =[line for line in lines if criteria not in line]
f.writelines(output)
Regards
You can use regular expressions to your rescue which will reduce the number of statements and checks in the code. If you have a list of criteria which can be dynamic, let's call the list of criteria crit_list, then the code would look like-
import re
with open("AW.txt", "r+", encoding='utf-8') as f:
new_f = f.readlines()
crit_list = ['PPL', 'PPLC', 'PPLX'] # Can use any number of criterions
obj = re.compile(r'%s' % ('|'.join(crit_list)))
out_lines = [line for line in new_f if not obj.search(line)]
f.truncate(0)
f.seek(0)
f.writelines(out_lines)
Use of regex makes it look different from how OP had posted. Let me explain the two lines containing the regex-
obj = re.compile(r'%s' % ('|'.join(crit_list)))
This line creates a regex object with the regular expression 'PPL|PPLX|PPLC' which means match at least one of these strings in the given line which can be thought of as a substitute for using as many ors in the code as there are criteria.
out_lines = [line for line in new_f if not obj.search(line)]
This statement means, search for the given criteria in the given line and if at least of them is found, preserve that line.
Hope that clears your doubts.
import re
output = []
with open('AW.txt', 'r+', encoding='utf-8') as f:
lines = f.readlines()
criteria = 'PPL'
output = re.sub("^.*[Crit1|Crit2|Crit3].*","")
f.writelines(output)
This will remove the lines. but it will not print them out in the writelines statement
your question was a little fuzzy, asking for lines to be deleted but then trying to write them out
add as many criteria as you want like this
You can get compare each list item with each criteria and get only those items that meet the criteria. Then simply get all lines which meet all the criterias.
For example, this can be done like (EDITED CODE):
with open('AW.txt', 'r+') as f:
lines = f.readlines()
criterias = ["PPL","PPLX","PPLC"]
conditioned_lines = [[line for criteria in criterias if criteria not in line] for line in lines]
output = [criteria_lines[0] for criteria_lines in conditioned_lines if len(criteria_lines) == len(criterias)]
f.truncate(0)
f.seek(0)
f.write(''.join(output))

I want to replace words from a file by the line no using python i have a list of line no?

if I have a file like:
Flower
Magnet
5001
100
0
and I have a list containing line number, which I have to change.
list =[2,3]
How can I do this using python and the output I expect is:
Flower
Most
Most
100
0
Code that I've tried:
f = open("your_file.txt","r")
line = f.readlines()[2]
print(line)
if line=="5001":
print "yes"
else:
print "no"
but it is not able to match.
i want to overwrite the file which i am reading
You may simply loop through the list of indices that you have to replace in your file (my original answer needlessly looped through all lines in the file):
with open('test.txt') as f:
data = f.read().splitlines()
replace = {1,2}
for i in replace:
data[i] = 'Most'
print('\n'.join(data))
Output:
Flower
Most
Most
100
0
To overwrite the file you have opened with the replacements, you may use the following:
with open('test.txt', 'r+') as f:
data = f.read().splitlines()
replace = {1,2}
for i in replace:
data[i] = 'Most'
f.seek(0)
f.write('\n'.join(data))
f.truncate()
The reason that you're having this problem is that when you take a line from a file opened in python, you also get the newline character (\n) at the end. To solve this, you could use the string.strip() function, which will automatically remove these characters.
Eg.
f = open("your_file.txt","r")
line = f.readlines()
lineToCheck = line[2].strip()
if(lineToCheck == "5001"):
print("yes")
else:
print("no")

how can i convert surname:name to name:surname? [duplicate]

In Python, calling e.g. temp = open(filename,'r').readlines() results in a list in which each element is a line from the file. However, these strings have a newline character at the end, which I don't want.
How can I get the data without the newlines?
You can read the whole file and split lines using str.splitlines:
temp = file.read().splitlines()
Or you can strip the newline by hand:
temp = [line[:-1] for line in file]
Note: this last solution only works if the file ends with a newline, otherwise the last line will lose a character.
This assumption is true in most cases (especially for files created by text editors, which often do add an ending newline anyway).
If you want to avoid this you can add a newline at the end of file:
with open(the_file, 'r+') as f:
f.seek(-1, 2) # go at the end of the file
if f.read(1) != '\n':
# add missing newline if not already present
f.write('\n')
f.flush()
f.seek(0)
lines = [line[:-1] for line in f]
Or a simpler alternative is to strip the newline instead:
[line.rstrip('\n') for line in file]
Or even, although pretty unreadable:
[line[:-(line[-1] == '\n') or len(line)+1] for line in file]
Which exploits the fact that the return value of or isn't a boolean, but the object that was evaluated true or false.
The readlines method is actually equivalent to:
def readlines(self):
lines = []
for line in iter(self.readline, ''):
lines.append(line)
return lines
# or equivalently
def readlines(self):
lines = []
while True:
line = self.readline()
if not line:
break
lines.append(line)
return lines
Since readline() keeps the newline also readlines() keeps it.
Note: for symmetry to readlines() the writelines() method does not add ending newlines, so f2.writelines(f.readlines()) produces an exact copy of f in f2.
temp = open(filename,'r').read().split('\n')
Reading file one row at the time. Removing unwanted chars from end of the string with str.rstrip(chars).
with open(filename, 'r') as fileobj:
for row in fileobj:
print(row.rstrip('\n'))
See also str.strip([chars]) and str.lstrip([chars]).
I think this is the best option.
temp = [line.strip() for line in file.readlines()]
temp = open(filename,'r').read().splitlines()
My preferred one-liner -- if you don't count from pathlib import Path :)
lines = Path(filename).read_text().splitlines()
This it auto-closes the file, no need for with open()...
Added in Python 3.5.
https://docs.python.org/3/library/pathlib.html#pathlib.Path.read_text
Try this:
u=open("url.txt","r")
url=u.read().replace('\n','')
print(url)
To get rid of trailing end-of-line (/n) characters and of empty list values (''), try:
f = open(path_sample, "r")
lines = [line.rstrip('\n') for line in f.readlines() if line.strip() != '']
You can read the file as a list easily using a list comprehension
with open("foo.txt", 'r') as f:
lst = [row.rstrip('\n') for row in f]
my_file = open("first_file.txt", "r")
for line in my_file.readlines():
if line[-1:] == "\n":
print(line[:-1])
else:
print(line)
my_file.close()
This script here will take lines from file and save every line without newline with ,0 at the end in file2.
file = open("temp.txt", "+r")
file2 = open("res.txt", "+w")
for line in file:
file2.writelines(f"{line.splitlines()[0]},0\n")
file2.close()
if you looked at line, this value is data\n, so we put splitlines()
to make it as an array and [0] to choose the only word data
import csv
with open(filename) as f:
csvreader = csv.reader(f)
for line in csvreader:
print(line[0])

generating list by reading from file

i want to generate a list of server addresses and credentials reading from a file, as a single list splitting from newline in file.
file is in this format
login:username
pass:password
destPath:/directory/subdir/
ip:10.95.64.211
ip:10.95.64.215
ip:10.95.64.212
ip:10.95.64.219
ip:10.95.64.213
output i want is in this manner
[['login:username', 'pass:password', 'destPath:/directory/subdirectory', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
i tried this
with open('file') as f:
credentials = [x.strip().split('\n') for x in f.readlines()]
and this returns lists within list
[['login:username'], ['pass:password'], ['destPath:/directory/subdir/'], ['ip:10.95.64.211'], ['ip:10.95.64.215'], ['ip:10.95.64.212'], ['ip:10.95.64.219'], ['ip:10.95.64.213']]
am new to python, how can i split by newline character and create single list. thank you in advance
You could do it like this
with open('servers.dat') as f:
L = [[line.strip() for line in f]]
print(L)
Output
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211', 'ip:10.95.64.215', 'ip:10.95.64.212', 'ip:10.95.64.219', 'ip:10.95.64.213']]
Just use a list comprehension to read the lines. You don't need to split on \n as the regular file iterator reads line by line. The double list is a bit unconventional, just remove the outer [] if you decide you don't want it.
I just noticed you wanted the list of ip addresses joined in one string. It's not clear as its off the screen in the question and you make no attempt to do it in your own code.
To do that read the first three lines individually using next then just join up the remaining lines using ; as your delimiter.
def reader(f):
yield next(f)
yield next(f)
yield next(f)
yield ';'.join(ip.strip() for ip in f)
with open('servers.dat') as f:
L2 = [[line.strip() for line in reader(f)]]
For which the output is
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
It does not match your expected output exactly as there is a typo 'destPath:/directory/subdirectory' instead of 'destPath:/directory/subdir' from the data.
This should work
arr = []
with open('file') as f:
for line in f:
arr.append(line)
return [arr]
You could just treat the file as a list and iterate through it with a for loop:
arr = []
with open('file', 'r') as f:
for line in f:
arr.append(line.strip('\n'))

Categories