1.) I'm trying to take input from a file specified as the first argument on the command line. (working)
2.) Remove all line starting with "#" (working)
3.) Sorting the remaining lines A-Z (Not-working)
4.) Writes its output to a file named for the input file with the current time appended. (working)
How can I get point 3 working?
import sys
from datetime import *
arg = sys.argv[1]
out_file = str(arg) + "." + datetime.now().strftime("%H%M")
with open(sys.argv[1], 'r') as fin, open((out_file), 'w') as fout:
for i, line in enumerate(fin):
if i == 0 or not line.lstrip().startswith('#'):
# line = sorted(out_file())
fout.write(line)
It should work like this
import sys
from datetime import *
arg = sys.argv[1]
out_file = str(arg) + "." + datetime.now().strftime("%H%M")
result = []
with open(sys.argv[1], 'r') as fin, open((out_file), 'w') as fout:
for i, line in enumerate(fin):
if i == 0 or not line.lstrip().startswith('#'):
result.append(line)
result = sorted(result)
for line in result:
fout.write(line)
What you did wrong in your example is, that you tried to sort a single line, instead of all lines.
I suggest you using either the method fin.readlines() or the instruction list(fin) which returns a list including all the lines of the file. Then you can use the sorted built-in methods to sort this list (or do anything you want to filter this list) before writing in the output file.
Here is an example of solution, very close to your original code:
import sys
from datetime import *
arg = sys.argv[1]
out_file = str(arg) + "." + datetime.now().strftime("%H%M")
with open(sys.argv[1], 'r') as fin, open((out_file), 'w') as fout:
lines = sorted(list(fin))
for i, line in enumerate(lines):
if i == 0 or not line.lstrip().startswith('#'):
fout.write(line)
I can also suggest you this solution:
import sys
from datetime import *
arg = sys.argv[1]
out_file = str(arg) + "." + datetime.now().strftime("%H%M")
with open(sys.argv[1], 'r') as fin, open((out_file), 'w') as fout:
lines = list(fin) # Get the list of all lines
fout.write(lines.pop(0)) # Write the 1st line (and remove from list)
lines = sorted(lines) # Sort A-Z
lines = [s for s in lines if not s.lstrip().startswith('#')] # Remove lines starting with '#'
fout.write(''.join(lines)) # Write the lines
This version of the code:
writes the first line by advancing the fin iterator
uses the filter built-in function to remove the lines starting with
'#'1
the first line is not included in the filter, because the fin iterator has moved past it
sorts the filtered lines, to minimise the amount of sorting to be done2
with open(sys.argv[1], 'r') as fin, open((out_file), 'w') as fout:
fout.write(next(fin))
filtered = filter(lambda x: not x.lstrip().startswith('#'), fin)
lines = sorted(filtered)
fout.writelines(lines)
1 You could use a generator expression or a list comprehension here instead of filter
2The code assumes that every line ends with a newline
Related
Say I have a file my_file, and I want to search for a certain word x on every line of the file, and if the word exists, attach my variable y to the left and right side of the word. Then I want replace the old line with the new, modified line in my_new_file. How do I do this? So far I have:
output = open(omy_new_file, "w")
for line in open(my_file):
if (" " + x + "") in line:
You can try this:
y = "someword"
x = "target_string"
lines = [i.strip('\n') for i in open('filename.txt')]
final_lines = ["{}{}{}".format(y, i, y) if x in i else i for i in lines]
f = open(omy_new_file, "w")
for i in final_lines:
f.write("{}\n".format(i))
f.close()
with open('inputfile.txt', 'r') as infile:
with open('outfile.txt', 'w') as outfile:
for line in infile.readlines():
outfile.write(line.replace('string', y + 'string' + y)
Try This:
with open("my_file", "r") as my_file:
raw_data = my_file.read()
# READ YOUR FILE
new_data = raw_data.split("\n")
for line in new_data:
if "sd" in line:
my_new_line = "y" + line + "y"
raw_data = raw_data.replace(line, my_new_line)
print(raw_data)
It's tough to replace a line in a file while reading it, for the same reason that it's tough to safely modify a list as you iterate over it.
It's much better to read through the file, collect a list of lines, then overwrite the original. If the file is particularly large (such that it would be infeasible to hold it all in memory at once), you can write to disk twice.
import tempfile
y = "***"
your_word = "Whatever you're filtering by"
with tempfile.TemporaryFile(mode="w+") as tmpf:
with open(my_file, 'r') as f:
for line in f:
if your_word in line:
line = f"{y}{line.strip()}{y}\n"
tmpf.write(line) # write to the temp file
tmpf.seek(0) # move back to the beginning of the tempfile
with open(my_file, 'w') as f:
for line in tmpf: # reading from tempfile now
my_file.write(line)
Im trying to delete a specific line (10884121) in a text file that is about 30 million lines long. This is the method I first attempted, however, when I execute it runs for about 20 seconds then gives me a "memory error". Is there a better way to do this? Thanks!
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
with open(f_in, 'r') as fin:
with open(f_out, 'w') as fout:
linenums = [10884121]
s = [y for x, y in enumerate(fin) if x not in [line - 1 for line in linenums]]
fin.seek(0)
fin.write(''.join(s))
fin.truncate(fin.tell())
First of all, you were not using the imports; you were trying to write to the input file, and your code read the whole file into memory.
Something like this might do the trick with less hassle - we read line by line,
use enumerate to count the line numbers; and for each line we write it to output if its number is not in the list of ignored lines:
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
ignored_lines = [10884121]
with open(f_in, 'r') as fin, open(f_out, 'w') as fout:
for lineno, line in enumerate(fin, 1):
if lineno not in ignored_lines:
fout.write(line)
Please try to use:
import fileinput
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
f = open(f_out,'w')
counter=0
for line in fileinput.input([f_in]):
counter=counter+1
if counter != 10884121:
f.write(line) # python will convert \n to os.linesep, maybe you need to add a os.linesep, check
f.close() # you can omit in most cases as the destructor will call it
There are high chances that you run out of memory since you are trying to store file into list.
Try this below:
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
_fileOne = open(f_in,'r')
_fileTwo = open(f_out,'w')
linenums = set([10884121])
for lineNumber, line in enumerate(_fileOne):
if lineNumber not in linenums:
_fileTwo.writeLine(line)
_fileOne.close()
_fileTwo.close()
Here we are reading file line by line and excluding lines which are not needed, this may not run out of memory.
You can also try reading file using buffering.
Hope this helps.
How about a generic file filter function?
def file_filter(file_path, condition=None):
"""Yield lines from a file if condition(n, line) is true.
The condition parameter is a callback that receives two
parameters: the line number (first line is 1) and the
line content."""
if condition is None:
condition = lambda n, line: True
with open(file_path) as source:
for n, line in enumerate(source):
if condition(n + 1, line):
yield line
open(f_out, 'w') as destination:
condition = lambda n, line: n != 10884121
for line in file_filter(f_in, condition):
destination.write(line)
So the text file I have is formatted something like this:
a
b
c
I know how to strip() and rstrip() but I want to get rid of the empty lines.
I want to make it shorter like this:
a
b
c
You could remove all blank lines (lines that contain only whitespace) from stdin and/or files given at the command line using fileinput module:
#!/usr/bin/env python
import sys
import fileinput
for line in fileinput.input(inplace=True):
if line.strip(): # preserve non-blank lines
sys.stdout.write(line)
You can use regular expressions :
import re
txt = """a
b
c"""
print re.sub(r'\n+', '\n', txt) # replace one or more consecutive \n by a single one
However, lines with spaces won't be removed. A better solution is :
re.sub(r'(\n[ \t]*)+', '\n', txt)
This way, wou will also remove leading spaces.
Simply remove any line that only equals "\n":
in_filename = 'in_example.txt'
out_filename = 'out_example.txt'
with open(in_filename) as infile, open(out_filename, "w") as outfile:
for line in infile.readlines():
if line != "\n":
outfile.write(line)
If you want to simply update the same file, close and reopen it to overwrite it with the new data:
filename = 'in_example.txt'
filedata = ""
with open(filename, "r") as infile:
for line in infile.readlines():
if line != "\n":
filedata += line
with open(filename, "w") as outfile:
outfile.write(filedata)
I have two different files and I want to compare theirs contents line by line, and write their common contents in a different file. Note that both of them contain some blank spaces.
Here is my pseudo code:
file1 = open('some_file_1.txt', 'r')
file2 = open('some_file_2.txt', 'r')
FO = open('some_output_file.txt', 'w')
for line1 in file1:
for line2 in file2:
if line1 == line2:
FO.write("%s\n" %(line1))
FO.close()
file1.close()
file2.close()
However, by doing this, I got lots of blank spaces in my FO file. Seems like common blank spaces are also written. I want to write only the text part. Can somebody please help me.
For example: my first file (file1) contains data:
Config:
Hostname = TUVALU
BT:
TS_Ball_Update_Threshold = 0.2
BT:
TS_Player_Search_Radius = 4
BT:
Ball_Template_Update = 0
while second file (file2) contains data:
Pole_ID = 2
Width = 1280
Height = 1024
Color_Mode = 0
Sensor_Scale = 1
Tracking_ROI_Size = 4
Ball_Template_Update = 0
If you notice, last two lines of each files are the same, hence, I want to write this file in my FO file. But, the problem with my approach is that, it writes the common blank space also. Should I use regex for this problem? I do not have experience with regex.
This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:
with open('some_file_1.txt', 'r') as file1:
with open('some_file_2.txt', 'r') as file2:
same = set(file1).intersection(file2)
same.discard('\n')
with open('some_output_file.txt', 'w') as file_out:
for line in same:
file_out.write(line)
Yet another example...
from __future__ import print_function #Only for Python2
with open('file1.txt') as f1, open('file2.txt') as f2, open('outfile.txt', 'w') as outfile:
for line1, line2 in zip(f1, f2):
if line1 == line2:
print(line1, end='', file=outfile)
And if you want to eliminate common blank lines, just change the if statement to:
if line1.strip() and line1 == line2:
.strip() removes all leading and trailing whitespace, so if that's all that's on a line, it will become an empty string "", which is considered false.
If you are specifically looking for getting the difference between two files, then this might help:
with open('first_file', 'r') as file1:
with open('second_file', 'r') as file2:
difference = set(file1).difference(file2)
difference.discard('\n')
with open('diff.txt', 'w') as file_out:
for line in difference:
file_out.write(line)
If order is preserved between files you might also prefer difflib. Although Robᵩ's result is the bona-fide standard for intersections you might actually be looking for a rough diff-like:
from difflib import Differ
with open('cfg1.txt') as f1, open('cfg2.txt') as f2:
differ = Differ()
for line in differ.compare(f1.readlines(), f2.readlines()):
if line.startswith(" "):
print(line[2:], end="")
That said, this has a different behaviour to what you asked for (order is important) even though in this instance the same output is produced.
Once the file object is iterated, it is exausted.
>>> f = open('1.txt', 'w')
>>> f.write('1\n2\n3\n')
>>> f.close()
>>> f = open('1.txt', 'r')
>>> for line in f: print line
...
1
2
3
# exausted, another iteration does not produce anything.
>>> for line in f: print line
...
>>>
Use file.seek (or close/open the file) to rewind the file:
>>> f.seek(0)
>>> for line in f: print line
...
1
2
3
Try this:
from __future__ import with_statement
filename1 = "G:\\test1.TXT"
filename2 = "G:\\test2.TXT"
with open(filename1) as f1:
with open(filename2) as f2:
file1list = f1.read().splitlines()
file2list = f2.read().splitlines()
list1length = len(file1list)
list2length = len(file2list)
if list1length == list2length:
for index in range(len(file1list)):
if file1list[index] == file2list[index]:
print file1list[index] + "==" + file2list[index]
else:
print file1list[index] + "!=" + file2list[index]+" Not-Equel"
else:
print "difference inthe size of the file and number of lines"
I have just been faced with the same challenge, but I thought "Why programming this in Python if you can solve it with a simple "grep"?, which led to the following Python code:
import subprocess
from subprocess import PIPE
try:
output1, errors1 = subprocess.Popen(["c:\\cygwin\\bin\\grep", "-Fvf" ,"c:\\file1.txt", "c:\\file2.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
output2, errors2 = subprocess.Popen(["c:\\cygwin\\bin\\grep", "-Fvf" ,"c:\\file2.txt", "c:\\file1.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
if (len(output1) + len(output2) + len(errors1) + len(errors2) > 0):
print ("Compare result : There are differences:");
if (len(output1) + len(output2) > 0):
print (" Output differences : ");
print (output1);
print (output2);
if (len(errors1) + len(errors2) > 0):
print (" Errors : ");
print (errors1);
print (errors2);
else:
print ("Compare result : Both files are equal");
except Exception as ex:
print("Compare result : Exception during comparison");
print(ex);
raise;
The trick behind this is the following:
grep -Fvf file1.txt file2.txt verifies if all entries in file2.txt are present in file1.txt. By doing this in both directions we can see if the content of both files are "equal". I put "equal" between quotes because duplicate lines are disregarded in this way of working.
Obviously, this is just an example: you can replace grep by any commandline file comparison tool.
difflib is well worth the effort, with nice condensed output.
from pathlib import Path
import difflib
mypath = '/Users/x/lib/python3'
file17c = Path(mypath, 'oop17c.py')
file18c = Path(mypath, 'oop18c.py')
with open(file17c) as file_1:
file1 = file_1.readlines()
with open(file18c) as file_2:
file2 = file_2.readlines()
for line in difflib.unified_diff(
file1, file2, fromfile=str(file17c), tofile=str(file18c), lineterm=''):
print(line)
output
+ ... unique stuff present in file18c
- ... stuff absent in file18c but present in file17c
I just want the last number of each line.
with open(home + "/Documents/stocks/" + filePath , newline='') as f:
stockArray = (line.split(',') for line in f.readlines())
for line in stockArray:
List = line.pop()
#print(line.pop())
#print(', '.join(line))
else:
print("Finished")
I tried using the line.pop() to take the last element but it only takes it from one line? How can I get it from each line and store it in list?
You probably just want something like:
last_col = [line.split(',')[-1] for line in f]
For more complicated csv files, you might want to look into the csv module in the standard library as that will properly handle quoting of fields, etc.
my_list = []
with open(home + "/Documents/stocks/" + filePath , newline='') as f:
for line in f:
my_list.append(line[-1]) # adds the last character to the list
That should do it.
If you want to add the last element of a list from the file:
my_list = []
with open(home + "/Documents/stocks/" + filePath , newline='') as f:
for line in f:
my_list.append(line.split(',')[-1]) # adds the last character to the list