How to fix issues with comparing text of 2 files and comparing - python

I have a script that gets the list of distribution switches from device manager "Master-EDR-List.txt". It then grabs another txt file from a different server "New-EDR-List.txt". Master list is pretty static until New list has additional EDRs that Master list is missing.
I would like to compare these 2 files and save any EDR that is in New list but not in Master list. I did write a compare script but it is not reliable. I added some additional test EDRs in New list and I am getting unexpected results based on where I place them in the list. I always get the new ones but sometimes I also get an EDR that is in both list and sometimes I get both new EDRs in same line with no spaces.
Here is my code:
old_lines = set((line.strip() for line in open('Master-EDR-List.txt', 'r+')))
file_new = open('New-EDR-List.txt', 'r+')
#file_diff = open('file_diff.txt', 'w')
#Open Master File
with open('Master-EDR-List.txt', 'r') as f:
d = set(f.readlines())
#Open New File
with open('New-EDR-List.txt', 'r') as f:
e = set(f.readlines())
#Open Diff files to store differences
open('file_diff.txt','w').close()
with open('file_diff.txt', 'a') as f:
for line in list(e - d):
f.write(line)
Here are my lists I am using for testing:
Master List:
rts41d-an28edr1.rt.tst.com
rts41d-an28edr2.rt.tst.com
rts41d-an32edr1.rt.tst.com
rts41d-an32edr2.rt.tst.com
rts41d-as19edr1.rt.tst.com
rts41d-as19edr2.rt.tst.com
rts41d-as21edr1.rt.tst.com
rts41d-as21edr2.rt.tst.com
rts12a-ah46edr2.rt.tst.com
rts12a-al46edr2.rt.tst.com
rts12a-as46edr1.rt.tst.com
rts12a-as46edr2.rt.tst.com
rts12a-as46edr2.rt.tst.com
rts12a-aw46edr1.rt.tst.com
rts12a-aw46edr2.rt.tst.com
rts12a-aw46edr2.rt.tst.com
rts12b-as46edr1.rt.tst.com
rts12b-ax46edr1.rt.tst.com
New List:
rts41d-an28edr1.rt.tst.com
rts41d-an28edr2.rt.tst.com
rts41d-an32edr1.rt.tst.com
rts41d-an32edr2.rt.tst.com
rts41d-as19edr1.rt.tst.com
rts41d-as19edr2.rt.tst.com
rt511-sps5.rt.tst.com
rts41d-as21edr1.rt.tst.com
rts41d-as21edr2.rt.tst.com
rts12a-ah46edr2.rt.tst.com
rts12a-al46edr2.rt.tst.com
rts12a-as46edr1.rt.tst.com
rts12a-as46edr2.rt.tst.com
rt511-sps6.rt.tst.com
rts12a-as46edr2.rt.tst.com
rts12a-aw46edr1.rt.tst.com
rts12a-aw46edr2.rt.tst.com
rts12a-aw46edr2.rt.tst.com
rts12b-as46edr1.rt.tst.com
rts12b-ax46edr1.rt.tst.com
rt511-sps7.rt.tst.com
I added 2 test rt511-sps5, 6 and 7 in the list and instead of only getting these 3 item, I am getting this in my Diff file:
Diff File:
rt511-sps7.rt.tst.comrt511-sps5.rt.tst.com
rt511-sps6.rt.tst.com
rts12b-ax46edr1.rt.tst.com
As you can see, sps7 and 5 are in same line for some reason and "rts12b-ax46edr1" should not be there as it is already in both files.
Does anyone know why this is happening and how I can fix it? New List can discover any new distribution switch and put it anywhere on list depending on its name. I would like this script to only print out any new EDR in the list that Master does not have.
Thanks

I modified your script. Please use the below code to fulfil your requirement. Do not forget to close all the open files.
with open('Master-EDR-List.txt', 'r') as f:
d = f.readlines()
#Open New File
with open('New-EDR-List.txt', 'r') as f:
e = f.readlines()
out = open('file_diff.txt', 'a')
for newline in e:
found = False
for oldline in d:
if((newline.strip(' \t\n\r')) == (oldline.strip(' \t\n\r'))):
found = True
break;
if(found == False):
if(newline != '\n'):
out.writelines(newline)

Related

python how to put difference of 2 files in a list?

I have 2 files with emailadresses in them and some of these emailadresses are the same and some aren't. I need to see which of the emailadresses in file1 aren't in file2. How can I do that? Also it would be great if I can put them in a list too.
here's what I got:
'file1 = open("competitor_accounts.txt")
file2 = open("accounts.txt")'
I know it ain't much, but I need help getting started
I thought maybe using a for loop with if statements? but I just don't know how.
You can read each file's contents to a separate list and then compare the lists to each other like so
with open('accounts.txt') as f:
accounts = [line for line in f]
with open('competitor_accounts.txt') as f:
competitors = [line for line in f]
accounts_not_competitors = [line for line in accounts if line not in competitors]
competitors_not_accounts = [line for line in competitors if line not in accounts]
You can use open as well with readlines() but using with is commonly more acceptable since you don't need to explicitly close() the file after you're done reading it.
file_a = open('accounts.txt')
accounts = file_a.readlines()
file_a.close()
The two rows in the end form an expression to generate a new list based on matches in the existing lists. These can be written out to an easier form:
accounts_not_competitors = []
for line in accounts:
if line not in competitors:
accounts_not_competitors.append(line)
I believe this should be enough to get you started with the syntax and functionality in case you wanted to do some other comparisons between the two.
Assuming that only one email is a line in each file
First save each file in a list and create another list that you will save the difference in.
Loop through file1 list and check if each item in file1 list is present in file2 list if not add that item to the diff list
f1_list = []
f2_list = []
diff = []
with open(file1name, 'r', encoding='utf-8') as f1:
for line in f1:
f1_list.append(line)
with open(file2name, 'r', encoding='utf-8') as f2:
for line in f2:
f2_list.append(line)
for email in f1_list:
if not email in f2_list:
diff.append(email)
print(diff)
You can use set
with open('competitor_accounts.txt', 'r') as file:
competitor_accounts = set([mail for mail in file])
with open('accounts.txt', 'r') as file:
accounts = set([mail for mail in file])
result = list(competitor_accounts - accounts)

Find unique entries in files

guess you have a solution concerning the following issue:
I want to compare two lists for common entries (on the basis of column 10) and write common entries to one file and unique entries for the first list into another file. The code I wrote is:
INFILE1 = open ("c:\\python\\test\\58962.filtered.csv", "r")
INFILE2 = open ("c:\\python\\test\\83887.filtered.csv", "r")
OUTFILE1 = open ("c:\\python\\test\\58962_vs_83887.common.csv", "w")
OUTFILE2 = open ("c:\\python\\test\\58962_vs_83887.unique.csv", "w")
for line in INFILE1:
line = line.rstrip().split(",")
if line[11] in INFILE2:
OUTFILE1.write(line)
else:
OUTFILE2.write(line)
INFILE1.close()
INFILE2.close()
OUTFILE1.close()
OUTFILE2.close()
The following error appears:
8 OUTFILE1.write(line)
9 else:
---> 10 OUTFILE2.write(line)
11 INFILE1.close()
TypeError: write() argument must be str, not list
Does somebody know about help for this?
Best
This line
line = line.rstrip().split(",")
replaces the line you read from a file by it's splitted list. You then try to write the splitted list to your file - thats not how the write method works and it tells you exactly that.
Change it to :
for line in INFILE1:
lineList = line.rstrip().split(",") # dont overwrite line, use lineList
if lineList[11] in INFILE2: # used lineList
OUTFILE1.write(line) # corrected indentation
else:
OUTFILE2.write(line)
You could have easily found your error yourself, just printing out the line before and after splitting or just befrore writing.
Please read How to debug small programs (#1) and follow it - its easier to find and fix bugs yourself then posting questions here.
You have some other problem at hand, though:
Files are stream based, they start with a position of 0 in the file. The position is advanced if you access parts of the file. When at the end, you wont get anything by using INFILE2.read() or other methods.
So if you want to repeatadly check if some lines column of file1 is somewhere in file2 you need to read file2 into a list (or other datastructure) so your repeated checks work. In other words, this:
if lineList[11] in INFILE2:
might work once, then the file is consumed and it will return false all the time.
You also might want to change from:
f = open(...., ...)
# do something with f
f.close()
to
with open(name,"r") as f:
# do something with f, no close needed, closed when leaving block
as it is safer, will close the file even if exceptions happen.
To solve that try this (untested) code:
with open ("c:\\python\\test\\83887.filtered.csv", "r") as file2:
infile2 = file2.readlines() # read in all lines as list
with open ("c:\\python\\test\\58962.filtered.csv", "r") as INFILE1:
# next 2 lines are 1 line, \ at end signifies line continues
with open ("c:\\python\\test\\58962_vs_83887.common.csv", "w") as OUTFILE1, \
with open ("c:\\python\\test\\58962_vs_83887.unique.csv", "w") as OUTFILE2:
for line in INFILE1:
lineList = line.rstrip().split(",")
if any(lineList[11] in x for x in infile2): # check the list of lines if
# any contains line[11]
OUTFILE1.write(line)
else:
OUTFILE2.write(line)
# all files are autoclosed here
Links to read:
the-with-statement
any() and other built-ins

Using same code for multiple text files and generate multiple text files as output using python

I have more than 30 text files. I need to do some processing on each text file and save them again in text files with different names.
Example-1: precise_case_words.txt ---- processing ---- precise_case_sentences.txt
Example-2: random_case_words.txt ---- processing ---- random_case_sentences.txt
Like this i need to do for all text files.
present code:
new_list = []
with open('precise_case_words.txt') as inputfile:
for line in inputfile:
new_list.append(line)
final = open('precise_case_sentences.txt', 'w+')
for item in new_list:
final.write("%s\n" % item)
Am manually copy+paste this code all the times and manually changing the names everytime. Please suggest me a solution to avoid manual job using python.
Suppose you have all your *_case_words.txt in the present dir
import glob
in_file = glob.glob('*_case_words.txt')
prefix = [i.split('_')[0] for i in in_file]
for i, ifile in enumerate(in_file):
data = []
with open(ifile, 'r') as f:
for line in f:
data.append(line)
with open(prefix[i] + '_case_sentence.txt' , 'w') as f:
f.write(data)
This should give you an idea about how to handle it:
def rename(name,suffix):
"""renames a file with one . in it by splitting and inserting suffix before the ."""
a,b = name.split('.')
return ''.join([a,suffix,'.',b]) # recombine parts including suffix in it
def processFn(name):
"""Open file 'name', process it, save it under other name"""
# scramble data by sorting and writing anew to renamed file
with open(name,"r") as r, open(rename(name,"_mang"),"w") as w:
for line in r:
scrambled = ''.join(sorted(line.strip("\n")))+"\n"
w.write(scrambled)
# list of filenames, see link below for how to get them with os.listdir()
names = ['fn1.txt','fn2.txt','fn3.txt']
# create demo data
for name in names:
with open(name,"w") as w:
for i in range(12):
w.write("someword"+str(i)+"\n")
# process files
for name in names:
processFn(name)
For file listings: see How do I list all files of a directory?
I choose to read/write line by line, you can read in one file fully, process it and output it again on block to your liking.
fn1.txt:
someword0
someword1
someword2
someword3
someword4
someword5
someword6
someword7
someword8
someword9
someword10
someword11
into fn1_mang.txt:
0demoorsw
1demoorsw
2demoorsw
3demoorsw
4demoorsw
5demoorsw
6demoorsw
7demoorsw
8demoorsw
9demoorsw
01demoorsw
11demoorsw
I happened just today to be writing some code that does this.

Loop list and open file if found

Not sure where to start with this... I know how to read in a csv file but if I have a heap of files in the same directory, how can read them in according to whether they are in a list. For example, a list such as...
l= [['file1.csv','title1','1'], ['file2.csv','title2','1'],['file3.csv','title3','1']]
How can I get just those 3 files even though I up to 'file20.csv' in the directory.
Can I somehow loop through the list and use an if-statement to check the filenames and open the file if found?
for filedesc in l: #go over each sublist in l
fname, ftitle, _ = filedesc #unpack the information contained in it
with open(fname) as f: #open the file with the appropriate name
reader = csv.reader(f) #create reader of that file
#go about bussiness
An updated post because I've gotten so close with this....
lfiles= []
csvfiles=[]
for row in l:
lfiles= row[0] #This reads in just the filesnames from list 'l'
with open(lfiles) as x:
inread = csv.reader(x)
for i in x:
print i
That's print everything in the files that were read in, but now I want to append 'csvfiles' (an empty list) with a row if a particular column equals something.
Probably like this...????
for i in x:
for line in i:
if line= 'ThisThingAppears5Times':
csvfiles.append(line) # and now the 5 lines are in a 2dlist
Of course that doesn't work but close??

writing lines group by group in different files

I've got a little script which is not working nicely for me, hope you can help and find the problem.
I have two starting files:
traveltimes: contains the lines I need, it's a column file (every row has just a number). The lines I need are separated by a line which starts with 11 whitespaces
header lines: contains three header lines
output_file: I want to get 29 files (STA%s). What's inside? Every file will contain the same header lines after which I want to append the group of lines contained in the traveltimes file (one different group of lines for every file). Every group of lines is made by 74307 rows (1 column)
So far this script creates 29 files with the same header lines but then it mixes up everything, I mean it writes something but it's not what I want.
Any idea????
def make_station_files(traveltimes, header_lines):
"""Gives the STAxx.tgrid files required by loc3d"""
sta_counter = 1
with open (header_lines, 'r') as file_in:
data = file_in.readlines()
for i in range (29):
with open ('STA%s' % (sta_counter), 'w') as output_files:
sta_counter += 1
for i in data [0:3]:
values = i.strip()
output_files.write ("%s\n\t1\n" % (values))
with open (traveltimes, 'r') as times_file:
#collector = []
for line in times_file:
if line.startswith (" "):
break
output_files.write ("%s" % (line))
Suggestion:
Read the header rows first. Make sure this works before proceeding. None of the rest of the code needs to be indented under this.
Consider writing a separate function to group the traveltimes file into a list of lists.
Once you have a working traveltimes reader and grouper, only then create a new STA file, print the headers to it, and then write the timegroups to it.
Build your program up step-by-step, making sure it does what you expect at each step. Don't try to do it all at once because then you won't easily be able to track down where the issue lies.
My quick edit of your script uses itertools.groupby() as a grouper. It is a little advanced because the grouping function is stateful and tracks it state in a mutable list:
def make_station_files(traveltimes, header_lines):
'Gives the STAxx.tgrid files required by loc3d'
with open (header_lines, 'r') as f:
headers = f.readlines()
def station_counter(line, cnt=[1]):
'Stateful station counter -- Keeps the count in a mutable list'
if line.strip() == '':
cnt[0] += 1
return cnt[0]
with open(traveltimes, 'r') as times_file:
for station, group in groupby(times_file, station_counter):
with open('STA%s' % (station), 'w') as output_file:
for header in headers[:3]:
output_file.write ('%s\n\t1\n' % (header.strip()))
for line in group:
if not line.startswith(' '):
output_file.write ('%s' % (line))
This code is untested because I don't have sample data. Hopefully, you'll get the gist of it.

Categories