Script to compare a string in two different files

Script to compare a string in two different files - python

I am brand new to stackoverflow and to scripting. I was looking for help to get started in a script, not necessarily looking for someone to write it.
Here's what I have:
File1.csv - contains some information, I am only interested in MAC addresses.
File2.csv - has some different information, but also contains MAC address.
I need a script that parses the MAC addresses from file1.csv and logs a report if any MAC address shows up in file2.csv.
The questions:
Any tips on the language I use, preferably perl, python or bash?
Can anyone suggest some structure for the logic needed (even if just in psuedo-code)?
update
Using #Adam Wagner's approach, I am really close!
import csv
#Need to strip out NUL values from .csv file to make python happy
class FilteredFile(file):
def next(self):
return file.next(self).replace('\x00','').replace('\xff\xfe','')
reader = csv.reader(FilteredFile('wifi_clients.csv', 'rb'), delimiter=',', quotechar='|')
s1 = set(rec[0] for rec in reader)
inventory = csv.reader(FilteredFile('inventory.csv','rb'),delimiter=',')
s2 = set(rec[6] for rec in inventory)
shared_items = s1.intersection(s2)
print shared_items
This always outputs:(even if I doctor the .csv files to have matching MAC addresses)
set([])
Contents of the csv files
wifi_clients.csv
macNames, First time seen, Last time seen,Power, # packets, BSSID, Probed ESSIDs
inventory.csv
Name,Manufacturer,Device Type,Model,Serial Number,IP Address,MAC Address,...

Here's the approach I'd take:
Iterate over each csv file (python has a handy csv module for accomplishing this), capturing the mac-address and placing it in a set (one per file). And once again, python has a great builtin set type. Here's a good example of using the csv module and of-course, the docs.
Next, you can get the intersection of set1 (file1) and set2 (file2). This will show you mac-addresses that exist in both files one and two.
Example (in python):
s1 = set([1,2,3]) # You can add things incrementally with "s1.add(value)"
s2 = set([2,3,4])
shared_items = s1.intersection(s2)
print shared_items
Which outputs:
set([2, 3])
Logging these shared items could be done with anything from printing (then redirecting output to a file), to using the logging module, to saving directly to a file.
I'm not sure how in-depth of an answer you were looking for, but this should get you started.
Update: CSV/Set usage example
Assuming you have a file "foo.csv", that looks something like this:
bob,123,127.0.0.1,mac-address-1
fred,124,127.0.0.1,mac-address-2
The simplest way to build the set, would be something like this:
import csv
set1 = set()
for record in csv.reader(open('foo.csv', 'rb')):
user, machine_id, ip_address, mac_address = record
set1.add(mac_address)
# or simply "set1.add(record[3])", if you don't need the other fields.
Obviously, you'd need something like this for each file, so you may want to put this in a function to make life easier.
Finally, if you want to go the less-verbose-but-cooler-python-way, you could also build the set like this:
csvfile = csv.reader(open('foo.csv', 'rb'))
set1 = set(rec[3] for rec in csvfile) # Assuming mac-address is the 4th column.

I strongly recommend python to do this.
'Cause you didn't give the structure of the csv file, I can only show a framework:
def get_MAC_from_file1():
... parse the file to get MAC
return a_MAC_list
def get_MAC_from_file2():
... parse the file to get MAC
return a_MAC_list
def log_MACs():
MAC_list1, MAC_list2 = get_MAC_from_file1(), get_MAC_from_file2()
for a_MAC in MAC_list1:
if a_MAC in MAC_list2:
...write your logs
if the data set is large, use a dict or set instead of the list and the intersect operation. But as it's MAC address, I guess your dataset is not that large. So keeping the script easy to read is the most important thing.

Awk is perfect for this
{
mac = $1 # assuming the mac addresses are in the first column
do_grep = "grep " mac " otherfilename" # we'll use grep to check if the mac address is in the other file
do_grep | getline mac_in_other_file # pipe the output of the grep command into a new variable
close(do_grep) # close the pipe
if(mac_in_other_file != ""){ # if grep found the mac address in the other file
print mac > "naughty_macs.log" # append the mac address to the log file
}
}
Then you'd run that on the first file:
awk -f logging_script.awk mac_list.txt
(this code is untested and I'm not the greatest awk hacker, but it should give the general idea)

For the example purpose generate 2 files that that look like yours.
File1:
for i in `seq 100`; do
echo -e "user$i\tmachine$i\t192.168.0.$i\tmac$i";
done > file1.csv
File2 (contains random entries of "mac addresses" numbered from 1-200)
for j in `seq 100`; do
i=$(($RANDOM % 200)) ;
echo -e "mac$i\tmachine$i\tuser$i";
done > file2.csv
Simplest approach would be to use join command and do a join on the appropriate field. This approach has the advantage that fields from both files would be available in the output.
Based on the example files above, the command would look like this:
join -1 4 -2 1 <(sort -k4 file1.csv) <(sort -k1 file2.csv)
join needs the input to be sorted by the field you are matching, that's why the sort is there (-k tells which column to use)
The command above matches rows from file1.csv with rows from file2.csv if column 4 in the first file is equal with column 1 from the second file.
If you only need specific fields, you can specify the output format to the join command:
join -1 4 -2 1 -o1.4 1.2 <(sort -k4 file1.csv) <(sort -k1 file2.csv)
This would print only the mac address and the machine field from the first file.
If you only need a list of matching mac addresses, you can use uniq or sort -u. Since the join output will be sorted by mac, uniq is faster. But if you need a unique list of another field, sort -u is better.
If you only need the mac addresses that match, grep can accept patterns from a file, and you can use cut to extract only the forth field.
fgrep -f<(cut -f4 file1.csv) file2.csv
The above would list all the lines in file2.csv that contain a mac address from file1
Note that I'm using fgrep which doesn't do pattern matching. Also, if file1 is big, this may be slower than the first approach. Also, it assumes that the mac is present only in the field1 of file2 and the other fields don't contain mac addresses.
If you only need the mac, you can either use -o option on fgrep but there are grep variants that don't have it, or you can pipe the output trough cut and then sort -u
fgrep -f<(cut -f4 file1.csv) file2.csv | cut -f1 | sort -u
This would be the bash way.
Python and awk hints have been shown above, I will take a stab at perl:
#!/usr/bin/perl -w
use strict;
open F1, $ARGV[0];
my %searched_mac_addresses = map {chomp; (split /\t/)[3] => 1 } <F1>;
close F1;
open F2, $ARGV[1];
while (<F2>) {
print if $searched_mac_addresses{(split "\t")[0]}
}
close F2
First you create a dictionary containing all the mac addresses from the first file:
my %searched_mac_addresses = map {chomp; (split /\t/)[3] => 1 } <F1>;
reads all the lines from the file1
chomp removes the end of line
split splits the line based on tab, you can use a more complex regexp if needed
() around split force an array context
[3] selects the forth field
map runs a piece of code for all elements of the array
=> generates a dictionary (hash in perl's terminology) element instead of an array
Then you read line by line the second file, and check if the mac exists in the above dictionary:
while (<F2>) {
print if $searched_mac_addresses{(split "\t")[0]}
}
while () will read the file F2, and put each line in the $_ variable
print without any parameters prints the default variable $_
if can postfix a instruction
dictionary elements can be accessed via {}
split by default splits the $_ default variable

Related

How to extract user defined region from an fasta file with a list in other file

I have a multi-fasta sequence file: test.fasta
>Ara_001
MGIKGLTKLLADNAPSCMKEQKFESYFGRKIAVDASMSIYQFLIVVGRTGTEMLTNEAGE
VTSHLQGMFNRTIRLLEAGIKPVYVFDGKPPELKRQELAKRYSKRADATADLTGAIEAGN
>Ara_002
MGIKGLTKLLADNAPSCMKEQKFESYFGRKIAVDASMSIYQFLIVVGRTGTEMLTNEAGE
VTSHLQGMFNRTIRLLEAGIKPVYVFDGKPPELKRQELAKRYSKRADATADLTGAIEAGN
>Ara_003
MGIKGLTKLLAEHAPRAAAQRRVEDYRGRVIAIDASLSIYQFLVVVGRKGTEVLTNEAEG
LTVDCYARFVFDGEPPDLKKRELAKRSLRRDDASEDLNRAIEVGDEDSIEKFSKRTVKIT
I have another list file with a range: range.txt
Ara_001 3 60
Ara_002 10 80
Ara_003 20 50
I want to extract the defined region.
My expected out put would be:
>Ara_001
KGLTKLLADNAPSCMKEQKFESYFGRKIAVDASMSIYQFLIVVGRTGTEMLTNEAGE
VT
>Ara_002
ADNAPSCMKEQKFESYFGRKIAVDASMSIYQFLIVVGRTGTEMLTNEAGE
VTSHLQGMFNRTIRLLEAGIKPVYVFDGKP
>Ara_003
RRVEDYRGRVIAIDASLSIYQFLVVVGRKG
I tried:
#!/bin/bash
lines=$(awk 'END {print NR}' range.txt)
for ((a=1; a<= $lines ; a++))
do
number=$(awk -v lines=$a 'NR == lines' range.txt)
grep -v ">" test.fasta | awk -v lines=$a 'NR == lines' | cut -c$number
done;

Do not reinvent the wheel. Use standard bioinformatics tools written for this purpose and used widely. In your example, use bedtools getfasta. Reformat your regions file to be in 3-column bed format, then:
bedtools getfasta -fi test.fasta -bed range.bed
Install bedtools suite, for example, using conda, specifically miniconda, like so:
conda create --name bedtools bedtools

Using biopython:
# read ranges as dictionary
with open('range.txt') as f:
ranges = {ID: (int(start), int(stop)) for ID, start, stop in map(lambda s: s.strip().split(), f)}
# {'Ara_001': (3, 60), 'Ara_002': (10, 80), 'Ara_003': (20, 50)}
# load input fasta and slice
from Bio import SeqIO
with open ('test.fasta') as handle:
out = [r[slice(*ranges[r.id])] for r in SeqIO.parse(handle, 'fasta')]
# export sliced sequences
with open('output.fasta', 'w') as handle:
SeqIO.write(out, handle, 'fasta')
Output file:
>Ara_001
KGLTKLLADNAPSCMKEQKFESYFGRKIAVDASMSIYQFLIVVGRTGTEMLTNEAGE
>Ara_002
ADNAPSCMKEQKFESYFGRKIAVDASMSIYQFLIVVGRTGTEMLTNEAGEVTSHLQGMFN
RTIRLLEAGI
>Ara_003
RRVEDYRGRVIAIDASLSIYQFLVVVGRKG
NB. With this quick code there must be a entry for each sequence id in range.txt, it's however quite easy to modify it to use a default behavior on case of absence of it.

When I wrote my initial answer I misunderstood the question. Your situation is not so much dependent on any programming languages. You seem to need the utility in Timur Shtatland's answer. Either install that utility or use mozway's python code snippet which handles both of your files.
I read the format of your question as if you had 4 individual files, not two.

With Biotite, a package I am developing, this can be done with:
import biotite.sequence.io.fasta as fasta
input_file = fasta.FastaFile.read("test.fasta")
output_file = fasta.FastaFile()
with open("range.txt") as file:
for line in file.read().splitlines():
seq_id, start, stop = line.split()
start = int(start)
stop = int(stop)
output_file[seq_id] = input_file[seq_id][start : stop]
output_file.write("path/to/output.fasta")

Read a python variable in a shell script?

my python file has these 2 variables:
week_date = "01/03/16-01/09/16"
cust_id = "12345"
how can i read this into a shell script that takes in these 2 variables?
my current shell script requires manual editing of "dt" and "id". I want to read the python variables into the shell script so i can just edit my python parameter file and not so many files.
shell file:
#!/bin/sh
dt="01/03/16-01/09/16"
cust_id="12345"
In a new python file i could just import the parameter python file.

Consider something akin to the following:
#!/bin/bash
# ^^^^ NOT /bin/sh, which doesn't have process substitution available.
python_script='
import sys
d = {} # create a context for variables
exec(open(sys.argv[1], "r").read()) in d # execute the Python code in that context
for k in sys.argv[2:]:
print "%s\0" % str(d[k]).split("\0")[0] # ...and extract your strings NUL-delimited
'
read_python_vars() {
local python_file=$1; shift
local varname
for varname; do
IFS= read -r -d '' "${varname#*:}"
done < <(python -c "$python_script" "$python_file" "${#%%:*}")
}
You might then use this as:
read_python_vars config.py week_date:dt cust_id:id
echo "Customer id is $id; date range is $dt"
...or, if you didn't want to rename the variables as they were read, simply:
read_python_vars config.py week_date cust_id
echo "Customer id is $cust_id; date range is $week_date"
Advantages:
Unlike a naive regex-based solution (which would have trouble with some of the details of Python parsing -- try teaching sed to handle both raw and regular strings, and both single and triple quotes without making it into a hairball!) or a similar approach that used newline-delimited output from the Python subprocess, this will correctly handle any object for which str() gives a representation with no NUL characters that your shell script can use.
Running content through the Python interpreter also means you can determine values programmatically -- for instance, you could have some Python code that asks your version control system for the last-change-date of relevant content.
Think about scenarios such as this one:
start_date = '01/03/16'
end_date = '01/09/16'
week_date = '%s-%s' % (start_date, end_date)
...using a Python interpreter to parse Python means you aren't restricting how people can update/modify your Python config file in the future.
Now, let's talk caveats:
If your Python code has side effects, those side effects will obviously take effect (just as they would if you chose to import the file as a module in Python). Don't use this to extract configuration from a file whose contents you don't trust.
Python strings are Pascal-style: They can contain literal NULs. Strings in shell languages are C-style: They're terminated by the first NUL character. Thus, some variables can exist in Python than cannot be represented in shell without nonliteral escaping. To prevent an object whose str() representation contains NULs from spilling forward into other assignments, this code terminates strings at their first NUL.
Now, let's talk about implementation details.
${#%%:*} is an expansion of $# which trims all content after and including the first : in each argument, thus passing only the Python variable names to the interpreter. Similarly, ${varname#*:} is an expansion which trims everything up to and including the first : from the variable name passed to read. See the bash-hackers page on parameter expansion.
Using <(python ...) is process substitution syntax: The <(...) expression evaluates to a filename which, when read, will provide output of that command. Using < <(...) redirects output from that file, and thus that command (the first < is a redirection, whereas the second is part of the <( token that starts a process substitution). Using this form to get output into a while read loop avoids the bug mentioned in BashFAQ #24 ("I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?").
The IFS= read -r -d '' construct has a series of components, each of which makes the behavior of read more true to the original content:
Clearing IFS for the duration of the command prevents whitespace from being trimmed from the end of the variable's content.
Using -r prevents literal backslashes from being consumed by read itself rather than represented in the output.
Using -d '' sets the first character of the empty string '' to be the record delimiter. Since C strings are NUL-terminated and the shell uses C strings, that character is a NUL. This ensures that variables' content can contain any non-NUL value, including literal newlines.
See BashFAQ #001 ("How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?") for more on the process of reading record-oriented data from a string in bash.

Other answers give a way to do exactly what you ask for, but I think the idea is a bit crazy. There's a simpler way to satisfy both scripts - move those variables into a config file. You can even preserve the simple assignment format.
Create the config itself: (ini-style)
dt="01/03/16-01/09/16"
cust_id="12345"
In python:
config_vars = {}
with open('the/file/path', 'r') as f:
for line in f:
if '=' in line:
k,v = line.split('=', 1)
config_vars[k] = v
week_date = config_vars['dt']
cust_id = config_vars['cust_id']
In bash:
source "the/file/path"
And you don't need to do crazy source parsing anymore. Alternatively you can just use json for the config file and then use json module in python and jq in shell for parsing.

I would do something like this. You may want to modify it little bit for minor changes to include/exclude quotes as I didn't really tested it for your scenario:
#!/bin/sh
exec <$python_filename
while read line
do
match=`echo $line|grep "week_date ="`
if [ $? -eq 0 ]; then
dt=`echo $line|cut -d '"' -f 2`
fi
match=`echo $line|grep "cust_id ="`
if [ $? -eq 0 ]; then
cust_id=`echo $line|cut -d '"' -f 2`
fi
done

Getting (row count - 1) of CSV files

Mainly have 2 questions in respect to this topic:
I'm looking to get the row counts of a few CSV files. In Bash, I know I can do wc -l < filename.csv. How do I do this and subtract 1 from it (because of headers)?
For anyone familiar with CSV files and possible issues with grabbing raw line count, how plausible is it that a line is wrapped across multiple lines? I know that this is a very possible scenario, but want to say that this never happens. In the event of this being a possibility, would using Python's csv package be better to use? Does it read lines based on delimiters and other column wrappers?

As Barmar points out, (1) it is quite possible for CSV files to have wrapped lines and (2) CSV programming libraries can handle this well. As an example, here is a python program which uses python's CSV module to count the number of lines in file.csv minus 1:
python -c 'import csv; print( sum(1 for line in csv.reader(open("file.csv")))-1 )'
The -c arg option tells python to treat the arg string as a program to execute. In this case, we make the csv module available with the "import" statement. Then, we print out the number of lines minus one. The construct sum(1 for line in csv.reader(open("file.csv"))) counts the lines one at a time.
If your csv file has a non-typical format, you will need to set options. This might be the delimiter or quoting character. See the documentation for details.
Example
Consider this test file:
$ cat file.csv
First name,Last name,Address
John,Smith,"P O Box 1234
Somewhere, State"
Jane,Doe,"Unknown"
This file has two rows plus a header. One of the rows is split over two lines. Python's csv module correctly understands this:
$ python -c 'import csv; print( sum(1 for line in csv.reader(open("file.csv")))-1 )'
2
gzipped files
To open gzip files in python, we use the gzip module:
$ python -c 'import csv, gzip; print( sum(1 for line in csv.reader(gzip.GzipFile("file.csv.gz")))-1 )'
2

For getting the line count, just subtract 1 from the value returned by wc using an arithmetic expression
count=$(($(wc -l < filename.csv) - 1)
CSV format allows fields to contain newlines, by surrounding the field with quotes, e.g.
field1,field2,"field3 broken
across lines",field4
Dealing with this in a plain bash script would be difficult (indeed, any CSV processing that needs to handle quoted fields is tricky). If you need to deal with the full generality of CSV, you should probably use a programming language with a CSV library.
But if you know that your CSV files will never be like this, you can ignore it.

As an alternative to subtracting one from the total row count, you can discard the first line from the file before
row_count=$( { read; wc -l; } < filename.csv )
(This is in no way better than simply using $(($(wc -l < filename.csv) - 1)); it's just a useful trick to know.)

Python or Bash - Iterate all words in a text file over itself

I have a text file that contains thousands of words, e.g:
laban
labrador
labradors
lacey
lachesis
lacy
ladoga
ladonna
lafayette
lafitte
lagos
lagrange
lagrangian
lahore
laius
lajos
lakeisha
lakewood
I want to iterate every word over itself so i get:
labanlaban
labanlabrador
labanlabradors
labanlacey
labanlachesis
etc...
In bash i can do the following, but it is extremely slow:
#!/bin/bash
( cat words.txt | while read word1; do
cat words.txt | while read word2; do
echo "$word1$word2" >> doublewords.txt
done; done )
Is there a faster and more efficient way to do this?
Also, how would i iterate two different text files in this manner?

If you can fit the list into memory:
import itertools
with open(words_filename, 'r') as words_file:
words = [word.strip() for word in words_file]
for words in itertools.product(words, repeat=2):
print(''.join(words))
(You can also do a double-for loop, but I was feeling itertools tonight.)
I suspect the win here is that we can avoid re-reading the file over and over again; the inner loop in your bash example will cat the file one for each iteration of the outer loop. Also, I think Python just tends to execute faster than bash, IIRC.
You could certainly pull this trick with bash (read the file into an array, write a double-for loop), it's just more painful.

It looks like sed is pretty efficient to append a text to each line.
I propose:
#!/bin/bash
for word in $(< words.txt)
do
sed "s/$/$word/" words.txt;
done > doublewords.txt
(Do you confuse $ which means end of line for sed and $word which is a bash variable).
For a 2000 line file, this runs in about 20 s on my computer, compared to ~2 min for you solution.
Remark: it also looks like you are slightly better off redirecting the standard output of the whole program instead of forcing writes at each loop.
(Warning, this is a bit off topic and personal opinion!)
If you are really going for speed, you should consider using a compiled language such as C++. For example:
vector<string> words;
ifstream infile("words.dat");
for(string line ; std::getline(infile,line) ; )
words.push_back(line);
infile.close();
ofstream outfile("doublewords.dat");
for(auto word1 : data)
for(auto word2 : data)
outfile << word1 << word2 << "\n";
outfile.close();
You need to understand that both bash and python are bad at double for loops: that's why you use tricks (#Thanatos) or predefined commands (sed). Recently, I came across a double for loop problem (given a set of 10000 points in 3d, compute all the distances between pairs) and I successful solved it using C++ instead of python or Matlab.

If you have GHC available, Cartesian products are a synch!
Q1: One file
-- words.hs
import Control.Applicative
main = interact f
where f = unlines . g . words
g x = map (++) x <*> x
This splits the file into a list of words, and then appends each word to each other word with the applicative <*>.
Compile with GHC,
ghc words.hs
and then run with IO redirection:
./words <words.txt >out
Q2: Two files
-- words2.hs
import Control.Applicative
import Control.Monad
import System.Environment
main = do
ws <- mapM ((liftM words) . readFile) =<< getArgs
putStrLn $ unlines $ g ws
where g (x:y:_) = map (++) x <*> y
Compile as before and run with the two files as arguments:
./words2 words1.txt words2.txt > out
Bleh, compiling?
Want the convenience of a shell script and the performance of a compiled executable? Why not do both?
Simply wrap the Haskell program you want in a wrapper script which compiles it in /var/tmp, and then replaces itself with the resulting executable:
#!/bin/bash
# wrapper.sh
cd /var/tmp
cat > c.hs <<CODE
# replace this comment with haskell code
CODE
ghc c.hs >/dev/null
cd - >/dev/null
exec /var/tmp/c "$#"
This handles arguments and IO redirection as though the wrapper didn't exist.
Results
Timing against some of the other answers with two 2000 word files:
$ time ./words2 words1.txt words2.txt >out
3.75s user 0.20s system 98% cpu 4.026 total
$ time ./wrapper.sh words1.txt words2.txt > words2
4.12s user 0.26s system 97% cpu 4.485 total
$ time ./thanatos.py > out
4.93s user 0.11s system 98% cpu 5.124 total
$ time ./styko.sh
7.91s user 0.96s system 74% cpu 11.883 total
$ time ./user3552978.sh
57.16s user 29.17s system 93% cpu 1:31.97 total

You can do this in pythonic way by creating a tempfile and write data to it while reading the existing file and finally remove the original file and move the new file to original file.
import sys
from os import remove
from shutil import move
from tempfile import mkstemp
def data_redundent(source_file_path):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace('\n', '')+line)
remove(source_file_path)
move(target_file_path, source_file_path)
data_redundent('test_data.txt')

I'm not sure how efficient this is, but a very simple way, using the Unix tool specifically designed for this sort of thing, would be
paste -d'\0' <file> <file>
The -d option specifies the delimiter to be used between the concatenated parts, and \0 indicates a NULL character (i.e. no delimiter at all).

Automating a directory diff while ignoring some particular lines in files

I need to compare two directories, and produce some sort of structured output (text file is fine) of the differences. That is, the output might looks something like this:
file1 exists only in directory2
file2 exists only in directory1
file3 is different between directory1 and directory2
I don't care about the format, so long as the information is there. The second requirement is that I need to be able to ignore certain character sequences when diffing two files. Araxis Merge has this ability: you can type in a Regex and any files whose only difference is in character sequences matching that Regex will be reported as identical.
That would make Araxis Merge a good candidate, BUT, as of yet I have found no way to produce a structured output of the diff. Even when launching consolecompare.exe with command-line argumetns, it just opens an Araxis GUI window showing the differences.
So, does either of the following exist?
A way to get Araxis Merge to print a diff result to a text file?
Another utility that do a diff while ignoring certain character
sequences, and produce structured output?
Extra credit if such a utility exists as a module or plugin for Python. Please keep in mind this must be done entirely from a command line / python script - no GUIs.

To some extent, the plain old diff command can do just that, i.e. compare directory contents and ignoring changes that match a certain regex pattern (Using the -I option).
From man bash:
-I regexp
Ignore changes that just insert or delete lines that match regexp.
Quick demo:
[me#home]$ diff images/ images2
Only in images2: x
Only in images/: y
diff images/z images2/z
1c1
< zzz
---
> zzzyy2
[me#home]$ # a less verbose version
[me#home]$ diff -q images/ images2
Only in images2: x
Only in images/: y
Files images/z and images2/z differ
[me#home]$ # ignore diffs on lines that contain "zzz"
[me#home]$ diff -q -I ".*zzz.*" images/ images2/
Only in images2/: x
Only in images/: y

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Script to compare a string in two different files - python

Related

How to extract user defined region from an fasta file with a list in other file

Read a python variable in a shell script?

Getting (row count - 1) of CSV files

Python or Bash - Iterate all words in a text file over itself

Automating a directory diff while ignoring some particular lines in files

Categories

Resources