Open two files pairwise out of many - python - python

Hey guys I'm a rookie in python and need some help.
My problem is, that I have a folder full of text files (with lists in it), where two belong to each other and need to be read and compared.
Folder with many files: File1_in.xlo, File1_out.xlo, File2_in.xlo, File2_out.xlo, ...
--> so File1_in.xlo and File1_out.xlo belong together and need to be compared.
I already can append the lists of the 'in-Files' (or 'out-Files') and then compare them, but since there are many Files the lists become really long (thousands and thousands of entries), so the idea is to compare the files or respectively the lists pairwise.
My first try looks like:
import os
for filename in sorted(os.listdir('path')):
if filename.endswith('in.xlo'):
with open(os.path.join('path', filename)) as inn:
lines = inn.readlines()
for x in lines:
temperatureIn = x.split()[4]
if filename.endswith('out.xlo'):
with open(os.path.join('path', filename)) as outt:
lines = outt.readlines()
for x in lines:
temperatureOut = x.split()[4] #4. column in list
So the problem is, as you can see, the 'temperatureIn's are always overwritten before I can compare them with the 'temperatureOut's. I think/ hope there must be a way to open both files at once to compare the list entries.
I hope you can understand my problem and someone can help me.
Thanks

Use zip to access in-Files and out-Files in pairs
files = sorted(os.listdir('path'))
in_files = [fname for fname in files if fname.endswith('in.xlo')]
out_files = [fname for fname in files if fname.endswith('out.xlo')]
for in_file, out_file in zip(in_files, out_files):
with open(os.path.join('path', in_file)) as inn, open(os.path.join('path', out_file)) as outt:
# Do whatever you want

add them to a list created just before your for loop, as:
temps_in =[]
for x in lines:
temperatureIn = x.split()[4]
temps_in.append(temperatureIn)
Do the same thoing for temperatures out, then compare your two lists

Related

How to modify iteration list?

Following scenario of traversing dir structure.
"Build complete dir tree with files but if files in single dir are similar in name list only single entity"
Example tree ( let's assume they're are not sorted ):
- rootDir
-dirA
fileA_01
fileA_03
fileA_05
fileA_06
fileA_04
fileA_02
fileA_...
fileAB
fileAC
-dirB
fileBA
fileBB
fileBC
Expected output:
- rootDir
-dirA
fileA_01 - fileA_06 ...
fileAB
fileAC
-dirB
fileBA
fileBB
fileBC
So I did already simple def findSimilarNames that for fileA_01 (or any fileA_) will return list [fileA_01...fileA_06]
Now I'm in os.walk and I'm doing loop over files so every file will be checked against similar filenames so e.g fileA_03 I've got rest of them [fileA_01 - fileA_06] and now I want to modify the list that I iterate over to just skip items from findSimilarNames, without need of using another loop or if's inside.
I searched here and people are suggesting avoidance of modifying iteration list, but doing so I would avoid every file iteration.
Pseudo code:
for root,dirs,files in os.walk( path ):
for file in files:
similarList = findSimilarNames( file )
#OVERWRITE ITERATION LIST SOMEHOW
files = (set(files)-set(similarList))
#DEAL WITH ELEMENT
What I'm trying to avoid is below - checking each file because maybe it's already found by findSimilarNames.
for root,dirs,files in os.walk( path ):
filteredbysimilar = files[:]
for file in files:
similar = findSimilarNames( file )
filteredbysimilar = list(set(filteredbysimilar)-set(similar))
#--
for filteredFile in filteredbysimilar:
#DEAL WITH ELEMENT
#OVERWRITE ITERATION LIST SOMEHOW
You can get this effect by using a while-loop style iteration. Since you want to do set subtraction to remove the similar groups anyway, the natural approach is to start with a set of all the filenames, and repeatedly remove groups until nothing is left. Thus:
unprocessed = set(files)
while unprocessed:
f = unprocessed.pop() # removes and returns an arbitrary element
group = findSimilarNames(f)
unprocessed -= group # it is not an error that `f` has already been removed.
doSomethingWith(group) # i.e., "DEAL WITH ELEMENT" :)
How about building up a list of files that aren't similar?
unsimilar = set()
for f in files:
if len(findSimilarNames(f).intersection(unsimilar))==0:
unsimilar.add(f)
This assumes findSimilarNames yields a set.

Looping over different python dictionaries - wrong results?

I am using Python 2.7, btw..
Let's say I have a couple directories that I want to create dictionaries for. The files in each of the directories are named YYYYMMDD.hhmmss and are all different, and the size of each directory is different:
path1 = /path/to/folders/to/make/dictionaries
dir1 = os.listdir(path1)
I also have another static directory that will have some files to compare
gpath1 = /path/to/static/files
gdir1 = os.listdir(gpath1)
dir1_file_list = [datetime.strptime(g, '%Y%m%d.%H%M%S') for g in gdir1]
So I have a static directory of files in gdir2, and I now want to loop through each directory in dir1 and create a unique dictionary. This is the code:
for i in range(0,len(dir1)):
path2 = path1 + "/" + dir1[i]
dir2 = os.listdir(path2)
dir2_file_list = [datetime.strptime(r, '%Y%m%d.%H%M%S') for r in dir2]
# Define a dictionary, and initialize comparisons
dict_gr = []
dict_gr = dict()
for dir1_file in dir1_file_list:
dict_gr[str(dir1_file)] = []
# Look for instances within the last 5 minutes
for dir2_file in dir2_file_list:
if 0 <= (dir1_file - dir2_file).total_seconds() <= 300:
dict_gr[str(dir1_file)].append(str(dir2_file))
# Sort the dictionaries
for key, value in sorted(dict_gr.iteritems()):
dir2_lib.append(key)
dir1_lib.append(sorted(value))
The issue is that path2 and dir2 both properly go to the different folders and grab the necessary filenames, and creating dict_gr will all work well. However, when I go to the part of the script where I sort the dictionaries, the 2nd directory that has been looped over will contain the contents of the first directory. The 3rd looped dictionary will contain the contents of the 1st and 2nd, etc. In other words, they are not matching uniquely with each directory.
Any thoughts?
Overlooked appending to dir2_lib and dir1_lib, needed to initialize these.

Python-comparing mutliple files from different folders and generating diff files

i want to automate below scenario in python
Actual-
cc0023-base.txt
cc9038.final.txt
Expected:
base.txt
final.txt
1"Actual" and "Expected" are two different folders under same directory.i want to compare "base" and "final" files of both folders and generate the diff file in another folder.
Diff:
base-diff.txt
final-diff.txt
how do i do it in python. below is the sample code which i have written,but its generating diff files of all possible combinations.I need that base should be compared only with base and final with final of both folders.
expected_files=os.listdir('expected/path')
actual_files = os.listdir('actual/path')
diff_files=os.listdir('diff/path')
cr=['base.txt','final.txt']
i=0
for files in expected_files:
tst=os.path.join('expected/path',files)
with open(tst,'r')as Expected:
for actualfile in actual_files:
actualpath=os.path.join('actual/path',actualfile)
with open(actualpath,'r') as actual:
diff=difflib.unified_diff(Expected.readlines(),
actual.readlines(),
fromfile=Expected,
tofile=actual,)
diffpath=os.path.join('diff/path',cr[i])
diff_file = open(diffpath, 'w')
for line in diff:
diff_file.write(line)
diff_file.close()
i=i+1
Please help,as i am new to python
The issue in your code is in this section:
i=0
diffpath=os.path.join('diff/path',cr[i])
diff_file = open(diffpath, 'w')
for line in diff:
diff_file.write(line)
diff_file.close()
i=i+1
Since you are always setting i to 0 before accessing cr[i] it will always be cr[0]
move the i=0 to before the loop starts that you want to initialize the value to 0.
I think you want something like this:
expected_files=os.listdir('expected/path')
actual_files = os.listdir('actual/path')
diff_files=os.listdir('diff/path')
cr=['base.txt','final.txt']
j=1
for files in expected_files:
tst=os.path.join('expected/path',files)
with open(tst,'r')as Expected:
#i=0
for i, actualfile in enumerate(actual_files):
actualpath=os.path.join('actual/path',actualfile)
with open(actualpath,'r') as actual:
diff=difflib.unified_diff(Expected.readlines(),
actual.readlines(),
fromfile=Expected,
tofile=actual,)
diffpath=os.path.join('diff/path',cr[i])
with open(diffpath, 'w') as diff_file:
for line in diff:
diff_file.write(line)
#diff_file.close()
#i=i+1
Some explanation, so the enumerate(actual_files) will give you an index i and the data from the list actualfile this way you don't have to do the incrementing yourself. (Also worth noting that this will break for more than 2 files in your directory!) Also, you can use with open() as foo: syntax for writes as shown.

Python: `paste' multiple (unknown) csvs together

What I am essentially looking for is the `paste' command in bash, but in Python2. Suppose I have a csv file:
a1,b1,c1,d1
a2,b2,c2,d2
a3,b3,c3,d3
And another such:
e1,f1
e2,f2
e3,f3
I want to pull them together into this:
a1,b1,c1,d1,e1,f1
a2,b2,c2,d2,e2,f2
a3,b3,c3,d3,e3,f3
This is the simplest case where I have a known number and only two. What if I wanted to do this with an arbitrary number of files without knowing how many I have.
I am thinking along the lines of using zip with a list of csv.reader iterables. There will be some unpacking involved but seems like this much python-foo is above my IQ level ATM. Can someone suggest how to implement this idea or something completely different?
I suspect this should be doable with a short snippet. Thanks.
file1 = open("file1.csv", "r")
file2 = open("file2.csv", "r")
for line in file1:
print(line.strip().strip(",") +","+ file2.readline().strip()+"\n")
Extendable for as many files as you wish. Just keep adding to the print statement. Instead of print you can also have a append to a list or whatever you wish. You may have to worry about length of files, I did not as you did not specify.
Assuming the number of files is unknown, and that all the files are properly formatted as csv have the same number of lines:
files = ['csv1', 'csv2', 'csv3']
fs = map(open, files)
done = False
while not done:
chunks = []
for f in fs:
try:
l = next(f).strip()
chunks.append(l)
except StopIteration:
done = True
break
if not done:
print ','.join(chunks)
for f in fs:
f.close()
There seems to be no easy way of using context managers with a variable list of files easily, at least in Python 2 (see a comment in the accepted answer here), so manual closing of files will be required as above.
You could try pandas
In your case, group of [a,b,c,d] and [e,f] could be treated as DataFrame in Pandas, and it's easy to do join because Pandas has function called concat.
import pandas as pd
# define group [a-d] as df1
df1 = pd.read_csv('1.csv')
# define group [e-f] as df2
df2 = pd.read_csv('2.csv')
pd.concat(df1,df2,axis=1)

How to match a python formatstring to elements in a list?

I use Python and there's a list of file names of different file types. Text files may look like these:
01.txt
02.txt
03.txt
...
Let's assume the text files are all numbered in this manner. Now I want to get all the text files with the number ranging from 1 to 25. So I would like to provide a formatstring like %02i.txt via GUI in order to identify all the matching file names.
My solution so far is a nested for loop. The outer loop iterates over the whole list and the inner loop counts from 1 to 25 for every file:
fmt = '%02i.txt'
for f in files:
for i in range(1, 25+1):
if f == fmt % i:
# do stuff
This nested loop doesn't look very pretty and the complexity is O(n²). So it could take a while on very long lists. Is there a smarter/pythonic way of doing this?
Well, yes, I could use a regular expression like ^\d{2}\.txt$, but a formatstring with % is way easier to type.
You can use a set:
fmt = '%02i.txt'
targets = {fmt % i for i in range(1, 25+1)}
then
for f in files:
if f in targets:
# do stuff
A more pythonic way to iterate through files is through use of the glob module.
>>> import glob
>>> for f in glob.iglob('[0-9][0-9].txt'):
print f
01.txt
02.txt
03.txt

Categories