Python: How to iterate through set of files based on file names? - python

I have a set of files named like this:
qd-p64-dZP-d64-z8-8nn.q
qd-p8-dPZ-d8-z1-1nn.q qq-p8-dZP-d8-z1-2nn.q
qd-p8-dPZ-d8-z1-2nn.q qq-p8-dZP-d8-z1-4nn.q
qd-p8-dPZ-d8-z1-4nn.q qq-p8-dZP-d8-z16-1nn.q
qd-p8-dPZ-d8-z16-1nn.q qq-p8-dZP-d8-z16-2nn.q
qd-p8-dPZ-d8-z16-2nn.q qq-p8-dZP-d8-z16-4nn.q
qd-p8-dPZ-d8-z16-4nn.q qq-p8-dZP-d8-z16-8nn.q
qd-p8-dPZ-d8-z16-8nn.q qq-p8-dZP-d8-z1-8nn.q
qd-p8-dPZ-d8-z1-8nn.q qq-p8-dZP-d8-z2-1nn.q
qd-p8-dPZ-d8-z2-1nn.q qq-p8-dZP-d8-z2-2nn.q
qd-p8-dPZ-d8-z2-2nn.q qq-p8-dZP-d8-z2-4nn.q
qd-p8-dPZ-d8-z2-4nn.q qq-p8-dZP-d8-z2-8nn.q
qd-p8-dPZ-d8-z2-8nn.q qq-p8-dZP-d8-z32-1nn.q
qd-p8-dPZ-d8-z32-1nn.q qq-p8-dZP-d8-z32-2nn.q
qd-p8-dPZ-d8-z32-2nn.q qq-p8-dZP-d8-z32-4nn.q
qd-p8-dPZ-d8-z32-4nn.q qq-p8-dZP-d8-z32-8nn.q
qd-p8-dPZ-d8-z32-8nn.q qq-p8-dZP-d8-z4-1nn.q
qd-p8-dPZ-d8-z4-1nn.q qq-p8-dZP-d8-z4-2nn.q
qd-p8-dPZ-d8-z4-2nn.q qq-p8-dZP-d8-z4-4nn.q
The information to iterate is given in the file names, for example:
Fix
dZP, 1nn, z2,
and vary
d
with values
{d8, d16, d32 d64}
Then, increase z value to get
dZP, 1nn, z4
and vary d again
{d8, d16, d32 d64}
Once I'm able to iterate like this I need to do some information processing from the files.

Looks like a good task for a generator. I just did it for d, z, and n, but it should be easy enough to generalize to all of your filename fields:
def filename_generator():
l1 = ['d8', 'd16', 'd32', 'd64']
l2 = ['z1', 'z2', 'z4', ,'z8', 'z16', 'z32']
l3 = ['1nn', '2nn', '4nn', '8nn']
for n in l3:
for z in l2:
for d in l1:
yield '%s-%s-%s.q' % (d, z, n)

You could something like the following. It may not be exactly what you want, since you've left some important details out of your question, but I've attempted to write it in a way to make it easy for you to change as necessary depending on what you really want.
In a nutshell, what it does is use the re module break each filename up into "fields" with the numeric value found in each. These values are assigned corresponding names it a temporary dictionary, which is then used to create a namedtuple of the values with the desired field precedence. Other parts of the filename are ignored.
The initial filename list can be obtained from the file system using os.listdir() or glob.glob().
from collections import namedtuple
import re
filenames = ['qd-p64-dZP-d64-z8-8nn.q', 'qd-p8-dPZ-d8-z1-1nn.q',
'qd-p8-dPZ-d8-z1-2nn.q', 'qd-p8-dPZ-d8-z1-4nn.q',
'qd-p8-dPZ-d8-z16-1nn.q', 'qd-p8-dPZ-d8-z16-2nn.q',
'qd-p8-dPZ-d8-z16-4nn.q', 'qd-p8-dPZ-d8-z16-8nn.q',
'qd-p8-dPZ-d8-z1-8nn.q', 'qd-p8-dPZ-d8-z2-1nn.q',
'qd-p8-dPZ-d8-z2-2nn.q', 'qd-p8-dPZ-d8-z2-4nn.q',
'qd-p8-dPZ-d8-z2-8nn.q', 'qd-p8-dPZ-d8-z32-1nn.q',
'qd-p8-dPZ-d8-z32-2nn.q', 'qd-p8-dPZ-d8-z32-4nn.q',
'qd-p8-dPZ-d8-z32-8nn.q', 'qd-p8-dPZ-d8-z4-1nn.q',
'qd-p8-dPZ-d8-z4-2nn.q', 'qq-p8-dZP-d8-z1-2nn.q',
'qq-p8-dZP-d8-z1-4nn.q', 'qq-p8-dZP-d8-z16-1nn.q',
'qq-p8-dZP-d8-z16-2nn.q', 'qq-p8-dZP-d8-z16-4nn.q',
'qq-p8-dZP-d8-z16-8nn.q', 'qq-p8-dZP-d8-z1-8nn.q',
'qq-p8-dZP-d8-z2-1nn.q', 'qq-p8-dZP-d8-z2-2nn.q',
'qq-p8-dZP-d8-z2-4nn.q', 'qq-p8-dZP-d8-z2-8nn.q',
'qq-p8-dZP-d8-z32-1nn.q', 'qq-p8-dZP-d8-z32-2nn.q',
'qq-p8-dZP-d8-z32-4nn.q', 'qq-p8-dZP-d8-z32-8nn.q',
'qq-p8-dZP-d8-z4-1nn.q', 'qq-p8-dZP-d8-z4-2nn.q',
'qq-p8-dZP-d8-z4-4nn.q']
filename_order = ('p', 'd', 'z', 'nn') # order fields occur in the filenames
fieldname_order = ('z', 'd', 'p', 'nn') # desired field sort order
OrderedTuple = namedtuple('OrderedTuple', fieldname_order)
def keyfunc(filename):
values = [int(value) for value in re.findall(r'-\D*(\d+)', filename)]
parts = dict(zip(filename_order, values))
return OrderedTuple(**parts)
filenames.sort(key=keyfunc) # sort filename list in-place
Resulting order of filenames in list:
['qd-p8-dPZ-d8-z1-1nn.q', 'qd-p8-dPZ-d8-z1-2nn.q', 'qq-p8-dZP-d8-z1-2nn.q',
'qd-p8-dPZ-d8-z1-4nn.q', 'qq-p8-dZP-d8-z1-4nn.q', 'qd-p8-dPZ-d8-z1-8nn.q',
'qq-p8-dZP-d8-z1-8nn.q', 'qd-p8-dPZ-d8-z2-1nn.q', 'qq-p8-dZP-d8-z2-1nn.q',
'qd-p8-dPZ-d8-z2-2nn.q', 'qq-p8-dZP-d8-z2-2nn.q', 'qd-p8-dPZ-d8-z2-4nn.q',
'qq-p8-dZP-d8-z2-4nn.q', 'qd-p8-dPZ-d8-z2-8nn.q', 'qq-p8-dZP-d8-z2-8nn.q',
'qd-p8-dPZ-d8-z4-1nn.q', 'qq-p8-dZP-d8-z4-1nn.q', 'qd-p8-dPZ-d8-z4-2nn.q',
'qq-p8-dZP-d8-z4-2nn.q', 'qq-p8-dZP-d8-z4-4nn.q',
'qd-p64-dZP-d64-z8-8nn.q', 'qd-p8-dPZ-d8-z16-1nn.q',
'qq-p8-dZP-d8-z16-1nn.q', 'qd-p8-dPZ-d8-z16-2nn.q',
'qq-p8-dZP-d8-z16-2nn.q', 'qd-p8-dPZ-d8-z16-4nn.q',
'qq-p8-dZP-d8-z16-4nn.q', 'qd-p8-dPZ-d8-z16-8nn.q',
'qq-p8-dZP-d8-z16-8nn.q', 'qd-p8-dPZ-d8-z32-1nn.q',
'qq-p8-dZP-d8-z32-1nn.q', 'qd-p8-dPZ-d8-z32-2nn.q',
'qq-p8-dZP-d8-z32-2nn.q', 'qd-p8-dPZ-d8-z32-4nn.q',
'qq-p8-dZP-d8-z32-4nn.q', 'qd-p8-dPZ-d8-z32-8nn.q',
'qq-p8-dZP-d8-z32-8nn.q']

Related

Python - Get newest dict value where string = string

I have this code and it works. But I want to get two different files.
file_type returns either NP or KL. So I want to get the NP file with the max value and I want to get the KL file with the max value.
The dict looks like
{"Blah_Blah_NP_2022-11-01_003006.xlsx": "2022-03-11",
"Blah_Blah_KL_2022-11-01_003006.xlsx": "2022-03-11"}
This is my code and right now I am just getting the max date without regard to time. Since the date is formatted how it is and I don't care about time, I can just use max().
I'm having trouble expanding the below code to give me the greatest NP file and the greatest KL file. Again, file_type returns the NP or KL string from the file name.
file_dict = {}
file_path = Path(r'\\place\Report')
for file in file_path.iterdir():
if file.is_file():
path_object = Path(file)
filename = path_object.name
stem = path_object.stem
file_type = file_date = stem.split("_")[2]
file_date = stem.split("_")[3]
file_dict.update({filename: file_date})
newest = max(file_dict, key=file_dict.get)
return newest
I basically want newest where file_type = NP and also newest where file_type = KL
You could filter the dictionary into two dictionaries (or however many you need if there's more types) and then get the max date for any of those.
But the whole operation can be done efficiently in only few lines:
from pathlib import Path
from datetime import datetime
def get_newest():
maxs = {}
for file in Path(r'./examples').iterdir():
if file.is_file():
*_, t, d, _ = file.stem.split('_')
d = datetime(*map(int, d.split('-')))
maxs[t] = d if t not in maxs else max(d, maxs[t])
return maxs
print(get_newest())
This:
collects the maximum date for each type into a dict maxs
loops over the files like you did (but in a location where I created some examples following your pattern)
only looks at the files, like your code
assumes the files all meet your pattern, and splits them over '_', only keeping the next to last part as the date and the part before it as the type
converts the date into a datetime object
keeps whichever is greater, the new date or a previously stored one (if any)
Result:
{'KL': datetime.datetime(2023, 11, 1, 0, 0), 'NP': datetime.datetime(2022, 11, 2, 0, 0)}
The files in the folder:
Blah_Blah_KL_2022-11-01_003006.txt
Blah_Blah_KL_2023-11-01_003006.txt
Blah_Blah_NP_2022-11-02_003051.txt
Blah_Blah_NP_2022-11-01_003006.txt
Blah_Blah_KL_2021-11-01_003006.txt
In the comments you asked
no idea how the above code it getting the diff file types and the max. Is it just looing for all the diff types in general? It's hard to know what each piece is with names like s, d, t, etc. Really lost on *_, t, d, _ = and also d = datetime(*map(int, d.split('-')))
That's a fair point, I prefer short names when I think the meaning is clear, but a descriptive name might have been better. t is for type (and type would be a bad name, shadowing type, so perhaps file_type). d is for date, or dt for datetime might have been better. I don't see s?
The *_, t, d, _ = is called 'extended tuple unpacking', it takes all the results from what follows and only keeps the 3rd and 2nd to last, as t and d respectively, and throws the rest away. The _ takes up a position, but the underscore indicates we "don't care" about whatever is in that position. And the *_ similarly gobbles up all values at the start, as explained in the linked PEP article.
The d = datetime(*map(int, d.split('-'))) is best read from the inside out. d.split('-') just takes a date string like '2022-11-01' and splits it. The map(int, ...) that's applied to the result applies the int() function to every part of that result - so it turns ('2022', '11', '01') into (2022, 11, 1). The * in front of map() spreads the results as parameters to datetime - so, datetime(2022, 11, 1) would be called in this example.
This is what I both like and hate about Python - as you get better at it, there are very concise (and arguably beautiful - user #ArtemErmakov seems to agree) ways to write clean solutions. But they become hard to read unless you know most of the basics of the language. They're not easy to understand for a beginner, which is arguably a bad feature of a language.
To answer the broader question: since the loop takes each file, gets the type (like 'KL') from it and gets the date, it can then check the dictionary, add the date if the type is new, or if the type was already in the dictionary, update it with the maximum of the two, which is what this line does:
maxs[t] = d if t not in maxs else max(d, maxs[t])
I would recommend you keep asking questions - and whenever you see something like this code, try to break it down into all it small parts, and see what specific parts you don't understand. Python is a powerful language.
As a bonus, here is the same solution, but written a bit more clearly to show what is going on:
from pathlib import Path
from datetime import datetime
def get_newest_too():
maximums = {}
for file_path in Path(r'./examples').iterdir():
if file_path.is_file():
split_file = file_path.stem.split('_')
file_type = split_file[-3]
date_time_text = split_file[-2]
date_time_parts = (int(part) for part in date_time_text.split('-'))
date_time = datetime(*date_time_parts) # spreading is just right here
if file_type in maximums:
maximums[file_type] = max(date_time, maximums[file_type])
else:
maximums[file_type] = date_time
return maximums
print(get_newest_too())
Edit: From the comments, it became clear that you had trouble selecting the actual file of each specific type for which the date was the maximum for that type.
Here's how to do that:
from pathlib import Path
from datetime import datetime
def get_newest():
maxs = {}
for file in Path(r'./examples').iterdir():
if file.is_file():
*_, t, d, _ = file.stem.split('_')
d = datetime(*map(int, d.split('-')))
maxs[t] = (d, file) if t not in maxs else max((d, file), maxs[t])
return {f: d for _, (d, f) in maxs.items()}
print(get_newest())
Result:
{WindowsPath('examples/Blah_Blah_KL_2023-11-01_003006.txt'): datetime.datetime(2023, 11, 1, 0, 0), WindowsPath('examples/Blah_Blah_NP_2022-11-02_003051.txt'): datetime.datetime(2022, 11, 2, 0, 0)}
You could construct another dict containing only the items you need:
file_dict_NP = {key:value for key, value in file_dict.items() if 'NP' in key}
And then do the same thing on it:
newest_NP = max(file_dict_NP, key=file_dict_NP.get)

Finding all possible permutations of a hash when given list of grouped elements

Best way to show what I'm trying to do:
I have a list of different hashes that consist of ordered elements, seperated by an underscore. Each element may or may not have other possible replacement values. I'm trying to generate a list of all possible combinations of this hash, after taking into account replacement values.
Example:
grouped_elements = [["1", "1a", "1b"], ["3", "3a"]]
original_hash = "1_2_3_4_5"
I want to be able to generate a list of the following hashes:
[
"1_2_3_4_5",
"1a_2_3_4_5",
"1b_2_3_4_5",
"1_2_3a_4_5",
"1a_2_3a_4_5",
"1b_2_3a_4_5",
]
The challenge is that this'll be needed on large dataframes.
So far here's what I have:
def return_all_possible_hashes(df, grouped_elements)
rows_to_append = []
for grouped_element in grouped_elements:
for index, row in enriched_routes[
df["hash"].str.contains("|".join(grouped_element))
].iterrows():
(element_used_in_hash,) = set(grouped_element) & set(row["hash"].split("_"))
hash_used = row["hash"]
replacement_elements = set(grouped_element) - set([element_used_in_hash])
for replacement_element in replacement_elements:
row["hash"] = stop_hash_used.replace(
element_used_in_hash, replacement_element
)
rows_to_append.append(row)
return df.append(rows_to_append)
But the problem is that this will only append hashes with all combinations of a given grouped_element, and not all combinations of all grouped_elements at the same time. So using the example above, my function would return:
[
"1_2_3_4_5",
"1a_2_3_4_5",
"1b_2_3_4_5",
"1_2_3a_4_5",
]
I feel like I'm not far from the solution, but I also feel stuck, so any help is much appreciated!
If you make a list of the original hash value's elements and replace each element with a list of all its possible variations, you can use itertools.product to get the Cartesian product across these sublists. Transforming each element of the result back to a string with '_'.join() will get you the list of possible hashes:
from itertools import product
def possible_hashes(original_hash, grouped_elements):
hash_list = original_hash.split('_')
variations = list(set().union(*grouped_elements))
var_list = hash_list.copy()
for i, h in enumerate(hash_list):
if h in variations:
for g in grouped_elements:
if h in g:
var_list[i] = g
break
else:
var_list[i] = [h]
return ['_'.join(h) for h in product(*var_list)]
possible_hashes("1_2_3_4_5", [["1", "1a", "1b"], ["3", "3a"]])
['1_2_3_4_5',
'1_2_3a_4_5',
'1a_2_3_4_5',
'1a_2_3a_4_5',
'1b_2_3_4_5',
'1b_2_3a_4_5']
To use this function on various original hash values stored in a dataframe column, you can do something like this:
df['hash'].apply(lambda x: possible_hashes(x, grouped_elements))

How to sort a list of strings delimited by '.' with also numbers in the middle?

I have a list of strings that contain commands separated by a dot . like this:
DeviceA.CommandA.1.Hello,
DeviceA.CommandA.2.Hello,
DeviceA.CommandA.11.Hello,
DeviceA.CommandA.3.Hello,
DeviceA.CommandB.1.Hello,
DeviceA.CommandB.1.Bye,
DeviceB.CommandB.What,
DeviceA.SubdeviceA.CommandB.1.Hello,
DeviceA.SubdeviceA.CommandB.2.Hello,
DeviceA.SubdeviceB.CommandA.1.What
And I would want to order them in natural order:
The order must prioritize by field index (e.g The commands that start with DeviceA will always go before DeviceB etc)
Order alphabetically the strings
When it finds a number sort numerically in ascending order
Therefore, the sorted output should be:
DeviceA.CommandA.1.Hello,
DeviceA.CommandA.2.Hello,
DeviceA.CommandA.3.Hello,
DeviceA.CommandA.11.Hello,
DeviceA.CommandB.1.Bye,
DeviceA.CommandB.1.Hello,
DeviceA.SubdeviceA.CommandB.1.Hello,
DeviceA.SubdeviceA.CommandB.2.Hello,
DeviceA.SubdeviceB.CommandA.What,
DeviceB.CommandB.What
Also note that the length of the command fields is dynamic, the number of fields separated by dot can be any size.
So far I tried this without luck (the numbers are order alphabetically, for example 11 goes before 5):
list = [
"DeviceA.CommandA.1.Hello",
"DeviceA.CommandA.2.Hello",
"DeviceA.CommandA.11.Hello",
"DeviceA.CommandA.3.Hello",
"DeviceA.CommandB.1.Hello",
"DeviceA.CommandB.1.Bye",
"DeviceB.CommandB.What",
"DeviceA.SubdeviceA.CommandB.1.Hello",
"DeviceA.SubdeviceA.CommandB.2.Hello",
"DeviceA.SubdeviceB.CommandA.1.What"
]
sorted_list = sorted(list, key=lambda x: x.split('.'))
EDIT: Corrected typo error.
Something like this should get you going.
from pprint import pprint
data_list = [
"DeviceA.CommandA.1.Hello",
"DeviceA.CommandA.2.Hello",
"DeviceA.CommandA.3.Hello",
"DeviceA.CommandB.1.Hello",
"DeviceA.CommandB.1.Bye",
"DeviceB.CommandB.What",
"DeviceA.SubdeviceA.CommandB.1.Hello",
"DeviceA.SubdeviceA.CommandB.15.Hello", # added test case to ensure numbers are sorted numerically
"DeviceA.SubdeviceA.CommandB.2.Hello",
"DeviceA.SubdeviceB.CommandA.1.What",
]
def get_sort_key(s):
# Turning the pieces to integers would fail some comparisons (1 vs "What")
# so instead pad them on the left to a suitably long string
return [
bit.rjust(30, "0") if bit.isdigit() else bit
for bit in s.split(".")
]
# Note the key function must be passed as a kwarg.
sorted_list = sorted(data_list, key=get_sort_key)
pprint(sorted_list)
The output is
['DeviceA.CommandA.1.Hello',
'DeviceA.CommandA.2.Hello',
'DeviceA.CommandA.3.Hello',
'DeviceA.CommandB.1.Bye',
'DeviceA.CommandB.1.Hello',
'DeviceA.SubdeviceA.CommandB.1.Hello',
'DeviceA.SubdeviceA.CommandB.2.Hello',
'DeviceA.SubdeviceA.CommandB.15.Hello',
'DeviceA.SubdeviceB.CommandA.1.What',
'DeviceB.CommandB.What']
Specifying a key in sorted seems to achieve what you want:
import re
def my_key(s):
n = re.search("\d+",s)
return (s[:n.span()[0]], int(n[0])) if n else (s,)
print(sorted(l, key = my_key))
Output:
['DeviceA.CommandA.1.Hello', 'DeviceA.CommandA.2.Hello', 'DeviceA.CommandA.3.Hello', 'DeviceA.CommandA.11.Hello', 'DeviceA.CommandB.1.Hello', 'DeviceA.CommandB.1.Bye', 'DeviceA.SubdeviceA.CommandB.1.Hello', 'DeviceA.SubdeviceA.CommandB.2.Hello', 'DeviceA.SubdeviceB.CommandA.1.What', 'DeviceB.CommandB.What']
There are many ways to achieve this. Here's one that doesn't rely on importation of any additional modules:
LOS = ['DeviceA.CommandA.1.Hello',
'DeviceA.CommandA.2.Hello',
'DeviceA.CommandA.11.Hello',
'DeviceA.CommandA.3.Hello',
'DeviceA.CommandB.1.Hello',
'DeviceA.CommandB.1.Bye',
'DeviceB.CommandB.What',
'DeviceA.SubdeviceA.CommandB.1.Hello',
'DeviceA.SubdeviceA.CommandB.2.Hello',
'DeviceA.SubdeviceB.CommandA.1.What']
def func(s):
tokens = s.split('.')
for i, token in enumerate(tokens):
try:
v = int(token)
return ('.'.join(tokens[0:i]), v)
except ValueError:
pass
return (s, 0)
print(sorted(LOS, key=func))

How to zip generated lists when random lists return empty, using itertools and fillvalue?

I am trying to zip together multiple lists that contain 10 values. The lists are created by an iterator. Sometimes, the lists generated contain less than 10 values or even 0 values. Thus, I sometimes run into the problem of trying to zip together a list of 10 values with a list of 0 values, or even a list of 0 values with another list of 0 values. I am trying to get python to recognize a list with 0 values and to then fill that list with 0s. This is what I have (the second URL is the problem):
import grequests
import json
import time
import itertools
urls3 = [
#'https://api.livecoin.net/exchange/order_book?currencyPair=RBIES/BTC&depth=5',
'https://api.livecoin.net/exchange/order_book?currencyPair=REE/BTC&depth=5',
#'https://api.livecoin.net/exchange/order_book?currencyPair=RLT/BTC&depth=5',
]
requests = (grequests.get(u) for u in urls3)
responses = grequests.map(requests)
#CellRange("B28:DJ48").clear()
def make_column(catalog_response, name):
column = []
catalog1 = list(itertools.izip_longest(catalog_response.json()[name][0:5], fillvalue='0 '))
#catalog1 = catalog_response.json()[name][0:5]
print(catalog1)
#quantities1, rates1 = list(itertools.izip_longest(*catalog1,fillvalue='0.0001')) #uncomment for print #2
#quantities1, rates1 = zip(*catalog1) #uncomment for print #2
print(quantities1)
Printing out catalog1 for only the second link results in the following output:
[]
[([u'0.00000001', u'9907729.00000000'],), ([u'0.00000001', u'44800.00000000'],), ([u'0.00000002', u'8463566.49169284'],), ([u'0.00000002', u'3185222.59932121'],), ([u'0.00000002', u'25000.00000000'],)]
As you can see, the first array prints [], its empty. This does not make sense to me. I did a trial run with a simpler example of what I am trying to attempt, and it worked just fine:
import itertools
list1 = ['a', 'b', 'c', 'd', 'e']
list2 = []
print list(itertools.izip_longest(list1,list2, fillvalue='0'))
This output the following:
[('a', '0'), ('b', '0'), ('c', '0'), ('d', '0'), ('e', '0')]
I thought that maybe running
`column = []
catalog1 = list(itertools.izip_longest(catalog_response.json()[name][0:5], fillvalue='0 '))
#catalog1 = catalog_response.json()[name][0:5]
#print(catalog1)
quantities1, rates1 = list(itertools.izip_longest(*catalog1,fillvalue='0')) #uncomment for print #2
#quantities1, rates1 = zip(*catalog1) #uncomment for print #2
print(quantities1)`
might fix the issue. But it returns the following error: ValueError: need more than 0 values to unpack. I cannot seem to figure out why the empty array is not being filled with zeros like my simpler example. In reality, any method that would populate the empty array with a tupled list of zeros would work for me. I apologize if this is unclear, I am brand new to coding, and I have spent a considerable amount of time on this project and feel like I am getting lost in the weeds. Any help is appreciated.
Note: This question is directly related to my other question at How do I get my DataNitro table to either skip over failed iterations or print none to the table? but I felt like the two questions, though sharing the same end, are distinct.
lst2 = ([[u'0',u'0'],[u'0',u'0'],[u'0',u'0'],[u'0',u'0'],[u'0',u'0']])
catalog1 = catalog_response.json()[name][0:5]
S = catalog1 + lst2
quantities1, rates1 = zip(*S)

difflib.SequenceMatcher not returning unique ratio

I am trying to compare 2 street networks and when i run this code it returns a a ratio of .253529... i need it to compare each row to get a unique value so i can query out the streets that dont match. What can i do it get it to return unique ratio values per row?
# Set local variables
inFeatures = gp.GetParameterAsText(0)
fieldName = gp.GetParameterAsText(1)
fieldName1 = gp.GetParameterAsText(2)
fieldName2 = gp.GetParameterAsText(3)
expression = difflib.SequenceMatcher(None,fieldName1,fieldName2).ratio()
# Execute CalculateField
arcpy.CalculateField_management(inFeatures, fieldName, expression, "PYTHON_9.3")
If you know both files always have the exact same number of lines, a simple approach like this would work:
ratios = []
with open('fieldName1', 'r') as f1, open('fieldName2', 'r') as f2:
for l1, l2 in zip(f1, f2):
R = difflib.SequenceMatcher(None,l1,l2).ratio()
ratios.append((l1, l2, R))
This will produce a list of tuples like this:
[("aa", "aa", 1), ("aa", "ab", 0.5), ...]
If your files are different sizes you'll need to find some way to match up the lines, or otherwise handle it

Categories