How to automate dictionary creation in Python

How to automate dictionary creation in Python - python

I am trying to write a python code that solves a Sudoku puzzle. My code starts by making a list of each row/column combination, or the coordinates of each box. Next, I want to find a way to, for each box, reference its location.
This is my current code:
boxes = []
for i in range(1, 10):
for x in range(1,10):
boxes = boxes + ['r'+str(i)+'c'+str(x)]
for box in boxes:
Next, I was going to create a dictionary for each one, but I would want each to be named by the list item. The dictionaries would be, for example, r1c1 = {'row': '1', 'Column': 1}.
What is the best way to separate and store this information?

You don't need to create all those dictionaries. You already have your coordinates, just don't lock them up in strings:
boxes = []
for i in range(1, 10):
for x in range(1,10):
boxes.append((i, x))
would create a list of (row, column) tuples instead, and you then wouldn't have to map them back.
Even if you needed to associate strings with data, you could do so in a nested dictionary:
coordinates = {
'r1c1': {'row': 1, 'column': 1},
# ...
}
but you could also parse that string and extract the numbers after r and c to produce the row and column numbers again.
In fact, I wrote a Sudoku checker on the same principles once; in the following code block_indices, per9() and zip(*per9(s)) produce indices for each block, row or column of a puzzle, letting you verify that you have 9 unique values in each. The only difference is that instead of a matrix, I used one long list to represent a puzzle, all elements from row to row included in sequence:
from itertools import product
block_indices = [[x + y + s for s in (0, 1, 2, 9, 10, 11, 18, 19, 20)]
for x, y in product(range(0, 81, 27), range(0, 9, 3))]
def per9(iterable):
# group iterable in chunks of 9
return zip(*([iter(iterable)] * 9))
def is_valid_sudoku(s):
return (
# rows
all(len(set(r)) == 9 for r in per9(s)) and
# columns
all(len(set(c)) == 9 for c in zip(*per9(s))) and
# blocks
all(len(set(s[i] for i in ix)) == 9 for ix in block_indices)
)
So row 1, column 4 is 1 * 9 + 4 = index 13 in a flat list.

While Martijn's answer is probably better from a "what you should do" perspective, for completeness, you could build that structure pretty easily using dictionary comprehension:
The below annotated code will output your desire data structure:
boxes = {
"r%sc%s"%(i,j): # build the keys in the form "r1c2"
{'row':i,"column":j} # build the dictionary of values - {'row':1, 'column': 2}
for i in range(1,10) # first level of loop
for j in range(1,10) # second level of loop
}
print boxes
This will output in your desired format:
{ 'r1c1': { 'column': 1, 'row': 1},
'r1c2': { 'column': 2, 'row': 1},
'r1c3': { 'column': 3, 'row': 1},
'r1c4': { 'column': 4, 'row': 1},
....
}

Related

How to make a Custom Sorting Function for Dictionary Key Values?

I have a dictionary whose key values are kind of like this,
CC-1A
CC-1B
CC-1C
CC-3A
CC-3B
CC-5A
CC-7A
CC-7B
CC-7D
SS-1A
SS-1B
SS-1C
SS-3A
SS-3B
SS-5A
SS-5B
lst = ['CC-1A', 'CC-1B', 'CC-1C', 'CC-3A', 'CC-3B', 'CC-5A', 'CC-7A', 'CC-7B',
'CC-7D', 'SS-1A', 'SS-1B', 'SS-1C', 'SS-3A', 'SS-3B', 'SS-5A', 'SS-5B']
d = dict.fromkeys(lst)
^Not exactly in this order, but in fact they are all randomly placed in the dictionary as key values.
Now, I want to sort them. If I use the built in function to sort the dictionary, it sorts all the key values according to the order given above.
However, I want the dictionary to be first sorted based upon the values after the - sign (i.e. 1A, 1B, 1C etc.) and then based upon the first two characters.
So, for the values given above, following would be my sorted list,
CC-1A
CC-1B
CC-1C
SS-1A
SS-1B
SS-1C
CC-3A
CC-3B
SS-3A
SS-3B
CC-5A
and so on
First, sorting is done based upon the "4th" character in the keys. (that is, 1, 3, etc.)
Then sorting is done based upon the last character (i.e. A, B etc.)
Then sorting is done based upon the first two characters of the keys (i.e. CC, SS etc.)
Is there any way to achieve this?

Your "wanted" and your sorting description deviate.
Your "wanted" can be achieved by
di = {"CC-1A":"value1","CC-1A":"value2","CC-1B":"value3",
"CC-1C":"value4","CC-3A":"value5","CC-3B":"value6",
"CC-5A":"value7","CC-7A":"value8","CC-7B":"value9",
"CC-7D":"value0","SS-1A":"value11","SS-1B":"value12",
"SS-1C":"value13","SS-3A":"value14","SS-3B":"value15",
"SS-5A":"value16","SS-5B":"value17"}
print(*((v,di[v]) for v in sorted(di, key= lambda x: (x[3], x[:2], x[4]) )),
sep="\n")
to get
('CC-1A', 'value2')
('CC-1B', 'value3')
('CC-1C', 'value4')
('SS-1A', 'value11')
('SS-1B', 'value12')
('SS-1C', 'value13')
('CC-3A', 'value5')
('CC-3B', 'value6')
('SS-3A', 'value14')
('SS-3B', 'value15')
('CC-5A', 'value7')
('SS-5A', 'value16')
('SS-5B', 'value17')
('CC-7A', 'value8')
('CC-7B', 'value9')
('CC-7D', 'value0')
which sorts by number (Pos 4 - (1based)), Start (Pos 1+2 (1based)) then letter (Pos 5 (1based))
but that conflicts with
First, sorting is done based upon the "4th" character in the keys.
(that is, 1, 3, etc.)
Then sorting is done based upon the last character (i.e. A, B etc.)
Then sorting is done based upon the first two characters of the keys
(i.e. CC, SS etc.)

One suggestion is to use a nested dictionary, so instead of:
my_dict = {'CC-1A1': 2,
'CC-1A2': 3,
'CC-1B': 1,
'CC-1C': 5,
'SS-1A': 33,
'SS-1B': 23,
'SS-1C': 31,
'CC-3A': 55,
'CC-3B': 222,
}
you would have something like:
my_dict = {'CC': {'1A1': 2, '1A2': 3, '1B': 1, '1C': 5, '3A': 55, '3B': 222},
'SS': {'1A': 33, '1B': 22, '1C': 31}
}
which would allow you to sort first based on the leading number/characters and then by group. (Actually I think you want this concept reversed based on your question).
Then you can create two lists with your sorted keys/values by doing something like:
top_keys = sorted(my_dict)
keys_sorted = []
values_sorted = []
for key in top_keys:
keys_sorted.append([f"{key}-{k}" for k in my_dict[key].keys()])
values_sorted.append([v for v in my_dict[key].values()])
flat_keys = [key for sublist in keys_sorted for key in sublist]
flat_values = [value for sublist in values_sorted for value in sublist]
Otherwise, you'd have to implement a custom sorting algorithm based first the characters after the - and subsequently on the initial characters.

You can write a function to build a sorting key that will make the required decomposition of the key strings and return a tuple to sort by. Then use that function as the key= parameter of the sorted function:
D = {'CC-1A': 0, 'CC-1B': 1, 'CC-1C': 2, 'CC-3A': 3, 'CC-3B': 4,
'CC-5A': 5, 'CC-7A': 6, 'CC-7B': 7, 'CC-7D': 8, 'SS-1A': 9,
'SS-1B': 10, 'SS-1C': 11, 'SS-3A': 12, 'SS-3B': 13, 'SS-5A': 14,
'SS-5B': 15}
def sortKey(s):
L,R = s.split("-",1)
return (R[:-1],L)
D={k:D[k] for k in sorted(D.keys(),key=sortKey)}
print(D)
{'CC-1A': 0,
'CC-1B': 1,
'CC-1C': 2,
'SS-1A': 9,
'SS-1B': 10,
'SS-1C': 11,
'CC-3A': 3,
'CC-3B': 4,
'SS-3A': 12,
'SS-3B': 13,
'CC-5A': 5,
'SS-5A': 14,
'SS-5B': 15,
'CC-7A': 6,
'CC-7B': 7,
'CC-7D': 8}
If you expect the numbers to eventually go beyond 9 and want a numerical order, then right justify the R part in the tuple: e.g. return (R[:-1].rjust(10),L)

You could use a custom function that implements your rule as sorting key:
def get_order(tpl):
s = tpl[0].split('-')
return (s[1][0], s[0], s[1][1])
out = dict(sorted(d.items(), key=get_order))
Output:
{'CC-1A': None, 'CC-1B': None, 'CC-1C': None, 'SS-1A': None, 'SS-1B': None, 'SS-1C': None, 'CC-3A': None, 'CC-3B': None, 'SS-3A': None, 'SS-3B': None, 'CC-5A': None, 'SS-5A': None, 'SS-5B': None, 'CC-7A': None, 'CC-7B': None, 'CC-7D': None}

Optimizing python DFS (for loop is inefficient)

Given the following function, what would be the correct and pythonic way to archiving the same (and faster) result?
My code is not efficient and I believe I'm missing something that is staring at me.
The idea is to find a pattern that is [[A,B],[A,C],[C,B]] without having to generate additional permutations (since this will result in a higher processing time for the comparisons).
The length of the dictionary fed into find_path in real-life would be approximately 10,000, so having to iterate over that amount with the current code version below is not efficient.
from time import perf_counter
from typing import List, Generator, Dict
def find_path(data: Dict) -> Generator:
for first_pair in data:
pair1: List[str] = first_pair.split("/")
for second_pair in data:
pair2: List[str] = second_pair.split("/")
if pair2[0] == pair1[0] and pair2[1] != pair1[1]:
for third_pair in data:
pair3: List[str] = third_pair.split("/")
if pair3[0] == pair2[1] and pair3[1] == pair1[1]:
amount_pair_1: int = data.get(first_pair)[
"amount"
]
id_pair_1: int = data.get(first_pair)["id"]
amount_pair_2: int = data.get(second_pair)[
"amount"
]
id_pair_2: int = data.get(second_pair)["id"]
amount_pair_3: int = data.get(third_pair)[
"amount"
]
id_pair_3: int = data.get(third_pair)["id"]
yield (
pair1,
amount_pair_1,
id_pair_1,
pair2,
amount_pair_2,
id_pair_2,
pair3,
amount_pair_3,
id_pair_3,
)
raw_data = {
"EZ/TC": {"id": 1, "amount": 9},
"LM/TH": {"id": 2, "amount": 8},
"CD/EH": {"id": 3, "amount": 7},
"EH/TC": {"id": 4, "amount": 6},
"LM/TC": {"id": 5, "amount": 5},
"CD/TC": {"id": 6, "amount": 4},
"BT/TH": {"id": 7, "amount": 3},
"BT/TX": {"id": 8, "amount": 2},
"TX/TH": {"id": 9, "amount": 1},
}
processed_data = list(find_path(raw_data))
for i in processed_data:
print(("The path to traverse is:", i))
>> ('The path to traverse is:', (['CD', 'TC'], 4, 6, ['CD', 'EH'], 7, 3, ['EH', 'TC'], 6, 4))
>> ('The path to traverse is:', (['BT', 'TH'], 3, 7, ['BT', 'TX'], 2, 8, ['TX', 'TH'], 1, 9))
>> ('Time to complete', 5.748599869548343e-05)
# Timing for a simple ref., as mentioned above, the raw_data is a dict containing about 10,000 keys

You can't do that with this representation of the graph. This algorithm has O(|E|^3) time complexity. It is a good idea to store edges as array of lists, each list will store only adjacent vertexes. And then it is easy to do what you need. Fortunately, you can re-represent graph in O(|E|) time.
How to do that
We will store graph as array of vertices (but in this case because of string vertex-values we take a dictionary). We want to access in all neighbours by a vertex. Let's do that -- we will store in the array lists of all neighbours of the given vertex.
Now we just need to construct our structure by set of edges (aka row_data).
How to add an edge in graph? Easy! We should find a vertex from in our array and add a vertex to to the list of it's neighbours
So, the construct_graph function could be like:
def construct_graph(raw_data): # here we will change representation
graph = defaultdict(list) # our graph
for pair in raw_data: # go through every edge
u, v = pair.split("/") # get from and to vertexes
graph[u].append(v) # and add this edge in our structure
return graph # return our new graph to other functions
How to find path length 2
We will use dfs on our graph.
def dfs(g, u, dist): # this is a simple dfs function
if dist == 2: # we has a 'dist' from our start
return [u] # and if we found already answer, return it
for v in g.get(u, []): # otherwise check all neighbours of current vertex
ans = dfs(g, v, dist + 1) # run dfs in every neighbour with dist+1
if ans: # and if that dfs found something
ans.append(u) # store it in ouy answer
return ans # and return it
return [] # otherwise we found nothing
And then we just try it for every vertex.
def main():
graph = construct_graph(raw_data)
for v in graph.keys(): # here we will try to find path
ans = dfs(graph, v, 0) # starting with 0 dist
if ans: # and if we found something
print(list(reversed(ans))) # return it, but answer will be reversed

Python list appending takes long time?

I have a function to find common, uncommon items and its rates between a given list (one list) and other
lists (60,000 lists) for each user (4,000 users). Running below loop takes too long time and high momery usage
with partial list construction and crash. I think due to the long returned list and heavy elements (tuples),
so I divided it into two functions as below , but it seems the problem in appending list items in the tuple,
[(user, [items],rate),(user, [items],rate),....]. I want to create a dataframes from returned values,
What should I do to an algorithm to get around this matter and reduce memory usage?
Iam using python 3.7, windows 10, 64-bit , RAM 8G.
common items function:
def common_items(user,list1, list2):
com_items = list(set(list1).intersection(set(list2)))
com_items_rate = len(com_items)/len(set(list1).union(set(list2)))
return user, com_items, com_items_rate
uncommon items function:
def uncommon_items(user,list1, list2):
com_items = list(set(list1).intersection(set(list2)))
com_items_rate = len(com_items)/len(set(list1).union(set(list2)))
uncom_items = list(set(list2) - set(com_items)) # uncommon items that blonge to list2
uncom_items_rate = len(uncom_items)/len(set(list1).union(set(list2)))
return user, com_items_rate, uncom_items, uncom_items_rate # common_items_rate is also needed
Constructing the list:
common_item_rate_tuple_list = []
for usr in users: # users.shape = 4,000
list1 = get_user_list(usr) # a function to get list1, it takes 0:00:00.015632 or less for a user
# print(usr, len(list1))
for list2 in df["list2"]: # df.shape = 60,000
common_item_rate_tuple = common_items(usr,list1, list2)
common_item_rate_tuple_list.append(common_item_rate_tuple)
print(len(common_item_rate_tuple_list)) # 4,000 * 60,000 = 240,000,000‬ items
# sample of common_item_rate_tuple_list:
#[(1,[2,5,8], 0.676), (1,[7,4], 0.788), ....(4000,[1,5,7,9],0.318), (4000,[8,9,6],0.521)
I looked at (Memory errors and list limits?) and
(Memory error when appending to list in Python) they deal with constructed list. And I couldnot deal with suggested answer for (Python list memory error).

There are a couple things you should consider for speed and memory management with data this big.
you are or should be working only with sets here because order has no meaning in your lists and you are doing a lot of intersecting of sets. So, can you change your get_user_list() function to return a set instead of a list? That will prevent all of the unnecessary conversions you are doing. Same for list2, just make a set right away
In your look for "uncommon items" you should just use the symmetric difference operator on the sets. Much faster, many less list -> set conversions
at the end of your loop, do you really want to create a list of 240M sub-lists? That is probably your memory explosion. I would suggest a dictionary with keys as user name. and you only need to create an entry in it if there are common items. If there are "sparse" matches, you will get a very much smaller data container
--- Edit w/ example
So I think your hope of keeping it in a data frame is too big. Perhaps you can do what is needed without storing it in a data frame. Dictionary makes sense. You may even be able to compute things "on the fly" and not store the data. Anyhow. Here is a toy example that shows the memory problem using 4K users and 10K "other lists". Of course the size of the intersected sets may make this vary, but it is informative:
import sys
import pandas as pd
# create list of users by index
users = list(range(4000))
match_data = list()
size_list2 = 10_000
for user in users:
for t in range(size_list2):
match_data.append(( user, (1,5,6,9), 0.55)) # 4 dummy matches and fake percentage
print(match_data[:4])
print(f'size of match: {sys.getsizeof(match_data)/1_000_000} MB')
df = pd.DataFrame(match_data)
print(df.head())
print(f'size of dataframe {sys.getsizeof(df)/1_000_000} MB')
This yields the following:
[(0, (1, 5, 6, 9), 0.55), (0, (1, 5, 6, 9), 0.55), (0, (1, 5, 6, 9), 0.55), (0, (1, 5, 6, 9), 0.55)]
size of match: 335.072536 MB
0 1 2
0 0 (1, 5, 6, 9) 0.55
1 0 (1, 5, 6, 9) 0.55
2 0 (1, 5, 6, 9) 0.55
3 0 (1, 5, 6, 9) 0.55
4 0 (1, 5, 6, 9) 0.55
size of dataframe 3200.00016 MB
You can see that a nutshell of your idea for only 10K other lists is 3.2GB in a dataframe. This will be unmanageable.
Here is an idea for a data structure just to use dictionaries all the way.
del df
# just keep it in a dictionary
data = {} # intended format: key= (usr, other_list) : value= [common elements]
# some fake data
user_items = { 1: {2,3,5,7,99},
2: {3,5,88,790},
3: {2,4,100} }
# some fake "list 2 data"
list2 = [ {1,2,3,4,5},
{88, 100},
{11, 13, 200}]
for user in user_items.keys():
for idx, other_set in enumerate(list2): # using enumerate to get the index of the other list
common_elements = user_items.get(user) & other_set # set intersection
if common_elements: # only put it into the dictionary if it is not empty
data[(user, idx)] = common_elements
# try a couple data pulls
print(f'for user 1 and other list 0: {data.get((1, 0))}')
print(f'for user 2 and other list 2: {data.get((2, 2))}') # use .get() to be safe. It will return None if no entry
The output here is:
for user 1 and other list 0: {2, 3, 5}
for user 2 and other list 2: None
Your other alternative if you are going to be working with this data a lot is just to put these tables into a database like sqlite which is built in and won't bomb out your memory.

How to find the number of every length of contiguous sequences of values in a list?

Problem
Given a sequence (list or numpy array) of 1's and 0's how can I find the number of contiguous sub-sequences of values? I want to return a JSON-like dictionary of dictionaries.
Example
[0, 0, 1, 1, 0, 1, 1, 1, 0, 0] would return
{
0: {
1: 1,
2: 2
},
1: {
2: 1,
3: 1
}
}
Tried
This is the function I have so far
def foo(arr):
prev = arr[0]
count = 1
lengths = dict.fromkeys(arr, {})
for i in arr[1:]:
if i == prev:
count += 1
else:
if count in lengths[prev].keys():
lengths[prev][count] += 1
else:
lengths[prev][count] = 1
prev = i
count = 1
return lengths
It is outputting identical dictionaries for 0 and 1 even if their appearance in the list is different. And this function isn't picking up the last value. How can I improve and fix it? Also, does numpy offer any quicker ways to solve my problem if my data is in a numpy array? (maybe using np.where(...))

You're suffering from Ye Olde Replication Error. Let's instrument your function to show the problem, adding one line to check the object ID of each dict in the list:
lengths = dict.fromkeys(arr, {})
print(id(lengths[0]), id(lengths[1]))
Output:
140130522360928 140130522360928
{0: {2: 2, 1: 1, 3: 1}, 1: {2: 2, 1: 1, 3: 1}}
The problem is that you gave the same dict as initial value for each key. When you update either of them, you're changing the one object to which they both refer.
Replace it with an explicit loop -- not a mutable function argument -- that will create a new object for each dict entry:
for key in lengths:
lengths[key] = {}
print(id(lengths[0]), id(lengths[1]))
Output:
139872021765576 139872021765288
{0: {2: 1, 1: 1}, 1: {2: 1, 3: 1}}
Now you have separate objects.
If you want a one-liner, use a dict comprehension:
lengths = {key: {} for key in lengths}

What is the quickest way to map between lists in python

I have pairs of 4 lists a and b with integer values such as list_1a = [1,2,3,...] and list_1b = [8,11,15,...]. The idea is that the integer values in list_1a are now represented by the integer values in list_1b, and the same for list_2a and list_2b etc.
Now I have a list of 4 columns final_list which contained integer values corresponding to the a lists. I want to map the values in final_list to the values in the b lists. What is the quickest way to do this in python ?
Is there a quicker way than using lists ?
Edit:
To clarify the question, take the following example:
list_1a = [1,2,3]
list_1b = [8,11,15]
list_2a = [5,6,7,8]
list_2b = [22,26,30,34]
list_3a = [11,12,13,14,18]
list_3b = [18,12,25,28,30]
list_4a = [51,61,72,82]
list_4b = [73,76,72,94]
Note that some of these lists can contain more than a million entries (So maybe memory can be an issue)
The lists do not have the same length
All of the integer values in these lists are unique to their lists, i.e. list_1a + list_1b will never have a repeating integer value.
final_list should look like final_list_b after the mapping occurs
final_list_a = [[1,6,11,51],[3,6,14,72]]
final_list_b = [[8,26,18,73],[15,26,28,72]]
To put things into perspective, this questions is for a database application where these "lists" contain auto-generated key values

I think what you want is a dictionary, which associates keys with values. Unless i'm confused about what you want to do here.
So if I make 4 short example lists.
list_1a = [1,2,3,4]
list_1b = [8,11,15,18]
list_2a = [5,6,7,8]
list_2b = [22,26,30,34]
and make them into a big list of all "a" values and all "b" values.
a_list = list_1a + list_2a
b_list = list_1b + list_2b
I can then use zip to merge the lists into a dictionary
my_dict = dict(zip(a_list, b_list))
print(my_dict)
See:
how to merge 2 list as a key value pair in python
for some other ways to do this last bit.
result:
{1: 8, 2: 11, 3: 15, 4: 18, 5: 22, 6: 26, 7: 30, 8: 34}
Now your "a" list makes up the keys of this dictionary.. while the "b" list make up the values. You can access the values by using the keys. here's some examples.
print(my_dict.keys())
print(my_dict.values())
print(my_dict[5])
gives me:
[1, 2, 3, 4, 5, 6, 7, 8]
[8, 11, 15, 18, 22, 26, 30, 34]
22
Is this what you want?
EDIT: I feel that I should note that while my dictionary has printed in order, dictionaries are actually not ordered like lists. You might want to look into collections.OrderedDict or sorted if this is important to you.
Update:
For what you want to do, maybe consider nested dictionaries. You can make a dictionary whose values are dictionaries, also note that when 1a and 1b don't match in length, zip doesn't care and just excludes 60:
list_1a = [1,2,3,4]
list_1b = [8,11,15,18,60]
list_2a = [5,6,7,8]
list_2b = [22,26,30,34]
a_dict = dict(zip(list_1a, list_2a))
b_dict = dict(zip(list_1b, list_2b))
my_dict = {"a" : a_dict, "b" : b_dict}
print(my_dict)
Result:
{'a': {1: 5, 2: 6, 3: 7, 4: 8}, 'b': {8: 22, 18: 34, 11: 26, 15: 30}}
Now you can access the inner values in a different way:
print(my_dict["a"].keys())
print(my_dict["a"].values())
print(my_dict["a"][4])
Result:
[1, 2, 3, 4]
[5, 6, 7, 8]
8

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to automate dictionary creation in Python - python

Related

How to make a Custom Sorting Function for Dictionary Key Values?

Optimizing python DFS (for loop is inefficient)

Python list appending takes long time?

How to find the number of every length of contiguous sequences of values in a list?

What is the quickest way to map between lists in python

Categories

Resources