I am trying to implement a multimap in Python. I have three fields of each record.
SerialNo, Name, Food
1 John Apple
2 Bill Orange
3 Josh Apple
Here, SerialNo and Name will not be duplicated except Food.
I can insert one key, value and query on that in my hashmap. But, how to make relations with three values. As, I want to query like,
SerialNo s where Food='Apple'
Name where Food='Apple'
Food where Name='Bill'
Get the all stored data (SerialNo, Name, Food)
I can make only one index but how to query with each fields.
This is my hashmap to insert data,
class HashMap:
def __init__(self):
self.store = [None for _ in range(16)]
self.size = 0
def put(self, key, value):
p = Node(key, value)
key_hash = self._hash(key)
index = self._position(key_hash)
if not self.store[index]:
self.store[index] = [p]
self.size += 1
else:
list_at_index = self.store[index]
if p not in list_at_index:
list_at_index.append(p)
self.size += 1
else:
for i in list_at_index:
if i == p:
i.value = value
break
I can't use dict , I prefer to build the function from scratch for learning purpose.
It sounds like you're trying to implement a database table. This is most naturally implemented as a set of dicts. (But there are libraries with more efficient implementations, like sqlite and pandas)
table = {
dict(zip(('SerialNo', 'Name', 'Food'), row))
for row in [
(1, 'John', 'Apple'),
(2, 'Bill', 'Orange'),
(3, 'Josh', 'Apple'),
]
}
You can do your queries as list comprehensions.
# SerialNo s where Food='Apple'
[row['Food'] for row in table if row['Food'] == 'Apple']
# Name where Food='Apple'
[row['Name'] for row in table if row['Food'] == 'Apple']
# Food where Name='Bill'
[row['Food'] for row in table if row['Name'] == 'Bill']
# Get the all stored data (SerialNo, Name, Food)
table
More complex queries are possible. For large tables, you can make queries more efficient by creating an external index in advance, just like a database engine. Use a dictionary with your lookup keys and point them to your row dicts in the table set.
name_index = {row['Name']: row for row in table}
name_index['John'] # {'SerialNo': 1, 'Name': 'John', 'Food': 'Apple'}
You could also try a set of named tuples as your table rows instead of dicts for more efficiency. This also lets you use the dot notation for cell access.
New requirement:
Hi, I can't use dict . I prefer to build the function for learning purpose.
OK, so implement a normal hashtable and use that instead of the native Python dict. If you need multiple indexes use multiple hashtables as above.
But, I have duplicates Food value, how to index on it ?
Think about what a query using this index would return. It's a set of rows that use that food instead of a single row, right? So that's what you use as the value to look up in the index hashtable.
{food: {row for row in table if row['Food']==food}
for food in {row['Food'] for row in table}}
You could probably do that a bit more efficiently with a for loop.
food_index = {}
for row in table:
food_index.setdefault(row['Food'], set()).add(row)
It may simplify your logic if the unique indexes also return sets of rows instead of a single row. (The set would contain a single row in that case.)
name_index = {row['Name']: {row} for row in table}
I'm still using dicts to demonstrate the approach concisely, but there's no reason you couldn't implement all these features yourself. Even comprehensions can be done with a generator expression inside a function call, e.g.
{k:v for k, v in foo}
dict((k, v) for k, v in foo)
Those two lines above are functionally equivalent. Of course, if foo already contains pairs (the result of a zip call, for example), it could be simplified to
dict(foo)
You can use your own hash table class in place of dict.
Your multimap implementation can create and store these indexes upon initialization and update the row sets in each index appropriately when rows are added or removed from the table. Mutating a row dict could invalidate the indexes, so it might be more appropriate to make each row an immutable type like a named tuple or the like (if you must, implement it yourself). Then altering a row in the table is just adding a new row and deleting the old one.
Related
I'm trying to write a dynamic query that takes my dict and writes it into the database.
A dict can be of any size, with any number of keys, and can look look like this:
dict = {
'name': 'foo',
'date': '2008',
'genre': ['bar', 'baz']
}
The problem is that the values in the dictionary can be a either a string or a list of strings. Using the function I already have I can generate queries of dynamic length, but only for a value that is a string, not a list of strings.
def insert_data(metadata):
conn = sql.connect(utils.get_db_path())
_insert_data(conn,
'albums',
name=metadata['album'],
date=metadata['date'],
tracktotal=metadata['tracktotal'],
disctotal=metadata['disctotal']
)
def _insert_data(conn, table_name, **metadata):
c = conn.cursor()
query = f'INSERT INTO {table}({", ".join(metadata.keys())}) VALUES({",".join(["?"] * len(metadata))})'
c.execute(query, tuple(value for value in metadata.values())
An query generated by this code would look like this:
INSERT INTO albums(name, date, tracktotal, disctotal) VALUES(?,?,?,?)
However, if there values are a list of strings, I need to generate several queries, which is even more complicated when several values are a list, not only one (for example, both date and genre are a list, in this case I need 2^2 = 4 queries)
What would be the way to do this, or is there a different approach that doesn't require that many queries?
EDIT1: The table in the database for the aforementioned dict would look like this:
TABLE albums
id name date
1 foo 2008
TABLE genres
id name
1 bar
2 baz
TABLE albumgenres
id album_id genre_id
1 1 1
2 1 2
Writing into albums is easy because there is no duplicates there, but the code to call the function for genres would look like this:
_insert_data(conn,
'genres',
name='['bar', 'baz']
)
and not work properly anymore.
Upon further research I believe that this can be achieved by making a list of dicts based on the provided dict using a Cartesian product.
See Combine Python Dictionary Permutations into List of Dictionaries
This is more of a best practice question. What I have created is working perfectly, but I am curious if there is a shorter method for creating a dictionary out of my current data structure.
I am reading tables out of a SQLite database, the data is returned as a list of tuples.
eg
[(49, u'mRec49', u'mLabel49', 1053, 1405406806822606L, u'1405406906822606'),
(48, u'mRec48', u'mLabel48', 1330, 1405405806822606L, u'1405405906822606'),
(47, u'mRec47', u'mLabel47', 1220, 1405404806822606L, u'1405404906822606')...
]
I want to take each column of the list-tuple structure, make it into a list, get the column name from the database and use that as the key holding the list. Later I turn my dictionary into JSON.
Here is my function I scratched up, it does the job, I just can't help wondering if there is a better way to do this.
def make_dict(columns, list_o_tuples):
anary = {}
for j, column in enumerate(columns):
place = []
for row in list_o_tuples:
place.append(row[j])
anary[column] = place
return anary
make_dict(mDBTable.columns, mDBTable.get_table())
Note:the function shouldn't care about the table its presented, or the number or rows & columns in table.
It seems that you want to transpose the list_o_tuples:
transpose = zip(*list_o_tuples)
And then zip that up with the column names:
return dict(zip(columns, transpose))
You can simply unzip the list_o_tuples and then using a dictionary comprehension create a new dictionary with the corresponding column data and the column header
columns = ["num1", "str1", "num2", "num3", "str2", "str3"]
print {columns[idx]:row for idx, row in enumerate(zip(*list_o_tuples))}
I am trying to sort a table but would like to exclude given columns by their names while sorting. In other words, the given columns should remain where they were before sorting. This is aimed at dealing with columns like "Don't know', "NA" etc.
The API I'm using is unique and company specific but it uses python.
A table in this API is an object which is a list of rows, where each row is a list of cells and each cell is a list of cell values.
I am currently have a working function which sorts a table but I would like to edit/modify this to exclude a given column by it's name but I am struggling to find a way.
FYI - "Matrix" can be thought of as the table itself.
def SortColumns(byRow=0, usingCellValue=0, descending=True):
"""
:param byRow: Use the values in this row to determine the sort order of the
columns.
:param usingCellValue: When there are multiple values within a cell use this
to control which value row within each cell is used for sorting
(zero-based)
:param descending: Determines the order in which the values should be
sorted.
"""
for A in range(0,Matrix.Count):
for B in range(0,Matrix.Count):
if(A==B):
continue; #do not compare rows against eachother
valA = Matrix[byRow][A][usingCellValue].NumericValue if Matrix[byRow][A].Count > usingCellValue else None;
valB = Matrix[byRow][B][usingCellValue].NumericValue if Matrix[byRow][B].Count > usingCellValue else None;
if(descending):
if valB < valA:
Matrix.SwitchColumns(A,B)
else:
if valA < valB:
Matrix.SwitchColumns(A,B)
I am thinking of adding a new parameter which takes a list of column names, and use this to bypass these columns.
Something like:
def SortColumns(fixedcolumns, byRow=0,usingCellValue=0,descending=True):
While iterating through the columns, You can use the continue statement to skip over columns that you don't want to move. Put these conditions at the start of your two loops:
for A in range(0,Matrix.Count):
a_name = ??? #somehow get the name of column A
if a_name in fixedcolumns: continue
for B in range(0,Matrix.Count):
b_name = ??? #somehow get the name of column B
if b_name in fixedcolumns: continue
if(A==B):
continue
I have a problem with manipulating a list of dictionaries into something more digestible to write to a csv. For example, I have a list of dictionaries like so:
dict_example = [{'id':1,'key1':'value1','key2':'value2'},{'id':1,'key1':'value3','key2':'value4'}]
Ideally, I would like a csv out of this that would be:
id,key1,key2,key1,key2
1,value1,value2,value3,value4
Basically, I would like to find out the easiest way to do this. This list I am working with is much larger. Additionally, for each 'id' there are 4 dictionaries representing different values I would like all in one row for each 'id'.
Any thoughts? I can think of ways to do this by extracting values into other forms, but not by leaving them in the dictionary and then writing it to CSV.
EDIT:
I now need to figure out what data format may be most useful. For each 'id', there are different 'stages' where the value for 'key1' and 'key2' are different. The ID is persistent.
What would be a useful dict to store this in?
Example as it exists now, with more clarity:
dict = dict_example = [{'id':1,'stage':'stage1','key2':'value1'},{'id':1,'stage':'stage2','key2':'value2'}]
You can use itertools.groupby to group on certain criteria - in your case, the 'id':
import itertools
d = [{'id':1,'key1':'value1','key2':'value2'},{'id':1,'key1':'value3','key2':'value4'}, \
{'id':1,'key4':'value5'}, {'id':2,'key1':'value3','key2':'value4'}]
for id, group in itertools.groupby(d, lambda x : x['id']):
key_line = 'id'
values_line = str(id)
for g in group:
for key in sorted(g.keys()):
if key == 'id':
continue
key_line += ',{0}'.format(key)
values_line += ',{0}'.format(g[key])
print key_line
print values_line
This will output
id,key1,key2,key1,key2,key4
1,value1,value2,value3,value4,value5
id,key1,key2
2,value3,value4
...etc
for each id in your dictionary. I'm still not sure if it's a very usable output. You might want to consider reorganizing how you are storing your data in the first place.
I have written a simple script that prints out and adds the name of a table and it's associated column headings to a python list:
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
b.append(field.name + "," + fc)
print b
In each table there are a number of column headings. There are many instances where one or more tables contain the same column headings. I want to do a bit of a reverse python dictionary instead of a list, where keys are the column headings and the values are the table names. My idea is, to find the all the tables that each column heading lies within.
I've been playing around all afternoon and I think I am over thinking this so I came here for some help. If anyone can suggest how I can accomplish this, i would appreciate it.
Thanks,
Mike
Try this:
result = {}
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
result.setdefault(field.name, []).append(table)
If I understand correctly, you want to map from a column name to a list of tables that contain that have columns with that name. That should be easy enough to do with a defaultdict:
from collections import defaultdict
header_to_table_dict = defaultdict(list)
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
header_to_table_dict[field.name].append(table.name)
I'm not sure if table.name is what you want to save, exactly, but this should get you on the right track.
You want to create a dictionary in which each key is a field name, and each value is a list of table names:
# initialize the dictionary
col_index = {}
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
if field.name not in col_index:
# this is a field name we haven't seen before,
# so initialize a dictionary entry with an empty list
# as the corresponding value
col_index[field.name] = []
# add the table name to the list of tables for this field name
col_index[field.name].append(table.name)
And then, if you want want a list of tables that contain the field LastName:
list_of_tables = col_index['LastName']
If you're using a database that is case-insensitive with respect to column names, you might want to convert field.name to upper case before testing the dictionary.