Change List of Tuples into Dictionary (pythonic way) - python

This is more of a best practice question. What I have created is working perfectly, but I am curious if there is a shorter method for creating a dictionary out of my current data structure.
I am reading tables out of a SQLite database, the data is returned as a list of tuples.
eg
[(49, u'mRec49', u'mLabel49', 1053, 1405406806822606L, u'1405406906822606'),
(48, u'mRec48', u'mLabel48', 1330, 1405405806822606L, u'1405405906822606'),
(47, u'mRec47', u'mLabel47', 1220, 1405404806822606L, u'1405404906822606')...
]
I want to take each column of the list-tuple structure, make it into a list, get the column name from the database and use that as the key holding the list. Later I turn my dictionary into JSON.
Here is my function I scratched up, it does the job, I just can't help wondering if there is a better way to do this.
def make_dict(columns, list_o_tuples):
anary = {}
for j, column in enumerate(columns):
place = []
for row in list_o_tuples:
place.append(row[j])
anary[column] = place
return anary
make_dict(mDBTable.columns, mDBTable.get_table())
Note:the function shouldn't care about the table its presented, or the number or rows & columns in table.

It seems that you want to transpose the list_o_tuples:
transpose = zip(*list_o_tuples)
And then zip that up with the column names:
return dict(zip(columns, transpose))

You can simply unzip the list_o_tuples and then using a dictionary comprehension create a new dictionary with the corresponding column data and the column header
columns = ["num1", "str1", "num2", "num3", "str2", "str3"]
print {columns[idx]:row for idx, row in enumerate(zip(*list_o_tuples))}

Related

Remove rows from dataframe whose text does not contain items from a list

I am importing data from a table with inconsistent naming conventions. I have created a list of manufacturer names that I would like to use as a basis of comparison against the imported name. Ideally, I will delete all rows from the dataframe that do not align with the manufacturer list. I am trying to create an index vector using a for loop to iterate through each element of the dataframe column and compare against the list. If the text is there, update my index vector to true. If not, index vector is updated to false. Finally, I want to use the index vector to drop rows from the original data frame.
I have tried generators and sets, but to no avail. I thought a for loop would be less elegant but ultimately work, yet I'm still stuck. My code is below.
meltdat.Products is my dataframe column that contains the imported data
mfgs is my list of manufacturer names
prodex is my index vector
meltdat = pd.DataFrame(
{"Location":["S1","S1","S1","S1","S1"],
"Date":["1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020"],
"Products":['CC304RED','COHoney','EtainXL','Med467','MarysTop'],
"Sold":[1,3,0,1,2]})
mfgs = ['CC', 'Etain', 'Marys']
for prods in meltdat.Products:
if any(mfg in meltdat.Products[prods] for mfg in mfgs):
prodex[prods] = TRUE
else:
prodex[prods] = FALSE
I added example data in the dataframe that mirrors my imported data.
you can use pd.DataFrame.apply:
meltdat[meltdat.Products.apply(lambda x: any(m in x for m in mfgs))]

I have a list where I want each element of the list to be in as a single row

I have a list of lists and I want to assign each of the lists to a specific column, I have created the columns of the Dataframe. But in each column, the elements are coming as a list. I want each element of this list to be a separate row as part of that particular column.
Here's what I did:
df = pd.DataFrame([np.array(dataset).T],columns=list1)
print(df)
Attached screenshot for the output.
I want each element of that list to be a row, as my output.
This should do the work for you:
import pandas as pd
Fasteners = ['Screws & Bolts', 'Threaded Rods & Studs', 'Eyebolts', 'U-Bolts']
Adhesives_and_Tape = ['Adhesives','Tape','Hook & Loop']
Weld_Braz_Sold = ['Electrodes & Wire','Gas Regulators','Welding Gloves','Welding Helmets & Glasses','Protective Screens']
df = pd.DataFrame({'Fastener': pd.Series(Fasteners), 'Adhesives_and_Tape': pd.Series(Adhesives_and_Tape), 'Weld_Braz_Sold': pd.Series(Weld_Braz_Sold)})
print(df)
Please provide the structure of the database you are starting from or the structure of the respective lists. I can give you are more focussed answer to your specific problem then.
If the structure is getting larger, you can also iterate through all lists when generating the data frame. This is just the basic process to solve your question.
Feel free to comment for further help.
EDIT
If you want to loop through a database of lists. Use the following code additionally:
for i in range(len(list1)): df.iloc[:,i] = pd.Series(dataset[i])

Implement a multimap in Python

I am trying to implement a multimap in Python. I have three fields of each record.
SerialNo, Name, Food
1 John Apple
2 Bill Orange
3 Josh Apple
Here, SerialNo and Name will not be duplicated except Food.
I can insert one key, value and query on that in my hashmap. But, how to make relations with three values. As, I want to query like,
SerialNo s where Food='Apple'
Name where Food='Apple'
Food where Name='Bill'
Get the all stored data (SerialNo, Name, Food)
I can make only one index but how to query with each fields.
This is my hashmap to insert data,
class HashMap:
def __init__(self):
self.store = [None for _ in range(16)]
self.size = 0
def put(self, key, value):
p = Node(key, value)
key_hash = self._hash(key)
index = self._position(key_hash)
if not self.store[index]:
self.store[index] = [p]
self.size += 1
else:
list_at_index = self.store[index]
if p not in list_at_index:
list_at_index.append(p)
self.size += 1
else:
for i in list_at_index:
if i == p:
i.value = value
break
I can't use dict , I prefer to build the function from scratch for learning purpose.
It sounds like you're trying to implement a database table. This is most naturally implemented as a set of dicts. (But there are libraries with more efficient implementations, like sqlite and pandas)
table = {
dict(zip(('SerialNo', 'Name', 'Food'), row))
for row in [
(1, 'John', 'Apple'),
(2, 'Bill', 'Orange'),
(3, 'Josh', 'Apple'),
]
}
You can do your queries as list comprehensions.
# SerialNo s where Food='Apple'
[row['Food'] for row in table if row['Food'] == 'Apple']
# Name where Food='Apple'
[row['Name'] for row in table if row['Food'] == 'Apple']
# Food where Name='Bill'
[row['Food'] for row in table if row['Name'] == 'Bill']
# Get the all stored data (SerialNo, Name, Food)
table
More complex queries are possible. For large tables, you can make queries more efficient by creating an external index in advance, just like a database engine. Use a dictionary with your lookup keys and point them to your row dicts in the table set.
name_index = {row['Name']: row for row in table}
name_index['John'] # {'SerialNo': 1, 'Name': 'John', 'Food': 'Apple'}
You could also try a set of named tuples as your table rows instead of dicts for more efficiency. This also lets you use the dot notation for cell access.
New requirement:
Hi, I can't use dict . I prefer to build the function for learning purpose.
OK, so implement a normal hashtable and use that instead of the native Python dict. If you need multiple indexes use multiple hashtables as above.
But, I have duplicates Food value, how to index on it ?
Think about what a query using this index would return. It's a set of rows that use that food instead of a single row, right? So that's what you use as the value to look up in the index hashtable.
{food: {row for row in table if row['Food']==food}
for food in {row['Food'] for row in table}}
You could probably do that a bit more efficiently with a for loop.
food_index = {}
for row in table:
food_index.setdefault(row['Food'], set()).add(row)
It may simplify your logic if the unique indexes also return sets of rows instead of a single row. (The set would contain a single row in that case.)
name_index = {row['Name']: {row} for row in table}
I'm still using dicts to demonstrate the approach concisely, but there's no reason you couldn't implement all these features yourself. Even comprehensions can be done with a generator expression inside a function call, e.g.
{k:v for k, v in foo}
dict((k, v) for k, v in foo)
Those two lines above are functionally equivalent. Of course, if foo already contains pairs (the result of a zip call, for example), it could be simplified to
dict(foo)
You can use your own hash table class in place of dict.
Your multimap implementation can create and store these indexes upon initialization and update the row sets in each index appropriately when rows are added or removed from the table. Mutating a row dict could invalidate the indexes, so it might be more appropriate to make each row an immutable type like a named tuple or the like (if you must, implement it yourself). Then altering a row in the table is just adding a new row and deleting the old one.

Writing a whole list in a CSV row-Python/IronPython

Right now I have several long lists : One called variable_names.
Lets say Variable names= [ Velocity, Density, Pressure, ....] (length is 50+)
I want to write a row that reads every index of the list, leaves about 5 empty cells, then writes next value, and keeps doing it until list is done.
As shown in row1 Sample picture
The thing is I can't use xlrd due to compatibility issues with Iron Python and I need to dynamically write each row in the new csv , load data from old csv , then append that data in the new csv, the old csv keeps changing once I append the data in the new csv, so I need to iterate all values in the lists for every time I write the row, because it is much more difficult to append columns in csv.
What I basicall want to do is :
with open('data.csv','a') as f:
sWriter=csv.writer(f)
sWriter.writerow([Value_list[i],Value_list[i+1],Value_list[i+2].....Value_list[end])
But I can't seem to think of a way to do this with iteration
Because writerow method takes a list argument, you can first construct the list and then write the list so everything in the list will be in one row.
Like,
with open('data.csv','a') as f:
sWriter=csv.writer(f)
listOfColumns = []
for i in range(from, to): # append elements from Value_list
listOfColumns.append(Value_list[i])
for i in range(0, 2): # Or you may want some columns with blank
listOfColumns.append("")
for i in range(anotherFrom, anotherTo): # append elements from Value_list
listOfColumns.append(Value_list[i])
# At here, the listOfColumns will be [Value_list[from], ..., Value_list[to], "", "", Value_list[anotherFrom], ..., Value_list[anotherTo]]
sWriter.writerow(listOfColumns)

Adding Keys and Values to Python Dictionary in Reverse Order

I have written a simple script that prints out and adds the name of a table and it's associated column headings to a python list:
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
b.append(field.name + "," + fc)
print b
In each table there are a number of column headings. There are many instances where one or more tables contain the same column headings. I want to do a bit of a reverse python dictionary instead of a list, where keys are the column headings and the values are the table names. My idea is, to find the all the tables that each column heading lies within.
I've been playing around all afternoon and I think I am over thinking this so I came here for some help. If anyone can suggest how I can accomplish this, i would appreciate it.
Thanks,
Mike
Try this:
result = {}
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
result.setdefault(field.name, []).append(table)
If I understand correctly, you want to map from a column name to a list of tables that contain that have columns with that name. That should be easy enough to do with a defaultdict:
from collections import defaultdict
header_to_table_dict = defaultdict(list)
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
header_to_table_dict[field.name].append(table.name)
I'm not sure if table.name is what you want to save, exactly, but this should get you on the right track.
You want to create a dictionary in which each key is a field name, and each value is a list of table names:
# initialize the dictionary
col_index = {}
for table in arcpy.ListTables():
for field in arcpy.ListFields(table):
if field.name not in col_index:
# this is a field name we haven't seen before,
# so initialize a dictionary entry with an empty list
# as the corresponding value
col_index[field.name] = []
# add the table name to the list of tables for this field name
col_index[field.name].append(table.name)
And then, if you want want a list of tables that contain the field LastName:
list_of_tables = col_index['LastName']
If you're using a database that is case-insensitive with respect to column names, you might want to convert field.name to upper case before testing the dictionary.

Categories