I am reading a .csv called courses. Each row corresponds to a course which has an id, a name, and a teacher. They are to be stored in a Dict. An example:
list_courses = {
1: {'id': 1, 'name': 'Biology', 'teacher': 'Mr. D'},
...
}
While iterating the rows using enumerate(file_csv.readlines()) I am performing the following:
list_courses={}
for idx, row in enumerate(file_csv.readlines()):
# Skip blank rows.
if row.isspace(): continue
# If we're using the row, turn it into a list.
row = row.strip().split(",")
# If it's the header row, take note of the header. Use these values for the dictionaries' keys.
# As of 3.7 a Dict remembers the order in which the keys were inserted.
# Since the order is constant, simply load each other row into the corresponding key.
if not idx:
sheet_item = dict.fromkeys(row)
continue
# Loop through the keys in sheet_item. Assign the value found in the row, converting to int where necessary.
for idx, key in enumerate(list(sheet_item)):
sheet_item[key] = int(row[idx].strip()) if key == 'id' or key == 'mark' else row[idx].strip()
# Course list
print("ADDING COURSE WITH ID {} TO THE DICTIONARY:".format(sheet_item['id']))
list_courses[sheet_item['id']] = sheet_item
print("\tADDED: {}".format(sheet_item))
print("\tDICT : {}".format(list_courses))
Thus, the list_courses dictionary is printed after each sheet_item is added to it.
Now comes the issue - when reading in two courses, I expect that list_courses should read:
list_courses = {
1: {'id': 1, 'name': 'Biology', 'teacher': 'Mr. D'},
2: {'id': 2, 'name': 'History', 'teacher': 'Mrs. P'}
}
However, the output of my print statements (substantiated by errors later in my program) is:
ADDING COURSE WITH ID 1 TO THE DICTIONARY:
ADDED: {'id': 1, 'name': 'Biology', 'teacher': 'Mr. D'}
DICT : {1: {'id': 1, 'name': 'Biology', 'teacher': 'Mr. D'}}
ADDING COURSE WITH ID 2 TO THE DICTIONARY:
ADDED: {'id': 2, 'name': 'History', 'teacher': 'Mrs. P'}
DICT : {1: {'id': 2, 'name': 'History', 'teacher': 'Mrs. P'}, 2: {'id': 2, 'name': 'History', 'teacher': 'Mrs. P'}}
Thus, the id with which the sheet_item is being added to courses_list is correct (1 or 2), however the assignment which occurs for the second course appears to be overwriting the value for key 1. I'm not even sure how this is possible. Please let me know your thoughts.
You're using the same dictionary for both the header and all the rows. You never create any new dictionaries after the header. Key assignments are overwriting previous ones, because there are no new dictionaries to write to.
Store the keys in a list, and make a new sheet_item before the for loop:
list_courses={}
keys = None # Let Python know this is defined
for idx, row in enumerate(file_csv.readlines()):
# Skip blank rows.
if row.isspace(): continue
# If we're using the row, turn it into a list.
row = row.strip().split(",")
# If it's the header row, take note of the header. Use these values for the dictionaries' keys.
# As of 3.7 a Dict remembers the order in which the keys were inserted.
# Since the order is constant, simply load each other row into the corresponding key.
if not idx:
keys = row
continue
sheet_item = {}
# Loop through the keys in sheet_item. Assign the value found in the row, converting to int where necessary.
for idx, key in enumerate(keys):
sheet_item[key] = int(row[idx].strip()) if key == 'id' or key == 'mark' else row[idx].strip()
# Course list
print("ADDING COURSE WITH ID {} TO THE DICTIONARY:".format(sheet_item['id']))
list_courses[sheet_item['id']] = sheet_item
print("\tADDED: {}".format(sheet_item))
print("\tDICT : {}".format(list_courses))
Related
My dictionary looks like below, and I am following this link to update the values in "Column_Type" key. Bascially, I would like to replace values "String" with "VARCHAR(256)", DATE with "NUMBER (4,0)", Int with "NUMBER" and Numeric with "Number". Whenever I run below code, my values are not getting updated to my dictionary.My desired output for updated dictionary is as below
Please note: The location of column_types might vary as well. For ex: Column_type[String] currently is at position 1, but It might be at position 3 later on .
{'Column_name': ['Name', 'Salary', 'Date', 'Phone'], 'Column_Type': ['String', 'Numeric', 'Date', 'Int']}
Code:
for key1, key2 in my_dict.items():
if key2== 'String':
my_dict[key2] = "VARCHAR(256)"
print(my_dict)
Desired Output:
{'Column_name': ['Name', 'Salary', 'Date', 'Phone'], 'Column_Type': ['VARCHAR(256)', 'NUMBER', 'NUMBER(4,0)', 'NUMBER']}
In your example, your keys are "Column_Name" and Column_Type". There is no key named "String" in your dict. Both values in your dict are of type list so neither are equal to the string String either.
What you want is to replace a specific value in a list.
Try like this:
for index, value in enumerate(my_dict["Column_Type"]):
if value == "String":
my_dict["Column_Type"][index] = "VARCHAR(256)"
This replaces the value in the list, not the dict. That is what you want.
If you need to replace multiple values you can use a dict, like #Jeremy suggested:
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
for index, value in enumerate(my_dict["Column_Type"]):
my_dict["Column_Type"][index] = type_strs.get(value, value)
Here, the .get() function on a dict returns the value corresponding to the key given by the first argument, or the second argument if no such key exists.
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
my_dict['Column_Type'] = [type_strs[t] for t in my_dict['Column_Type']]
I would recommend a dictionary instead of if statements for translating the type strings
Your are in this line comparing a list with an element of this list if key2== 'String':
key2 when you are traveling the variable contains the next ['String', 'Numeric', 'Date', 'Int'], so you will need to join to this value of the array for compare. You can do it with a for cycle
The program is the next:
my_dict={'Column_name': ['Name', 'Salary', 'Date', 'Phone'], 'Column_Type': ['String', 'Numeric', 'Date', 'Int']}
# We create this variable to save the position of the element
position=0
# We travel to the dictionary
for i in my_dict['Column_Type']:
# If the variable is equal to the string
if i == 'String':
# We assign the new information to the variable
my_dict['Column_Type'][position]="VARCHAR(256)"
#And add one to the position
position+=1
print(my_dict)
Output
{'Column_name': ['Name', 'Salary', 'Date', 'Phone'], 'Column_Type': ['VARCHAR(256)', 'Numeric', 'Date', 'Int']}
You can use list.update(val1, val2)
example:
# Dictionary of strings to ints
word_freq = {
"Hello": 56,
"at": 23,
"test": 43,
"this": 43
}
# Adding a new key value pair
word_freq.update({'before': 23})
print(word_freq)
As you can see I have a column that contains an array of dictionaries. I need to see if a key in any item has a certain value and return the row if it does.
0 [{'id': 473172988, 'node_id': 'MDU6TGFiZWw0NzM...
1 [{'id': 473172988, 'node_id': 'MDU6TGFiZWw0NzM...
2 [{'id': 473172988, 'node_id': 'MDU6TGFiZWw0NzM...
3 [{'id': 473173351, 'node_id': 'MDU6TGFiZWw0NzM...
Is there a straightforward approach for this?
The datatype of the column is an object.
You would need to give the exact format of your dictionary, but on the general principle you should loop over the elements:
key = 'xxx'
value = 'yyy'
out = [any(d.get(key) == value for d in l) for l in df['your_column']]
# slicing rows
df[out]
I have a csv file that will contain a frequently updated (overwritten) dataframe with a few rows of purchase orders, something like this:
uniqueId item action quantity price
123 widget1 buy 10 99.44
234 widget2 sell 15 19.99
345 widget3 buy 2 999.99
This csv file will be passed to my python code by another program; my code will check for its presence every few minutes. Once it appears, the code will read it. I'm not including the code for that, since that's not the issue.
The idea is to turn this purchase order dataframe into something that I can pass to my (already written) place-the-order code. I want to iterate through each row in order (enumerate?), and assign the values from that row to variables that I use in the order code, then reassign the new values to the same variable for the next row after the order from that row has been placed.
As I understand it, itertuples are probably the way to go for iterating through it, but I'm new enough to python that I can't figure out the actual mechanism/syntax of using it to do what I want. All my trial-and-error tests for assigning the values to reusable variables result in syntax errors.
I'm having a mental block on what is probably very basic python! I know how to iterate through the rows and print 'em out--plenty of examples out there show me how to do that--but not how to turn the data into something I can use elsewhere. Can someone walk me through an example or two that actually applies to what I'm trying to do?
Like you said, you can quite easily iterate over a dataframe with .itertuples()
Here's how I would go about it (df is your dataframe; for it I used the data from your example):
Code:
for row in df.itertuples():
print(row)
Output:
Pandas(Index=0, uniqueId=123, item='widget1', action='buy', quantity=10, price=99.44)
Pandas(Index=1, uniqueId=234, item='widget2', action='sell', quantity=15, price=19.99)
Pandas(Index=2, uniqueId=345, item='widget3', action='buy', quantity=2, price=999.999)
If you want to get specific entries of the tuples you need to use the position in the tuple as index:
Code:
for row in df.itertuples():
uniqueID = row[1]
print(uniqueID)
Output:
123
234
345
I'm not sure how the rest of your code looks like. If you have the place-the-order code inside a function you could just call the function in the for-loop after assigning the variables to your liking:
for row in df.itertuples():
uniqueID = row[1]
item = row[2]
action = row[3]
quantity = row[4]
price = row[5]
place-the-order(uniqueID, item, action, quantity, price)
(You could even skip assigning the variables and just call place-the-order(row[1], row[2], ...). In my opinion it is more readable to assign the variables.)
If your place-the-order code is not in a function I would recommend using a nested dictionary with the row index as key and a dictionary of the content of the row as value. The row index is easily accessible as it is the first item in the tuple.
content_of_rows = {}
for row in df.itertuples():
index = row[0]
uniqueID = row[1]
item = row[2]
action = row[3]
quantity = row[4]
price = row[5]
content_of_rows.update({index:{"uniqueID":uniqueID, "item": item, "action": action, "quantity": quantity, "price": price}})
print(content_of_rows)
Output:
{0: {'uniqueID': 123, 'item': 'widget1', 'action': 'buy', 'quantity': 10, 'price': 99.44},
1: {'uniqueID': 234, 'item': 'widget2', 'action': 'sell', 'quantity': 15, 'price': 19.99},
2: {'uniqueID': 345, 'item': 'widget3', 'action': 'buy', 'quantity': 2, 'price': 999.999}}
This way you can't use the same variable for every row, since it's generally speaking just a different way of writing a dataframe. You can iterate over dictionaries pretty much the same as over tuples but instead of numerical indices you have to use the key.
for row in content_of_rows:
# row is the key, so in the first iteration it would be 0, in the second iteration it would be 1, and so on
print(content_of_rows[row])
Output:
{'uniqueID': 123, 'item': 'widget1', 'action': 'buy', 'quantity': 10, 'price': 99.44}
{'uniqueID': 234, 'item': 'widget2', 'action': 'sell', 'quantity': 15, 'price': 19.99}
{'uniqueID': 345, 'item': 'widget3', 'action': 'buy', 'quantity': 2, 'price': 999.999}
If you want to get the uniqueID of the rows, you would do something like this:
Code:
for row in content_of_rows:
print(content_of_rows[row]["uniqueID"]) # just put the second key you're looking for right after the first, also in []
Output:
123
234
345
It's usually best to put different parts of your code into functions, so if you haven't done that already I'd recommend you do so. That way you can use the same variables for each row.
I hope this (kinda long, sorry about that) answer could help you. Greetings from Bavaria!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Here is the dictionary form
abc = {
'if1': {'name': 'data', 'date': '80980'},
'if2': {'name': 'data_1', 'date': '9886878'},
'if3': {'name': 'data', 'date': '0987667'},
'if4': {'name': 'data__5', 'date': '0987667'},
'if5': {'date': '0987667'}
}
and I am trying to apply a filter using the NAME when I give input filter in the form of a list
list_item = ['data','data_1']
it should give me the output dates as follows
{
data:['80980', '0987667'],
data_1:['9886878']
}
please help me to resolve this issue.
For the resulting dictionary we create a defaultdict with an empty list as the default value. Then we loop over all values of 'abc' and check if we have an entry with the key 'name' and the corresponding value is in list_items. If this is the case we can use the name as a key for the resulting dictionary and append the value of the element with the key 'date'.
abc = {
'if1': {'name': 'data', 'date': '80980'},
'if2': {'name': 'data_1', 'date': '9886878'},
'if3': {'name': 'data', 'date': '0987667'},
'if4': {'name': 'data__5', 'date': '0987667'},
'if5': {'date': '0987667'}
}
list_item = ['data','data_1']
import collections
result = collections.defaultdict(list)
for item in abc.values():
if item.get('name', None) in list_item:
result[item['name']].append(item['date'])
print(result)
An other approach is looping over the values in 'list_item'.
result = {}
for key in list_item:
result[key] = [item['date'] for item in abc.values() if item.get('name', None) == key]
print(result)
Using a dictionary comprehension you can transform the last solution into a one-liner (but I prefer a more readable style):
result = {key:[item['date'] for item in abc.values() if item.get('name', None) == key] for key in list_item}
This dict is so complex you can't just filter it, you have to convert it to something more useful.
abc= { 'if1':{'name':'data','date':'80980'}, 'if2':{'name':'data_1','date':'9886878'}, 'if3':{'name':'data','date':'0987667'}, 'if4':{'name':'data__5','date':'0987667'}}
[The dictionary was made wrongly, I assumed 2nd if4 is a typo and I deleted it when I copied the text.]
First, let's flatten it by removing the inside dictionary, making date our dict's value:
formatted1 = {(key, subdict['name']): subdict['date'] for key, subdict in abc.items() if 'name' in subdict}
I kept the original key as part of the new key because otherwise we'd overwrite our data entry.
Our new dict looks like that:
{('if1', 'data'): '80980', ('if2', 'data_1'): '9886878', ('if3', 'data'): '0987667', ('if4', 'data__5'): '0987667'}
Now it's easier to work with. Let's do a simple loop to format it further:
formatted2 = {}
for (_, key), value in formatted1.items(): # we'll be skipping first part of the key, hence I didn't give it a meaningful name
elem = formatted2.get(key, [])
elem.append(value)
formatted2[key] = elem
Our even newer dict looks now like that:
{'data': ['80980', '0987667'], 'data_1': ['9886878'], 'data__5': ['0987667']}
And now it's finally in a form we can easily filter!
list_item = ['data','data_1']
result = {k: formatted2[k] for k in formatted2 if k in list_item}
Result:
{'data': ['80980', '0987667'], 'data_1': ['9886878']}
My data looks as followed:
Application WorkflowStep
0 WF:ACAA-CR (auto) Manager
1 WF:ACAA-CR (auto) Access Responsible
2 WF:ACAA-CR (auto) Automatic
3 WF:ACAA-CR-AccResp (auto) Manager
4 WF:ACAA-CR-AccResp (auto) Access Responsible
5 WF:ACAA-CR-AccResp (auto) Automatic
6 WF:ACAA-CR-IT-AccResp[AUTO] Group
7 WF:ACAA-CR-IT-AccResp[AUTO] Access Responsible
8 WF:ACAA-CR-IT-AccResp[AUTO] Automatic
Additionally to these two columns I want to add a third column showing the sum of all WorkflowStep's.
The dictionary should look like the following (or similiar):
{'WF:ACAA-CR (auto)':
[{'Workflow': ['Manager', 'Access Responsible','Automatic'], 'Summary': 3}],
'WF:ACAA-CR-AccResp (auto)':
[{'Workflow': ['Manager','Access Responsible','Automatic'], 'Summary': 3}],
'WF:ACAA-CR-IT-AccResp[AUTO]':
[{'Workflow': ['Group','Access Responsible','Automatic'], 'Summary': 3}]
}
My code to create a dictionary out of the two above columns works fine.
for i in range(len(df)):
currentid = df.iloc[i,0]
currentvalue = df.iloc[i,1]
dict.setdefault(currentid, [])
dict[currentid].append(currentvalue)
The code to create the sum of the WorkflowStep is as followed and also works fine:
for key, values in dict.items():
val = values
match = ["Manager", "Access Responsible", "Automatic", "Group"]
c = Counter(val)
sumofvalues = 0
for m in match:
if c[m] == 1:
sumofvalues += 1
My initial idea was to adjust my first code where the initial key is the Application and WorkflowStep, Summary would be sub-dictionaries.
for i in range(len(df)):
currentid = df.iloc[i,0]
currentvalue = df.iloc[i,1]
dict.setdefault(currentid, [])
dict[currentid].append({"Workflow": [currentvalue], "Summary": []})
The result of this is however unsatisfactory because it does not add currentvalue to the already existing Workflow key but recreates them after every iteration.
Example
{'WF:ACAA-CR (auto)': [{'Workflow': ['Manager'], 'Summary': []},
{'Workflow': ['Access Responsible'], 'Summary': []},
{'Workflow': ['Automatic'], 'Summary': []}]
}
How can I create a dictionary similiar to what I wrote above?
IIUC, here's what can help -
val = df.groupby('Application')['WorkflowStep'].unique()
{val.index[i]: [{'WorkflowStep':list(val[i]), 'Summary':len(val[i])}] for i in range(len(val))}
resulting into,
{'WF:ACAA-CR (auto)': [{'WorkflowStep': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}],
'WF:ACAA-CR-AccResp (auto)': [{'WorkflowStep': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}],
'WF:ACAA-CR-IT-AccResp[AUTO]': [{'WorkflowStep': ['Group', 'Access Responsible', 'Automatic'], 'Summary': 3}]}
I think meW's answer is a much better way of doing things, and takes advantage of the neat power of dataframe's but for reference, if you wanted to do it the way you were trying, I think this will work:
# Create the data for testing.
d = {'Application': ["WF:ACAA-CR (auto)", "WF:ACAA-CR (auto)", "WF:ACAA-CR (auto)",
"WF:ACAA-CR-AccResp (auto)", "WF:ACAA-CR-AccResp (auto)", "WF:ACAA-CR-AccResp (auto)"],
'WorkflowStep': ["Manager", "Access Responsible","Automatic","Manager","Access Responsible", "Automatic"]}
df = pd.DataFrame(d)
new_dict = dict()
# Iterate through the rows of the data frame.
for index, row in df.iterrows():
# Get the values for the current row.
current_application_id = row['Application']
current_workflowstep = row['WorkflowStep']
# Set the default values if not already set.
new_dict.setdefault(current_application_id, {'Workflow': [], 'Summary' : 0})
# Add the new values.
new_dict[current_application_id]['Workflow'].append(current_workflowstep)
new_dict[current_application_id]['Summary'] += 1
print(new_dict)
Which gives an output of:
{'WF:ACAA-CR (auto)': {'Workflow': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3},
'WF:ACAA-CR-AccResp (auto)': {'Workflow': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}}