I’m working with a CSV file that looks as follows:
POS,Transaction id,Product,Quantity,Customer,Date
1,E100,TV,1,Test Customer,9/19/2022
2,E100,Laptop,3,Test Customer,9/20/2022
3,E200,TV,1,Test Customer,9/21/2022
4,E300,Smartphone,2,Test Customer,9/22/2022
5,E300,Laptop,5,New Customer,9/23/2022
6,E300,TV,1,New Customer,9/23/2022
7,E400,TV,2,ABC,9/24/2022
8,E500,Smartwatch,4,ABC,9/25/2022
In order to grab elements out of it this gets each line into a callable line by assignment:
with open(obj.file_name.path, 'r') as f:
rdr = csv.DictReader(f)
for row in rdr:
pos = row['POS']
product = row['Product']
transaction_id = row['Transaction id']
quantity = row['Quantity']
customer = row['Customer']
date = row['Date']
try:
product_obj = Product.objects.get(name__iexact=product)
except Product.DoesNotExist:
product_obj = None
For example to print each row from the CSV I can now type:
pos, transaction_id, product, quantity, customer, transaction_id, date = row
print(row)
Resulting in this terminal output:
file is being uploaded
{'POS': '1', 'Transaction id': 'E100', 'Product': 'TV', 'Quantity': '1', 'Customer': 'Test Customer', 'Date': '9/19/2022'}
{'POS': '2', 'Transaction id': 'E100', 'Product': 'Laptop', 'Quantity': '3', 'Customer': 'Test Customer', 'Date': '9/20/2022'}
{'POS': '3', 'Transaction id': 'E200', 'Product': 'TV', 'Quantity': '1', 'Customer': 'Test Customer', 'Date': '9/21/2022'}
{'POS': '4', 'Transaction id': 'E300', 'Product': 'Smartphone', 'Quantity': '2', 'Customer': 'Test Customer', 'Date': '9/22/2022'}
{'POS': '5', 'Transaction id': 'E300', 'Product': 'Laptop', 'Quantity': '5', 'Customer': 'New Customer', 'Date': '9/23/2022'}
{'POS': '6', 'Transaction id': 'E300', 'Product': 'TV', 'Quantity': '1', 'Customer': 'New Customer', 'Date': '9/23/2022'}
{'POS': '7', 'Transaction id': 'E400', 'Product': 'TV', 'Quantity': '2', 'Customer': 'ABC', 'Date': '9/24/2022'}
{'POS': '8', 'Transaction id': 'E500', 'Product': 'Smartwatch', 'Quantity': '4', 'Customer': 'ABC', 'Date': '9/25/2022'}
So it totally works, what I’m struggling with however is how to access one particular dictionary say I want to only access a particular dictionary, the one containing POS: 1, or to see all dictionaries containing the product TV. How would I go about this?
Edit:
Even though it is possible to extract product, using it in assignment of product_obj it always returns None. Does anyone know what the reason for this might be?
rdr = csv.DictReader(...) is (probably) dynamically creating the entries as it reads them
Either collect them all into a list() and index them at the line you're after [index] or break when you find the line with the content you want
for row in csv.DictReader(...):
if row.get(search_key) == search_value:
break # or return from a function
raise ValueError(f"failed to find {search_key}")
I wasn't able to run your coding on my computer but there are a lot of ways of doing what you are attempting to do. I would recommend:
https://blog.finxter.com/how-to-filter-a-dictionary-in-python/#:~:text=The%20Most%20Pythonic%20Way%29%201%20Method%201%3A%20Simple,4%20Method%204%3A%20Filter%20By%20Dictionary%20Comprehension%20
If possible, I would recommend using pandas.
import pandas as pd
file = "Your file path here"
df = pd.read_csv(file)
#To filter for 1
print(df[df['POS']==1])
#To filter for TV
print(df[df['Product']=='TV'])
Related
I'm trying to create a dictionary of specific key values from a list of dictionaries. I believe my code is not flattening out the dictionaries when i put in chunkdata.extend(pythondict[0][1][2], chunkdata will return with the whole 1st 2nd and 3rd dictionaries where i want something like the "name" key pair for all the dictionaries that return in the response.
chunkdata = []
for chunk in chunklist:
url3 = "some URL"
headers = {'accept': 'application/json',
response = requests.request("GET", url3, headers=headers)
time.sleep(5)
print(response.text)
pythondict = json.loads(response.text)
print(pythondict)
chunkdata.extend(pythondict['name']['age']['date']
pythondict output
[{'data': {'name': 'jon', 'age': '30', 'date': '2020-01-05', 'time': '1', 'color': 'blue'}, {'data': {'name': 'phil', 'age': '33', 'date': '2020-01-05', 'time': '1', 'color': 'blue'}, {'data': {'name': 'ted', 'age': '25', 'date': '2020-01-05', 'time':'1', 'color': 'blue'}]
Traceback (most recent call last):
File line 84, in <module>
chunkdata.extend(pythondict['name']['age']['date']
TypeError: list indices must be integers or slices, not str
Use requests.json() for parsing. It is more reliable and accurate.
Note: Response header MUST contain Content-Type: application/json in the header in order for .json() method to work
I figured out that the json format you get is not right here. I was not able to make out the necessity of the 'data:' prior to each element.
It would be better to modify it in the following form:
python_dict=[{'name': 'jon', 'age': '30', 'date': '2020-01-05', 'time': '1', 'color': 'blue'}, {'name': 'phil', 'age': '33', 'date': '2020-01-05', 'time': '1', 'color': 'blue'}, {'name': 'ted', 'age': '25', 'date': '2020-01-05', 'time':'1', 'color': 'blue'}]
Modify the relevant part of the code as follows:
chunkdata=[]
for x in range(len(python_dict)):
temp_list=[python_dict[x]['name'],python_dict[x]['age'],python_dict[x]['date'],python_dict[x]['time'],python_dict[x]['color']]
chunkdata.append(temp_list)
print(chunkdata)
chunkdata will be a list of lists that you can keep appending into. The output for chunkdata is as follows:
[['jon', '30', '2020-01-05', '1', 'blue'], ['phil', '33',
'2020-01-05', '1', 'blue'], ['ted', '25', '2020-01-05', '1', 'blue']]
I am declaring a method named add_to_cart(db, itemid, quantity). Whenever the method is called, it will look into the database for the session data. Session data contains a list of dictionaries. The purpose of this method is to create a new entry (dictionary) to the list or update the value of existing entry.
Dictionary has the following keys: id, quantity
So far I have developed the following code. First of all after fetching the data from the database, I am matching the itemid with the dictionary key: 'id'. If the itemid does not match with any of the values of the dictionaries, then it will append a new dictionary to that list.
def add_to_cart(db, itemid, quantity):
# ......
row = cursor.fetchone()
if row is not None:
cart = json.loads(row['data'])
for dic in cart:
if str(dic.get("id")) == str(itemid):
dic['quantity'] = int(dic['quantity']) + quantity
data = json.dumps(cart)
# update the 'data' to the database
break
else:
if counter == len(cart):
item = {
'id': itemid,
'quantity': quantity
}
cart.append(item)
data = json.dumps(cart)
# update the 'data' to the database
break
Let the initial cart is like:
[{'id': '40', 'quantity': '2'}, {'id': '41', 'quantity': '5'}]
When I add 1 more of the item 40 to the cart, this should be like:
[{'id': '40', 'quantity': '3'}, {'id': '41', 'quantity': '5'}]
but I am getting :
[{'id': '40', 'quantity': '2'}, {'id': '41', 'quantity': '5'}, {'id': '40', 'quantity': '1'}]
You are adding a new dictionary to the list when you do cart.append(item),
hence the list
[{'id': '40', 'quantity': '2'}, {'id': '41', 'quantity': '5'}]
ends up being
[{'id': '40', 'quantity': '2'}, {'id': '41', 'quantity': '5'}, {'id': '40', 'quantity': '1'}]
But you want to find the matching id in that list of dictionaries, and add to the quantity for that dictionary.
So the code will look like as below:
li = [{'id': '40', 'quantity': '2'}, {'id': '41', 'quantity': '5'}]
def add_elem(li, id, to_add):
#Iterate over the dictionaries
for item in li:
#If the id is found
if str(id) in item.values():
#Increment the quantity
item['quantity'] = str(int(item['quantity']) + to_add)
#Return the updated list
return li
print(add_elem(li, 40, 1))
The output will be
[{'id': '40', 'quantity': '3'}, {'id': '41', 'quantity': '5'}]
The problem seems to be that you are going simply adding a new dict to the list (cart), through append. You need to go through the list, find the dict with the itemid you need, then add to the quantity.
Try this -
for dict in cart:
if dict[itemid] == itemid:
dict['quantity'] += str(quantity)
break
item = {
'id': itemid,
'quantity': quantity
}
cart.append(item)
I have a set of dictionary objects that has a structure looking like this:
{'android_id': 'ds cgethcvwrzvbjezrzve',
'app': 'hndbfhjdfhf bnmhjknuihklmmkbghjbtfgjnkluilnkkfbnjtkjzn',
'app_ver': '10.0.1_0',
'at': '2016-02-02 23:59:47',
'birth_date': 1447896843,
'browser': 'Android 4',
'carrier': 'Comcast Cable',
'city_name': 'Jacksonville',
'country': 'us',
'custom': {'Action': 'Click',
'Campaign ID': '167713',
'Creative ID': '113961',
'Creative Type': 'Alert',
'Schema Version - Client': '3',
'Schema Version - Server': '1'},
'customer_ids': {'customer_id': '1234587612545464525441540341414'},
'data_conn': 'android_network_type_3',
'device_new': False,
}
My question is. How do I access the nested keys to produce columns in a Pandas DataFrame? I imported from pandas.io.json json_normalize.
and tried, json_normalize(dictionary) but the performance is quite bad because I have about 200,000 entries I would like to normalize. Any help on this is greatly appreciated.
You can un-nest the data then construct your dataframe. Here is how to un-nest it.
df = {'android_id': 'ds cgethcvwrzvbjezrzve',
'app': 'hndbfhjdfhf bnmhjknuihklmmkbghjbtfgjnkluilnkkfbnjtkjzn',
'app_ver': '10.0.1_0',
'at': '2016-02-02 23:59:47',
'birth_date': 1447896843,
'browser': 'Android 4',
'carrier': 'Comcast Cable',
'city_name': 'Jacksonville',
'country': 'us',
'custom': {'Action': 'Click',
'Campaign ID': '167713',
'Creative ID': '113961',
'Creative Type': 'Alert',
'Schema Version - Client': '3',
'Schema Version - Server': '1'},
'customer_ids': {'customer_id': '1234587612545464525441540341414'},
'data_conn': 'android_network_type_3',
'device_new': False,
}
sub_df1 = df.pop('custom')
sub_df2 = df.pop('customer_ids')
df.update(sub_df1)
df.update(sub_df2)
# you can define a function to do this
def un_nest(df):
sub_df1 = df.pop('custom')
sub_df2 = df.pop('customer_ids')
df.update(sub_df1)
df.update(sub_df2)
return df
This gives output like
{'Action': 'Click',
'Campaign ID': '167713',
'Creative ID': '113961',
'Creative Type': 'Alert',
'Schema Version - Client': '3',
'Schema Version - Server': '1',
'android_id': 'ds cgethcvwrzvbjezrzve',
'app': 'hndbfhjdfhf bnmhjknuihklmmkbghjbtfgjnkluilnkkfbnjtkjzn',
'app_ver': '10.0.1_0',
'at': '2016-02-02 23:59:47',
'birth_date': 1447896843,
'browser': 'Android 4',
'carrier': 'Comcast Cable',
'city_name': 'Jacksonville',
'country': 'us',
'customer_id': '1234587612545464525441540341414',
'data_conn': 'android_network_type_3',
'device_new': False}
Pandas has a utility function, pd.io.json.json_normalize, to do this:
import pandas as pd
df = pd.io.json.json_normalize({'android_id': 'ds cgethcvwrzvbjezrzve',
'app': 'hndbfhjdfhf bnmhjknuihklmmkbghjbtfgjnkluilnkkfbnjtkjzn',
'app_ver': '10.0.1_0',
'at': '2016-02-02 23:59:47',
'birth_date': 1447896843,
'browser': 'Android 4',
'carrier': 'Comcast Cable',
'city_name': 'Jacksonville',
'country': 'us',
'custom': {'Action': 'Click',
'Campaign ID': '167713',
'Creative ID': '113961',
'Creative Type': 'Alert',
'Schema Version - Client': '3',
'Schema Version - Server': '1'},
'customer_ids': {'customer_id': '1234587612545464525441540341414'},
'data_conn': 'android_network_type_3',
'device_new': False,
})
df.columns
Output is
['android_id', 'app', 'app_ver', 'at', 'birth_date', 'browser',
'carrier', 'city_name', 'country', 'custom.Action',
'custom.Campaign ID', 'custom.Creative ID', 'custom.Creative Type',
'custom.Schema Version - Client', 'custom.Schema Version - Server',
'customer_ids.customer_id', 'data_conn', 'device_new']
Notice how the function created the nested columns you want. For example: custom.Action and custom.Campaign ID.
New to python here. I've been pulling my hair for hours and still can't figure this out.
I have a list of dictionaries:
[ {'FX0XST001.MID5': '195', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'}
{'FX0XST001.MID13': '4929', 'Name': 'Firmicutes', 'Taxonomy ID': '1239','Type': 'phylum'},
{'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'},
.
.
.
.
{'FX0XST001.MID6': '125', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}
{'FX0XST001.MID25': '70', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}
{'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'} ]
I want to merge the dictionaries in the list based on their Type, Name, and Taxonomy ID
[ {'FX0XST001.MID5': '195', 'FX0XST001.MID13': '4929', 'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'}
.
.
.
.
{'FX0XST001.MID6': '125', 'FX0XST001.MID25': '70', 'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}]
I have the data structure setup like this because I need to write the data to CSV using csv.DictWriter later.
Would anyone kindly point me to the right direction?
You can use the groupby function for this:
http://docs.python.org/library/itertools.html#itertools.groupby
from itertools import groupby
keyfunc = lambda row : (row['Type'], row['Taxonomy ID'], row['Name'])
result = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
# you can either add the matching rows to the item so you end up with what you wanted
item = {}
for row in g:
item.update(row)
result.append(item)
# or you could just add the matched rows as subitems to a parent dictionary
# which might come in handy if you need to work with just the parts that are
# different
item = {'Type': k[0], 'Taxonomy ID' : k[1], 'Name' : k[2], 'matches': [])
for row in g:
del row['Type']
del row['Taxonomy ID']
del row['Name']
item['matches'].append(row)
result.append(item)
Make some test data:
list_of_dicts = [
{"Taxonomy ID":1, "Name":"Bob", "Type":"M", "hair":"brown", "eyes":"green"},
{"Taxonomy ID":1, "Name":"Bob", "Type":"M", "height":"6'2''", "weight":200},
{"Taxonomy ID":2, "Name":"Alice", "Type":"F", "hair":"black", "eyes":"hazel"},
{"Taxonomy ID":2, "Name":"Alice", "Type":"F", "height":"5'7''", "weight":145}
]
I think this (below) is a neat trick using reduce that improves upon the other groupby solution.
import itertools
def key_func(elem):
return (elem["Taxonomy ID"], elem["Name"], elem["Type"])
output_list_of_dicts = [reduce((lambda x,y: x.update(y) or x), list(val)) for key, val in itertools.groupby(list_of_dicts, key_func)]
Then print the output:
for elem in output_list_of_dicts:
print elem
This prints:
{'eyes': 'green', 'Name': 'Bob', 'weight': 200, 'Taxonomy ID': 1, 'hair': 'brown', 'height': "6'2''", 'Type': 'M'}
{'eyes': 'hazel', 'Name': 'Alice', 'weight': 145, 'Taxonomy ID': 2, 'hair': 'black', 'height': "5'7''", 'Type': 'F'}
FYI, Python Pandas is far better for this sort of aggregation, especially when dealing with file I/O to .csv or .h5 files, than the itertools stuff.
Perhaps the easiest thing to do would be to create a new dictionary, indexed by a (Type, Name, Taxonomy ID) tuple, and iterate over your dictionary, storing values by (Type, Name, Taxonomy ID). Use a default dict to make this easier. For example:
from collections import defaultdict
grouped = defaultdict(lambda : {})
# iterate over items and store:
for entry in list_of_dictionaries:
grouped[(entry["Type"], entry["Name"], entry["Taxonomy ID"])].update(entry)
# now you have everything stored the way you want in values, and you don't
# need the dict anymore
grouped_entries = grouped.values()
This is a bit hackish, especially because you end up overwriting "Type", "Name", and "Phylum" every time you use update, but since your dict keys are variable, that might be the best you can do. This will get you at least close to what you need.
Even better would be to do this on your initial import and skip intermediate steps (unless you actually need to transform the data beforehand). Plus, if you could get at the only varying field, you could change the update to just: grouped[(type, name, taxonomy_id)][key] = value where key and value are something like: 'FX0XST001.MID5', '195'
from itertools import groupby
data = [ {'FX0XST001.MID5': '195', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type':'phylum'},
{'FX0XST001.MID13': '4929', 'Name': 'Firmicutes', 'Taxonomy ID': '1239','Type': 'phylum'},
{'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'},
{'FX0XST001.MID6': '125', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'},
{'FX0XST001.MID25': '70', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'},
{'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'} ,]
kk = ('Name', 'Taxonomy ID', 'Type')
def key(item): return tuple(item[k] for k in kk)
result = []
data = sorted(data, key=key)
for k, g in groupby(data, key):
result.append(dict((i, j) for d in g for i,j in d.items()))
print result
I'm using python to fetch issues from Jira with xml-rpc. It works well except it is missing the 'Resolution' field in the returned dictionary. For example 'Fixed', or 'WontFix' etc.
This is how I get the issue from Jira:
import xmlrpclib
s = xmlrpclib.ServerProxy('http://myjira.com/rpc/xmlrpc')
auth = s.jira1.login('user', 'pass')
issue = s.jira1.getIssue(auth, 'PROJ-28')
print issue.keys()
And this is the list of fields that I get back:
['status', 'project', 'attachmentNames', 'votes', 'updated',
'components', 'reporter', 'customFieldValues', 'created',
'fixVersions', 'summary', 'priority', 'assignee', 'key',
'affectsVersions', 'type', 'id', 'description']
The full content is:
{'affectsVersions': [{'archived': 'false',
'id': '11314',
'name': 'v3.09',
'released': 'false',
'sequence': '7'}],
'assignee': 'myuser',
'attachmentNames': '2011-08-17_attach.tar.gz',
'components': [],
'created': '2011-06-14 12:33:54.0',
'customFieldValues': [{'customfieldId': 'customfield_10040', 'values': ''},
{'customfieldId': 'customfield_10010',
'values': 'Normal'}],
'description': "Blah blah...\r\n",
'fixVersions': [],
'id': '28322',
'key': 'PROJ-28',
'priority': '3',
'project': 'PROJ',
'reporter': 'myuser',
'status': '1',
'summary': 'blah blah...',
'type': '1',
'updated': '2011-08-18 15:41:04.0',
'votes': '0'}
When I do:
resolutions = s.jira1.getResolutions(auth )
pprint.pprint(resolutions)
I get:
[{'description': 'A fix for this issue is checked into the tree and tested.',
'id': '1',
'name': 'Fixed'},
{'description': 'The problem described is an issue which will never be fixed.',
'id': '2',
'name': "Won't Fix"},
{'description': 'The problem is a duplicate of an existing issue.',
'id': '3',
'name': 'Duplicate'},
{'description': 'The problem is not completely described.',
'id': '4',
'name': 'Incomplete'},
{'description': 'All attempts at reproducing this issue failed, or not enough information was available to reproduce the issue. Reading the code produces no clues as to why this behavior would occur. If more information appears later, please reopen the issue.',
'id': '5',
'name': 'Cannot Reproduce'},
{'description': 'Code is checked in, and is, er, ready for build.',
'id': '6',
'name': 'Ready For Build'},
{'description': 'Invalid bug', 'id': '7', 'name': 'Invalid'}]
The Jira version is v4.1.1#522 and I using Python 2.7.
Any ideas why I don't get a field called 'resolution'?
Thanks!
The answer is that the getIssue method in JiraXmlRpcService.java calls makeIssueStruct with a RemoteIssue object. The RemoteIssue object contains the Resolution field, but makeIssueStruct copies only values that are set. So if Resolution is not set, it won't appear in the Hashtable there.