I have a string that looks like this:
Standard,NonStandard=[Hybrid,Non-standard,Preferred],AnotherOne=[a, b, c]
and am looking into ways to convert it to this dictionary / json structure via Python.
[{
'value': 'Standard',
},
{
'value': 'NonStandard',
'sub': [ 'Hybrid', 'Non-standard', 'Preferred' ]
},
{
'value': 'AnotherOne',
'sub': [ 'a', 'b', 'c']
}]
I think I can do this via looping over strings and keeping track of the =[ and closing ] but was wondering if there is a more "pythonic" solution.
Related
I have a list of dictionaries as below and I'd like to create a dictionary to store specific data from the list.
test_list = [
{
'id':1,
'colour':'Red',
'name':'Apple',
'edible': True,
'price':100
},
{
'id':2,
'colour':'Blue',
'name':'Blueberry',
'edible': True,
'price':200
},
{
'id':3,
'colour':'Yellow',
'name':'Crayon',
'edible': False,
'price':300
}
]
For instance, a new dictionary to store just the {id, name, price} of the various items.
I created several lists:
id_list = []
name_list = []
price_list = []
Then I added the data I want to each list:
for n in test_list:
id_list.append(n['id']
name_list.append(n['name']
price_list.append(n['price']
But I can't figure out how to create a dictionary (or a more appropriate structure?) to store the data in the {id, name, price} format I'd like. Appreciate help!
If you don't have too much data, you can use this nested list/dictionary comprehension:
keys = ['id', 'name', 'price']
result = {k: [x[k] for x in test_list] for k in keys}
That'll give you:
{
'id': [1, 2, 3],
'name': ['Apple', 'Blueberry', 'Crayon'],
'price': [100, 200, 300]
}
I think a list of dictionaries is stille the right data format, so this:
test_list = [
{
'id':1,
'colour':'Red',
'name':'Apple',
'edible': True,
'price':100
},
{
'id':2,
'colour':'Blue',
'name':'Blueberry',
'edible': True,
'price':200
},
{
'id':3,
'colour':'Yellow',
'name':'Crayon',
'edible': False,
'price':300
}
]
keys = ['id', 'name', 'price']
limited = [{k: v for k, v in d.items() if k in keys} for d in test_list]
print(limited)
Result:
[{'id': 1, 'name': 'Apple', 'price': 100}, {'id': 2, 'name': 'Blueberry', 'price': 200}, {'id': 3, 'name': 'Crayon', 'price': 300}]
This is nice, because you can access its parts like limited[1]['price'].
However, your use case is perfect for pandas, if you don't mind using a third party library:
import pandas as pd
test_list = [
{
'id':1,
'colour':'Red',
'name':'Apple',
'edible': True,
'price':100
},
{
'id':2,
'colour':'Blue',
'name':'Blueberry',
'edible': True,
'price':200
},
{
'id':3,
'colour':'Yellow',
'name':'Crayon',
'edible': False,
'price':300
}
]
df = pd.DataFrame(test_list)
print(df['price'][1])
print(df)
The DataFrame is perfect for this stuff and selecting just the columns you need:
keys = ['id', 'name', 'price']
df_limited = df[keys]
print(df_limited)
The reason I'd prefer either to a dictionary of lists is that manipulating the dictionary of lists will get complicated and error prone and accessing a single record means accessing three separate lists - there's not a lot of advantages to that approach except maybe that some operations on lists will be faster, if you access a single attribute more often. But in that case, pandas wins handily.
In the comments you asked "Let's say I had item_names = ['Apple', 'Teddy', 'Crayon'] and I wanted to check if one of those item names was in the df_limited variable or I guess the df_limited['name'] - is there a way to do that, and if it is then print say the price, or manipulate the price?"
There's many ways of course, I recommend looking into some online pandas tutorials, because it's a very popular library and there's excellent documentation and teaching materials online.
However, just to show how easy it would be in both cases, retrieving the matching objects or just the prices for them:
item_names = ['Apple', 'Teddy', 'Crayon']
items = [d for d in test_list if d['name'] in item_names]
print(items)
item_prices = [d['price'] for d in test_list if d['name'] in item_names]
print(item_prices)
items = df[df['name'].isin(item_names)]
print(items)
item_prices = df[df['name'].isin(item_names)]['price']
print(item_prices)
Results:
[{'id': 1, 'colour': 'Red', 'name': 'Apple', 'edible': True, 'price': 100}, {'id': 3, 'colour': 'Yellow', 'name': 'Crayon', 'edible': False, 'price': 300}]
[100, 300]
id name price
0 1 Apple 100
2 3 Crayon 300
0 100
2 300
In the example with the dataframe there's a few things to note. They are using .isin() since using in won't work in the fancy way dataframes allow you to select data df[<some condition on df using df>], but there's fast and easy to use alternatives for all standard operations in pandas. More importantly, you can just do the work on the original df - it already has everything you need in there.
And let's say you wanted to double the prices for these products:
df.loc[df['name'].isin(item_names), 'price'] *= 2
This uses .loc for technical reasons (you can't modify just any view of a dataframe), but that's way too much to get into in this answer - you'll learn looking into pandas. It's pretty clean and simple though, I'm sure you agree. (you could use .loc for the previous example as well)
In this trivial example, both run instantly, but you'll find that pandas performs better for very large datasets. Also, try writing the same examples using the method you requested (as provided in the accepted answer) and you'll find that it's not as elegant, unless you start by zipping everything together again:
item_prices = [p for i, n, p in zip(result.values()) if n in item_names]
Getting out a result that has the same structure as result is way more trickier with more zipping and unpacking involved, or requires you to go over the lists twice.
Requirement
My requirement is to have a Python code extract some records from a database, format and upload a formatted JSON to a sink.
Planned approach
1. Create JSON-like templates for each record. E.g.
json_template_str = '{{
"type": "section",
"fields": [
{{
"type": "mrkdwn",
"text": "Today *{total_val}* customers saved {percent_derived}%."
}}
]
}}'
2. Extract records from DB to a dataframe.
3. Loop over dataframe and replace the {var} variables in bulk using something like .format(**locals()))
Question
I haven't worked with dataframes before.
What would be the best way to accomplish Step 3 ? Currently I am
3.1 Looping over the dataframe objects 1 by 1 for i, df_row in df.iterrows():
3.2 Assigning
total_val= df_row['total_val']
percent_derived= df_row['percent_derived']
3.3 In the loop format and add str to a list block.append(json.loads(json_template_str.format(**locals()))
I was trying to use the assign() method in dataframe but was not able to figure out a way to use like a lambda function to create a new column with my expected value that I can use.
As a novice in pandas, I feel there might be a more efficient way to do this (which may even involve changing the JSON template string - which I can totally do). Will be great to hear thoughts and ideas.
Thanks for your time.
I would not write a JSON string by hand, but rather create a corresponding python object and then use the json library to convert it into a string. With this in mind, you could try the following:
import copy
import pandas as pd
# some sample data
df = pd.DataFrame({
'total_val': [100, 200, 300],
'percent_derived': [12.4, 5.2, 6.5]
})
# template dictionary for a single block
json_template = {
"type": "section",
"fields": [
{"type": "mrkdwn",
"text": "Today *{total_val:.0f}* customers saved {percent_derived:.1f}%."
}
]
}
# a function that will insert data from each row
# of the dataframe into a block
def format_data(row):
json_t = copy.deepcopy(json_template)
text_t = json_t["fields"][0]["text"]
json_t["fields"][0]["text"] = text_t.format(
total_val=row['total_val'], percent_derived=row['percent_derived'])
return json_t
# create a list of blocks
result = df.agg(format_data, axis=1).tolist()
The resulting list looks as follows, and can be converted into a JSON string if needed:
[{
'type': 'section',
'fields': [{
'type': 'mrkdwn',
'text': 'Today *100* customers saved 12.4%.'
}]
}, {
'type': 'section',
'fields': [{
'type': 'mrkdwn',
'text': 'Today *200* customers saved 5.2%.'
}]
}, {
'type': 'section',
'fields': [{
'type': 'mrkdwn',
'text': 'Today *300* customers saved 6.5%.'
}]
}]
I have a list of dictionaries called api_data, where each dictionary has this structure:
{
'location':
{
'indoor': 0,
'exact_location': 0,
'latitude': '45.502',
'altitude': '133.9',
'id': 12780,
'country': 'IT',
'longitude': '9.146'
},
'sampling_rate': None,
'id': 91976363,
'sensordatavalues':
[
{
'value_type': 'P1',
'value': '8.85',
'id': 197572463
},
{
'value_type': 'P2',
'value': '3.95',
'id': 197572466
}
{
'value_type': 'temperature',
'value': '20.80',
'id': 197572625
},
{
'value_type': 'humidity',
'value': '97.70',
'id': 197572626
}
],
'sensor':
{
'id': 24645,
'sensor_type':
{
'name': 'DHT22',
'id': 9,
'manufacturer':
'various'
},
'pin': '7'
},
'timestamp': '2020-04-18 18:37:50'
},
This structure is not complete for each dictionary, meaning that sometimes a dictionary, a list element or a key is missing.
I want to extract the value of a key when the key value of the same dictionary is equal to a certain value.
For example, for dictionary sensordatavalues, I want the value of the key 'value' when 'value_type' is equal to 'P1'.
I have developed this code working with for and if cycles, but I bet it is heavily inefficient.
How can I do it in a quicker and more efficient way?
Please note that sensordatavalues always exists
for sensor in api_data:
sensordatavalues = sensor['sensordatavalues']
# L_sdv = len(sensordatavalues)
for physical_quantity_recorded in sensordatavalues:
if physical_quantity_recorded['value_type'] == 'P1':
PM10_value = physical_quantity_recorded['value']
If you are confident that the value 'P1' is unique to the key you are searching, you can use the 'in' operator with dict.values()
Should be ok to omit this assignment: sensordatavalues = sensor['sensordatavalues']
for sensor in api_data:
for physical_quantity_recorded in sensor['sensordatavalues']:
if 'P1' in physical_quantity_recorded.values():
PM10_value = physical_quantity_recorded['value']
You just need one for loop:
for x in api_data["sensordatavalues"]:
if x["value_type"] == "P1":
print(x["value"])
Output:
8.85
Use dictionary.get() method if the key not exist it will return default value
for physical_quantity_recorded in api_data['sensordatavalues']:
if physical_quantity_recorded.get('value_type', 'default_value') == 'P1':
PM10_value = physical_quantity_recorded.get('value', 'default_value')
this is an alternative: jmespath - allows you to search and filter a nested dict/json :
summary of jmespath ... to access a key, use the . notation, if ur values are in a list, u access it via the [] notation
NB: dict is wrapped in a data variable
import jmespath
#sensordatavalues is a key, so we can access it directly
#the values of sensordatavalues are wrapped in a list
#to access it we pass the bracket(```[]```)
#we are interested in the dict where value_type is P1
#in jmespath, we identify that using the ? mark to precede the filter object
#pass the filter
#and finally access the key we are interested in ... value
expression = jmespath.compile('sensordatavalues[?value_type==`P1`].value')
expression.search(data)
['8.85']
I am trying to pull multiple values from consul.
after pulling data using the following code:
import consul
c = consul.Consul("consulServer")
index, data = c.kv.get("key",recurese=False)
print data
I am getting the following json in my data list:
[ {
'LockIndex': 0,
'ModifyIndex': 54,
'Value': '1',
'Flags': 0,
'Key': 'test/one',
'CreateIndex': 54
}, {
'LockIndex': 0,
'ModifyIndex': 69,
'Value': '2',
'Flags': 0,
'Key': 'test/two',
'CreateIndex': 69
}]
I want to transform this output to key:value json file. for this example it should look like:
{
"one": "1",
"two": "2"
}
I have two questions:
1. Is there a better way to get multiple values from consul kv?
2. Assuming there is no better way, what is the best way to convert the json from the first example to the second one?
Thanks,
In one of my models that contains a PolygonField I have a to_dict method that takes all the values and turns them into a readable dictionary.
I do a similar thing for a model that has a PointField and it looks like this:
'point': {
'type': 'Point',
'coordinates': [self.point.x, self.point.y]
},
For the PolygonField I have to loop through the points to put them in the dictionary. I tried this but as expected, django complained:
'polygon': {
'type': 'Polygon',
'coordinates': [
for point in self.path:
[point.x, point.y]
]
},
How do you add all the points from a PolygonField into a dictionary?
Figured it out!
'polygon': {
'type': 'Polygon',
'coordinates': [[point.x, point.y] for point in self.polygon]
},