Converting data from Json to String in python - python

Could anybody please explain how to convert the following json data into a string in python. It's very big but i need your help...
You can see it from the following link:- http://api.openweathermap.org/data/2.5/forecast/daily?q=delhi&mode=json&units=metric&cnt=7&appid=146f5f89c18a703450d3bd6737d4fc94
Please suggest it's solution it is important for my project :-)

You can decode a JSON string in python like this:
import json
data = json.loads('json_string')
Source: https://docs.python.org/2/library/json.html

import requests
url = 'http://api.openweathermap.org/data/2.5/forecast/daily?q=delhi&mode=json&units=metric&cnt=7&appid=146f5f89c18a703450d3bd6737d4fc94'
response = requests.get(url)
response.text # this is a string
response.json() # this is a json dictionary
s = "The City is {city[name]} todays HIGH is {list[0][temp][max]}".format(**response.json())
print s

Some simple code that will read the JSON from your page and produce a Python dictionary follows. I have used the implicit concatenation of adjacent strings to improve the layout of the code.
import json
import urllib.request
f = urllib.request.urlopen
(url="http://api.openweathermap.org/data/2.5/forecast/daily?"
"q=delhi&mode=json&units=metric&"
"cnt=7&appid=146f5f89c18a703450d3bd6737d4fc94")
content = f.read()
result = json.loads(content.decode("utf-8"))
print(result)
This gives me the following output (which I have not shown in code style as it would appear in a single long line):
{'city': {'coord': {'lat': 28.666668, 'lon': 77.216667}, 'country': 'IN', 'id': 1273294, 'population': 0, 'name': 'Delhi'}, 'cnt': 7, 'message': 0.0081, 'list': [{'dt': 1467093600, 'weather': [{'icon': '01n', 'id': 800, 'description': 'clear sky', 'main': 'Clear'}], 'humidity': 82, 'clouds': 0, 'pressure': 987.37, 'speed': 2.63, 'temp': {'max': 32, 'eve': 32, 'night': 30.67, 'min': 30.67, 'day': 32, 'morn': 32}, 'deg': 104}, {'dt': 1467180000, 'weather': [{'icon': '10d', 'id': 501, 'description': 'moderate rain', 'main': 'Rain'}], 'humidity': 74, 'clouds': 12, 'pressure': 989.2, 'speed': 4.17, 'rain': 9.91, 'temp': {'max': 36.62, 'eve': 36.03, 'night': 31.08, 'min': 29.39, 'day': 35.61, 'morn': 29.39}, 'deg': 126}, {'dt': 1467266400, 'weather': [{'icon': '02d', 'id': 801, 'description': 'few clouds', 'main': 'Clouds'}], 'humidity': 71, 'clouds': 12, 'pressure': 986.56, 'speed': 3.91, 'temp': {'max': 36.27, 'eve': 35.19, 'night': 30.87, 'min': 29.04, 'day': 35.46, 'morn': 29.04}, 'deg': 109}, {'dt': 1467352800, 'weather': [{'icon': '10d', 'id': 502, 'description': 'heavy intensity rain', 'main': 'Rain'}], 'humidity': 100, 'clouds': 48, 'pressure': 984.48, 'speed': 0, 'rain': 18.47, 'temp': {'max': 30.87, 'eve': 30.87, 'night': 28.24, 'min': 24.96, 'day': 27.16, 'morn': 24.96}, 'deg': 0}, {'dt': 1467439200, 'weather': [{'icon': '10d', 'id': 501, 'description': 'moderate rain', 'main': 'Rain'}], 'humidity': 0, 'clouds': 17, 'pressure': 983.1, 'speed': 6.54, 'rain': 5.31, 'temp': {'max': 35.48, 'eve': 32.96, 'night': 27.82, 'min': 27.82, 'day': 35.48, 'morn': 29.83}, 'deg': 121}, {'dt': 1467525600, 'weather': [{'icon': '10d', 'id': 501, 'description': 'moderate rain', 'main': 'Rain'}], 'humidity': 0, 'clouds': 19, 'pressure': 984.27, 'speed': 3.17, 'rain': 7.54, 'temp': {'max': 34.11, 'eve': 34.11, 'night': 27.88, 'min': 27.53, 'day': 33.77, 'morn': 27.53}, 'deg': 133}, {'dt': 1467612000, 'weather': [{'icon': '10d', 'id': 503, 'description': 'very heavy rain', 'main': 'Rain'}], 'humidity': 0, 'clouds': 60, 'pressure': 984.82, 'speed': 5.28, 'rain': 54.7, 'temp': {'max': 33.12, 'eve': 33.12, 'night': 26.15, 'min': 25.78, 'day': 31.91, 'morn': 25.78}, 'deg': 88}], 'cod': '200'}

Related

Error when I try to extract info in a json

I have this code:
api_key = "_________"
ciudad = input("put the city: ")
url = "https://api.openweathermap.org/data/2.5/forecast?q=" +ciudad+ "&appid=" + api_key
print(url)
data = urllib.request.urlopen(url).read().decode()
js = json.loads(data)
And it is all okey
but I need the temp max and min and I try this:
for res in js["list"][0]["main"]:
print("the value of", res["main.temp_min"])
and the code give me this error
TypeError: string indices must be integers
The json it is like:
{'cod': '200', 'message': 0, 'cnt': 40, 'list': [{'dt': 1669032000, 'main': {'temp': 288.99, 'feels_like': 288.35, 'temp_min': 286.43, 'temp_max': 288.99, 'pressure': 1012, 'sea_level': 1012, 'grnd_level': 1007, 'humidity': 66, 'temp_kf': 2.56}, 'weather': [{'id': 500, 'main': 'Rain', 'description': 'light rain', 'icon': '10d'}], 'clouds': {'all': 75}, 'wind': {'speed': 9.85, 'deg': 296, 'gust': 13.2}, 'visibility': 10000, 'pop': 1, 'rain': {'3h': 1.55}, 'sys': {'pod': 'd'}, 'dt_txt': '2022-11-21 12:00:00'}, {'dt': 1669042800, 'main': {'temp': 287.59, 'feels_like': 286.94, 'temp_min': 284.8, 'temp_max': 287.59, 'pressure': 1014, 'sea_level': 1014, 'grnd_level': 1008, 'humidity': 71, 'temp_kf': 2.79}, 'weather': [{'id': 500, 'main': 'Rain', 'description': 'light rain', 'icon': '10d'}], 'clouds': {'all': 78}, 'wind': {'speed': 9.77, 'deg': 314, 'gust': 14.1}, 'visibility': 10000, 'pop': 1, 'rain': {'3h': 2.28}, 'sys': {'pod': 'd'}, 'dt_txt': '2022-11-21 15:00:00'}, {'dt': 1669053600, 'main': {'temp': 286.12, 'feels_like': 285.14, 'temp_min': 284.68, 'temp_max': 286.12, 'pressure': 1016, 'sea_level': 1016, 'grnd_level': 1009, 'humidity': 64, 'temp_kf': 1.44}, 'weather': [{'id': 500, 'main': 'Rain', 'description': 'light rain', 'icon': '10n'}], 'clouds': {'all': 86}, 'wind': {'speed': 8.5, 'deg': 308, 'gust': 12.41}, 'visibility': 10000, 'pop': 1, 'rain': {'3h': 1.46}, 'sys': {'pod': 'n'}, 'dt_txt': '2022-11-21 18:00:00'}, {'dt': 1669064400, 'main': {'temp': 284.63, 'feels_like': 283.53, 'temp_min': 284.63, 'temp_max': 284.63, 'pressure': 1019, 'sea_level': 1019, 'grnd_level': 1010, 'humidity': 65, 'temp_kf': 0}, 'weather': [{'id': 500, 'main': 'Rain', 'description': 'light rain', 'icon': '10n'}], 'clouds': {'all': 100}, 'wind': {'speed': 7.04, 'deg': 300, 'gust': 10.08}, 'visibility': 10000, 'pop': 0.57, 'rain': {'3h': 0.42}, 'sys': {'pod': 'n'}, 'dt_txt': '2022-11-21 21:00:00'}, {'dt': 1669075200, 'main': {'temp': 284.82, 'feels_like': 283.95, 'temp_min': 284.82, 'temp_max': 284.82, 'pressure': 1018, 'sea_level': 1018, 'grnd_level': 1009, 'humidity': 73, 'temp_kf': 0}
js["list"][0]["main"] is a dictionary:
{'temp': 288.99, 'feels_like': 288.35, 'temp_min': 286.43, 'temp_max': 288.99, 'pressure': 1012, 'sea_level': 1012, 'grnd_level': 1007, 'humidity': 66, 'temp_kf': 2.56}
for res in js["list"][0]["main"] iterates over its keys. So res is one of the keys in this dictionary which are strings (hence the error). What you probably want is:
for l in js["list"]:
print("the value of", l["main"]["temp_min"])

Python Request with Cookies - Content blocked by Cookie Banner

I'm trying to access a website via Python Requests. To avoid the iframe of the "Cookie Banner" I want to pass the cookie that handles the banner.
With Selenium I already managed to figure out which cookie that is and there it works fine with just passing the key/value" pair. I already found online, that it is necessary to "get" the page before passing and then refreshing it with "get" again after adding the cookies.
website = "https://www.myfitnesspal.com"
path = "path/to/your/chromedriver.exe"
service = Service(executable_path=path)
driver = webdriver.Chrome(service = service)
driver.get(website)
driver.add_cookie({'name': 'notice_preferences','value': '2:'})
driver.get(website)
So far so good. However, if I pass the same cookie that already worked in Selenium to a Python Request, the response.text that I receive still shows the content of the iframe and "Cookie Banner".
response = requests.get(website, cookies={"notice_preferences":"2:"})
Does anyone know why this is happening or if there is even a solution for this?
I don't think that page's content is blocked by cookie banner. It's rather blocked by the lack of a proper user-agent in header. The following code will return the page content as seen in browser:
import requests
import pandas as pd
s = requests.Session()
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://www.myfitnesspal.com'
s.headers.update(headers)
s.cookies.set("name", "notice_preferences", domain="www.myfitnesspal.com/")
s.cookies.set("value", "2:", domain="www.myfitnesspal.com/")
r = s.get(url)
# print(r.text)
print(s.cookies)
Result printed in terminal:
<RequestsCookieJar[<Cookie split-id=e28e4968-c2e3-4145-9226-0d9db15bcffe for www.myfitnesspal.com/>, <Cookie name=notice_preferences for www.myfitnesspal.com//>, <Cookie value=2: for www.myfitnesspal.com//>]>
You can then navigate to another page in that website, and requests' Session will preserve the headers and cookies. Also, print out the text response, see if the info you're looking for is there.
For more info on Requests, you can visit https://requests.readthedocs.io/en/latest/
EDIT: This is an a la carte XY problem - luckily the OP clarified it in comments.
That data is being pulled via an XHR call from an API endpoint. To get the info youi want, you need to scrape the endpoint. THis is how you do it (after inspecting Dev Tools - Network Tab and seeing that endpoint):
import requests
import pandas as pd
s = requests.Session()
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://www.myfitnesspal.com/api/nutrition?query=banane&page=1&offset=10'
r = s.get(url)
df = pd.DataFrame(r.json()['items'])
display(df)
This will print in terminal:
item tags type
0 {'country_code': 'US', 'deleted': False, 'description': 'Banane', 'id': 1873022840, 'nutritional_contents': {'calcium': 0.5, 'carbohydrates': 22.84, 'cholesterol': 0, 'energy': {'unit': 'calories', 'value': 89}, 'fat': 0.33, 'fiber': 2.6, 'iron': 1.44444, 'monounsaturated_fat': 0.032, 'polyunsaturated_fat': 0.073, 'potassium': 358, 'protein': 1.09, 'saturated_fat': 0.112, 'sodium': 1, 'sugar': 12.23, 'trans_fat': 0, 'vitamin_a': 4.26667, 'vitamin_c': 14.5, 'vitamin_d': 0}, 'public': True, 'serving_sizes': [{'id': '67628178485117', 'index': 0, 'nutrition_multiplier': 1.18, 'unit': 'medium', 'value': 1}, {'id': '67078422671357', 'index': 1, 'nutrition_multiplier': 1.36, 'unit': 'large', 'value': 1}, {'id': '67628178485245', 'index': 2, 'nutrition_multiplier': 1.5, 'unit': 'cup, sliced', 'value': 1}, {'id': '67078414315389', 'index': 3, 'nutrition_multiplier': 2.25, 'unit': 'cup, mashed', 'value': 1}, {'id': '67628170129277', 'index': 4, 'nutrition_multiplier': 0.01, 'unit': 'g', 'value': 1}, {'id': '67078414315517', 'index': 5, 'nutrition_multiplier': 0.283495, 'unit': 'oz', 'value': 1}, {'id': '67628170129405', 'index': 6, 'nutrition_multiplier': 0.81, 'unit': 'extra small', 'value': 1}, {'id': '67078422703997', 'index': 7, 'nutrition_multiplier': 1.52, 'unit': 'extra large', 'value': 1}, {'id': '67628178517885', 'index': 8, 'nutrition_multiplier': 4.53592, 'unit': 'lb(s)', 'value': 1}, {'id': '67078422704125', 'index': 9, 'nutrition_multiplier': 1e-05, 'unit': 'mg(s)', 'value': 1}, {'id': '67628178518013', 'index': 10, 'nutrition_multiplier': 10, 'unit': 'kg(s)', 'value': 1}, {'id': '67076304547197', 'index': 11, 'nutrition_multiplier': 0.00625, 'unit': 'mL, sliced ', 'value': 1}, {'id': '67626060361085', 'index': 12, 'nutrition_multiplier': 0.009375, 'unit': 'mL, mashed ', 'value': 1}, {'id': '67076304547325', 'index': 13, 'nutrition_multiplier': 6.25, 'unit': 'liter(s), sliced ', 'value': 1}, {'id': '67626060361213', 'index': 14, 'nutrition_multiplier': 9.375, 'unit': 'liter(s), mashed ', 'value': 1}], 'type': 'food', 'user_id': '133476501057389', 'verified': True, 'version': '199432263862133'} [canonical, best_match] food
1 {'brand_name': 'Banane', 'country_code': 'FR', 'deleted': False, 'description': 'Une banane', 'id': 2007191148, 'nutritional_contents': {'calcium': 0, 'carbohydrates': 27, 'cholesterol': 0, 'energy': {'unit': 'calories', 'value': 105}, 'fat': 0.4, 'fiber': 2.1, 'iron': 0, 'monounsaturated_fat': 0, 'polyunsaturated_fat': 0, 'potassium': 0, 'protein': 1.3, 'saturated_fat': 0, 'sodium': 0, 'sugar': 12, 'trans_fat': 0, 'vitamin_a': 0, 'vitamin_c': 0}, 'public': True, 'serving_sizes': [{'id': '93902759513197', 'index': 0, 'nutrition_multiplier': 1, 'unit': 'fruit entier (120g)', 'value': 1}, {'id': '94452515327085', 'index': 1, 'nutrition_multiplier': 0.00833333, 'unit': 'gram', 'value': 1}], 'type': 'food', 'user_id': '160406080319149', 'verified': False, 'version': '198055450101605'} [] food
2 {'brand_name': 'Obst', 'country_code': 'DE', 'deleted': False, 'description': ' Banane ()', 'id': 1659839707, 'nutritional_contents': {'calcium': 0.625, 'carbohydrates': 22.84, 'energy': {'unit': 'calories', 'value': 90}, 'fat': 0.33, 'fiber': 2.6, 'iron': 1.857, 'potassium': 358, 'protein': 1.09, 'sodium': 1, 'sugar': 12.23, 'vitamin_a': 0.375, 'vitamin_c': 10.875}, 'public': True, 'serving_sizes': [{'id': '268297681372533', 'index': 0, 'nutrition_multiplier': 1, 'unit': 'g', 'value': 100}, {'id': '268297681372661', 'index': 1, 'nutrition_multiplier': 0.01, 'unit': 'g', 'value': 1}, {'id': '268847437186549', 'index': 2, 'nutrition_multiplier': 0.283495, 'unit': 'ounce', 'value': 1}, {'id': '268297673016693', 'index': 3, 'nutrition_multiplier': 0.992232, 'unit': 'ounce', 'value': 3.5}], 'type': 'food', 'user_id': '163850601983789', 'verified': False, 'version': '129355447387317'} [] food
3 {'brand_name': 'Obst', 'country_code': 'DE', 'deleted': False, 'description': 'Banane 1 Stück', 'id': 1887842011, 'nutritional_contents': {'calcium': 0, 'carbohydrates': 26.4, 'cholesterol': 0, 'energy': {'unit': 'calories', 'value': 115}, 'fat': 0.2, 'iron': 0, 'monounsaturated_fat': 0, 'polyunsaturated_fat': 0, 'potassium': 0, 'protein': 1.2, 'saturated_fat': 0, 'sodium': 0, 'trans_fat': 0, 'vitamin_a': 0, 'vitamin_c': 0}, 'public': True, 'serving_sizes': [{'id': '27521653231597', 'index': 0, 'nutrition_multiplier': 1, 'unit': 'g', 'value': 120}, {'id': '28071409045485', 'index': 1, 'nutrition_multiplier': 0.00833333, 'unit': 'g', 'value': 1}, {'id': '27521661620077', 'index': 2, 'nutrition_multiplier': 0.236246, 'unit': 'ounce', 'value': 1}], 'type': 'food', 'user_id': '234889390534445', 'verified': False, 'version': '53489009870261'} [] food
4 {'brand_name': 'Obst', 'country_code': 'DE', 'deleted': False, 'description': 'Banane', 'id': 227750309, 'nutritional_contents': {'calcium': 5, 'carbohydrates': 22.8, 'cholesterol': 0, 'energy': {'unit': 'calories', 'value': 89}, 'fat': 0.3, 'fiber': 2.6, 'iron': 0.3, 'monounsaturated_fat': 0.1, 'polyunsaturated_fat': 0.1, 'potassium': 358, 'protein': 1.1, 'saturated_fat': 0.1, 'sodium': 1, 'sugar': 12, 'trans_fat': 0, 'vitamin_a': 64, 'vitamin_c': 9}, 'public': True, 'serving_sizes': [{'id': '88267487686061', 'index': 0, 'nutrition_multiplier': 1, 'unit': 'g', 'value': 100}, {'id': '88817243499949', 'index': 1, 'nutrition_multiplier': 0.01, 'unit': 'g', 'value': 1}, {'id': '88267496074541', 'index': 2, 'nutrition_multiplier': 0.283495, 'unit': 'ounce', 'value': 1}], 'type': 'food', 'user_id': '134026256871405', 'verified': True, 'version': '230354056064301'} [] food
5 {'brand_name': 'Banane Ohne Schale', 'country_code': 'DE', 'deleted': False, 'description': 'Banane', 'id': 1889101676, 'nutritional_contents': {'calcium': 0.625, 'carbohydrates': 22.84, 'energy': {'unit': 'calories', 'value': 95}, 'fat': 0.33, 'fiber': 2.6, 'iron': 35.71, 'potassium': 358, 'protein': 1.09, 'sugar': 12.23, 'vitamin_a': 0.375, 'vitamin_c': 10.875}, 'public': True, 'serving_sizes': [{'id': '138151424970349', 'index': 0, 'nutrition_multiplier': 1, 'unit': 'g', 'value': 100}, {'id': '137601669156589', 'index': 1, 'nutrition_multiplier': 0.01, 'unit': 'g', 'value': 1}, {'id': '138151424970477', 'index': 2, 'nutrition_multiplier': 0.283495, 'unit': 'ounce', 'value': 1}], 'type': 'food', 'user_id': '278614430748141', 'verified': False, 'version': '53074667210277'} [] food
6 {'brand_name': 'Banane', 'country_code': 'FR', 'deleted': False, 'description': 'Banane Gebacken', 'id': 1349524295, 'nutritional_contents': {'calcium': 0, 'carbohydrates': 25, 'cholesterol': 0, 'energy': {'unit': 'calories', 'value': 157}, 'fat': 4, 'fiber': 0, 'iron': 0, 'monounsaturated_fat': 0, 'polyunsaturated_fat': 0, 'potassium': 0, 'protein': 4, 'saturated_fat': 0, 'sodium': 1, 'sugar': 12, 'trans_fat': 0, 'vitamin_a': 0, 'vitamin_c': 0}, 'public': True, 'serving_sizes': [{'id': '59106944525429', 'index': 0, 'nutrition_multiplier': 1, 'unit': 'g', 'value': 100}, {'id': '58557188711669', 'index': 1, 'nutrition_multiplier': 0.01, 'unit': 'g', 'value': 1}, {'id': '59106944525557', 'index': 2, 'nutrition_multiplier': 0.283495, 'unit': 'ounce', 'value': 1}, {'id': '58557197100149', 'index': 3, 'nutrition_multiplier': 0.992232, 'unit': 'ounce', 'value': 3.5}], 'type': 'food', 'user_id': '133324127170493', 'verified': False, 'version': '31634077001709'} [] food
7 {'brand_name': 'Banane', 'country_code': 'FR', 'deleted': False, 'description': 'Demi banane', 'id': 139561661, 'nutritional_contents': {'calcium': 0, 'carbohydrates': 14, 'cholesterol': 0, 'energy': {'unit': 'calories', 'value': 93}, 'fat': 2, 'fiber': 0, 'iron': 0, 'monounsaturated_fat': 0, 'polyunsaturated_fat': 0, 'potassium': 0, 'protein': 3, 'saturated_fat': 0, 'sodium': 0, 'sugar': 0, 'trans_fat': 0, 'vitamin_a': 0, 'vitamin_c': 0}, 'public': True, 'serving_sizes': [{'id': '124397058706493', 'index': 1, 'nutrition_multiplier': 1, 'unit': 'yaourt', 'value': 1}], 'type': 'food', 'user_id': '133476501057517', 'verified': False, 'version': '63530949537133'} [] food
8 {'brand_name': 'Banane', 'country_code': 'CA', 'deleted': False, 'description': 'Banane (Santé Canada)', 'id': 1568891032, 'nutritional_contents': {'calcium': 1, 'carbohydrates': 27, 'cholesterol': 0, 'energy': {'unit': 'calories', 'value': 105}, 'fat': 0, 'fiber': 3, 'iron': 2, 'monounsaturated_fat': 0, 'polyunsaturated_fat': 0, 'potassium': 487, 'protein': 1, 'saturated_fat': 0, 'sodium': 1, 'sugar': 14, 'trans_fat': 0, 'vitamin_a': 2, 'vitamin_c': 17}, 'public': True, 'serving_sizes': [{'id': '27092291822629', 'index': 0, 'nutrition_multiplier': 1, 'unit': 'banana 118g', 'value': 1}, {'id': '26542536008869', 'index': 1, 'nutrition_multiplier': 1, 'unit': 'med bananna', 'value': 1}], 'type': 'food', 'user_id': '199722294733869', 'verified': False, 'version': '264028726224173'} [] food
9 {'brand_name': 'Banane', 'country_code': 'CA', 'deleted': False, 'description': ' Une banane moyenne', 'id': 1484522768, 'nutritional_contents': {'calcium': 1.53, 'carbohydrates': 22.8, 'cholesterol': 0, 'energy': {'unit': 'calories', 'value': 89}, 'fat': 0.33, 'fiber': 2, 'iron': 0, 'monounsaturated_fat': 0, 'polyunsaturated_fat': 0.07, 'potassium': 0, 'protein': 1.1, 'saturated_fat': 0.11, 'sodium': 8, 'sugar': 12, 'trans_fat': 0, 'vitamin_a': 0, 'vitamin_c': 0}, 'public': True, 'serving_sizes': [{'id': '63099251926181', 'index': 0, 'nutrition_multiplier': 1, 'unit': 'g', 'value': 100}, {'id': '63649007740069', 'index': 1, 'nutrition_multiplier': 0.01, 'unit': 'g', 'value': 1}, {'id': '63099260314661', 'index': 2, 'nutrition_multiplier': 0.283495, 'unit': 'ounce', 'value': 1}, {'id': '63649016128549', 'index': 3, 'nutrition_multiplier': 0.992232, 'unit': 'ounce', 'value': 3.5}], 'type': 'food', 'user_id': '128659968929645', 'verified': False, 'version': '136640012748413'} [] food
You can drill down further into that json object, (normalize it, etc) to get data in different shapes and forms.

how to make json_normalize build a dataframe from openweather respons

Hi I'm struggling to extract the data from openweather response. I am using json_normalize to to the table but the construction of statement is not clear for me. I managed to divide a peace of data in to smaller portions and to normalize it but I wonder if there is a nice and smooth way of doing it.
'daily': [{'dt': 1612432800, 'sunrise': 1612419552, 'sunset': 1612452288,'temp': {'day': -4.21, 'min': -10.24, 'max': -2.31, 'night': - 10.24, 'eve': -5.11, 'morn': -3.43},
'feels_like': {'day': -10.78, 'night': -13.48, 'eve': -9.52, 'morn': -11.35}, 'pressure': 1010, 'humidity': 96,
'dew_point': -5.84, 'wind_speed': 5.69, 'wind_deg': 13,
'weather': [{'id': 601, 'main': 'Snow', 'description': 'snow', 'icon': '13d'}], 'clouds': 100, 'pop': 1,
'snow': 10.24, 'uvi': 0.89}, {'dt': 1612519200, 'sunrise': 1612505843, 'sunset': 1612538809,
'temp': {'day': -3.7, 'min': -10.24, 'max': -2.6, 'night': -9.09, 'eve': -6.92,'morn': -8.96},
'feels_like': {'day': -8.01, 'night': -13.25, 'eve': -10.96, 'morn': -13.11},
'pressure': 1023, 'humidity': 98, 'dew_point': -4.64, 'wind_speed': 2.59,
'wind_deg': 273, 'weather': [{'id': 802, 'main': 'Clouds', 'description': 'scattered clouds','icon': '03d'}], 'clouds': 29,
'pop': 0.16, 'uvi': 0.91},{'dt': 1612605600, 'sunrise': 1612592132, 'sunset': 1612625330,
'temp': {'day': -8.27, 'min': -15.93, 'max': -7.49, 'night': -15.93, 'eve': -12.8, 'morn': -10.72},
'feels_like': {'day': -12.82, 'night': -20.74, 'eve': -17.38, 'morn': -14.93}, 'pressure': 1024,
'humidity': 92, 'dew_point': -11.71, 'wind_speed': 2.21, 'wind_deg': 32,
'weather': [{'id': 803, 'main': 'Clouds', 'description': 'broken clouds', 'icon': '04d'}], 'clouds': 67,
'pop': 0, 'uvi': 0.86}, {'dt': 1612692000, 'sunrise': 1612678420, 'sunset': 1612711851,
'temp': {'day': -11.72, 'min': -16.93, 'max': -9.81, 'night': -14.36, 'eve': -11.18,'morn': -16.76},
'feels_like': {'day': -17.5, 'night': -20.73, 'eve': -17.09, 'morn': -22},
'pressure': 1023, 'humidity': 94, 'dew_point': -13.77, 'wind_speed': 3.65,
'wind_deg': 81, 'weather': [{'id': 803, 'main': 'Clouds', 'description': 'broken clouds', 'icon': '04d'}], 'clouds': 54, 'pop': 0,'uvi': 0.98}, {'dt': 1612778400, 'sunrise': 1612764705, 'sunset': 1612798372,
'temp': {'day': -12.41, 'min': -15.94, 'max': -8.43, 'night': -11.33,'eve': -9.23, 'morn': -15.94},
'feels_like': {'day': -20.36, 'night': -19.04, 'eve': -17.44,'morn': -22.64}, 'pressure': 1015, 'humidity': 90,
'dew_point': -16.35, 'wind_speed': 6.64, 'wind_deg': 69, 'weather': [{'id': 804, 'main': 'Clouds', 'description': 'overcast clouds', 'icon': '04d'}], 'clouds': 97, 'pop': 0,'uvi': 1.01},{'dt': 1612864800, 'sunrise': 1612850989, 'sunset': 1612884894,'temp': {'day': -13.58, 'min': -14.7, 'max': -11.21, 'night': -11.4, 'eve': -11.26, 'morn': 13.48},'feels_like': {'day': -19.95, 'night': -17.27, 'eve': -17.3, 'morn': -20.35}, 'pressure': 1014, 'humidity': 94,'dew_point': -15.84, 'wind_speed': 4.33, 'wind_deg': 60,'weather': [{'id': 600, 'main': 'Snow', 'description': 'light snow', 'icon': '13d'}], 'clouds': 100,
'pop': 0.73, 'snow': 0.83, 'uvi': 0.98}, {'dt': 1612951200, 'sunrise': 1612937272, 'sunset': 1612971415,
'temp': {'day': -13.58, 'min': -17.87, 'max': -11.37,'night': -17.87, 'eve': -13.19, 'morn': -13.34},
'feels_like': {'day': -19.11, 'night': -23.19, 'eve': -18.44,'morn': -18.75}, 'pressure': 1021, 'humidity': 94,'dew_point': -15.74, 'wind_speed': 3.14, 'wind_deg': 54, 'weather': [{'id': 600, 'main': 'Snow', 'description': 'light snow', 'icon': '13d'}], 'clouds': 82, 'pop': 0.73,'snow': 0.78, 'uvi': 1},{'dt': 1613037600, 'sunrise': 1613023553, 'sunset': 1613057936,
'temp': {'day': -16.26, 'min': -20.28, 'max': -13.32, 'night': -19.55, 'eve': -14.36, 'morn': -19.46},
'feels_like': {'day': -22.12, 'night': -25.23, 'eve': -20, 'morn': -24.97}, 'pressure': 1028, 'humidity': 93,
'dew_point': -18.8, 'wind_speed': 3.41, 'wind_deg': 77,'weather': [{'id': 801, 'main': 'Clouds', 'description': 'few clouds', 'icon': '02d'}], 'clouds': 18, 'pop': 0,'uvi': 1}]}
day = temp_Json['daily']
data_frame_day = pd.json_normalize(day, 'weather', ['dt', 'sunrise', 'sunset', 'pressure', 'humidity', 'dew_point', 'wind_speed','wind_deg', 'clouds', 'pop', 'snow', 'uvi', ['temp', 'day'],['temp', 'min'],['temp', 'max'], ['temp', 'night'], ['temp', 'eve'], ['temp', 'morn'],['feels_like', 'day'], ['feels_like', 'night'], ['feels_like', 'eve'],['feels_like', 'morn']], errors='ignore')
The error is:
Traceback (most recent call last):
File "C:\Users\Jakub\PycharmProjects\Tests\main.py", line 263, in <module>
data_frame_day = pd.json_normalize(day, 'weather',
File "C:\Users\Jakub\PycharmProjects\Tests\venv\lib\site-packages\pandas\io\json\_normalize.py", line 336, in _json_normalize
_recursive_extract(data, record_path, {}, level=0)
File "C:\Users\Jakub\PycharmProjects\Tests\venv\lib\site-packages\pandas\io\json\_normalize.py", line 329, in _recursive_extract
raise KeyError(
KeyError: "Try running with errors='ignore' as key 'snow' is not always present"
This is how I would normalize the records:
df = pd.DataFrame(day)
# since weather column contains a list we need to transform each element to a row
df = df.explode('weather')
# normalize columns
weather = pd.json_normalize(df['weather']).add_prefix('weather.')
feels_like = pd.json_normalize(df['feels_like']).add_prefix('feels_like.')
temp = pd.json_normalize(df['temp']).add_prefix('temp.')
# join columns together after normalization and drop original unnormalized columns
df_normalized = pd.concat([weather, temp, feels_like, df], axis=1).drop(columns=['weather', 'temp', 'feels_like'])
This will give you the normalized dataframe.

How to dynamically format nested list of dict with less latency

I need your expertise to easy the nested dictionary formatting. I have list of input signals which need to be grouped on the u_id and on timestamp field based on minute precision and convert to respective output format. I have posted the formatting i have tried. I need to easily format and process it as fast as possible, because time complexity is involved. help highly appreciated.
Code snippet
final_output = []
sorted_signals = sorted(signals, key=lambda x: (x['u_id'], str(x['start_ts'])[0:8]))
data = itertools.groupby(sorted_signals, key=lambda x: (x['u_id'], calendar.timegm(time.strptime(datetime.utcfromtimestamp(x['start_ts']).strftime('%Y-%m-%d-%H:%M'),'%Y-%m-%d-%H:%M'))))
def format_signals(v):
result =[]
for i in v:
temp_dict = {}
temp_dict.update({'timestamp_utc': i['start_ts']})
for data in i['sign']:
temp_dict.update({data['name'].split('.')[0]: data['val']})
result.append(temp_dict)
return result
for k, v in data:
output_format = {'ui_id': k[0], 'minute_utc': datetime.fromtimestamp(int(k[1])), 'data': format_signals(v),
'processing_timestamp_utc': datetime.strptime(datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S"),"%Y-%m-%d %H:%M:%S")}
final_output.append(output_format)
print(final_output)
Input
signals = [
{'c_id': '1234', 'u_id': 288, 'f_id': 331,
'sign': [{'name': 'speed', 'val': 9},
{'name': 'pwr', 'val': 1415}], 'start_ts': 1598440244,
'crt_ts': 1598440349, 'map_crt_ts': 1598440351, 'ca_id': 'AT123', 'c_n': 'demo',
'msg_cnt': 2, 'window': 'na', 'type': 'na'},
{'c_id': '1234', 'u_id': 288, 'f_id': 331,
'sign': [{'name': 'speed', 'val': 10},
{'name': 'pwr', 'val': 1416}], 'start_ts': 1598440243,
'crt_ts': 1598440349, 'map_crt_ts': 1598440351, 'ca_id': 'AT123', 'c_n': 'demo',
'msg_cnt': 2, 'window': 'na', 'type': 'na'},
{'c_id': '1234', 'u_id': 287, 'f_id': 331,
'sign': [{'name': 'speed', 'val': 10},
{'name': 'pwr', 'val': 1417}], 'start_ts': 1598440344,
'crt_ts': 1598440349, 'map_crt_ts': 1598440351, 'ca_id': 'AT123', 'c_n': 'demo',
'msg_cnt': 2, 'window': 'na', 'type': 'na'},
{'c_id': '1234', 'u_id': 288, 'f_id': 331,
'sign': [{'name': 'speed.', 'val': 8.2},
{'name': 'pwr', 'val': 925}], 'start_ts': 1598440345,
'crt_ts': 1598440349, 'map_crt_ts': 1598440351, 'ca_id': 'AT172', 'c_n': 'demo',
'msg_cnt': 2, 'window': 'na', 'type': 'na'}
]
Current output
[{
'ui_id': 287,
'minute_utc': datetime.datetime(2020, 8, 26, 16, 42),
'data': [{
'timestamp_utc': 1598440344,
'speed': 10,
'pwr': 1417
}],
'processing_timestamp_utc': datetime.datetime(2020, 8, 29, 19, 35, 46)
}, {
'ui_id': 288,
'minute_utc': datetime.datetime(2020, 8, 26, 16, 40),
'data': [{
'timestamp_utc': 1598440244,
'speed': 9,
'pwr': 1415
}, {
'timestamp_utc': 1598440243,
'speed': 10,
'pwr': 1416
}],
'processing_timestamp_utc': datetime.datetime(2020, 8, 29, 19, 35, 46)
}, {
'ui_id': 288,
'minute_utc': datetime.datetime(2020, 8, 26, 16, 42),
'data': [{
'timestamp_utc': 1598440345,
'speed': 8.2,
'pwr': 925
}],
'processing_timestamp_utc': datetime.datetime(2020, 8, 29, 19, 35, 46)
}]
Required Output
[{
'ui_id': 287,
'f_id': 311,
'c_id': 1234,
'minute_utc': datetime.datetime(2020, 8, 26, 16, 42),
'data': [{
'timestamp_utc': 1598440344,
'speed': 10,
'pwr': 1417
}],
'processing_timestamp_utc': datetime.datetime(2020, 8, 29, 19, 35, 46)
}, {
'ui_id': 288,
'f_id': 311,
'c_id': 1234,
'minute_utc': datetime.datetime(2020, 8, 26, 16, 40),
'data': [{
'timestamp_utc': 1598440244,
'speed': 9,
'pwr': 1415
}, {
'timestamp_utc': 1598440243,
'speed': 10,
'pwr': 1416
}],
'processing_timestamp_utc': datetime.datetime(2020, 8, 29, 19, 35, 46)
}, {
'ui_id': 288,
'f_id': 311,
'c_id': 1234,
'minute_utc': datetime.datetime(2020, 8, 26, 16, 42),
'data': [{
'timestamp_utc': 1598440345,
'speed': 8.2,
'pwr': 925
}],
'processing_timestamp_utc': datetime.datetime(2020, 8, 29, 19, 35, 46)
}]
So, let's define simple function which will extract from each object keys which required for grouping:
def extract(obj):
return obj['u_id'], obj['f_id'], obj['c_id'], obj['start_ts'] // 60 * 60
Note: to implement "minutes precision" I've divided timestamp to 60 to cut seconds and multiply to 60 to get valid timestamp back.
Then let's group objects and form final list:
from itertools import groupby
from datetime import datetime
...
final_output = []
for (uid, fid, cid, ts), ss in groupby(sorted(signals, key=extract), extract):
obj = {
'ui_id': uid,
'f_id': fid,
'c_id': int(cid),
'minute_utc': datetime.utcfromtimestamp(ts),
'data': [],
'processing_timestamp_utc': datetime.utcnow()
}
for s in ss:
obj['data'].append({
'timestamp_utc': s['start_ts'],
**{i['name']: i['val'] for i in s['sign']}
})
final_output.append(obj)
To print final_output in readable form we could use pprint:
from pprint import pprint
...
pprint(final_output, sort_dicts=False)
Maybe this helps you to write the code in a more straightforward way. If you can just go through the signals and organize them in one loop, maybe you don't need the sort and groupby which may be heavier.
As you want to gather the signals based on the u_id, a dictionary is handy to get a single entry per u_id. This does that much, you just need to add creating the output based on this organized dict of signals:
organized = {}
for s in signals:
u_id = s['u_id']
entry = organized.get(u_id, None)
if entry is None:
entry = []
organized[u_id] = entry
entry.append(s)
pprint.pprint(organized)
Is executable there, and output pasted below, https://repl.it/repls/ShallowQuintessentialInteger
{287: [{'c_id': '1234',
'c_n': 'demo',
'ca_id': 'AT123',
'crt_ts': 1598440349,
'f_id': 331,
'map_crt_ts': 1598440351,
'msg_cnt': 2,
'sign': [{'name': 'speed', 'val': 10}, {'name': 'pwr', 'val': 1417}],
'start_ts': 1598440344,
'type': 'na',
'u_id': 287,
'window': 'na'}],
288: [{'c_id': '1234',
'c_n': 'demo',
'ca_id': 'AT123',
'crt_ts': 1598440349,
'f_id': 331,
'map_crt_ts': 1598440351,
'msg_cnt': 2,
'sign': [{'name': 'speed', 'val': 9}, {'name': 'pwr', 'val': 1415}],
'start_ts': 1598440244,
'type': 'na',
'u_id': 288,
'window': 'na'},
{'c_id': '1234',
'c_n': 'demo',
'ca_id': 'AT123',
'crt_ts': 1598440349,
'f_id': 331,
'map_crt_ts': 1598440351,
'msg_cnt': 2,
'sign': [{'name': 'speed', 'val': 10}, {'name': 'pwr', 'val': 1416}],
'start_ts': 1598440243,
'type': 'na',
'u_id': 288,
'window': 'na'},
{'c_id': '1234',
'c_n': 'demo',
'ca_id': 'AT172',
'crt_ts': 1598440349,
'f_id': 331,
'map_crt_ts': 1598440351,
'msg_cnt': 2,
'sign': [{'name': 'speed.', 'val': 8.2}, {'name': 'pwr', 'val': 925}],
'start_ts': 1598440345,
'type': 'na',
'u_id': 288,
'window': 'na'}]}

DataFrame pop function removing wanted values in Nest Dictionary

I have a DataFrame that has a nested dict within a column. I am removing the nested values and creating a column for each associated key. When using the pop function on pricings it removes values that are wanted. I wish to keep the '1 color', '2 color', '3 color', '4 color', '5 color', '6 color'.
The nested dict looks like this, with column name variations
{'name': 'printing on a DARK shirt',
'pricings': {'1 color': [{'max': 47, 'min': 1, 'price': 100.0},
{'max': 71, 'min': 48, 'price': 40.25},
{'max': 143, 'min': 72, 'price': 2.8},
{'max': 287, 'min': 144, 'price': 2.5}],
'2 color': [{'max': 47, 'min': 1, 'price': 200.0},
{'max': 71, 'min': 48, 'price': 4.25},
{'max': 143, 'min': 72, 'price': 3.8},
{'max': 287, 'min': 144, 'price': 3.5}],
'3 color': [{'max': 47, 'min': 1, 'price': 300.0},
{'max': 71, 'min': 48, 'price': 5.25},
{'max': 143, 'min': 72, 'price': 4.8},
{'max': 287, 'min': 144, 'price': 4.5}],
'4 color': [{'max': 47, 'min': 1, 'price': 400.0},
{'max': 71, 'min': 48, 'price': 6.25},
{'max': 143, 'min': 72, 'price': 5.8},
{'max': 287, 'min': 144, 'price': 5.5}],
'5 color': [{'max': 47, 'min': 1, 'price': 500.0},
{'max': 71, 'min': 48, 'price': 7.5},
{'max': 143, 'min': 72, 'price': 7.0},
{'max': 287, 'min': 144, 'price': 6.6}],
'6 color': [{'max': 47, 'min': 1, 'price': 600.0},
{'max': 71, 'min': 48, 'price': 8.5},
{'max': 143, 'min': 72, 'price': 8.0},
{'max': 287, 'min': 144, 'price': 7.6}]}}
The code I'm using looks like this
df2 = (pd.concat({i: pd.DataFrame(x) for i, x in df1.pop('variations').items()})
.reset_index(level=1, drop=True)
.join(df1 , how='left', lsuffix='_left', rsuffix='_right')
.reset_index(drop=True))
The output is as follows, with the new column name pricing added.
[{'max': 47, 'min': 1, 'price': 20.0},
{'max': 71, 'min': 48, 'price': 4.25},
{'max': 143, 'min': 72, 'price': 3.8},
{'max': 287, 'min': 144, 'price': 3.5}]
If its not clear in the DataFrame the actual list of colors '1 color', '2 color', '3 color', '4 color', '5 color', '6 color'. ranges has fallen off. This is important and the portion I want most. the colors have not created there own column so we are clear.

Categories