Why the loop isn't deleting my specific number? - python

Can someone help me, my code was working fine until I put a loop that checks and deletes an array if it includes "0.00000000" by the second index, it doesn't work and sometimes writes "list index out of range" what's the problem? Thank you in advance, and here is my code:
parse = json.loads(message)
sum = len(parse["b"])
for x in range(sum):
if (parse["b"][x][1] == "0.00000000"):
del parse["b"][x]
My json:
{
"U":26450991840,
"u":26450991976,
"b":[
[
"20640.59000000",
"0.00000000"
],
[
"20640.15000000",
"0.08415000"
],
[
"20640.14000000",
"0.05144000"
],
[
"20640.13000000",
"0.00519000"
],
[
"20640.12000000",
"0.00000000"
],
[
"20640.11000000",
"0.00000000"
],
[
"20640.10000000",
"0.00000000"
]
]
}
I tried to make a script that checks all the json string converting it in dictionary by using python library and deleting all the arrays containing "0.00000000"

As the other answers have indicated, you cannot delete elements from the array that you are currently iterating - it's length is modified.
Here's a solution that generates a completely new list:
parse["b"] = [ x for x in parse["b"] if x[1] != "0.00000000" ]
Above, we use a list comprehension to just skip all elements of parse["b"] that have a "0.0000000" at index 1.

Referring to the JSON that you've provided, when you run this code, you set sum to be 7, ie, x in the for loop takes value 0, 1, 2, 3, 4, 5, and 6.
However, before the execution of the for loop is completed, you modify parse["b"] and delete some values in the list, thereby reducing the size of it. It is now less than 7. So, when the loop reaches an index value that is no longer present in the list, it throws an IndexError.
To better understand this, run:
parse = json.loads(message)
sum = len(parse["b"])
print(f"Original length of list: {sum}")
for x in range(sum):
print(f"Current value of index (x): {x}")
print(f"Current length of list (parse['b']): {len(parse['b'])}")
if (parse["b"][x][1] == "0.00000000"):
del parse["b"][x]

When the first del occurs, your list size changes and your indexes are then invalidated. You can overcome that problem by traversing the list in reverse
myjson = {
"U":26450991840,
"u":26450991976,
"b":[
[
"20640.59000000",
"0.00000000"
],
[
"20640.15000000",
"0.08415000"
],
[
"20640.14000000",
"0.05144000"
],
[
"20640.13000000",
"0.00519000"
],
[
"20640.12000000",
"0.00000000"
],
[
"20640.11000000",
"0.00000000"
],
[
"20640.10000000",
"0.00000000"
]
]
}
blist = myjson['b']
for i in range(len(blist)-1, -1, -1):
if blist[i][1] == "0.00000000":
del blist[i]
print(myjson)
Output:
{'U': 26450991840, 'u': 26450991976, 'b': [['20640.15000000', '0.08415000'], ['20640.14000000', '0.05144000'], ['20640.13000000', '0.00519000']]}
Of course that's an in situ modification of the original dictionary. If you want a new dictionary then:
mynewdict = {}
for k, v in myjson.items():
if k != 'b':
mynewdict[k] = v
else:
for e in v:
if e[1] != "0.00000000":
mynewdict.setdefault(k, []).append(e)
print(mynewdict)

Related

Python - handle empty list when iterating through dict

I have a list of dicts and need to retreive events key which is a list. However that list is not always filled with data, depending on a case.
How to iterate through them and not get list index out of range error?
[-1] does work but when events is and empty list, I get that error.
Sample input:
jobs = [
{
"JobName":"xyz",
"JobRunState":"SUCCEEDED",
"LogGroupName":"xyz",
"Id":"xyz",
"events":[
]
},
{
"JobName":"xyz2",
"JobRunState":"SUCCEEDED",
"LogGroupName":"xyz",
"Id":"xyz",
"events":[
{
"timestamp":1673596884835,
"message":"....",
"ingestionTime":1673598934350
},
{
"timestamp":1673599235711,
"message":"....",
"ingestionTime":1673599236353
}
]
}
]
Code:
success = [
{
"name": x["JobName"],
"state": x["JobRunState"],
"event": self.logs_client.get_log_events(
logGroupName=x["LogGroupName"] + "/output",
logStreamName=x["Id"],
)["events"][-1]["message"],
}
for x in jobs
if x["JobRunState"] in self.SUCCESS
]
Expected behavior: when ["events"] is empty, return "event" as an empty list.
[
{'name': 'xyz', 'state': 'SUCCEEDED', 'event': []},
{'name': 'xyz2', 'state': 'SUCCEEDED', 'event': "...."}
]
Error code:
"event": self.logs_client.get_log_events(
IndexError: list index out of range
If you actually wanted to get all the events and not just the last one, you could do:
success = [
{"event": event["message"]}
for x in jobs
for event in self.logs_client.get_log_events(
logGroupName=x["LogGroupName"] + "/output",
logStreamName=x["Id"],
)["events"]
]
which will simply handle empty lists by not producing a dictionary for those jobs.
If you really just wanted the last one, but still to skip jobs with no events, modify the above code to iterate over a slice of either the last event or no events:
success = [
{"event": last_event["message"]}
for x in jobs
for last_event in self.logs_client.get_log_events(
logGroupName=x["LogGroupName"] + "/output",
logStreamName=x["Id"],
)["events"][-1:]
]
the useful difference of the slice operation being that it gives you a list no matter what rather than an IndexError on an empty list:
>>> [1, 2, 3][-1:]
[3]
>>> [][-1:]
[]
The simple answer is to not try to do everything inside a list comprehension. Just make it a regular loop where you can add more complex logic and build your resulting list with append().
successes = list()
for job in jobs:
if job["state"] in self.SUCCESS:
success = dict()
#do stuff to populate success object
successes.append(success)

Reducing list of tuples and generating mean values (being pythonic!)

I'm a Python beginner and I have written some code which works (shown at the end) but I'd prefer to learn a pythonic way to do this.
I have a list of lists of tuples, as below. There might be anywhere from 1 to 6 tuples in each list. I'd like to determine the mean of the three numerical values in each of the lists, and end up with just one tuple in each list, so something like the second snippet.
[
[
("2022-02-21 20:30:00", None, 331.0),
("2022-02-21 21:00:00", None, 324.0),
("2022-02-21 21:30:00", None, 298.0),
],
[
("2022-02-21 22:00:00", None, 190.0),
("2022-02-21 22:30:00", None, 221.0),
("2022-02-21 23:00:00", None, 155.0),
],
[
("2022-02-21 23:30:00", None, 125.0),
("2022-02-22 00:00:00", None, 95.0),
("2022-02-22 00:30:00", None, 69.0),
],
]
[
[
("2022-02-21 20:30:00", None, 317.7),
],
[
("2022-02-21 22:00:00", None, 188.7),
],
[
("2022-02-21 23:30:00", None, 96.3),
],
]
for li in data:
li = [list(t) for t in li]
sum = 0
for t in li:
sum = sum + t[tuple_idx]
mean = sum / len(li)
li[0][tuple_idx] = mean
new_data.append(tuple(li[0]))
data = new_data
I wouldn't try to make it more pythonic or shorter but more readable.
So I would keep your version with few small changes.
First I would use names which mean something.
And there is function sum() so I wouldn't use it as variable.
new_data = []
value_idx = 2
for group in data:
total_sum = sum(item[value_idx] for item in group)
mean = total_sum / len(group)
first_item = list(group[0])
first_item[value_idx] = mean
new_data.append( [tuple(first_item)] )
data = new_data
This is probably the most pythonic you can get with this:
from statistics import fmean
result = [
[(*x[0][:-1], fmean(map(lambda y: y[2], x)))] for x in inputs
]
You can flatten it to a list of tuples if you want by dropping the square brackets inside the comprehension.

Structure JSON format to a specified data structure

Basically I have a list
data_list = [
'__att_names' : [
['id', 'name'], --> "__t_idx": 0
['location', 'address'] --> "__t_idx": 1
['random_key1', 'random_key2'] "__t_idx": 2
['random_key3', 'random_key4'] "__t_idx": 3
]
"__root": {
"comparables": [
"__g_id": "153564396",
"__atts": [
1, --> This would be technically __att_names[0][1]
'somerandomname',--> This would be technically __att_names[0][2]
{
"__atts": [
'location_value', --> This would be technically __att_names[1][1]
'address_value',--> This would be technically __att_names[1][2]
"__atts": [
]
"__t_idx": 1 --> It can keep getting nested.. further and further.
]
"__t_idx": 1
}
{
"__atts": [
'random_key3value'
'random_key3value'
]
"__t_idx": 3
}
{
"__atts": [
'random_key1value'
'random_key2value'
]
"__t_idx": 2
}
],
"__t_idx": 0 ---> This maps to the first item in __att_names
]
}
]
My desired output in this case would be
[
{
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
I was able to get it working for the first few nested fields for __att_names, but my code was getting really long and wonky when I was doing nested and it felt really repetitive.
I feel like there is a neater and recursive way to solve this.
This is my current approach:
As of now the following code does take care first the very first nested object..
payload_names = data_list['__att_names']
comparable_data = data_list['__root']['comparables']
output_arr = []
for items in comparable_data[:1]:
output = {}
index_number = items.get('__t_idx')
attributes = items.get('__atts')
if attributes:
recursive_function(index_number, attributes, payload_names, output)
output_arr.append(output)
def recursive_function(index, attributes, payload_names, output):
category_location = payload_names[index]
for index, categories in enumerate(category_location):
output[categories] = attributes[index]
if type(attributes[index]) == dict:
has_nested_index = attributes[index].get('__t_idx')
has_nested_attributes = attributes[index].get('__atts')
if has_nested_attributes and has_nested_index:
recursive_function(has_nested_index, has_nested_attributes, payload_names, output)
else:
continue
To further explain given example:
[ {
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
Specifically 'location': 'address_value', The value 'address_value' was derived from the array of comparables key which has the array of dictionaries with key value pair. i.e __g_id and __atts and also __t_idx note some of them might not have __g_id but when there is a key __atts there is also __t_idx which would map the index with array in __att_names
Overally
__att_names are basically all the different keys
and all the items within comparables -> __atts are basically the values for the key names in __att_names.
__t_idx helps us map __atts array items to __att_names and create a dictionary key-value as outcome.
If you want to restructure a complex JSON object, my recommendation is to use jq.
Python package
Oficial website
The data you present is really confusing and ofuscated, so I'm not sure what exact filtering your case would require. But your problem involves indefinitely nested data, for what I understand. So instead of a recursive function, you could make a loop that unnests the data into the plain structure that you desire. There's already a question on that topic.
You can traverse the structure while tracking the __t_idx key values that correspond to list elements that are not dictionaries:
data_list = {'__att_names': [['id', 'name'], ['location', 'address'], ['random_key1', 'random_key2'], ['random_key3', 'random_key4']], '__root': {'comparables': [{'__g_id': '153564396', '__atts': [1, 'somerandomname', {'__atts': ['location_value', 'address_value', {'__atts': [], '__t_idx': 1}], '__t_idx': 1}, {'__atts': ['random_key3value', 'random_key4value'], '__t_idx': 3}, {'__atts': ['random_key1value', 'random_key2value'], '__t_idx': 2}], '__t_idx': 0}]}}
def get_vals(d, f = False, t_idx = None):
if isinstance(d, dict) and '__atts' in d:
yield from [i for a, b in d.items() for i in get_vals(b, t_idx = d.get('__t_idx'))]
elif isinstance(d, list):
yield from [i for b in d for i in get_vals(b, f = True, t_idx = t_idx)]
elif f and t_idx is not None:
yield (d, t_idx)
result = []
for i in data_list['__root']['comparables']:
new_d = {}
for a, b in get_vals(i):
new_d[b] = iter([*new_d.get(b, []), a])
result.append({j:next(new_d[i]) for i, a in enumerate(data_list['__att_names']) for j in a})
print(result)
Output:
[
{'id': 1,
'name': 'somerandomname',
'location': 'location_value',
'address': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value'
}
]

How to combine every nth dict element in python list?

Input:
list1 = [
{
"dict_a":"dict_a_values"
},
{
"dict_b":"dict_b_values"
},
{
"dict_c":"dict_c_values"
},
{
"dict_d":"dict_d_values"
}
]
Assuming n=2, every two elements have to be combined together.
Output:
list1 = [
{
"dict_a":"dict_a_values",
"dict_c":"dict_c_values"
},
{
"dict_b":"dict_b_values",
"dict_d":"dict_d_values"
}
]
Ideally, it'd be nicer if the output could look like something as follows with an extra layer of nesting:
[
{"dict_combined_ac": {
"dict_a":"dict_a_values",
"dict_c":"dict_c_values"
}},
{"dict_combined_bd": {
"dict_b":"dict_b_values",
"dict_d":"dict_d_values"
}}
]
But since this is really difficult to implement, I'd be more than satisfied with an output looking something similar to the first example. Thanks in advance!
What I've tried so far:
[ ''.join(x) for x in zip(list1[0::2], list1[1::2]) ]
However, I know this doesn't work because I'm working with dict elements and not str elements and when wrapping the lists with str(), every two letters is being combined instead. I'm also unsure of how I can adjust this to be for every n elements instead of just 2.
Given the original list, as in the question, the following should generate the required output:
result_list = list()
n = 2 # number of elements you want in each partition
seen_idx = set()
for i in range(len(list1)): # iterate over all indices
if i not in seen_idx:
curr_idx_list = list() # current partition
for j in range(i, len(list1), n): # generate indices for a combination partition
seen_idx.add(j) # keep record of seen indices
curr_idx_list.append(j) # store indices for current partition
# At this point we have indices of a partition, now combine
temp_dict = dict() # temporary dictionary where we store combined values
for j in curr_idx_list: # iterate over indices of current partition
temp_dict.update(list1[j])
result_list.append(temp_dict) # add to result list
print(result_list, '\n')
# Bonus: change result list into list of nested dictionaries
new_res_list = list()
for elem in result_list: # for each (combined) dictionary in the list, we make new keys
key_names = list(elem.keys())
key_names = [e.split('_')[1] for e in key_names]
new_key = 'dict_combined_' + ''.join(key_names)
temp_dict = {new_key: elem}
new_res_list.append(temp_dict)
print(new_res_list, '\n')
The output is as follows:
[{'dict_a': 'dict_a_values', 'dict_c': 'dict_c_values'}, {'dict_b': 'dict_b_values', 'dict_d': 'dict_d_values'}]
[{'dict_combined_ac': {'dict_a': 'dict_a_values', 'dict_c': 'dict_c_values'}}, {'dict_combined_bd': {'dict_b': 'dict_b_values', 'dict_d': 'dict_d_values'}}]

How can I sort or change keys (consisting of numbers) in dictionary, based on other key names?

I have a dictionary, loaded from JSON, like so:
data = {
"1": [
"data",
"data"
],
"2": [
"data",
"data"
],
"3": [
"data",
"data"
]
"5": [
"data",
"data"
]
}
In this instance, "4" is missing, since it has been deleted through another function.
I am trying to write a function that will re-organize/sort the dictionary. This involves fixing holes such as this. In this instance, the numbers should go from 1 to 4 with no holes, this means changing the key-name 5 to 4.
I have written some code to find out which number is missing:
nums = [1, 2, 3, 5]
missing= 0
for x in range(len(data)):
if x not in nums and x is not 0:
missing += x
Greatly appreciate some help. I am simply stuck on how to proceed.
PS: I realize it may not be an optimal data structure. It is like this so I can easily match integers given as system arguments to keys and thus finding corresponding values.
So I just figured out a very easy way of doing it... was a lot simpler than I thought it would be.
for x in range(len(data)):
for k, v in data.items():
data[x] = data.pop(k)
break
You could use enumerate to get the new indices for the existing (sorted) keys:
>>> data = {"1": ["data11", "data12"],
... "2": ["data21", "data22"],
... "3": ["data31", "data32"],
... "5": ["data51", "data52"]}
...
>>> {i: data[k] for i, k in enumerate(sorted(data), start=1)}
{1: ['data11', 'data12'],
2: ['data21', 'data22'],
3: ['data31', 'data32'],
4: ['data51', 'data52']}
Or the same, in-place (the if i != k: is not really needed, but might be faster):
>>> for i, k in enumerate(sorted(data), start=1):
... if i != k:
... data[i] = data.pop(k)
...
>>> data
{1: ['data11', 'data12'],
2: ['data21', 'data22'],
3: ['data31', 'data32'],
4: ['data51', 'data52']}

Categories