Reducing list of tuples and generating mean values (being pythonic!) - python

I'm a Python beginner and I have written some code which works (shown at the end) but I'd prefer to learn a pythonic way to do this.
I have a list of lists of tuples, as below. There might be anywhere from 1 to 6 tuples in each list. I'd like to determine the mean of the three numerical values in each of the lists, and end up with just one tuple in each list, so something like the second snippet.
[
[
("2022-02-21 20:30:00", None, 331.0),
("2022-02-21 21:00:00", None, 324.0),
("2022-02-21 21:30:00", None, 298.0),
],
[
("2022-02-21 22:00:00", None, 190.0),
("2022-02-21 22:30:00", None, 221.0),
("2022-02-21 23:00:00", None, 155.0),
],
[
("2022-02-21 23:30:00", None, 125.0),
("2022-02-22 00:00:00", None, 95.0),
("2022-02-22 00:30:00", None, 69.0),
],
]
[
[
("2022-02-21 20:30:00", None, 317.7),
],
[
("2022-02-21 22:00:00", None, 188.7),
],
[
("2022-02-21 23:30:00", None, 96.3),
],
]
for li in data:
li = [list(t) for t in li]
sum = 0
for t in li:
sum = sum + t[tuple_idx]
mean = sum / len(li)
li[0][tuple_idx] = mean
new_data.append(tuple(li[0]))
data = new_data

I wouldn't try to make it more pythonic or shorter but more readable.
So I would keep your version with few small changes.
First I would use names which mean something.
And there is function sum() so I wouldn't use it as variable.
new_data = []
value_idx = 2
for group in data:
total_sum = sum(item[value_idx] for item in group)
mean = total_sum / len(group)
first_item = list(group[0])
first_item[value_idx] = mean
new_data.append( [tuple(first_item)] )
data = new_data

This is probably the most pythonic you can get with this:
from statistics import fmean
result = [
[(*x[0][:-1], fmean(map(lambda y: y[2], x)))] for x in inputs
]
You can flatten it to a list of tuples if you want by dropping the square brackets inside the comprehension.

Related

Why the loop isn't deleting my specific number?

Can someone help me, my code was working fine until I put a loop that checks and deletes an array if it includes "0.00000000" by the second index, it doesn't work and sometimes writes "list index out of range" what's the problem? Thank you in advance, and here is my code:
parse = json.loads(message)
sum = len(parse["b"])
for x in range(sum):
if (parse["b"][x][1] == "0.00000000"):
del parse["b"][x]
My json:
{
"U":26450991840,
"u":26450991976,
"b":[
[
"20640.59000000",
"0.00000000"
],
[
"20640.15000000",
"0.08415000"
],
[
"20640.14000000",
"0.05144000"
],
[
"20640.13000000",
"0.00519000"
],
[
"20640.12000000",
"0.00000000"
],
[
"20640.11000000",
"0.00000000"
],
[
"20640.10000000",
"0.00000000"
]
]
}
I tried to make a script that checks all the json string converting it in dictionary by using python library and deleting all the arrays containing "0.00000000"
As the other answers have indicated, you cannot delete elements from the array that you are currently iterating - it's length is modified.
Here's a solution that generates a completely new list:
parse["b"] = [ x for x in parse["b"] if x[1] != "0.00000000" ]
Above, we use a list comprehension to just skip all elements of parse["b"] that have a "0.0000000" at index 1.
Referring to the JSON that you've provided, when you run this code, you set sum to be 7, ie, x in the for loop takes value 0, 1, 2, 3, 4, 5, and 6.
However, before the execution of the for loop is completed, you modify parse["b"] and delete some values in the list, thereby reducing the size of it. It is now less than 7. So, when the loop reaches an index value that is no longer present in the list, it throws an IndexError.
To better understand this, run:
parse = json.loads(message)
sum = len(parse["b"])
print(f"Original length of list: {sum}")
for x in range(sum):
print(f"Current value of index (x): {x}")
print(f"Current length of list (parse['b']): {len(parse['b'])}")
if (parse["b"][x][1] == "0.00000000"):
del parse["b"][x]
When the first del occurs, your list size changes and your indexes are then invalidated. You can overcome that problem by traversing the list in reverse
myjson = {
"U":26450991840,
"u":26450991976,
"b":[
[
"20640.59000000",
"0.00000000"
],
[
"20640.15000000",
"0.08415000"
],
[
"20640.14000000",
"0.05144000"
],
[
"20640.13000000",
"0.00519000"
],
[
"20640.12000000",
"0.00000000"
],
[
"20640.11000000",
"0.00000000"
],
[
"20640.10000000",
"0.00000000"
]
]
}
blist = myjson['b']
for i in range(len(blist)-1, -1, -1):
if blist[i][1] == "0.00000000":
del blist[i]
print(myjson)
Output:
{'U': 26450991840, 'u': 26450991976, 'b': [['20640.15000000', '0.08415000'], ['20640.14000000', '0.05144000'], ['20640.13000000', '0.00519000']]}
Of course that's an in situ modification of the original dictionary. If you want a new dictionary then:
mynewdict = {}
for k, v in myjson.items():
if k != 'b':
mynewdict[k] = v
else:
for e in v:
if e[1] != "0.00000000":
mynewdict.setdefault(k, []).append(e)
print(mynewdict)

Python 2D list slicing where one list is full of empty strings

So say I have a 2D list like:
[
[
"9743",
"user3"
],
[
"435",
"user2"
],
[
"5426",
"user8"
],
[
"",
""
],
[
"9743",
"user9"
]
]
Where the index of that list of empty strings is unknown. Is there an easy way to slice the list so that everything after and including that list with empty strings is removed just keeping the stuff before it?
If you are sure that you have ['',''] try this:
lst = [[ "9743", "user3"],[ "435","user2"],["5426","user8"],["",""],["9743","user9"]]
lst[:lst.index(['',''])]
# Output
# [['9743', 'user3'], ['435', 'user2'], ['5426', 'user8']]
If you are not sure that you have ['',''] try this:
lst = [[ "9743", "user3"],[ "435","user2"],["5426","user8"],["9743","user9"]]
try:
out = lst[:lst.index(['',''])]
except ValueError:
out = lst
Output:
[['9743', 'user3'], ['435', 'user2'], ['5426', 'user8'], ['9743', 'user9']]
Iterate through each item in the list, slice the list when the empty value is found and break.
for i in range(len(lst)):
if lst[i][0] == "" and lst[i][1] == "":
lst = lst[:i]
break
You can use itertools.takewhile:
from itertools import takewhile
list(takewhile(lambda x: x != ['', ''], the_list))
NB. I named the list "the_list"
output:
[['9743', 'user3'], ['435', 'user2'], ['5426', 'user8']]
If you want to stop at any item being '':
from itertools import takewhile
list(takewhile(lambda x: '' not in x, l))
Using iter with sentinel:
list(iter(iter(lst).__next__, ['', '']))
You can lose a for-loop like this: (with x being your list)
output = []
for pair in x:
if pair != ['', '']:
output.append(pair)
else:
break # Breaking as soon as there is an empty pair.
The output is:
[['9743', 'user3'], ['435', 'user2'], ['5426', 'user8']]

Structure JSON format to a specified data structure

Basically I have a list
data_list = [
'__att_names' : [
['id', 'name'], --> "__t_idx": 0
['location', 'address'] --> "__t_idx": 1
['random_key1', 'random_key2'] "__t_idx": 2
['random_key3', 'random_key4'] "__t_idx": 3
]
"__root": {
"comparables": [
"__g_id": "153564396",
"__atts": [
1, --> This would be technically __att_names[0][1]
'somerandomname',--> This would be technically __att_names[0][2]
{
"__atts": [
'location_value', --> This would be technically __att_names[1][1]
'address_value',--> This would be technically __att_names[1][2]
"__atts": [
]
"__t_idx": 1 --> It can keep getting nested.. further and further.
]
"__t_idx": 1
}
{
"__atts": [
'random_key3value'
'random_key3value'
]
"__t_idx": 3
}
{
"__atts": [
'random_key1value'
'random_key2value'
]
"__t_idx": 2
}
],
"__t_idx": 0 ---> This maps to the first item in __att_names
]
}
]
My desired output in this case would be
[
{
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
I was able to get it working for the first few nested fields for __att_names, but my code was getting really long and wonky when I was doing nested and it felt really repetitive.
I feel like there is a neater and recursive way to solve this.
This is my current approach:
As of now the following code does take care first the very first nested object..
payload_names = data_list['__att_names']
comparable_data = data_list['__root']['comparables']
output_arr = []
for items in comparable_data[:1]:
output = {}
index_number = items.get('__t_idx')
attributes = items.get('__atts')
if attributes:
recursive_function(index_number, attributes, payload_names, output)
output_arr.append(output)
def recursive_function(index, attributes, payload_names, output):
category_location = payload_names[index]
for index, categories in enumerate(category_location):
output[categories] = attributes[index]
if type(attributes[index]) == dict:
has_nested_index = attributes[index].get('__t_idx')
has_nested_attributes = attributes[index].get('__atts')
if has_nested_attributes and has_nested_index:
recursive_function(has_nested_index, has_nested_attributes, payload_names, output)
else:
continue
To further explain given example:
[ {
'id': 1,
'name': 'somerandomname',
'location': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value',
}
]
Specifically 'location': 'address_value', The value 'address_value' was derived from the array of comparables key which has the array of dictionaries with key value pair. i.e __g_id and __atts and also __t_idx note some of them might not have __g_id but when there is a key __atts there is also __t_idx which would map the index with array in __att_names
Overally
__att_names are basically all the different keys
and all the items within comparables -> __atts are basically the values for the key names in __att_names.
__t_idx helps us map __atts array items to __att_names and create a dictionary key-value as outcome.
If you want to restructure a complex JSON object, my recommendation is to use jq.
Python package
Oficial website
The data you present is really confusing and ofuscated, so I'm not sure what exact filtering your case would require. But your problem involves indefinitely nested data, for what I understand. So instead of a recursive function, you could make a loop that unnests the data into the plain structure that you desire. There's already a question on that topic.
You can traverse the structure while tracking the __t_idx key values that correspond to list elements that are not dictionaries:
data_list = {'__att_names': [['id', 'name'], ['location', 'address'], ['random_key1', 'random_key2'], ['random_key3', 'random_key4']], '__root': {'comparables': [{'__g_id': '153564396', '__atts': [1, 'somerandomname', {'__atts': ['location_value', 'address_value', {'__atts': [], '__t_idx': 1}], '__t_idx': 1}, {'__atts': ['random_key3value', 'random_key4value'], '__t_idx': 3}, {'__atts': ['random_key1value', 'random_key2value'], '__t_idx': 2}], '__t_idx': 0}]}}
def get_vals(d, f = False, t_idx = None):
if isinstance(d, dict) and '__atts' in d:
yield from [i for a, b in d.items() for i in get_vals(b, t_idx = d.get('__t_idx'))]
elif isinstance(d, list):
yield from [i for b in d for i in get_vals(b, f = True, t_idx = t_idx)]
elif f and t_idx is not None:
yield (d, t_idx)
result = []
for i in data_list['__root']['comparables']:
new_d = {}
for a, b in get_vals(i):
new_d[b] = iter([*new_d.get(b, []), a])
result.append({j:next(new_d[i]) for i, a in enumerate(data_list['__att_names']) for j in a})
print(result)
Output:
[
{'id': 1,
'name': 'somerandomname',
'location': 'location_value',
'address': 'address_value',
'random_key1': 'random_key1value',
'random_key2': 'random_key2value',
'random_key3': 'random_key3value',
'random_key4': 'random_key4value'
}
]

How to combine tuples with the same key value when combine lists

I'm trying to combine multiple lists into one list, the values with the same tuple key must been add together.
For example:
A = [ (1,2),(5,2) ]
B = [ (1,2),(5,5),(11,2) ]
Expected result:
result = [ (1,4),(5,7),(11,2) ]
You can do this quite simply once you realise the idea of keeping track of the first element is done well with a dict
c = dict(A)
for key, value in B:
c[key] = c.get(key, 0) + value
result = list(c.items())
If the order is not important, using collections.Counter is another option:
In [21]: from collections import Counter
In [22]: A = [ (1,2),(5,2) ]
In [23]: B = [ (1,2),(5,5),(11,2) ]
In [24]: (Counter(dict(A)) + Counter(dict(B))).items() # list(...) for Python 3
Out[24]: [(1, 4), (11, 2), (5, 7)]

Fast way to get the label of an item x (in DATA) from a list of labels

I have a list DATA of n lists, and another list LABELS of n elements corresponding to the labels of the elements in DATA. What is the fastest way to get the label of some element x from DATA ? def getLabel(x): ...
A faster way than just doing: LABELS[ DATA.index(x) ]
DATA = [ [2,5,8], [2,4,3], [5,5,7], [9,8,4] ]
LABELS = [ "AAA", "BBB", "AAA", "CCC" ]
print getLabel( [5,5,7] ); # will prints "AAA"
Use a dict here, with the item from DATA as key and corresponding item from LABELS as value.
Dicts provide O(1) lookup, while searching in lists is an O(N) operation.
>>> DATA = [ [2,5,8], [2,4,3], [5,5,7], [9,8,4] ]
>>> LABELS = [ "AAA", "BBB", "AAA", "CCC" ]
>>> get_labels = {tuple(x):y for x,y in zip(DATA,LABELS)}
>>> get_labels[5,5,7]
'AAA'
>>> get_labels[9,8,4]
'CCC'

Categories