Hi I have a list of numbers with some 'None''s in them that I want to replace with other numbers in the list that are not 'None'.
For example, for the list below:
listP = [ 2.5, 3, 4, None, 4, 8.5, None, 7.3]
I want the two None items to be replaced with random numbers in the list that are not themselves a None. So in this example the None could be replaced by 2.5 or 3 or 4 or 8.5 or 7.3.
Is there anyway to do this in one line of code?
You'll need to use two steps; extract the non-None values for random.choice() to pick from, then use a list comprehension to actually pick the random values:
import random
numbers = [n for n in listP if n is not None]
result = [n if n is not None else random.choice(numbers) for n in listP]
You can filter the list first to construct a list of not-None values and then randomly choose from it using choice:
import random
listP = [ 2.5, 3, 4, None, 4, 8.5, None, 7.3]
listR = filter(lambda x: x is not None, listP)
print([l if l is not None else random.choice(listR) for l in listP])
result:
[2.5, 3, 4, 7.3, 4, 8.5, 4, 7.3]
Use list comprehensions
>>> [x if x else 'WHATEVER' for x in [ 2.5, 3, 4, None, 4, 8.5, None, 7.3]]
[2.5, 3, 4, 'WHATEVER', 4, 8.5, 'WHATEVER', 7.2999999999999998]
You could use the following:
import random
listP = [2.5, 3, 4, None, 4, 8.5, None, 7.3]
numbers = [num for num in listP if num is not None]
answer = [el if el is not None else random.choice(numbers) for el in listP]
print(answer)
Sample Output
[2.5, 3, 4, 3, 4, 8.5, 8.5, 7.3]
This creates a numbers list by filtering out the None values from listP. It then uses a list comprehension with random.choice() to fill None values with a random choice from numbers.
Get any value from a list except None. I use max function to get this.
Check max value is not None
Use list comprehension to create new list which replace None value with max value.
Demo:
>>> listP = [ 2.5, 3, 4, None, 4, 8.5, None, 7.3]
>>> l_max = max(listP)
>>> if l_max:
... listP = [n if n is not None else max(listP) for n in listP]
...
>>> listP
[2.5, 3, 4, 8.5, 4, 8.5, 8.5, 7.3]
>>>
max is inbuilt function.
List Comprehension
Related
I have this code which works fine and is minimal and reproducible. It uses lists and tuples. Given the slowness of lists and tuples on large amounts of data, i would like to change the whole setting and use dictionaries to speed up performance.
So I'd like to convert this block of queues into something similar that uses dictionaries.
The purpose of the code is to create the variables x and y (calculation of mathematical data) and add them to a list, using an append and tuples. I then mine the numbers for certain purposes.
How can I add dictionaries where needed and replace them with list/append codes? Thank you!
VERSION WITH TUPLE AND LIST
mylist = {('Jack', 'Grace', 8, 9, '15:00'): [0, 1, 1, 5],
('William', 'Dawson', 8, 9, '18:00'): [1, 2, 3, 4],
('Natasha', 'Jonson', 8, 9, '20:45'): [0, 1, 1, 2]}
new = []
for key, value in mylist.items():
#create variables and perform calculations
calc_x= sum(value)/ len(value)
calc_y = (calc_x *100) / 2
#create list with 3 tuples inside
if calc_x > 0.1:
new.append([[key], [calc_x], [calc_y]])
print(new)
print(" ")
#example for call calc_x
print_x = [tuple(i[1]) for i in new]
print(print_x)
I was trying to write something like this, but I don't think it fits, so don't even look at it.I have two requests if possible:
I would like sum(value)/ len(value) and (calc_x *100) / 2 to continue to have their own variables calc_x and calc_y, so that they can invoke individually in the append as you can see
In the new variable, i would like to be able to call the variables when i are needed, such as for example i do for print_x = [tuple(i[1]) for i in new]. Thank you
If you really want to improve performance, you can use Pandas (or Numpy) to vectorize math operations:
import pandas as pd
# Transform your dataset to DataFrame
df = pd.DataFrame.from_dict(mylist, orient='index')
# Compute some operations
df['x'] = df.mean(axis=1)
df['y'] = df['x'] * 50
# Filter out and export
out = df.loc[df['x'] > 0.1, ['x', 'y']].to_dict('split')
new = dict(zip(out['index'], out['data']))
Output:
>>> new
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}
A numpy version:
import numpy as np
# transform keys to numpy array (special hack to keep tuples)
keys = np.empty(len(mylist), dtype=object)
keys[:] = tuple(mylist.keys())
# transform values to numpy array
vals = np.array(tuple(mylist.values()))
x = np.mean(vals, axis=1)
y = x * 50
# boolean mask to exclude some values
m = x > 0.1
out = np.vstack([x, y]).T
new = dict(zip(keys[m].tolist(), out[m].tolist()))
print(new)
# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}
A python version:
new = {}
for k, v in mylist.items():
x = sum(v) / len(v)
y = x * 50
if x > 0.1:
new[k] = [x, y]
print(new)
# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}
Update: How to extract x:
# Pandas
>>> df['x'].tolist() # or simply df['x'] to extract the column
[1.75, 2.5, 1.0]
# Python
>>> [v[0] for v in new.values()]
[1.75, 2.5, 1.0]
I have two unsorted lists as followed:
A = [1, 3, 1.75]
B = [0, 1.5, 2, 4]
I want to make a list that includes the numbers in A and B in a sorted manner (e.g. ascending). However, I want to keep the sequence from each list as well. The suitable output would look like something below:
AB = [0, 1, 1.5, 2, 3, 1.75, 4]
Do you have any ideas/hints on how to do this? The original problem includes 150 lists that need to be merged into one list like above. Thank you for your ideas beforehand!
This looks like a "merge" problem to me:
def merge(lists):
iters = [iter(s) for s in lists]
heads = [next(s) for s in iters]
res = []
inf = float('inf')
while True:
v, n = min((v, n) for n, v in enumerate(heads))
if v == inf:
return res
res.append(v)
try:
heads[n] = next(iters[n])
except StopIteration:
heads[n] = inf
lists = [
[1,2,3,8],
[1,7,4],
[6,9,1,2,3],
]
print(merge(lists))
## [1, 1, 2, 3, 6, 7, 4, 8, 9, 1, 2, 3]
I need to write a function called var_fun that returns the variance of the list and testing with 2 lists. I have to return the standard deviation of each element actually.
list_1 = [8, 8, 3, 5, 5, 8, 1, 4, 8, 6, 3, 10, 9]
list_2 = [8, 12, 3, 5, 5, 8, 1, 4, 8, 3, 10, 9]
This is the code that I did but it returns <function var_func at 0x7f462679ad08>
How can I solve this?
def var_fun(x):
for i in x:
var = ((i - mean_fun(x))**2)
return var_fun
print(var_fun(list_1))
print(var_fun(list_2))
This is my mean_fun:
def mean_fun(values) :
length = len(values)
total_sum = 0
for i in range(length):
total_sum += values[i]
average = (total_sum/length)
return round(average, 2)
print(mean_fun(list_1))
print(mean_fun(list_2))
The output should look like this:
[25.0, 9.0, 9.0, 4.0, 1.0, 1.0, 0.0, 4.0, 4.0, 4.0, 4.0, 9.0, 16.0]
[28.41, 11.09, 11.09, 5.43, 1.77, 1.77, 2.79, 2.79, 2.79, 7.13, 13.47, 32.15]
In function var_fun(), return var instead of var_fun. That should solve the problem.
You get the error because you are trying to return the function itself.
Also, these lines don't make any sense. You can remove them:
mean_fun = list_1
mean_fun = list_2
Better Alternative:
https://numpy.org/doc/stable/reference/generated/numpy.var.html
https://numpy.org/doc/stable/reference/generated/numpy.mean.html
Have a look at the above numpy functions which does the job of directly calculating the mean and variance.
I have a frequency counter, to iterate through a list of times and tell me how often each number comes up. First, I run it through a function to remove the decimals using int(). I check this with a print statement at the bottom, it works fine. But for some reason, even though the frequency thing comes after I change the the values with int(). Here is my code and I'll give some output.
from itertools import groupby
times = [1.23, 1.23, 2.56, 1.23, 1.23, 1.23, 1.23, 1.5, 4.32, 5.3, 2.5, 5.7, 3.4, 8.9, 8.9, 8.9]
newtimes = []
lentimes = len(times)
for time in times:
#Rounds down every time
time = int(time)
#Adds time to new list
newtimes.append(time)
setTimes = list(set(newtimes))
freqlist = [len(list(group)) for key, group in groupby(newtimes)]
print(newtimes)
print(lentimes)
print(setTimes)
print("Freqlist is " + str(freqlist))
the output looks like:
[1, 1, 2, 1, 1, 1, 1, 1, 4, 5, 2, 5, 3, 8, 8, 8]
16
[1, 2, 3, 4, 5, 8]
Freqlist is [2, 1, 5, 1, 1, 1, 1, 1, 3]
It took my a while to figure out what was up with the freqlist output, it's doing everything right, but it is doing times, not newtimes (where we drop the decimals), even though it should be after we drop the decimals. Any ideas? Thanks!
The problem is itertools.groupby works for consecutive similar items only. It requires a sorted input to work in the way you expect. You also don't need to create intermediary lists; instead, you can use sum with a generator expression:
freqlist = [sum(1 for _ in group) for key, group in groupby(sorted(newtimes))]
Sorting takes O(n log n) time. For an O(n) solution, you can use collections.Counter:
from collections import Counter
d = Counter(map(int, times))
Counter({1: 7, 2: 2, 4: 1, 5: 2, 3: 1, 8: 3})
Then, if you wish, extract values in a list after sorting by key:
keys, values = zip(*sorted(d.items()))
print(values)
(7, 2, 1, 1, 2, 3)
I have one list
a = [1.0, 2.0, 2.1, 3.0, 3.1, 4.2, 5.1, 7.2, 9.2]
I want to compare this list with other list but also I want to extract the information regarding the list content in numeric order.All other list have the elements that are same as a.
So I have tried this
a = [1.0, 2.0, 2.1, 3.0, 3.1, 4.2, 5.1, 7.2, 9.2]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print dict(zip(a,b))
a1=[2.1, 3.1, 4.2, 7.2]
I want to compare a1 with a and extract dict values [3, 5, 6, 8].
Just loop through a1 and see if there is a matching key in the dictionary you created:
mapping = dict(zip(a, b))
matches = [mapping[value] for value in a1 if value in mapping]
Demo:
>>> a = [1.0, 2.0, 2.1, 3.0, 3.1, 4.2, 5.1, 7.2, 9.2]
>>> b = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> a1 = [2.1, 3.1, 4.2, 7.2]
>>> mapping = dict(zip(a, b))
>>> [mapping[value] for value in a1 if value in mapping]
[3, 5, 6, 8]
However, take into account that you are using floating point numbers. You may not be able to match values exactly, since floating point numbers are binary approximations to decimal values; the value 2.999999999999999 (15 nines) for example, may be presented by the Python str() function as 3.0, but is not equal to 3.0:
>>> 2.999999999999999
2.999999999999999
>>> str(2.999999999999999)
'3.0'
>>> 2.999999999999999 == 3.0
False
>>> 2.999999999999999 in mapping
False
If your input lists a is sorted, you could use the math.isclose() function (or a backport of it), together with the bisect module to keep matching efficient:
import bisect
try:
from math import isclose
except ImportError:
def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
# simplified backport, doesn't handle NaN or infinity.
if a == b: return True
return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
result = []
for value in a1:
index = bisect.bisect(a, value)
if index and isclose(a[index - 1], value):
result.append(b[index - 1])
elif index < len(a) and isclose(a[index], value):
result.append(b[index])
This tests up to two values from a per input value; one that is guaranteed to equal or lower (at index - 1) and the next, higher value. For your sample a, the value 2.999999999999999 is bisected to index 3, between 2.1 and 3.0. Since isclose(3.0, 2.999999999999999) is true, that would still let you map that value to 4 in b.