I have a list (a), which contains references to another list (b). I am trying to create a third list (c), which contains the summed up values of (b) referenced in the according index of (a). Below is an example to hopefully make it clearer.
I have a rather large data set and this needs to be done frequently as part of an optimization process. Is there a way besides nested for-loops to do this efficiently and automated, without having to define every entry of c?
a = [[0],[0,3],[1,2],[3],[1,2,3]]
b = [10,20,30,40]
c = [b[0], b[0]+b[3], b[1]+b[2], b[3], b[1]+b[2]+b[3]]
Thanks in advance for any help, and sorry for potential mistakes in the post. It's my first and I'm trying to learn.
Not sure if you need it to be pure python only, but if not - you can use numpy library:
>>> import numpy as np
>>> c = np.array(b)
>>> [sum(c[i]) for i in a]
[10, 50, 50, 40, 90]
You can do this in a list comprehension: It takes the indices from each inner list in a, builds a list of the b values corresponding, sums them up, and stores them in a list that is assigned to d.
a = [[0], [0, 3], [1, 2], [3], [1, 2, 3]]
b = [10, 20, 30, 40]
d = [sum(b[idx] for idx in indices) for indices in a]
print(d)
output:
[10, 50, 50, 40, 90]
Related
To understand what I'm trying to do, first look at the code I've included below:
a = []
b = []
c = []
x = [a, b, c]
x[0] = [5, 10, 15]
x[1] = [30, 60, 90]
x[2] = [100, 200, 300]
Now, what I wish I could do is find a technique such that those last three lines would update the elements I had initially stored in x -- In other words, I wish that assigning a value to x[0] would directly update a, and assigning a value to x[1] would update b, and assigning a value to x[2] would update c. If this was the case, we'd be able to see the following outputs:
print(a)
>> [5, 10, 15]
print(b)
>> [30, 60, 90]
print(c)
>> [100, 200, 300]
I know this won't happen because assigning a value to x[i] just replaces the ith element in x, rather than "updating" the value of the ith element. And frankly, that makes much more sense and is obviously the way it should be. But, is there an alternate way to "update" variables in a list/array like this?
Here's a more detailed example of why I'm looking to do this:
dataset = 'directory\data.csv'
writer = []
director = []
runtime = []
language = []
country = []
genre = []
budget = []
year = []
features = [writer, director, runtime, language, country, genre, budget, year]
cnt = 0
for cnt in range(0,len(features)):
with open(dataset,'r',encoding="latin1") as f:
reader = csv.reader(f,delimiter=",",quotechar='"')
for row in reader:
features[cnt].append(row[cnt])
features[cnt] = np.array(features[cnt]) # <<- trying to convert each element from list to numpy array
In other words, I'm trying to read data from a .csv file and copy each column to one of several different lists, which I initialized between dataset and features. Then, I need to convert these lists to numpy arrays so that I can use the np.where() function, since .index() won't do what I need for this task.
Ideally, after the most recent snippet of code, I'd be able to see all the elements in features had been updated. For example, the following outputs would ideally be what we see after running the above code:
print(country)
>> ['USA', 'USA', 'UK', 'Germany', 'USA', 'Italy', ... , 'Germany']
print(language)
>> ['English', 'English', 'English', 'German', 'English', 'Italian', ... ,'German']
# etc. etc.
Problem is, in my actual code, I have many more lists (ie. many more .csv columns) I need to copy, so that's why I put them all in one features list/array. I was hoping to be able to iteratively copy a different column into each list, and then convert each list to a numpy array, all in one compact loop. This would eliminate the need for a separate loop for each column/list I want to copy. But unfortunately, I can't figure out how to "update" each features element rather than write something entirely different in its place.
Is there a way to accomplish this? Apologies for poor syntax/efficiency anywhere, I'm not very experienced with Python.
You can do what you want. a, b, and c are mutable lists, and x stores references to the same lists referenced by a, b and c. Don't reassign elements of x, mutate them:
a = []
b = []
c = []
x = [a, b, c]
print(f'{x=}')
print(f'{a=}')
print(f'{b=}')
print(f'{c=}')
x[0][:] = [5, 10, 15] # replace the entire content of the list at x[0]...
x[1][:] = [30, 60, 90]
x[2][:] = [100, 200, 300]
print(f'{x=}')
print(f'{a=}')
print(f'{b=}')
print(f'{c=}')
x=[[], [], []]
a=[]
b=[]
c=[]
x=[[5, 10, 15], [30, 60, 90], [100, 200, 300]]
a=[5, 10, 15]
b=[30, 60, 90]
c=[100, 200, 300]
x stores references to the lists in a, b, and c. x[0] is a is True, so if you mutate x[0] then a reflects the change. Same for the rest.
You could also x[0].append(item) or x[0].extend(list_of_items) to grow the a list.
But if you reassign x[0], then x[0] no longer references a. x[0] = [1,2,3] stores the reference to a new list in that location and no longer references the object a references.
I have a list of lists, where each inner list represents a row in a spreadsheet. With my current data structure, how can I perform an operation on each element on an inner list with the same index ( which amounts to basically performing operations down a column in a spreadsheet.)
Here is an example of what I am looking for (in terms of addition)
>>> lisolis = [[1,2,3], [4,5,6], [7,8,9]]
>>> sumindex = [1+4+7, 2+5+8, 3+6+9]
>>> sumindex = [12, 15, 18]
This problem can probably be solved with slicing, but I'm unable to see how to do that cleanly. Is there a nifty tool/library out there that can accomplish this for me?
Just use zip:
sumindex = [sum(elts) for elts in zip(*lisolis)]
#tzaman has a good solution for lists, but since you have also put numpy in the tags, there's an even simpler solution if you have a numpy 2D array:
>>> inport numpy
>>> a = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> a.sum(axis=0)
array([12, 15, 18])
This should be faster if you have large arrays.
>>> sumindex = numpy.array(lisolis).sum(axis=0).tolist()
>>import pandas as pd
>>df = pd.DataFrame([[1,2,3], [4,5,6], [7,8,9]], columns=['A','B','C'])
>>df.sum()
A 12
B 15
C 18
The list(), map(), zip(), and sum() functions make short work of this problem:
>>> list(map(sum, zip(*lisolis)))
[12, 15, 18]
First I'll fill out the lists of what we want to add up:
>>> lisolis = [[1,2,3], [4,5,6], [7,8,9]]
Then create an empty list for my sums:
>>> sumindex = []
Now I'll start a loop inside a loop. I'm going to add the numbers that are in same positions of each little list (that's the "y" loop) and I'm going to do that for each position (that's the "x" loop).
>>> for x in range(len(lisolis[0])):
z = 0
for y in range(len(lisolis)):
z += lisolis[y][x]
sumindex.append(z)
range(len(lisolis[0])) gives me the length of the little lists so I know how many positions there are to add up, and range(len(lisolis)) gives me the amount of little lists so I know how many numbers need to be added up for any particular position in the list.
"z" is a placecard holder. Each number in a particular list position is going to be added to "z" until they're summed up. After that, I'll put the value of "z" into the next slot in sumindex.
To see the results, I would then type:
>>>sumindex
which would give me:
[12, 15, 18]
I have a list which consists out of two numpy arrays, the first one telling the index of a value and the second containing the belonging value itself. It looks a little like this:
x_glob = [[0, 2], [85, 30]]
A function is now receiving the following input:
x = [-10, 0, 77, 54]
My goal is to swap the values of x with the values from x_glob based on the given index array from x_glob. This example should result in something like this:
x_new = [85, 0, 30, 54]
I do have a solution using a loop. But I am pretty sure there is a way in python to solve this issue more efficient and elegant.
Thank you!
NumPy arrays may be indexed with other arrays, which makes this replacement trivial.
All you need to do is index your second array with x_glob[0], and then assign x_glob[1]
x[x_glob[0]] = x_glob[1]
To see how this works, just look at the result of the indexing:
>>> x[x_glob[0]]
array([-10, 77])
The result is an array containing the two values that we need to replace, which we then replace with another numpy array, x_glob[1], to achieve the desired result.
>>> x_glob = np.array([[0, 2], [85, 30]])
>>> x = np.array([-10, 0, 77, 54])
>>> x[x_glob[0]] = x_glob[1]
>>> x
array([85, 0, 30, 54])
For a non-numpy solution, you could create a dict mapping the indices from x_glob to the respective values and then use a list comprehension with that dict's get method:
>>> x_glob = [[0, 2], [85, 30]]
>>> x = [-10, 0, 77, 54]
>>> d = dict(zip(*x_glob))
>>> [d.get(i, n) for i, n in enumerate(x)]
[85, 0, 30, 54]
Or using map with multiple parameter lists (or without zip using itertools.starmap):
>>> list(map(d.get, *zip(*enumerate(x))))
[85, 0, 30, 54]
My solution also uses for loop, but it's pretty short and elegant (I think), works in place and is effective as it does not have to iterate through full x array, just through list of globs:
for k,v in zip(*x_glob):
x[k] = v
I have a list of lists, where each inner list represents a row in a spreadsheet. With my current data structure, how can I perform an operation on each element on an inner list with the same index ( which amounts to basically performing operations down a column in a spreadsheet.)
Here is an example of what I am looking for (in terms of addition)
>>> lisolis = [[1,2,3], [4,5,6], [7,8,9]]
>>> sumindex = [1+4+7, 2+5+8, 3+6+9]
>>> sumindex = [12, 15, 18]
This problem can probably be solved with slicing, but I'm unable to see how to do that cleanly. Is there a nifty tool/library out there that can accomplish this for me?
Just use zip:
sumindex = [sum(elts) for elts in zip(*lisolis)]
#tzaman has a good solution for lists, but since you have also put numpy in the tags, there's an even simpler solution if you have a numpy 2D array:
>>> inport numpy
>>> a = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> a.sum(axis=0)
array([12, 15, 18])
This should be faster if you have large arrays.
>>> sumindex = numpy.array(lisolis).sum(axis=0).tolist()
>>import pandas as pd
>>df = pd.DataFrame([[1,2,3], [4,5,6], [7,8,9]], columns=['A','B','C'])
>>df.sum()
A 12
B 15
C 18
The list(), map(), zip(), and sum() functions make short work of this problem:
>>> list(map(sum, zip(*lisolis)))
[12, 15, 18]
First I'll fill out the lists of what we want to add up:
>>> lisolis = [[1,2,3], [4,5,6], [7,8,9]]
Then create an empty list for my sums:
>>> sumindex = []
Now I'll start a loop inside a loop. I'm going to add the numbers that are in same positions of each little list (that's the "y" loop) and I'm going to do that for each position (that's the "x" loop).
>>> for x in range(len(lisolis[0])):
z = 0
for y in range(len(lisolis)):
z += lisolis[y][x]
sumindex.append(z)
range(len(lisolis[0])) gives me the length of the little lists so I know how many positions there are to add up, and range(len(lisolis)) gives me the amount of little lists so I know how many numbers need to be added up for any particular position in the list.
"z" is a placecard holder. Each number in a particular list position is going to be added to "z" until they're summed up. After that, I'll put the value of "z" into the next slot in sumindex.
To see the results, I would then type:
>>>sumindex
which would give me:
[12, 15, 18]
As far as I can tell, this is not officially not possible, but is there a "trick" to access arbitrary non-sequential elements of a list by slicing?
For example:
>>> L = range(0,101,10)
>>> L
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
Now I want to be able to do
a,b = L[2,5]
so that a == 20 and b == 50
One way besides two statements would be something silly like:
a,b = L[2:6:3][:2]
But that doesn't scale at all to irregular intervals.
Maybe with list comprehension using the indices I want?
[L[x] for x in [2,5]]
I would love to know what is recommended for this common problem.
Probably the closest to what you are looking for is itemgetter (or look here for Python 2 docs):
>>> L = list(range(0, 101, 10)) # works in Python 2 or 3
>>> L
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
>>> from operator import itemgetter
>>> itemgetter(2, 5)(L)
(20, 50)
If you can use numpy, you can do just that:
>>> import numpy
>>> the_list = numpy.array(range(0,101,10))
>>> the_indices = [2,5,7]
>>> the_subset = the_list[the_indices]
>>> print the_subset, type(the_subset)
[20 50 70] <type 'numpy.ndarray'>
>>> print list(the_subset)
[20, 50, 70]
numpy.array is very similar to list, just that it supports more operation, such as mathematical operations and also arbitrary index selection like we see here.
Just for completeness, the method from the original question is pretty simple. You would want to wrap it in a function if L is a function itself, or assign the function result to a variable beforehand, so it doesn't get called repeatedly:
[L[x] for x in [2,5]]
Of course it would also work for a string...
["ABCDEF"[x] for x in [2,0,1]]
['C', 'A', 'B']
Something like this?
def select(lst, *indices):
return (lst[i] for i in indices)
Usage:
>>> def select(lst, *indices):
... return (lst[i] for i in indices)
...
>>> L = range(0,101,10)
>>> a, b = select(L, 2, 5)
>>> a, b
(20, 50)
The way the function works is by returning a generator object which can be iterated over similarly to any kind of Python sequence.
As #justhalf noted in the comments, your call syntax can be changed by the way you define the function parameters.
def select(lst, indices):
return (lst[i] for i in indices)
And then you could call the function with:
select(L, [2, 5])
or any list of your choice.
Update: I now recommend using operator.itemgetter instead unless you really need the lazy evaluation feature of generators. See John Y's answer.
None of the other answers will work for multidimensional object slicing. IMHO this is the most general solution (uses numpy):
numpy.ix_ allows you to select arbitrary indices in all dimensions of an array simultaneously.
e.g.:
>>> a = np.arange(10).reshape(2, 5) # create an array
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ixgrid = np.ix_([0, 1], [2, 4]) # create the slice-like grid
>>> ixgrid
(array([[0],
[1]]), array([[2, 4]]))
>>> a[ixgrid] # use the grid to slice a
array([[2, 4],
[7, 9]])