Construct a dictionary merging multiple lists - python

I have a list of objects (clusters) and each object has an attribute vertices which is a list of numbers. I want to construct a dictionary (using a one liner) such that the key is a vertex number and the value is the index of the corresponding cluster in the actual list.
Ex:
clusters[0].vertices = [1,2]
clusters[1].vertices = [3,4]
Expected Output:
{1:0,2:0,3:1,4:1}
I came up with the following:
dict(reduce(lambda x,y:x.extend(y) or x, [
dict(zip(vertices, [index]*len(vertices))).items()
for index,vertices in enumerate([i.vertices for i in clusters])]))
It works... but is there a better way of doing this?
Also comment on the efficiency of the above piece of code.
PS: The vertex lists are disjoint.

This is a fairly simple solution, using a nested for:
dict((vert, i) for (i, cl) in enumerate(clusters) for vert in cl.vertices)
This is also more efficient than the version in the question, since it doesn't build lots of intermediate lists while collecting the data for the dict.

Related

How to structure nested for loops when trying to use a function on unique combinations of several lists in dataframe?

I have a dataframe that is made of several lists. I'm trying to make a nested for loop that will execute a function for each different combination of values in a few of the lists... none of those list values are needed for the function. I'm just trying to calculate a variable for each combination of the following lists in a df:
list1 = [sun, sun sun, shade]
list2 = [0, 1, 0, 1]
list3 = [PA, GA, CA, FL]
new_list=[]
for p in range(len(df['list1'])):
for q in range(len(df['list2'])):
for t in range(len(df['list3'])):
x = UnivariateSpline(params)
new_list.append(x)
I've provided dummy code above, but the function I'm trying to run is UnivariateSpline from scipy. So I want to generate interpolated data using that function that I then append to a new list. And I want to do this for each combination of values from lists 1-2. I.e., use the function if list1=sun and list2=0, and list3=PA... if list1=sun and list2=1, and list3=PA... etc.
I know how to do it by using the df.loc to subset the data in all combination types and then use the function. But I'm sure there is a more streamlined way to do it than that.
Can someone help me understand how I need to have this formatted? I'm a very novice coder (as you can likely tell). Thanks in advance!

Is there a better way to implement arrays in python

So here is my approach:
def transpose(m):
output = [["null" for i in range(len(m))] for j in range(len(m[0]))]
for i in range(len(m[0])):
for j in range(len(m)):
if i==j:
output[i][j]=m[i][j]
else:
output[i][j]=m[j][i]
return(output)
the above method creates an array/list as a placeholder so that new values can be added. I tried this approach because I am new to python and was previously learning Java which had built-in arrays but python doesn't and I found there was no easy way of indexing 2D lists similar to what we do in java unless we predefine list (like in java but had to use some for loops). I know there are packages which implement arrays but I am fairly new to the language so I tried simulating it the way I was familiar.
So my main question is that is there a better approach to predefine lists for a restricted kinda size (like arrays in java) without these funky for loops. OR even a better way to have predefined list which I can then easily index without needing to append list inside list and all those stuff. Its really difficult for me because it doesn't behave like I want.
Also I made a helper method for prebuilding lists like this:
def arraybuilder(r,c,jagged=[]): #builds an empty placeholder 2D array/list of required size
output=[]
if not jagged:
output = [["null" for i in range(c)] for j in range(r)]
return(output)
else:
noOfColumns=[]
for i in range(len(jagged)):
noOfColumns.append(len(jagged[i]))
for i in range(len(jagged)):
row=[]
for j in range(noOfColumns[i]):
row.append("null")
output.append(row)
return(output,noOfColumns)#returns noOfColumns as well for iteration purposes
The typical transposition pattern for 2d iterables is zip(*...):
def transpose(m):
return [*map(list, zip(*m))]
# same as:
# return [list(col) for col in zip(*m))]
zip(*m) unpacks the nested lists and zips (interleaves) them into column tuples. Since zip returns a lazy iterator over tuples, we consume it into a list while converting all the tuples into lists as well.
And if you want to be more explicit, there are more concise ways of creating a nested list. Here is a nested comprehension:
def transpose(m):
return [[row[c] for row in m] for c in range(len(m[0]))]

Efficient data structure for storing N lists where N is very large

I will need to store N lists, where N is large (1 million). For example,
[2,3]
[4,5,6]
...
[4,5,6,7]
Each item is a list of about 0-10000 elements. I wanted to use a numpy array of lists, like
np.array([[2,3],[4,5,6])
Then I got efficiency issues when trying to append to the lists in the numpy array. Also I was told here: Efficiently append an element to each of the lists in a large numpy array, to not use numpy array of lists.
What would be a good data structure for storing such data, in terms of memory and time efficiency?
Maybe use a dictionary:
d={}
for i in range(N):
d[i]=your_nth_list
And you will simply append them by:
d[k].append(additional_items)
(It's efficient for 10.000.000 lists of 1000 items each)
Unless the elements youre storing follow some pattern you must use nested list since there is no other way to get those elements out of the others.
In Python:
listOfLists = [[1,2,3],
[4,5,6],
[7,8,9]]
So whenever you want to operate with this list you can use numpy functions
>>> np.mean(listOfLists)
5.0
>>> np.max(listOfLists)
9
try nested list
nestedList = [[2,3],[4,5,6]]
You could use nested lists but they are not efficent in terms of complexity. In fact, it is linear, you could use dictionaries to get better results :
dict={}
for i in range(numer_of_lists) :
dict[str(i)]=your_i-th_list
Then access the i-th element withdict[str(i)] Then, appening an element will be as easy
`

Put ordered data back into a dictionary

I have a (normal, unordered) dictionary that is holding my data and I extract some of the data into a numpy array to do some linear algebra. Once that's done I want to put the resulting ordered numpy vector data back into the dictionary with all of data. What's the best, most Pythonic, way to do this?
Joe Kington suggests in his answer to "Writing to numpy array from dictionary" that two solutions include:
Using Ordered Dictionaries
Storing the sorting order in another data structure, such as a dictionary
Here are some (possibly useful) details:
My data is in nested dictionaries. The outer is for groups: {groupKey: groupDict} and group keys start at 0 and count up in order to the total number of groups. groupDict contains information about items: (itemKey: itemDict}. itemDict has keys for the actual data and these keys typically start at 0, but can skip numbers as not all "item locations" are populated. itemDict keys include things like 'name', 'description', 'x', 'y', ...
Getting to the data is easy, dictionaries are great:
data[groupKey][itemKey]['x'] = 0.12
Then I put data such as x and y into a numpy vectors and arrays, something like this:
xVector = numpy.empty( xLength )
vectorIndex = 0
for groupKey, groupDict in dataDict.items()
for itemKey, itemDict in groupDict.items()
xVector[vectorIndex] = itemDict['x']
vectorIndex += 1
Then I go off and do my linear algebra and calculate a z vector that I want to add back into dataDict. The issue is that dataDict is unordered, so I don't have any way of getting the proper index.
The Ordered Dict method would allow me to know the order and then index through the dataDict structure and put the data back in.
Alternatively, I could create another dictionary while inside the inner for loop above that stores the relationship between vectorIndex, groupKey and itemKey:
sortingDict[vectorIndex]['groupKey'] = groupKey
sortingDict[vectorIndex]['itemKey'] = itemKey
Later, when it's time to put the data back, I could just loop through the vectors and add the data:
vectorIndex = 0
for z in numpy.nditer(zVector):
dataDict[sortingDict[vectorIndex]['groupKey']][sortingDict[vectorIndex]['itemKey']]['z'] = z
Both methods seem equally straight forward to me. I'm not sure if changing dataDict to an ordered dictionary will have any other effects elsewhere in my code, but probably not. Adding the sorting dictionary also seems pretty easy as it will get created at the same time as the numpy arrays and vectors. Left on my own I think I would go with the sortingDict method.
Is one of these methods better than the others? Is there a better way I'm not thinking of? My data structure works well for me, but if there's a way to change that to improve everything else I'm open to it.
I ended up going with option #2 and it works quite well.

Python: Finding corresponding indices for an intersection of two lists

This is somewhat related to a question I asked not too long ago today. I am taking the intersection of two lists as follows:
inter = set(NNSRCfile['datetimenew']).intersection(catdate)
The two components that I am taking the intersection of belong to two lengthy lists. Is it possible to get the indices of the intersected values? (The indices of the original lists that is).
I'm not quite sure where to start with this one.
Any help is greatly appreciated!
I would create a dictionary to hold the original indices:
ind_dict = dict((k,i) for i,k in enumerate(NNSRCfile['datetimenew']))
Now, build your sets as before:
inter = set(ind_dict).intersection(catdate)
Now, to get a list of indices:
indices = [ ind_dict[x] for x in inter ]

Categories