I want to analyze some data (one x-value, several y-values). Unfortunately not every x-value has all y-values filled, some values are empty. I want to put all values into lists, so that I have a x-value-list ([1, 2, 3, 4]) and an y-value-list ([[1, 2], [1, 4], [5, 2]]). But if I want to add an element into the list, it has to be a number (after my lists are float lists). Later I want to use these lists to plot the data. Thus I have the problem that I have to add a value to the list while parsing the data, but later I have to omit these values again for plotting, otherwise I get wrong results. My first idea was to simply add an empty space in the list, so that the plotting program skips this value. But that is not allowed to do in python.
What is the best way to circumvent my problem?
Create a DataPoint class that can hold x and y values and put them in a single list. Then you can set the y value to none (or an empty list) for the points that have missing values.
This also ensures you have a valid set of points at all times. You could enter an empty list for when there is no y value but you still run the risk of the x and y lists being out of sync.
Related
I have a numpy array total_weights which is an IxI array of floats. Each row/columns corresponds to one of I items.
During my main loop I acquire another real float array weights of size NxM (N, M < I) where each/column row also corresponds to one of the original I items (duplicates may also exist).
I want to add this array to total_weights. However, the sizes and order of the two arrays are not aligned. Therefore, I maintain a position map, a pandas Series with an index of item IDs to their proper index/position in total_weights, called pos_df.
In order to properly make the addition I want I perform the following operation inside the loop:
candidate_pos = pos_df.loc[candidate_IDs] # don't worry about how I get these
rated_pos = pos_df.loc[rated_IDs] # ^^
total_weights[candidate_pos, :][:, rated_pos] += weights
Unfortunately, the above operation must be editing a copy of the orignal total_weights matrix and not a view of it, since after the loop the total_weights array is still full of zeroes. How do I make it change the original data?
Edit:
I want to clarify that candidate_IDs are the N IDs of items and rated_IDs are the M IDs of items in the NxM array called weights. Through pos_df I can get their total order in all of I items.
Also, my guess as to the reason a copy is returned is that candidate_IDs and thus candidate_pos will probably contain duplicates e.g. [0, 1, 3, 1, ...]. So the same rows will sometimes have to be pulled into the new array/view.
Your first problem is in how you are using indexing. As candidate_pos is an array, total_weights[candidate_pos, :] is a fancy indexing operation that returns a new array. When you apply indexing again, i.e. ...[:, rated_pos] you are assigning elements to the newly created array rather than to total_weights.
The second problem, as you have already spotted, is in the actual logic you are trying to apply. If I understand your example correctly, you have a I x I matrix with weights, and you want to update weights for a sequence of pairs ((Ix_1, Iy_1), ..., (Ix_N, Iy_N)) with repetitions, with a single line of code. This can't be done in this way, using += operator, as you'll find yourself having added to weights[Ix_n, Iy_n] the weight corresponding to the last time (Ix_n, Iy_n) appears in your sequence: you have to first merge all the repeating elements in your sequence of weight updates, and then perform the update of your weights matrix with the new "unique" sequence of updates. Alternatively, you must collect your weights as an I x I matrix, and directly sum it to total_weights.
After #rveronese pointed out that it's impossible to do it one go because of the duplicates in candidate_pos I believe I have managed to do what I want with a for-loop on them:
candidate_pos = pos_df.loc[candidate_IDs] # don't worry about how I get these
rated_pos = pos_df.loc[rated_IDs] # ^^
for i, c in enumerate(candidate_pos):
total_weights[c, rated_pos] += weights[i, :]
In this case, the indexing does not create a copy and the assignment should be working as expected...
loss=[[ -137.70171527 -81408.95809899 -94508.84395371 -311.81198933 -294.08711874]]
When I print loss it prints the addition of the numbers and not the individual numbers. I want to change this to a list so I can iterate over each individual number bit and I don't know how, please help.
I have tried:
result = map(tuple,loss)
However it prints the addition of the inside. When I try to index it says there is only 1 element. It works if I put a comma in between but this is a matrix that is outputed from other codes so i can't change or add to it.
You have a list of a list with numbers the outer list contains exactly one element (namely the inner list). The inner list is your list of integers over which you want to iterate. Hence to iterate over the inner list you would first have to access it from the outer list using indices for example like this:
for list_item in loss[0]:
do_something_for_each_element(list_item)
Moreover, I think that you wanted to have separate elements in the inner list and not compute one single number, didn't you? If that is the case you have to separate each element using a ,.
E.g.:
loss=[[-137.70171527, -81408.95809899, -94508.84395371, -311.81198933, -294.08711874]]
EDIT:
As you clarified in the comments you want to iterate over a numpy matrix. One way to do so is by converting the matrix into an n dimensional array (ndarray) and the iterate that structure. This could for example look like this, other options have also been presented in this answer(Iterate over a numpy Matrix rows):
import numpy as np
test_matrix=np.matrix([[1, 2], [3, 4]])
for row in test_matrix.A:
print(row)
note that the A attribute of a matrix object is its ndarray representation (https://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html).
I'm not sure if 'hierarchical' is the correct way to label this problem, but I have a series of lists of integers that I'm intending to keep in 2D numpy array that I need to keep sorted in the following way:
array[0,:] = [1, 1, 1, 1, 2, 2, 2, 2, ...]
array[1,:] = [1, 1, 2, 2, 1, 1, 2, 2, ...]
array[2,:] = [1, 2, 1, 2, 1, 2, 1, 2, ...]
...
...
array[n,:] = [...]
So the first list is sorted, then the second list is broken into subsections of elements which all have the same value in the first list and those subsections are sorted, and so on down all the lists.
Initially each list will contain only one integer, and I'll then receive new columns that I need to insert into the array in such a way that it remains sorted as discussed above.
The purpose of keeping the lists in this order is that if I'm given a new column of integers I need to check whether an exact copy of that column exists in the array or not as efficiently as possible, and I assume this ordering will help me do it. It may be that there is a better way to make that check than keeping the lists like this - if you have thoughts about that please mention them!
I assume the correct position for a new column can be found by a series of binary searches but my attempts have been messy - any thoughts on doing this in a tidy and efficient way?
thanks!
If I understand your problem correctly, you have a bunch of sequences of numbers that you need to process, but you need to be able to tell if the latest one is a duplicate of one of the sequences you've processed before. Currently you're trying to insert the new sequences as columns in a numpy array, but that's awkward since numpy is really best with fixed-sized arrays (concatenating or inserting things is always going to be slow).
A much better data structure for your needs is a set. Membership tests and the addition of new items on a set are both very fast (amortized O(1) time complexity). The only limitation is that a set's items must be hashable (which is true for tuples, but not for lists or numpy arrays).
Here's the outline of some code you might be able to use:
seen = set()
for seq in sequences:
tup = tuple(sequence) # you only need to make a tuple if seq is not already hashable
if tup not in seen:
seen.add(tup)
# do whatever you want with seq here, it has not been seen before
else:
pass # if you want to do something with duplicated sequences, do it here
You can also look at the unique_everseen recipe in the itertools documentation, which does basically the same as the above, but as a well-optimized generator function.
I have the following task:
2 lists. 1st list -> item, 2nd list -> meta data (2 floats) for every item.
These lists get changed by various steps of an algorithm, but their length is kept equal. i.e. their length increases, but increases equally. This way the index is a way to identify which item the meta data refers to.
At one (repeated) step of the algorithm I am shortening the list identifying the same items. Respectively, I have to tune the meta data list.
I could implement that using generic lists, but at some point it overloads the memory. So, I tried using np.array, but the issue with them is that their dimensions should be equal for every element. i.e. arr=np.array([1,2, [3, [4]] ],dtype=object) returns arr.ndim=1. What I need though is for it to return arr.ndim=3. I played around with it and discovered that [3,[4]] is of type list and has nothing to do with np.array. With equal dimensions for every element of np.array, it returns elements along every axis of np types, say, np.int32 or np.array
Critical 3rd step. When I am going through the list and collecting meta data for the same items, I am putting them into the meta_list under the same index, i.e. creating (or expanding) a list of lists at that index. Example,
meta_list=[[1,2],[3,4],[5,6],[7,8]]
Then, say, 1st and 3rd elements of the item_list are the same. So I have to combine their meta data. It yields this:
meta_list=[[1,2],[[3,4],[7,8]],[5,6]]
But I cannot wrap my head around how to implement this step using np.array, profiting from its storage efficiency as that [[3,4],[7,8]] element will be of type list.
Would be very grateful for hints.
I am trying to implement some sort of local search. For this I have two arrays, which represent a list, that I am trying to optimize.
So I have one array as the current best array and the one I am currently analyzing.
At the beginning I am shuffling the array and set the current best array to the shuffled array.
random.shuffle(self.matchOrder)
self.bestMatchOrder = self.matchOrder
Then I am swapping a random neighboured pair in the array.
Now the problem I have is that when I am swapping the values in self.matchOrder, the values in self.bestMatchOrder get swapped.
a = self.matchOrder[index]
self.matchOrder[index] = self.matchOrder[index + 1]
self.matchOrder[index + 1] = a
"index" is given to the function as a parameter, it is just randomly generated number.
I guess I did something wrong assigning the variables, but I can't figure out what. So what can I do to only assign the value of the array to the other array and not make it apply the same changes to it, too?
When you use self.bestMatchOrder = self.matchOrder Then Python doesn't allocates a new memory location to self.bestMatchOrder instead both are pointing to the same memory location. And knowing the fact that the lists are mutable data type, Hence any changes made in the self.matchOrder would get reflected in the self.bestMatchOrder.
import copy
self.bestMatchOrder = copy.deepcopy(self.matchOrder)
However if you are using linear lists or simple lists the you can also use self.bestMatchOrder = self.matchOrder[:] But if yo are using nested lists then deepcopy() is the correct choice.
If you want to copy a list, you can use slice operation:
list_a = [1, 2, 3]
list_b = list_a[:]
list_a = [0, 0, 42]
print list_a # returns [0, 0, 42]
print list_b # returns [1, 2, 3], i.e. copied values
But if you have more complex structures, you should use deepcopy, as #anmol_uppal advised above.