How to Expand or "scale up" an 1d array? - python

I have a piece of C code that can only handle an array of size 20. The array that my instrument outputs is much smaller than what the function requires. Is there a numpy or math function that can "scale up" an array to any specific size while maintaining its structural integrity? For example:
I have a 8 element array that is basically a two ramp "sawtooth" meaning its values are :
[1 2 3 4 1 2 3 4]
What I need for the C code is a 20 element array. So I can scale it, by padding linear intervals of the original array with "0"s , like:
[1,0,0,2,0,0,3,0,0,4,0,0,1,0,0,2,0,0,3,0]
so it adds up to 20 elements. I would think this process is the opposite of "decimation". (I apologize ,I'm simplifying this process so it will be a bit more understandable)

Based on your example, I guess the following approach could be tweaked to do what you want:
upsample with 0s: upsampled_l = [[i, 0, 0] for i in l] with l being your initial list
Flatten the array flat_l = flatten(upsampled_l) using a method from
How to make a flat list out of a list of lists? for instance
Get the expected length final_l = flat_l[:20]
For instance, the following code gives the output you gave in your example:
l = [1, 2, 3, 4, 1, 2, 3, 4]
upsampled_l = [[i, 0, 0] for i in l]
flat_l = [item for sublist in upsampled_l for item in sublist]
final_l = flat_l[:20]
However, the final element of the initial list (the second 4) is missing from the final list. Perhaps it's worth upsampling with only one 0 in between ([i, 0] instead of [i, 0, 0]) and finally do final_l.extend([0 for _ in range(20 - len(final_l))]).
Hope this helps!

You can manage it in a one-liner by adding zeros as another axis, then flattening:
sm = np.array([1, 2, 3, 4, 1, 2, 3, 4])
np.concatenate([np.reshape(sm, (8, 1)), np.zeros((8, 3))], axis=1).flatten()

Related

Regarding direct multiplication of scalar with list

The snippet was supposed to store/print instantaneous voltage/current values generated by the sin and the arrange functions in numpy. I first had tolist() after those functions, but the multiplication of the magnitude (230 in the case of voltage, 5 in the case of current) had no effect on the result unless I removed the tolist(). Why does that occur?
V_magnitude = 230
I_magnitude = 5
voltage = V_magnitude*np.sin(np.arange(0,10,0.01)).tolist()
current = I_magnitude*np.sin(np.arange(-0.3,9.7,0.01))
What I've tried
-> making both the magnitudes as the second operand for multiplication
-> with and without tolist()
When you multiply a list by X you extend the list to contain X times the values in it.
When you multiply numpy array by X you multiply the values inside the array by X.
Try it with a simple example
lst = [1, 2, 3]
print(lst * 3) # [1, 2, 3, 1, 2, 3, 1, 2, 3]
print(np.array(lst) * 3) # [3 6 9]

How to calculate two different numpy array's values then put the result in a third array

I have two numpy arrays that I need to calculate to get the needed behaviour for the third array.
To start, here is the first two arrays:
[[2 0 1 3 0 1]
[1 2 1 2 1 2] # ARRAY 1
[2 1 2 1 0 1]
[0 2 0 2 2 3]
[0 3 3 3 1 4]
[2 3 2 3 1 3]]
[[0.60961197 0.29067687 0.20701799 0.79897639 0.74822711 0.21928105]
[0.67683562 0.14261662 0.74655501 0.21529103 0.14347939 0.42190162]
[0.21116134 0.98618323 0.93882545 0.51422862 0.12715579 0.18808092] # ARRAY 2
[0.48570863 0.32068082 0.32335023 0.62634641 0.37418013 0.44860968]
[0.12498966 0.56458377 0.24902924 0.12992352 0.76903935 0.68230202]
[0.90349626 0.75727838 0.14188677 0.63082553 0.96360265 0.28694261]]
Where array1[0][0] will be used to subtract the the input value from array3[0][0], and then array2[0][0] will be used multiply the now subtracted value from array3[0][0] to give the new output of array3[1][0] (In other words, these calculations WILL get array3).
So for example, lets say the starting values of array3[0] are:
[[20,22,24,40,42,10],
....
For array3[0][0] (20), it needs to subtract 2 (coming from array1[0][0]), leaving the value with 18. The value 18 is then NOW multiplied by 0.60961197 (array2[0][0]) leaving a NEW VALUE of 10.97. 10.97 is now the NEW value of array3[1][0].
If you were to move onto the next column, the process would be the same. You would take 22-0 = 22, then take 22 * 0.29067687 to create the new value for array3[1][1].
To provide a visual example, the completed process of this array for the first two lines would look something like this:
[[20 22 24 40 42 10],
[10.97 19.65 7.44 10.58 7.03],
....
I am trying to get this process continuing for the entire length of the first array ( and I guess second because they are the same). So for the next set, you would take 10.97-1 * 0.6768... = 6.74.. and so on for each index until it reaches the end.
I'm quite stuck on what to do for this, I had tried a for loop but I feel like there may be a lot more a an efficient way of doing this in numpy.
I sincerely appreciate the help, I know this isn't easy (or maybe it will be!). This will kick start what will be a fairly lengthy project for me.
Thank you very much!
Note: If numpy arrays are not a good way to solve this problem and lets say lists are better, I am more than willing to go that route. I'm just assuming with most of numpy's functions this will be easier.
If I understood correctly you could do something like this:
import numpy as np
np.random.seed(42)
arr1 = np.array([[2, 0, 1, 3, 0, 1],
[1, 2, 1, 2, 1, 2],
[2, 1, 2, 1, 0, 1],
[0, 2, 0, 2, 2, 3],
[0, 3, 3, 3, 1, 4],
[2, 3, 2, 3, 1, 3]])
arr2 = np.array([[0.60961197, 0.29067687, 0.20701799, 0.79897639, 0.74822711, 0.21928105],
[0.67683562, 0.14261662, 0.74655501, 0.21529103, 0.14347939, 0.42190162],
[0.21116134, 0.98618323, 0.93882545, 0.51422862, 0.12715579, 0.18808092],
[0.48570863, 0.32068082, 0.32335023, 0.62634641, 0.37418013, 0.44860968],
[0.12498966, 0.56458377, 0.24902924, 0.12992352, 0.76903935, 0.68230202],
[0.90349626, 0.75727838, 0.14188677, 0.63082553, 0.96360265, 0.28694261]])
arr3 = np.random.randint(5, 30, size=(6, 6))
result = (arr3 - arr1) * arr2
print(result)
Output
[[ 5.48650773 6.97624488 3.72632382 9.58771668 8.97872532 5.2627452 ]
[ 6.7683562 2.99494902 19.41043026 2.79878339 2.00871146 10.96944212]
[ 4.85671082 6.90328261 9.3882545 13.88417274 0.89009053 4.702023 ]
[12.14271575 1.28272328 9.05380644 8.76884974 2.99344104 1.34582904]
[ 3.1247415 1.12916754 3.23738012 2.98824096 11.53559025 17.0575505 ]
[17.16642894 8.33006218 2.55396186 10.09320848 17.3448477 5.7388522 ]]
If applied to the data from your example, you get:
arr3 = np.array([20, 22, 24, 40, 42, 10])
result = (arr3 - arr1[0]) * arr2[0]
print(result)
Output
[10.97301546 6.39489114 4.76141377 29.56212643 31.42553862 1.97352945]
Note that in the second example I just use the first row from arr2 and arr3.
Just expanding my comment to a full answer. The question is talking about two kinds of "repeat":
Doable-in-parallel ones (column-wise, broadcast-able)
Not-doable-in-parallel ones (row-wise, iterative)
numpy handles broadcast (i.e. column direction) nicely, so just use a for loop in row direction:
for i in range(len(array1)):
array3[i+1] = (array3[i] - array1[i]) * array2[i]
Do notice array3 should be longer than array1 or array2 otherwise that don't make sense.
EDIT
Oops I didn't see you want to avoid for loop. Technically you can do this problem without a for loop, but you need to mess up with linear algebra yourself:
If we name array1 as a, array2 as b, and the first row of array3 as c for convenience. The rows of array3 would be:
c
(c-a0)*b0 = c*b0-a0*b0
((c-a0)*b0-a1)*b1 = c*b0*b1-a0*b0*b1-a1*b1
...
The final line of array3 can then be computed as
B = b[::-1].cumprod(0)[::-1]
final_c = c * B[0] - (B * a).sum(0)
If you want the whole array3, that's not really trivial to do without for loop. You might be able to write it but that's both painful to read and painful to write. The performance is also questionable

Create array containing list of lists of n repeated items in Python

I try to find a faster way to create such a list:
import numpy as np
values = [0,1,2]
repeat = [3,4,2]
list = np.empty(0, dtype=int)
for i in range(len(values)):
list = np.append(list, np.full(repeat[i], values[i]))
print list
returns
[0 0 0 1 1 1 1 2 2]
Any idea? Thanks
You can save a lot of time using native python lists instead of numpy arrays. When I ran your code using the timeit module, it took 16.87 seconds. The following code took 0.87.
list = []
for val, rep in zip(values, repeat):
list.extend([val]*rep)
If you then convert list to a numpy array using list = np.array(list), that time goes up to 2.09 seconds.
Of course, because numpy is optimized for large amounts of data, this may not hold for very long lists of values with large numbers of repeats. In this case, one alternative would be to do you memory allocation all at the same time, instead of continually lengthening the array (which I believe covertly causes a copy to made, which is slow). The example below completes in 4.44 seconds.
list = np.empty(sum(repeat), dtype=int) #allocate the full length
i=0 #start the index at 0
for val, rep in zip (values, repeat):
list[i:i+rep] = [val]*rep #replace the slice
i+=rep #update the index
You can try this. Multiply lists of values by lengths for each pair of values and lengths.
You will get list of lists
L = [[i]*j for i, j in zip(values, repeat)]
print(L)
returns
[[0, 0, 0], [1, 1, 1, 1], [2, 2]]
Than make a flat list
flat_L = [item for sublist in L for item in sublist]
print(flat_L)
[0, 0, 0, 1, 1, 1, 1, 2, 2]
I would do like this:
a=[1,2,3]
b=[2,4,3]
x=[[y]*cnt_b for cnt_b,y in zip(b,a)]
Output:
[[1,1],[2,2,2,2],[3,3,3]]
In [8]: [i for i, j in zip(values, repeat) for _ in range(j)]
Out[8]: [0, 0, 0, 1, 1, 1, 1, 2, 2]
Here, we are zipping values and repeat together with zip to have one to one correspondence between them (like [(0, 3), (1, 4), (2, 2)]). Now, in the list comprehension I'm inserting i or values and looping them over range of j to repeat it jth times.

Splitting a array in python

How do you split an array in python in terms of the number of elements in the array. Im doing knn classification and I need to take into account of the first k elements of the 2D array.
import numpy as np
x = np.array([1, 2, 4, 4, 6, 7])
print(x[range(0, 4)])
You can also split it up by taking the range of elements that you want to work with. You could store x[range(x, x)]) in a variable and work with those particular elements of the array as well. The output as you can see splits the array up:
[1 2 4 4]
In Numpy, there is a method numpy.split.
x = np.arange(9.0)
np.split(x, 3)

Is there any easy way to sparsely store a matrix with a redundant pattern in python?

The type of matrix I am dealing with was created from a vector as shown below:
Start with a 1-d vector V of length L.
To create a matrix A from V with N rows, make the i'th column of A the first N entries of V, starting from the i'th entry of V, so long as there are enough entries left in V to fill up the column. This means A has L - N + 1 columns.
Here is an example:
V = [0, 1, 2, 3, 4, 5]
N = 3
A =
[0 1 2 3
1 2 3 4
2 3 4 5]
Representing the matrix this way requires more memory than my machine has. Is there any reasonable way of storing this matrix sparsely? I am currently storing N * (L - N + 1) values, when I only need to store L values.
You can take a view of your original vector as follows:
>>> import numpy as np
>>> from numpy.lib.stride_tricks import as_strided
>>>
>>> v = np.array([0, 1, 2, 3, 4, 5])
>>> n = 3
>>>
>>> a = as_strided(v, shape=(n, len(v)-n+1), strides=v.strides*2)
>>> a
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5]])
This is a view, not a copy of your original data, e.g.
>>> v[3] = 0
>>> v
array([0, 1, 2, 0, 4, 5])
>>> a
array([[0, 1, 2, 0],
[1, 2, 0, 4],
[2, 0, 4, 5]])
But you have to be careful no to do any operation on a that triggers a copy, since that would send your memory use through the ceiling.
If you're already using numpy, use its strided or sparse arrays, as Jaime explained.
If you're not already using numpy, you may to strongly consider using it.
If you need to stick with pure Python, there are three obvious ways to do this, depending on your use case.
For strided or sparse-but-clustered arrays, you could do effectively the same thing as numpy.
Or you could use a simple run-length-encoding scheme, plus maybe a higher-level list of runs for, or list of pointers to every Nth element, or even a whole stack of such lists (one for every 100 elements, one for every 10000, etc.).
But for mostly-uniformly-dense arrays, the easiest thing is to simply store a dict or defaultdict mapping indices to values. Random-access lookups or updates are still O(1)—albeit with a higher constant factor—and the storage you waste storing (in effect) a hash, key, and value instead of just a value for each non-default element is more than made up for by not storing values for the default elements, as long as you're less than 0.33 density.

Categories