I have a 1x10 dataframe where the name of each column is a string in a list.
I have a 1x10 row vector with values.
I would like to integrate this vector in the dataframe, so that I have the names in the list as column names, and the values of the vector in a single row.
How could I do that ? The only way I found was appending eveything into one column, and rename the index with the names in my list, but I want a 1x10 dataframe instead of a 10x1.
names = ['agmi', 'aglo', 'mwmi', 'mcha', 'mcmi', 'mclo', 'bkdr', 'mchi', 'melo', 'mwlo']
alt_GPS = np.array([[ 0. , 0. , 0.85253906, 6.12797546, 5.49960327,
11.00892639, 0. , 2.08251953, 0. , 2.4508667 ]])
alt = pd.DataFrame(columns=names)
Your alt_GPS array is already 2D, so isn't simply this what you want:
alt = pd.DataFrame(alt_GPS, columns=names)
Output:
agmi aglo mwmi mcha mcmi mclo bkdr mchi melo mwlo
0 0.0 0.0 0.852539 6.127975 5.499603 11.008926 0.0 2.08252 0.0 2.450867
If you want to create the dataframe first and add afterwards:
alt = pd.DataFrame(columns=names)
alt.loc[0] = alt_GPS[0]
Related
I want to make a code that goes through the lists within vals array one by one for each unique digit_vals value. The digit_vals value shows the nth number for the expected output, so since the first value in digit_vals is 24 then it means that all the numbers before it will be a filled with zeroes and the 24th number will contain value from vals. Since there are two 24s within digit_vals it means that the 2nd index within the first list of vals ([-3.3, -4.3, 23.05, 23.08, 23.88, 3.72]) will contain the 24th value in the Expected Output which is -4.3. The 4th index of the 2nd list within vals will contain the value for the 27th value in digit_vals and so on. The gaps between the digit_vals will be filled with zeroes as well in the results so between 24 and 27 there will be 2 zeroes for the 25th and 26th place respectively. How would I be able to code this function that allows me to achieve the Expected Output below?
import pandas as pd
import numpy as np
digit_vals = np.array([24, 24, 27, 27, 27, 27,
28, 28, 28, 31])
vals = np.array([list([-3.3, -4.3, 23.05, 23.08, 23.88, 3.72]),
list([2.3, 2.05, 3.08, -4.88, 4.72]),
list([5.3, 2.05, 6.08, -13.88, -17.2]),
list([9.05, 6.08, 3.88, -13.72])], dtype=object)
Expected Output:
array([ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
-4.3, , 0. , 0. , -4.88 ,
6.08, , 0 , 9.05])
First off, if I understand your question correctly, then your output array should be one element longer, with one more zero between to 6.08 value and the 9.05 value, because the 9.05 should be at index position 31 (the other values match their index position specified in digit_vals).
The hardest part of this question is transforming the information in the digits_vals array into two arrays that correctly index into each vals array list, and into the correct index position in the output array.
Because you're already using numpy, I think this is a reasonable approach
val_ind = []
out_ind = []
for ind, cnt in enumerate(np.bincount(digit_vals)):
if cnt > 0:
val_ind.append(cnt-1)
out_ind.append(ind)
Calculate the number of occurrences of each value in digits_vals and use that count (minus one for zero indexing) as the index into each list within the vals array. Each unique number in digits_vals is identified by capturing the index for each value with a nonzero count, assuming digits_vals will be ordered, as specified in the question example.
Once you have the index lists built, it is straightforward to build the output array:
out_arr = np.zeros(np.max(digit_vals)+1)
for r_ind, (v_ind, o_ind) in enumerate(zip(val_ind, out_ind)):
out_arr[o_ind] = vals[r_ind][v_ind]
Again, the enumeration provides the row index for extracting the correct row's data from the vals array. I've confirmed this reproduces the output array you provided, including the fix noted above. Hopefully I understood your question correctly, and made reasonable assumptions. If so, please update your question with a little more detail describing assumptions, etc.
Short version: I have a array and need to create a matrix but with names labels on top and side and export like example csv. (sorry if may wording incorrect)
Long version:
I made a recommendation system self taught and have a website ready after a year in quarantine learning and troubleshooting here on so usually a few day of searching I figure it out, but this got me stuck for about 3 weeks now.
The recommendation system system works in python I can put in a name and it spits of the recommended names i tweaked it and got it to acceptable results. But in the books, website and tutorial and udemy classes etc. Never learn how to take the python and make a Django site to get it to work.
This what the output is like currently is
# creating a Series for the name of the character so they are associated to an ordered numerical
# list I will use later to match the indexes
indices = pd.Series(df.index)
indices[:5]
# instantiating and generating the count matrix
count = CountVectorizer()
count_matrix = count.fit_transform(df['bag_of_words'])
# creating a Series for the name of the character so they are associated to an ordered numerical
# list I will use later to match the indexes
indices = pd.Series(df.index)
indices[:5]
0 ZZ Top
1 Zyan Malik
2 Zooey Deschanel
3 Ziggy Marley
4 ZHU
Name: name, dtype: object
# generating the cosine similarity matrix
cosine_sim = cosine_similarity(count_matrix, count_matrix)
cosine_sim
array([[1. , 0.11708208, 0.10192614, ..., 0. , 0. ,
0. ],
[0.11708208, 1. , 0.1682581 , ..., 0. , 0. ,
0. ],
[0.10192614, 0.1682581 , 1. , ..., 0. , 0. ,
0. ],
...,
[0. , 0. , 0. , ..., 1. , 1. ,
1. ],
[0. , 0. , 0. , ..., 1. , 1. ,
1. ],
[0. , 0. , 0. , ..., 1. , 1. ,
1. ]])
# I need to then export to csv which I understand
.to_csv('artist_similarities.csv')
Desired Exports
I am trying to have the array with the index name in what i think is called a matrix like this example.
scores ZZ Top Zyan Malik Zooey Deschanel ZHU
0 ZZ Top 0 65.61249881 24.04163056 24.06241883
1 Zyan Malik 65.61249881 0 89.35882721 69.6634768
2 Zooey Deschanel 24.04163056 89.40917179 0 20.09975124
3 ZHU 7.874007874 69.6634768 20.09975124 0
# function that takes in the character name as input and returns the top 10 recommended characters
def recommendations(name, cosine_sim = cosine_sim):
recommended_names = []
# getting the index of the movie that matches the title
idx = indices[indices == name].index[0]
# creating a Series with the similarity scores in descending order
score_series = pd.Series(cosine_sim[idx]).sort_values(ascending = False)
# getting the indexes of the 10 most characters
top_10_indexes = list(score_series.iloc[1:11].index)
# populating the list with the names of the best 10 matching characters
for i in top_10_indexes:
recommended_names.append(list(df.index)[i])
return recommended_names
# working results which for dataset are pretty good
recommendations('Blues Traveler')
['G-Love & The Special Sauce',
'Phish',
'Spin Doctors',
'Grace Potter and the Nocturnals',
'Jason Mraz',
'Pearl Jam',
'Dave Matthews Band',
'Lukas Nelson & Promise of the Real ',
'Vonda Shepard',
'Goo Goo Dolls']
I'm not sure I understand what you're asking and I can't comment so I'm forced to write here. I assume you want to add column and index fields to the cosine_sim array. You could do something like this:
cos_sim_df = pd.DataFrame(cosine_sim, index=indices, columns=indices)
cos_sim_df.to_csv("artist_similarities.csv")
And then read the csv like
cos_sim_df = pd.read_csv("artist_similarities.csv", header=0, index_col=0)
To make sure pandas knows the first row and columns are field names. Also I assumed your column and row indices are the same, you can change them if you need. Another thing, this won't be exactly like the desired exports because in that csv there is a "score" field which contains the names of the artists, though it seems like the artists should be field names. If you want the exported csv to look exactly like the desired exports you can add the artists in a "score" field like this:
cos_sim_df = pd.DataFrame(cosine_sim, columns=indices)
cos_sim_df["score"] = indices
# make the score field the first field
cos_sim_df = cos_sim_df[["score", *idx]]
Lastly I want to note that indexing data frames is row-major, and it seems you visualized the fields as column indices, for this specific case since your array has a line of symmetry across the diagonal, it doesn't matter which axis is indexed because cos_sim_df["Zayn Malik"] for example will return the same values anyway, but keep this in mind if your array isn't symmetrical.
why a[:,[x]] could create a column vector from an array? The [ ] represents what?
Could anyone explain to me the principle?
a = np.random.randn(5,6)
a = a.astype(np.float32)
print(a)
c = torch.from_numpy(a[:,[1]])
[[-1.6919796 0.3160475 0.7606999 0.16881375 1.325092 0.71536326]
[ 1.217861 0.35804042 0.0285245 0.7097111 -2.1760604 0.992101 ]
[-1.6351479 0.6607222 0.9375339 0.5308735 -1.9699149 -2.002803 ]
[-1.1895325 1.1744579 -0.5980689 -0.8906375 -0.00494479 0.51751447]
[-1.7642071 0.4681248 1.3938268 -0.7519176 0.5987852 -0.5138923 ]]
###########################################
tensor([[0.3160],
[0.3580],
[0.6607],
[1.1745],
[0.4681]])
The [ ] mean you are giving extra dimension. Try numpy shape method to see the diference.
a[:,1].shape
output :
(10,)
with [ ]
a[:,[1]].shape
output :
(10,1)
That syntax is for array slicing in numpy, where arrays are indexed as a[rows, columns, page, ... (higher-dimensions)]
Selecting for a specific row/column/page is done by giving a specific number or range of numbers. So when you use a[1,2], numpy gets the element from row 1, column 2.
You can select for several specific indices by giving the dimension multiple values. So a[[1,3],1] gets you both elements (1,1) and (1,3).
The : tells numpy to get everything from that specific array dimension. So when you use a[:,1], numpy gets every row in column 1. Alternatively, a[1,:] gets every column in row 1.
Hey guys so I want to write a function that performs a z-score transformation to a single column in a 2d array and then return an array where the specified column is "transformed" and the other columns remain the same. So the way I went about this is first I deleted the column that I want to transform using np.delete(), then performed the transformation, and then finally added the array with the deleted column and the transformed column using np.insert(). However all the elements in the transformed column is all 0. What can I do??
I have attached an image so you can view the incorrect output as well.
x1 = np.array([[4,3,12],[1,5,20],[1,2,3],[10,20,40],[7,2,44]])
def myfunc(array, scalar):
total_result = np.delete(array, scalar, axis =1)
z_score = ((array - array.mean())/array.std())[:,1]
answer = np.insert(total_result, scalar, z_score, axis=1)
return answer
myfunc(x1, 1)
Your array is of type integer, and your z-score is float. When you insert float into an integer array, it converts it to integer, hence all 0. You need to convert your array into float first. Also, deleting/inserting is not the right way to do it, simply assign your new values to your desired column. No need for delete/insert. Here is how to do it:
def myfunc(array, scalar):
z_score = ((array - array.mean())/array.std())[:,scalar]
array[:,scalar] = z_score
return array
x1 = x1.astype(np.float64, copy=False)
myfunc(x1, 1)
output:
[[ 4. -0.64344154 12. ]
[ 1. -0.49380397 20. ]
[ 1. -0.71826033 3. ]
[10. 0.62847778 40. ]
[ 7. -0.71826033 44. ]]
I want to combine two arrays which represent a curve where the variable is column 1, however the column 0 values do not always match:
import numpy as np
arr1= np.array([(12,1003),(17,900),(20,810)])
arr2= np.array([(10,1020),(17,902),(19,870),(21,750)])
I want to combine these into one array where the column 0 is combined and both column 1s are stacked with gaps where there is no value for the corresponding column 0 value, something like this:
arr3=np.array([((10,None,1020),(12,1003,None),(17,900,902),(19,None,870),(20,810,None),(21,None,750))])
The reason for this is that I want to be able to get mean values of the second column for each array but they are not at exactly the same column 0 value so the idea of creating this array is to then interpolate to replace all the None values, then create mean values from column 1 and 2 and have an extra column to represent that.
I have used numPy for everything else so far but obviously have got stuck with the np.column_stack function as it needs lists of the same length and also will be blind to stacking based on values from column o. Lastly I do not want to create a fit for the data as the actual data is non-linear and possibily not consistent so a fit will not work and interpolation seems like the most accurate method.
There may be an answer already but due to me not knowing how to describe it well I can't find it. Also I am relatively new to python so please don't make any assumptions about my knowledge other than it is very little.
Thank you.
will this help ??
import pandas
import numpy as np
arr1= np.array([(12,1003),(17,900),(20,810)])
arr2= np.array([(10,1020),(17,902),(19,870),(21,750)])
d1 = pandas.DataFrame(arr1)
d2 = pandas.DataFrame(arr2)
d1.columns = d2.columns = ['t','v']
d3 = pandas.DataFrame(np.array(d1.merge(d2, on='t',how='outer')))
print d3.values
# use d3.as_matrix() to convert to numpy array
output
[[ 12. 1003. nan]
[ 17. 900. 902.]
[ 20. 810. nan]
[ 10. nan 1020.]
[ 19. nan 870.]
[ 21. nan 750.]]