I'm trying to average values in columns E, F, G ... based on which key they have in column A.
I have an excel formula that works, I got it from adapting this post: =AVERAGEIF(A:A,0,E:E). However, there are potentially n keys (up to 100000), and m columns of values (up to 100)
My problem is that typing in that formula for each variation isn't logical. Is there a better way to do this; so that it takes into account the varying groups and columns?
NOTE
The first 7 rows are junk (they contain info about the file)
The data is in CSV format, I am only using Excel to open it
I am currently looking into modifying this script, but as I am unfamiliar with python so it may take some time
EXAMPLE
Column A is the group, column E are the values to be averaged. Column F contains the output averages (this is a little messy because it doesn't show which groups the averages are for, but they are just in ascending order, ie: 0, 1, 2)
If I understand you correctly I would use a Pivot Table to summarize your data. You can group by column A and then get the means of the rest of columns.
On the ribbon interface click "Insert" and select "Pivot Table". You'll then be prompted to select your data and set a location.
After that you should see a window on the right asking for fields. Drag the Column A field to the Row Labels list, and then the columns you want averages of to the values list. You might need to change the value field settings from the default sum to average which is what you want.
https://support.office.com/en-AU/Article/Create-a-PivotTable-to-analyze-worksheet-data-a9a84538-bfe9-40a9-a8e9-f99134456576
Related
I was trying to copy some filtered values from a column (using isin and a list with the values I want to filter), but I don't know how to copy those filtered values from that column and paste stick them into another blank column. I also need the values to keep their row number
I need help with this because I was doing research about this theme but I didn't find anything.
color_colors = ['Blanco negro rojo','Gris multicolor','Negro','Negro/Gris']
data_Color_colores = data[data['Color'].isin(color_colors)]
The code above is for the filtering part. I really don't know how to copy the values that this code bring back, into my new blank column named: "New_Color"
Updates:
The below image is my dataframe. I have the three UNTIDY columns: Color, Size and Type. So, I want to filter in each of these three columns, by the information I provide in a list, then copy that information filtered and stick the values with their respective row number into the "New_Color" if that values are colors, into "New_Size" if that values is a size or into "New_type" if these values are Types.
To change this columns to untidy to tidy, I want to paste the respective values from these three columns into the new ones, but my my self filtering first the possible color values in column "Color", then filtering by possible Size values and stick them into new size column, then with possible Type values into "Color" again... AND THEN go to the next column which is "SIZE" and perform the literally the same actions that I perform in the previous column (color). So with this method, when I finish, the three columns (Color, size and Type) will be TIDY in the new ones (New_Color, New_Size, New_Type)
I am currently working with dataframes in pandas. In sum, I have a dataframe called "Claims" filled with customer claims data, and I want to parse all the rows in the dataframe based on the unique values found in the field 'Part ID.' I would then like to take each set of rows and append it one at a time to an empty dataframe called "emptydf." This dataframe has the same column headings as the "Claims" dataframe. Since the values in the 'Part ID' column change from week to week, I would like to find some way to do this dynamically, rather than comb through the dataframe each week manually. I was thinking of somehow incorporating the df.where() expression and a For Loop, but am at a loss as to how to put it all together. Any insight into how to go about this, or even some better methods, would be great! The code I have thus far is divided into two steps as follows:
emptydf = Claims[0:0]
#Create empty dataframe
2.Parse_Claims = Claims.query('Part_ID == 1009')
emptydf = emptydf.append(Parse_Claims)
#Parse the dataframe by each unique Part ID number and append to empty dataframe. As you can see, I can only hard code one Part ID number at a time so far. This would take hours to complete manually, so I would love to figure out a way to iterate through the Part ID column and append the data dynamically.
Needless to say, I am super new to Python, so I definitely appreciate your patience in advance!
empty_df = list(Claims.groupby(Claims['Part_ID']))
this will create a list of tuples one for each part id. each tuple has 2 elements 1st is part id and 2nd is subset for that part id
I want to combine the different values/rows of a certain column. these values are texts and I want to combine them together to perform word count and find the most common words.
the dataframe is called df and is made of 30 columns. I want to combine all the rows of the first column (labeled 'text') into one row, or one list etc,. it doesn't matter as long as I can perform FreqDist on it. I am not interested in grouping the values according to a certain value, I just want all the values in this column to become one block.
I looked around a lot and I couldn't find what I am looking for.
thanks a lot.
I have some data with 4 features of interest: account_id, location_id, date_from and date_to. Each entry corresponds to a period where a customer account was associated with a particular location.
There are some pairs of account_id and location_id which have multiple entries, with different dates. This means that the customer is associated with the location for a longer period, covered by multiple consecutive entries.
So I want to create an extra column with the total length of time that a customer was associated with a given location. I am able to use groupby and apply to calculate this for each pair (see code below).. this works fine but I don't understand how to then add this back into the original dataframe as a new column.
lengths = non_zero_df.groupby(['account_id','location_id'], group_keys=False).apply(lambda x: x.date_to.max() - x.date_from.min())
Thanks
I think Mephy is right that this should probably go to StackOverflow.
You're going to have a shape incompatibility because there will be fewer entries in the grouped result than in the original table. You'll need to do the equivalent of an SQL left outer join with the original table and the results, and you'll have the total length show up multiple times in the new column -- every time you have an equal (account_id, location_id) pair, you'll have the same value in the new column. (There's nothing necessarily wrong with this, but it could cause an issue if people are trying to sum up the new column, for example)
Check out pandas.DataFrame.join (you can also use merge). You'll want to join the old table with the results, on (account_id, location_id), as a left (or outer) join.
I am exploring if it possible to create a calculation or total row which uses the column value based on matching a specified index value. I am quite new to Python so I am not sure if it is possible using pivots. See pivot I want to replicate below.
As you can see in the image above, I want the Ordered Row to be the calculation row. This will minus the Not Ordered Row value of each column from the Grand Total.
Is it possible in Python to Search the index, specifying criteria (E.g "Not Ordered") and loop through the columns to calculate the "Ordered Row"?