Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a dataframe in pandas and I am trying to find the easiest way to find the max value across rows and create a new column with the max value. See Below as example
MA10D MA30D MA50D MA100D MA200D
19.838 17.197333 16.5896 16.5207 16.52065
19.296 17.015333 16.4758 16.4676 16.48300
18.722 16.833000 16.3680 16.4106 16.44475
So in the first row of the new column I would want to return a 19.838 then 19.296 and 18.722 (it is just by chance that in this example all numbers are under MA10D column). Can someone help me find the best way to do this.
In Pandas, the vast majority of operations apply through rows, i.e. per column, and it is called axis=0. When it makes sense to apply these operations through columns, i.e. per row, use axis=1.
Finding the maximum is an expected operation on a dataframe. df.max() is equivalent to df.max(axis=0) and gives one resulting row with the max per column. For your case, use df.max(axis=1).
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 days ago.
Improve this question
I read in a csv file as a dataframe with hundreds of columns and thousands of rows. I need to apply a calculation that uses two columns I use for conditions and then for cases that match these conditions I apply a calculation to those rows and columns (hundreds of rows and thousands of columns).
I couldn't find a way to do this in the dataframe itself so I broke up the dataframe in separate dataframes based on one of the 2 conditions and then converted it to dictionary and used the other condition in the dictionary and applied the calculations in a nested for loop to the dictionary values. Then I converted the dictionaries back into dataframes and merged them together.
If I had millions of rows this might be a slow process so does it make sense to try to figure out how to change the values in the dataframe itself instead of converting a dataframe into a dictionary (or list) to apply some calculations and then convert back to a dataframe?
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
Let's say that you have an insanely large dataframe with several observations (rows) and labels/characteristics (columns) and the first thing you want to do is to exclude all the columns who has irrelevant informantion. For that, you need to first of all, glance over the different values the columns, but you can't truly do that with head or tail.
Is there a fuction who returns all the non-repeated values of the columns of a dataframe instead of doing column by column? Thank in advance
I'm able to do it with single columns through the fuction unique. For example using df.color.unique(), it gives me the list of the different colors that there are but I want to do it directly for all of the 100 colums of my dataframe
You can use a for loop, in order to know all the unique value for each columns
for column in df.columns:
print(f"{column}: {df[column].unique()}")
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
This is my first post on stack overflow, so apologies in advance for making mistakes in asking this question.
I am trying to pivot a DataFrame but I am struggling with understanding how it should be done properly, accounting for changes in values. I am a beginner in Python and Pandas.
The dataset I am using can be found here: https://www.kaggle.com/szymonjanowski/internet-articles-data-with-users-engagement
I have processed this dataset to this point:article_data df
What I would like to do next is to pivot this df so that 'source_id' will become the columns. I have done that using pivot_table method but I get a lot of NaN values. Here is a printscreen of the result I get: pivoted data
Moreover, I am not sure whether the pivot accounts only for unique values in the 'source_id' column. For that I was trying to implement a for loop which will iterate through the unique values of source_id and store them in the pivoted DF. However, I don't know how to write that code.
If you could provide me with some advice regarding what I am doing good and what not (and some ideas of how to fix that) I would be very thankful.
Since you have duplicate values in source_id, you'd need to perform some sort of aggregation grouped by that column and then use .unstack(). That's not advisable though since you have a lot of text data that cannot be aggregated.
You can try
df.set_index('source_id').T
but I don't know if duplicate index names are allowed.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm trying to do an iteration with Pandas or any built-in function to display multiple of 10 rows for example.
So e.g. there are 50 records and I want to display the multiple of 10 records which will be record ID 0,10,20,30,40,50.
Use iloc:
df.iloc[::10, :]
This method takes a row/column slice, both based on integer position. More details from documenation:
Purely integer-location based indexing for selection by position.
.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a large table with around 10 million rows. I need to take numbers from 2 columns perform some function and then save the result into a 3rd column.
Is there an efficient way of doing this? The only way I have been able to do this is to QUERY and save the result into a tuple. Then in a second for loop iterate through the tuple where the result and unique hash is stored and filter by hash and then update.
This is very very very slow though! Is there a better way to do this?
What about an update?
update t
set col3 = < some expression here on col1 and col2 >;