Looking to split columns of this data frame into multiple data frames. Each with the date column and the consecutive column. How do I get a function that can automate this. So we would have n data frames, n being the number of columns in the original data frame - 1( the date column).
The first thing first is to set the date column as the index:
df.set_index('Date')
Then, when you filter the data frame by a single column you will get a series object with the date and your column of interest:
e.g. df.P19245Y8E will give a series of the second column.
I think this will do what you need, but if you really want to create separate dataframes for each column then you just iterate through the columns:
new_dfs = []
for col in df.columns:
new_dfs.append(df[col])
or with list comprehension:
new_dfs = [df[col] for col in df.columns]
Related
I'm trying to create a excel with value counts and percentage, I'm almost finishing but when I run my for loop, the percentage is added like a new df.to_frame with two more columns but I only want one this is how it looks in excel:
I want that the blue square not appears in the excel or the df and the music percentage is next to the counts of music column, also the music percentage I would like to put it with percentage format instead 0.81 --> 81%. Below is my code.
li = []
for i in range(0, len(df.columns)):
value_counts = df.iloc[:, i].value_counts().to_frame().reset_index()
value_percentage = df.iloc[:, i].value_counts(normalize=True).to_frame().reset_index()#.style.format('{:.2%}')
li.append(value_counts)
li.append(value_percentage)
data = pd.concat(li, axis=1)
The .reset_index() function creates a column in your dataframe called index. So you are appending two-column dataframes each time, one of which is the index. You could add .drop(columns='index') after .reset_index() to drop the index column at each step and therefore also in your final dataframe.
However, depending on your application you may want to be careful with resetting the index because it looks like you are appending in a way where your rows do not align (i.e. not all your index columns are not all the same).
To change your dataframe values to strings with percentages you can use:
value_counts = (value_counts*100).astype(str)+'%'
I am trying to format a data frame from 2 rows to 1 rows. but I am encountering some issues. Do you have any idea on how to do that? Here the code and df:
Thanks!
If you are looking to convert two rows into one, you can do the following...
Stack the dataframe and reset the index at level=1, which will convert the data and columns into a stack. This will end up having each of the column headers as a column (called level_1) and the data as another column(called 0)
Then set the index as level_1, which will move the column names as index
Remove the index name (level_1). Then transpose the dataframe
Code is shown below.
df3=df3.stack().reset_index(level=1).set_index('level_1')
df3.index.name = None
df3=df3.T
Output
df3
I want to average the data of one column in a pandas dataframe is they share the same 'id' which is stored in another column in the same dataframe. To make it simple i have:
and i want:
Were is clear that 'nx' and 'ny' columns' elements have been averaged if for them the value of 'nodes' was the same. The column 'maille' on the other hand has to remain untouched.
I'm trying with groupby but couldn't manage till now to keep the column 'maille' as it is.
Any idea?
Use GroupBy.transform with specify columns names in list for aggregates and assign back:
cols = ['nx','ny']
df[cols] = df.groupby('nodes')[cols].transform('mean')
print (df)
Another idea with DataFrame.update:
df.update(df.groupby('nodes')[cols].transform('mean'))
print (df)
I have a loop in which a new data frame is populated with values during each step. The number of rows in the new dataframe is different for each step in the loop. At the end of the loop, I want to compare the dataframes and in order to do so, they all need to be the same length. Is there a way I can resample the dataframe at each step to an arbitrary number (eg. 5618) of rows?
If your dataframe is too small by N rows, then you can randomly sample N rows with replacement and add the rows on to the end of your original dataframe. If your dataframe is too big, then sample the desired number from the original dataframe .
if len(df) <5618:
df1 = df.sample(n=5618-len(df),replace=True)
df = pd.concat([df,df1])
if len(df) > 5618:
df = df.sample(n=5618)
Original file has multiple columns but there are lots of blanks and I want to rearrange so that there is one nice column with info. Starting with 910 rows, 51 cols (newFile df) -> Want 910+x rows, 3 cols (final df) final df has 910 rows.
newFile sample
for i in range (0,len(newFile)):
for j in range (0,48):
if (pd.notnull(newFile.iloc[i,3+j])):
final=final.append(newFile.iloc[[i],[0,1,3+j]], ignore_index=True)
I have this piece of code to go through newFile and if 3+j column is not null, to copy columns 0,1,3+j to a new row. I tried append() but it adds not only rows but a bunch of columns with NaNs again (like the original file).
Any suggestions?!
Your problem is that you are using a DataFrame and keeping column names, so adding a new columns with a value will fill the new column with NaN for the rest of the dataframe.
Plus your code is really inefficient given the double for loop.
Here is my solution using melt()
#creating example df
df = pd.DataFrame(numpy.random.randint(0,100,size=(100, 51)), columns=list('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXY'))
#reconstructing df as long version, keeping columns from index 0 to index 3
df = df.melt(id_vars=df.columns[0:2])
#dropping the values that are null
df.dropna(subset=['value'],inplace=True)
#here if you want to keep the information about which column the value is coming from you stop here, otherwise you do
df.drop(inplace=True,['variable'],axis=1)
print(df)