moving a row next to another row in panda data frame - python

I am trying to format a data frame from 2 rows to 1 rows. but I am encountering some issues. Do you have any idea on how to do that? Here the code and df:
Thanks!

If you are looking to convert two rows into one, you can do the following...
Stack the dataframe and reset the index at level=1, which will convert the data and columns into a stack. This will end up having each of the column headers as a column (called level_1) and the data as another column(called 0)
Then set the index as level_1, which will move the column names as index
Remove the index name (level_1). Then transpose the dataframe
Code is shown below.
df3=df3.stack().reset_index(level=1).set_index('level_1')
df3.index.name = None
df3=df3.T
Output
df3

Related

Split Data Frame into New Dataframe for each consecutive Column

Looking to split columns of this data frame into multiple data frames. Each with the date column and the consecutive column. How do I get a function that can automate this. So we would have n data frames, n being the number of columns in the original data frame - 1( the date column).
The first thing first is to set the date column as the index:
df.set_index('Date')
Then, when you filter the data frame by a single column you will get a series object with the date and your column of interest:
e.g. df.P19245Y8E will give a series of the second column.
I think this will do what you need, but if you really want to create separate dataframes for each column then you just iterate through the columns:
new_dfs = []
for col in df.columns:
new_dfs.append(df[col])
or with list comprehension:
new_dfs = [df[col] for col in df.columns]

How to convert cells into columns in pandas? (python) [duplicate]

The problem is, when I transpose the DataFrame, the header of the transposed DataFrame becomes the Index numerical values and not the values in the "id" column. See below original data for examples:
Original data that I wanted to transpose (but keep the 0,1,2,... Index intact and change "id" to "id2" in final transposed DataFrame).
DataFrame after I transpose, notice the headers are the Index values and NOT the "id" values (which is what I was expecting and needed)
Logic Flow
First this helped to get rid of the numerical index that got placed as the header: How to stop Pandas adding time to column title after transposing a datetime index?
Then this helped to get rid of the index numbers as the header, but now "id" and "index" got shuffled around: Reassigning index in pandas DataFrame & Reassigning index in pandas DataFrame
But now my id and index values got shuffled for some reason.
How can I fix this so the columns are [id2,600mpe, au565...]?
How can I do this more efficiently?
Here's my code:
DF = pd.read_table(data,sep="\t",index_col = [0]).transpose() #Add index_col = [0] to not have index values as own row during transposition
m, n = DF.shape
DF.reset_index(drop=False, inplace=True)
DF.head()
This didn't help much: Add indexed column to DataFrame with pandas
If I understand your example, what seems to happen to you is that you transpose takes your actual index (the 0...n sequence as column headers. First, if you then want to preserve the numerical index, you can store that as id2.
DF['id2'] = DF.index
Now if you want id to be the column headers then you must set that as an index, overriding the default one:
DF.set_index('id',inplace=True)
DF.T
I don't have your data reproduced, but this should give you the values of id across columns.

How to remove rows containing character in Pandas data frame?

I have a Pandas data frame, and I would like to remove all rows where there is a character "?" in column 6.
Assuming df is my data frame, I tried:
df2 = df[df[6].str.contains("\?")==False]
This, however, does only seem to generate a view of my original frame (when I print df2, the rows I wanted to remove are gone, but the row indices skip values at the removed rows...).
How can I obtain an independent new data frame df2 where the targeted rows are gone?
edit: the frame looks like this:
You can do that:
df2 = df[~df[6].str.contains("?")].reset_index(drop=True)
df2

python - append only select columns as rows

Original file has multiple columns but there are lots of blanks and I want to rearrange so that there is one nice column with info. Starting with 910 rows, 51 cols (newFile df) -> Want 910+x rows, 3 cols (final df) final df has 910 rows.
newFile sample
for i in range (0,len(newFile)):
for j in range (0,48):
if (pd.notnull(newFile.iloc[i,3+j])):
final=final.append(newFile.iloc[[i],[0,1,3+j]], ignore_index=True)
I have this piece of code to go through newFile and if 3+j column is not null, to copy columns 0,1,3+j to a new row. I tried append() but it adds not only rows but a bunch of columns with NaNs again (like the original file).
Any suggestions?!
Your problem is that you are using a DataFrame and keeping column names, so adding a new columns with a value will fill the new column with NaN for the rest of the dataframe.
Plus your code is really inefficient given the double for loop.
Here is my solution using melt()
#creating example df
df = pd.DataFrame(numpy.random.randint(0,100,size=(100, 51)), columns=list('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXY'))
#reconstructing df as long version, keeping columns from index 0 to index 3
df = df.melt(id_vars=df.columns[0:2])
#dropping the values that are null
df.dropna(subset=['value'],inplace=True)
#here if you want to keep the information about which column the value is coming from you stop here, otherwise you do
df.drop(inplace=True,['variable'],axis=1)
print(df)

Python numpy stack rows into a single column

I am working on a data frame like the following and want to reshape them into a single column and create another column using the original index:
convert the above data frame by stacking each row (indexed by "year") into a single column (named "value") and create another column filled with these values' corresponding "year" to generate a new data frame with two columns (value, year) like the following
How can I quickly achieve this using any of the numpy commands?
Thank you.
It just came to me that I can do this rather quickly with the following code
df['year'] = df.index
stacked = df.set_index('year').stack()
df = stacked.reset_index(name='value')
df.drop('level_1', axis=1, inplace=True)
This should do the trick. I should have gave it more thought before lodging this question, sorry.

Categories