Pandas: How to transpose a row to a column? - python

I have a csv file that I get from a specific software. In the csv file there are 196 rows, each row has a different amount of values. The values are seperated by a semicolon.
I want to have all values of the dataframe in one column, how to do it?
dftest = pd.read_csv("test.csv", sep=';', header=None)
dftest
0
0 14,0;14,0;13,9;13,9;13,8;14,0;13,9;13,9;13,8;1...
1 14,0;14,0;13,9;14,0;14,0;13,9;14,0;14,0;13,8;1...
2 13,8;13,9;14,0;13,9;13,9;14,6;14,0;14,0;13,9;1...
3 14,5;14,4;14,2;14,1;13,9;14,1;14,1;14,2;14,1;1...
4 14,1;14,0;14,1;14,2;14,0;14,3;13,9;14,2;13,7;1...
5 14,5;14,1;14,1;14,1;14,5;14,1;13,9;14,0;14,1;1...
6 14,1;14,7;14,0;13,9;14,2;13,8;13,8;13,9;14,8;1...
7 14,7;13,9;14,2;14,7;15,0;14,5;14,0;14,3;14,0;1...
8 13,9;13,8;15,1;14,1;13,8;14,3;14,1;14,8;14,0;1...
9 15,0;14,4;14,4;13,7;15,0;13,8;14,1;15,0;15,0;1...
10 14,3;13,8;13,9;14,8;14,3;14,0;14,5;14,1;14,0;1...
11 14,5;15,5;14,0;14,1;14,0;13,8;14,2;14,0;15,9;1...
The output looks like this, I want to have all values in one column
I would like to make it look like this:
0 14,0
1 14,0
2 13,9
.
.
.

If there is only one column 0 with values splitted by ; use Series.str.split with DataFrame.stack:
df = dftest[0].str.split(';', expand=True).stack().reset_index(drop=True)

you can also use numpy ravel and convert this to 1D Array.
df = pd.read_csv("test.csv", sep=';', header=None)
df = pd.DataFrame(df.values.ravel(), columns=['Name'])

Related

Stick the columns based on the one columns keeping ids

I have a DataFrame with 100 columns (however I provide only three columns here) and I want to build a new DataFrame with two columns. Here is the DataFrame:
import pandas as pd
df = pd.DataFrame()
df ['id'] = [1,2,3]
df ['c1'] = [1,5,1]
df ['c2'] = [-1,6,5]
df
I want to stick the values of all columns for each id and put them in one columns. For example, for id=1 I want to stick 2, 3 in one column. Here is the DataFrame that I want.
Note: df.melt does not solve my question. Since I want to have the ids also.
Note2: I already use the stack and reset_index, and it can not help.
df = df.stack().reset_index()
df.columns = ['id','c']
df
You could first set_index with "id"; then stack + reset_index:
out = (df.set_index('id').stack()
.droplevel(1).reset_index(name='c'))
Output:
id c
0 1 1
1 1 -1
2 2 5
3 2 6
4 3 1
5 3 5

Replace rows in Dataframe using index from another Dataframe

I have two dataframes with identical structures df and df_a. df_a is a subset of df that I need to reintegrate into df. Essentially, df_a has various rows (with varying indices) from df that have been manipulated.
Below is an example of indices of each df and df_a. These both have the same column structure so all the columns are the same, it's only the rows and idex of the rows that differ.
>> df
index .. other_columns ..
0
1
2
3
. .
9999
10000
10001
[10001 rows x 20 columns]
>> df_a
index .. other_columns ..
5
12
105
712
. .
9824
9901
9997
[782 rows x 20 columns]
So, I want to overwrite only the rows in df that have the indices of df_a with the corresponding rows in df_a. I checked out Replace rows in a Pandas df with rows from another df and replace rows in a pandas data frame but neither of those tell how to use the indices of another dataframe to replace the values in the rows.
Something along the lines of:
df.loc[df_a.index, :] = df_a[:]
I don't know if this wants you meant, for that you would need to be more specific, but if the first data frame was modified to be a new data frame with different indexes, then you can use this code to reset back the indexes:
import pandas as pd
df_a = pd.DataFrame({'a':[1,2,3,4],'b':[5,4,2,7]}, index=[2,55,62,74])
df_a.reset_index(inplace=True, drop=True)
print(df_a)
PRINTS:
a b
0 1 5
1 2 4
2 3 2
3 4 7

Convert first row of pandas dataframe to column name

I have a pandas dataframe
0 1 2
0 pass fail warning
1 50 12 34
I am trying to convert first row as column name
something like this
pass fail warning
0 50 12 34
I am currently doing this by renaming the column name
newdf.rename(columns={0: 'pass', 1: 'fail', 2:'warning'})
and then deleting the first row.
Any better way to do it .
For the dataframe DF, the following line of code will set the first row as the column names of the dataframe:
DF.columns = DF.iloc[0]
I believe need to add parameter to read_html:
df = pd.read_html(url, header=1)[0]
Or:
df = pd.read_html(url, skiprows=1)[0]

My dataframe is taking first row of dataset as index of the dataset.

I opened the .log file in dataframe (python). But the first row of dataset is taken as name of the columns. I tried to change the name of columns. But it deleted the first row. Need solutions.
You need parameter names in read_csv:
df = df.read_csv('filename.csv', names = ['col1','col2','col3'])
Sample:
import pandas as pd
from pandas.compat import StringIO
temp=u"""1,1,a
1,2,b
2,1,c
3,1,d"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), names=['col1','col2','col3'])
print (df)
col1 col2 col3
0 1 1 a
1 1 2 b
2 2 1 c
3 3 1 d
Try "header=None" option with the read_csv method. That will create a column index for you. If you need column names, do as in the answer from #jezrael.
pd.read_csv(csv_file, header=None)

How to remove data from DataFrame permanently

After reading CSV data file with:
import pandas as pd
df = pd.read_csv('data.csv')
print df.shape
I get DataFrame 99 rows (indexes) long:
(99, 2)
To cleanup DataFrame I go ahead and apply dropna() method which reduces it to 33 rows:
df = df.dropna()
print df.shape
which prints:
(33, 2)
Now when I iterate the columns it prints out all 99 rows like they weren't dropped:
for index, value in df['column1'].iteritems():
print index
which gives me this:
0
1
2
.
.
.
97
98
99
It appears the dropna() simply made the data "hidden". That hidden data returns back when I iterate DataFrame. How to assure the dropped data is removed from DataFrame instead just getting hidden?
You're being confused by the fact that the row labels have been preserved so the last row label is still 99.
Example:
In [2]:
df = pd.DataFrame({'a':[0,1,np.NaN, np.NaN, 4]})
df
Out[2]:
a
0 0
1 1
2 NaN
3 NaN
4 4
After calling dropna the index row labels are preserved:
In [3]:
df = df.dropna()
df
Out[3]:
a
0 0
1 1
4 4
If you want to reset so that they are contiguous then call reset_index(drop=True) to assign a new index:
In [4]:
df = df.reset_index(drop=True)
df
Out[4]:
a
0 0
1 1
2 4
Or you can just adjust parameters for example:
Df = df.dropna(inplace=True)

Categories