I have a pandas dataframe
0 1 2
0 pass fail warning
1 50 12 34
I am trying to convert first row as column name
something like this
pass fail warning
0 50 12 34
I am currently doing this by renaming the column name
newdf.rename(columns={0: 'pass', 1: 'fail', 2:'warning'})
and then deleting the first row.
Any better way to do it .
For the dataframe DF, the following line of code will set the first row as the column names of the dataframe:
DF.columns = DF.iloc[0]
I believe need to add parameter to read_html:
df = pd.read_html(url, header=1)[0]
Or:
df = pd.read_html(url, skiprows=1)[0]
Related
I have a DataFrame with 100 columns (however I provide only three columns here) and I want to build a new DataFrame with two columns. Here is the DataFrame:
import pandas as pd
df = pd.DataFrame()
df ['id'] = [1,2,3]
df ['c1'] = [1,5,1]
df ['c2'] = [-1,6,5]
df
I want to stick the values of all columns for each id and put them in one columns. For example, for id=1 I want to stick 2, 3 in one column. Here is the DataFrame that I want.
Note: df.melt does not solve my question. Since I want to have the ids also.
Note2: I already use the stack and reset_index, and it can not help.
df = df.stack().reset_index()
df.columns = ['id','c']
df
You could first set_index with "id"; then stack + reset_index:
out = (df.set_index('id').stack()
.droplevel(1).reset_index(name='c'))
Output:
id c
0 1 1
1 1 -1
2 2 5
3 2 6
4 3 1
5 3 5
Has the title say, I would like to find a way to drop the row (erase it) in a data frame from a column to the end of the data frame but I don't find any way to do so.
I would like to start with
A B C
-----------
1 1 1
1 1 1
1 1 1
and get
A B C
-----------
1
1
1
I was trying with
df.drop(df.loc[:, 'B':].columns, axis = 1, inplace = True)
But this delete the column itself too
A
-
1
1
1
am I missing something?
If you only know the column name that you want to keep:
import pandas as pd
new_df = pd.DataFrame(df["A"])
If you only know the column names that you want to drop:
new_df = df.drop(["B", "C"], axis=1)
For your case, to keep the columns, but remove the content, one possible way is:
new_df = pd.DataFrame(df["A"], columns=df.columns)
Resulting df contains columns "A" and "B" but without values (NaN instead)
I have a csv file that I get from a specific software. In the csv file there are 196 rows, each row has a different amount of values. The values are seperated by a semicolon.
I want to have all values of the dataframe in one column, how to do it?
dftest = pd.read_csv("test.csv", sep=';', header=None)
dftest
0
0 14,0;14,0;13,9;13,9;13,8;14,0;13,9;13,9;13,8;1...
1 14,0;14,0;13,9;14,0;14,0;13,9;14,0;14,0;13,8;1...
2 13,8;13,9;14,0;13,9;13,9;14,6;14,0;14,0;13,9;1...
3 14,5;14,4;14,2;14,1;13,9;14,1;14,1;14,2;14,1;1...
4 14,1;14,0;14,1;14,2;14,0;14,3;13,9;14,2;13,7;1...
5 14,5;14,1;14,1;14,1;14,5;14,1;13,9;14,0;14,1;1...
6 14,1;14,7;14,0;13,9;14,2;13,8;13,8;13,9;14,8;1...
7 14,7;13,9;14,2;14,7;15,0;14,5;14,0;14,3;14,0;1...
8 13,9;13,8;15,1;14,1;13,8;14,3;14,1;14,8;14,0;1...
9 15,0;14,4;14,4;13,7;15,0;13,8;14,1;15,0;15,0;1...
10 14,3;13,8;13,9;14,8;14,3;14,0;14,5;14,1;14,0;1...
11 14,5;15,5;14,0;14,1;14,0;13,8;14,2;14,0;15,9;1...
The output looks like this, I want to have all values in one column
I would like to make it look like this:
0 14,0
1 14,0
2 13,9
.
.
.
If there is only one column 0 with values splitted by ; use Series.str.split with DataFrame.stack:
df = dftest[0].str.split(';', expand=True).stack().reset_index(drop=True)
you can also use numpy ravel and convert this to 1D Array.
df = pd.read_csv("test.csv", sep=';', header=None)
df = pd.DataFrame(df.values.ravel(), columns=['Name'])
Hi I want to get the counts of unique values of the dataframe. count_values implements this however I want to use its output somewhere else. How can I convert .count_values output to a pandas dataframe. here is an example code:
import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)
print(value_counts)
print(type(value_counts))
output is:
2 3
1 2
Name: a, dtype: int64
<class 'pandas.core.series.Series'>
What I need is a dataframe like this:
unique_values counts
2 3
1 2
Thank you.
Use rename_axis for name of column from index and reset_index:
df = df.value_counts().rename_axis('unique_values').reset_index(name='counts')
print (df)
unique_values counts
0 2 3
1 1 2
Or if need one column DataFrame use Series.to_frame:
df = df.value_counts().rename_axis('unique_values').to_frame('counts')
print (df)
counts
unique_values
2 3
1 2
I just run into the same problem, so I provide my thoughts here.
Warning
When you deal with the data structure of Pandas, you have to aware of the return type.
Another solution here
Like #jezrael mentioned before, Pandas do provide API pd.Series.to_frame.
Step 1
You can also wrap the pd.Series to pd.DataFrame by just doing
df_val_counts = pd.DataFrame(value_counts) # wrap pd.Series to pd.DataFrame
Then, you have a pd.DataFrame with column name 'a', and your first column become the index
Input: print(df_value_counts.index.values)
Output: [2 1]
Input: print(df_value_counts.columns)
Output: Index(['a'], dtype='object')
Step 2
What now?
If you want to add new column names here, as a pd.DataFrame, you can simply reset the index by the API of reset_index().
And then, change the column name by a list by API df.coloumns
df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts']
Then, you got what you need
Output:
unique_values counts
0 2 3
1 1 2
Full Answer here
import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)
# solution here
df_val_counts = pd.DataFrame(value_counts)
df_value_counts_reset = df_val_counts.reset_index()
df_value_counts_reset.columns = ['unique_values', 'counts'] # change column names
I'll throw in my hat as well, essentially the same as #wy-hsu solution, but in function format:
def value_counts_df(df, col):
"""
Returns pd.value_counts() as a DataFrame
Parameters
----------
df : Pandas Dataframe
Dataframe on which to run value_counts(), must have column `col`.
col : str
Name of column in `df` for which to generate counts
Returns
-------
Pandas Dataframe
Returned dataframe will have a single column named "count" which contains the count_values()
for each unique value of df[col]. The index name of this dataframe is `col`.
Example
-------
>>> value_counts_df(pd.DataFrame({'a':[1, 1, 2, 2, 2]}), 'a')
count
a
2 3
1 2
"""
df = pd.DataFrame(df[col].value_counts())
df.index.name = col
df.columns = ['count']
return df
pd.DataFrame(
df.groupby(['groupby_col'])['column_to_perform_value_count'].value_counts()
).rename(
columns={'old_column_name': 'new_column_name'}
).reset_index()
Example of selecting a subset of columns from a dataframe, grouping, applying value_count per group, name value_count column as Count, and displaying first n groups.
# Select 5 columns (A..E) from a dataframe (data_df).
# Sort on A,B. groupby B. Display first 3 groups.
df = data_df[['A','B','C','D','E']].sort_values(['A','B'])
g = df.groupby(['B'])
for n,(k,gg) in enumerate(list(g)[:3]): # display first 3 groups
display(k,gg.value_counts().to_frame('Count').reset_index())
I'm creating a Pandas DataFrame to store data. Unfortunately, I can't know the number of rows of data that I'll have ahead of time. So my approach has been the following.
First, I declare an empty DataFrame.
df = DataFrame(columns=['col1', 'col2'])
Then, I append a row of missing values.
df = df.append([None] * 2, ignore_index=True)
Finally, I can insert values into this DataFrame one cell at a time. (Why I have to do this one cell at a time is a long story.)
df['col1'][0] = 3.28
This approach works perfectly fine, with the exception that the append statement inserts an additional column to my DataFrame. At the end of the process the output I see when I type df looks like this (with 100 rows of data).
<class 'pandas.core.frame.DataFrame'>
Data columns (total 2 columns):
0 0 non-null values
col1 100 non-null values
col2 100 non-null values
df.head() looks like this.
0 col1 col2
0 None 3.28 1
1 None 1 0
2 None 1 0
3 None 1 0
4 None 1 1
Any thoughts on what is causing this 0 column to appear in my DataFrame?
The append is trying to append a column to your dataframe. The column it is trying to append is not named and has two None/Nan elements in it which pandas will name (by default) as column named 0.
In order to do this successfully, the column names coming into the append for the data frame must be consistent with the current data frame column names or else new columns will be created (by default)
#you need to explicitly name the columns of the incoming parameter in the append statement
df = DataFrame(columns=['col1', 'col2'])
print df.append(Series([None]*2, index=['col1','col2']), ignore_index=True)
#as an aside
df = DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
dfRowImproper = [1,2,3,4]
#dfRowProper = DataFrame(arange(4)+1,columns=['A','B','C','D']) #will not work!!! because arange returns a vector, whereas DataFrame expect a matrix/array#
dfRowProper = DataFrame([arange(4)+1],columns=['A','B','C','D']) #will work
print df.append(dfRowImproper) #will make the 0 named column with 4 additional rows defined on this column
print df.append(dfRowProper) #will work as you would like as the column names are consistent
print df.append(DataFrame(np.random.randn(1,4))) #will define four additional columns to the df with 4 additional rows
print df.append(Series(dfRow,index=['A','B','C','D']), ignore_index=True) #works as you want
You could use a Series for row insertion:
df = pd.DataFrame(columns=['col1', 'col2'])
df = df.append(pd.Series([None]*2), ignore_index=True)
df["col1"][0] = 3.28
df looks like:
col1 col2
0 3.28 NaN