Python Pandas print Dataframe.describe() in default format in JupyterNotebook

Python Pandas print Dataframe.describe() in default format in JupyterNotebook - python

Output format without print function
If I run a JupyterNotebook cell with
dataframe.describe()
a pretty fromatted table will be printed like that:
VSCode JupyterNotebook dataframe.describe() solo cell printing format
Output format with print function
If I run a cell with more than just one line code dataframe.describe() would not print anythink. Therefore I need to call
print(dataframe.describe()).
This leads to a totally different formatting though:
VSCode JupyterNotebook printing dataframe.describe() with print function
Is there a way to print dataframe.describe() in the first format?

There are multiple things to say here:
Jupyter Notebooks can only print out one object at a time when simply calling it by name (e.g. dataframe). If you want, you can use one cell per command to get the right format.
If you use the function print, it will print anything as text because print is using the function to_string for any object it gets. This is python logic - in contrast, option 1) is Jupyter-specific...
If you don't want to use a seperate cell and still get the right formatting, there are several options, one might be this:
from IPython.display import display
display(dataframe)

I assume you are using the same jupyter notebook files in both environments.
If that is the case, the problem you are facing is related to the execution order of the steps defined in the notebook itself, as the output of the code cell itself will be the one of the execution of the last line of the cell.
Let me illustrate you with an example.
Having the defined the following a dataframe in pandas:
data = {'id':[1,2,3,4],'nome':['Paolo','Pietro','Giovanni','Maria'],'spesa':[23.4,34.5,99.2,50.1]}
The output of the following cell would be different between this two cases:
# Outputs the dataframe itself
dataframe1 = pd.DataFrame(data)
dataframe1.describe()
dataframe1
# Outputs the describe() function return value
dataframe1 = pd.DataFrame(data)
dataframe1
dataframe.describe()
Both cells execute the same two lines on the dataframe without changing its internal state, however, only the last line will be written to the cell output.

Related

Make Excel Cell value Variable in Python Using Pandas

I have looked for a while on this one but can't seem to find out how to pick a specific cell value in an excel worksheet and assign it to variable in python. I get a Traceback Error with the code below.
I have a number of work rules I want to assign as variables in python that are stored in an cells within an excel workhseet.
(work rules[4][2] is how I am trying to make the cell value into a variable.
Code:
work_rules = pd.read_excel(
'D:\\Personal Files\\Technical Development\\PycharmProjects\\Call Center Headcount Model\\Call Center Work Rules.xlsx',
sheet_name='Inputs')
historical_start_date = work_rules[4][2]
print(historical_start_date)

Found it:
Use the iloc method on the excel object: work_rules.iloc(4, 2)

Why does jupyter sometimes print a DataFrame formatted and sometimes as text?

I have the following code in a jupypter notebook:
# (1) How to add a new column?
test_csv['aggregator'] = None
print (test_csv)
# (2) How to remove a column?
test_csv.drop(columns=['platform'])
It prints the following:
Why is the second statement formatted tabularly (without a print statement) whereas the first one is just text data? Is there a way to force print-format the DataFrame with the nicely-formatted table applied?

Ran Cohen has already mentioned the "why" part of the print function destroying the nice formatting.
To get a nicely-formatted(tabular format with grey and white colors) dataframe, you can leave it at the end of the cell, but this works only for a single dataframe.
In case one wants to print multiple nicely-formatted dataframes one can use
1)
from IPython.display import display
display(df1) #displays nicely formatted dataframe1
display(df2) #displays nicely formatted dataframe2
OR
2)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
df1
df2
#displays both dataframes, nicely formatted
Note: The nice formatting is Jupyter notebook rendering(stylyzing and displaying) the HTML version of the dataframe, the same way html code is rendered by browsers.(here though the true HTML code obtained from df1.to_html() of the table is less stylistic as standalone html code when rendered by browser as some extra style is added by Jupyter Notebook)
Source:
Stackoverflow Article, same question,my answer attributed to the
ones given there
Found Solution Here First
Stackoverflow Article, related to rendering

Why does jupyter sometimes print a DataFrame formatted and sometimes as text?
if you use the function - print it will print it as text because print is using the function to_string for any object it gets.
and when you "leave" the data frame at the and of the cell it will show it as a table because it one of the functions that Jupiter does.
the function test_csv.drop(columns=['platform']) returns df
if you want it to do the drop in the dataFrame you have to use inplace=True or df=df.drop(col)...
and than print the dataFrame

test_csv.drop(columns=['platform']) does not actually drop the column. It just shows you the interim picture of dataframe by printing its state.
To actually drop the column:
test_csv.drop(columns=['platform'], inplace=True)
OR
test_csv.drop('platform', axis=1, inplace=True)
test_csv['aggregator'] = None changes the state of the dataframe by assigning a new column to it. Hence, it does not print anything.

I think you are running both the statements in the same cell of the Jupyter Notebook
You can run the below snippet in one cell
# (1) How to add a new column?
test_csv['aggregator'] = None
test_csv # no need to use print
and in the other cell
# (2) How to remove a column?
test_csv.drop(columns=['platform'])

Command to print top 10 rows of python pandas dataframe without index?

head() prints the indexes.
dataframe.to_string(index=False,max_rows=10) prints the first 5 and last 5 rows.

You should try this :
print(df.head(n=10).to_string(index=False))
This will work because df.head return a Dataframe object so you can apply the to_string method to it and get rid of that index ^^.

If you like complicated solutions, you may use
[print(row) for row in df.head().to_string(index=False).split("\n")]
The explanation:
df.head().to_string(index=False) returns a string with "\n" as row delimiters,
split() method then returns a list of single rows,
[print(row) for row in ...] then prints every row.
It was a joke, of course, albeit giving you the desired result. Printing it as a whole string will give the same result (as every row ends with "\n"):
print(df.head().to_string(index=False))
If you work with Jupyter Notebook, you may use a nicer command
df.head().style.hide_index()
Be careful!
No print(...), no df = .... The returning object is an object of the Styler class, not a dataframe.
Jupyter Notebook IDE automatically calls its method ._repr_html() to render (display) your table.
See my other answer for details.

Data presentation difference in python

Hopefully a fairly simple answer to my issue.
When I run the following code:
print (data_1.iloc[1])
I get a nice, vertical presentation of the data, with each column value header, and its value presented on separate rows. This is very useful when looking at 2 sets of data, and trying to find discrepancies.
However, when I write the code as:
print (data_1.loc[data_1["Name"].isin(["John"])])
I get all the information arrayed across the screen, with the column header in 1 row, and the values in another row.
My question is:
Is there any way of using the second code, and getting the same vertical presentation of the data?

The difference is that data_1.iloc[1] returns a pandas Series whereas data_1.loc[data_1["Name"].isin(["John"])] returns a DataFrame. Pandas has different representations for these two data types (i.e. they print differently).
The reason iloc[1] gives you a Series is because you indexed it using a scalar. If you do data_1.iloc[[1]] you'll see you get a DataFrame instead. Conversely, I'm assuming that data_1["Name"].isin(["John"]) is returning a collection. If you wanted to get a Series instead you might try something like
print(data_1.loc[data_1["Name"].isin(["John"])[0]])
but only if you're sure you're getting one element back.

Python Type-error: string indices must be integers, creating a new column using existing columns in a data frame

I am trying to create an additional custom column using existing column of a data-frame, however the function I am using throws the type error while execution. I am very new to python, can someone please help.
The dataframe used is as below
match_all = match[['country_id','league_id','season','stage','date',
'home_team_api_id','away_team_api_id','home_team_goal','away_team_goal']]
And the function I am using is as below
def goal_diff(matches):
for i in matches:
i['home_team_goal']-i['away_team_goal']
goal_diff(match_all)

The reason your function did not work is because matches in your function is a dataframe. When you do:
for i in matches:
print(i)
You would see that column names are returned of your current df. This is how a for loop operates on a df. So in your function, when you are using i in your subtraction call:
i['home_team_goal'] -i['away_team_goal']
it is like doing
['country_id']['home_team_goal'] - ['country_id']['away_team_goal']
['league_id']['home_team_goal'] - ['league_id']['away_team_goal']
...
This operation in pandas doesn't make any sense. So what you actually want to do when you are calling specific dataframe columns is the name of the df with the column:
matches['home_team_goal'] - matches['away_team_goal']
remember, matches is your function's input df. Lastly, in your for loop you are neither returning any value or storing any value, you are just calling a subtraction method on 2 columns. In your text editor or IDE you might see something print to screen, but in the future you will probably want to use these values for the next step in your code. So in a function, we use the return call to have the function actually give us values when we call it on something.
In your case, if I write my function below without the return call, and then call the function on my dataframe, the operation would complete, and no value would be "returned" to me, it would just be produced and disappear.
Pre-edit answer.
You do not need to create a loop for this, pandas will do it for you:
def goal_dff(matches):
return matches['home_team_goal'] - matches['away_team_goal']
match_all['home_away_goal_diff'] = goal_diff(match_all)
This function takes an input df and uses the columns 'home_team_goal' and 'away_team_goal' to calculate the difference. You also don't need a function for this. If you wanted to create a new column in your existing match_all df you could do this:
match_all['home_away_goal_diff'] = match_all['home_team_goal'] - match_all['away_team_goal']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Pandas print Dataframe.describe() in default format in JupyterNotebook - python

Related

Make Excel Cell value Variable in Python Using Pandas

Why does jupyter sometimes print a DataFrame formatted and sometimes as text?

Command to print top 10 rows of python pandas dataframe without index?

Data presentation difference in python

Python Type-error: string indices must be integers, creating a new column using existing columns in a data frame

Categories

Resources