Command to print top 10 rows of python pandas dataframe without index? - python

head() prints the indexes.
dataframe.to_string(index=False,max_rows=10) prints the first 5 and last 5 rows.

You should try this :
print(df.head(n=10).to_string(index=False))
This will work because df.head return a Dataframe object so you can apply the to_string method to it and get rid of that index ^^.

If you like complicated solutions, you may use
[print(row) for row in df.head().to_string(index=False).split("\n")]
The explanation:
df.head().to_string(index=False) returns a string with "\n" as row delimiters,
split() method then returns a list of single rows,
[print(row) for row in ...] then prints every row.
It was a joke, of course, albeit giving you the desired result. Printing it as a whole string will give the same result (as every row ends with "\n"):
print(df.head().to_string(index=False))
If you work with Jupyter Notebook, you may use a nicer command
df.head().style.hide_index()
Be careful!
No print(...), no df = .... The returning object is an object of the Styler class, not a dataframe.
Jupyter Notebook IDE automatically calls its method ._repr_html() to render (display) your table.
See my other answer for details.

Related

Remove spaces from strings in pandas DataFrame not working

Trying to remove spaces from a column of strings in pandas dataframe. Successfully did it using this method in other section of code.
for index, row in summ.iterrows():
row['TeamName'] = row['TeamName'].replace(" ", "")
summ.head() shows no change made to the column of strings after this operation, however no error as well.
I have no idea why this issue is happening considering I used this exact same method later in the code and accomplished the task successfully.
Why not use str.replace:
df["TeamName"] = df["TeamName"].str.replace(r' ', '', regex=False)
I may be proven wrong here, but I am wondering if its because you are iterating over it, and maybe working on a copy that isn't changing the data. From pandas.DataFrame.iterrows documentation, this is what I found there:
"You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect."
just a thought... hth

Python Pandas print Dataframe.describe() in default format in JupyterNotebook

Output format without print function
If I run a JupyterNotebook cell with
dataframe.describe()
a pretty fromatted table will be printed like that:
VSCode JupyterNotebook dataframe.describe() solo cell printing format
Output format with print function
If I run a cell with more than just one line code dataframe.describe() would not print anythink. Therefore I need to call
print(dataframe.describe()).
This leads to a totally different formatting though:
VSCode JupyterNotebook printing dataframe.describe() with print function
Is there a way to print dataframe.describe() in the first format?
There are multiple things to say here:
Jupyter Notebooks can only print out one object at a time when simply calling it by name (e.g. dataframe). If you want, you can use one cell per command to get the right format.
If you use the function print, it will print anything as text because print is using the function to_string for any object it gets. This is python logic - in contrast, option 1) is Jupyter-specific...
If you don't want to use a seperate cell and still get the right formatting, there are several options, one might be this:
from IPython.display import display
display(dataframe)
I assume you are using the same jupyter notebook files in both environments.
If that is the case, the problem you are facing is related to the execution order of the steps defined in the notebook itself, as the output of the code cell itself will be the one of the execution of the last line of the cell.
Let me illustrate you with an example.
Having the defined the following a dataframe in pandas:
data = {'id':[1,2,3,4],'nome':['Paolo','Pietro','Giovanni','Maria'],'spesa':[23.4,34.5,99.2,50.1]}
The output of the following cell would be different between this two cases:
# Outputs the dataframe itself
dataframe1 = pd.DataFrame(data)
dataframe1.describe()
dataframe1
# Outputs the describe() function return value
dataframe1 = pd.DataFrame(data)
dataframe1
dataframe.describe()
Both cells execute the same two lines on the dataframe without changing its internal state, however, only the last line will be written to the cell output.

Content info of 'row' in ( E.g for index, row in df.iterrows() ) for pandas

There is this code commonly used in python pandas "for index, row in df.iterrows()".
What is the difference between displaying these during the loop:
print(row)
print(row.index)
print(row.index[index])
print(row[index])
I tried printing them and cant comprehend what it does and how it selects the content and I cant find a well explained source online.
For one, it's more concise.
Secondly, you're only supposed to use it for displaying data rather than modifying. According to the docs you may get unpredictable results (a concurrent modification thing, methinks).
As to how it selects it, the docs also say it just returns the individual rows as pd.Series with the index being the id pandas uses to keep track of each row in the pd.DataFrame. I'd guess it'd be a akin to using a python zip() function on a list of int [0..n] and a list of pd.Series.

Why does jupyter sometimes print a DataFrame formatted and sometimes as text?

I have the following code in a jupypter notebook:
# (1) How to add a new column?
test_csv['aggregator'] = None
print (test_csv)
# (2) How to remove a column?
test_csv.drop(columns=['platform'])
It prints the following:
Why is the second statement formatted tabularly (without a print statement) whereas the first one is just text data? Is there a way to force print-format the DataFrame with the nicely-formatted table applied?
Ran Cohen has already mentioned the "why" part of the print function destroying the nice formatting.
To get a nicely-formatted(tabular format with grey and white colors) dataframe, you can leave it at the end of the cell, but this works only for a single dataframe.
In case one wants to print multiple nicely-formatted dataframes one can use
1)
from IPython.display import display
display(df1) #displays nicely formatted dataframe1
display(df2) #displays nicely formatted dataframe2
OR
2)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
df1
df2
#displays both dataframes, nicely formatted
Note: The nice formatting is Jupyter notebook rendering(stylyzing and displaying) the HTML version of the dataframe, the same way html code is rendered by browsers.(here though the true HTML code obtained from df1.to_html() of the table is less stylistic as standalone html code when rendered by browser as some extra style is added by Jupyter Notebook)
Source:
Stackoverflow Article, same question,my answer attributed to the
ones given there
Found Solution Here First
Stackoverflow Article, related to rendering
Why does jupyter sometimes print a DataFrame formatted and sometimes as text?
if you use the function - print it will print it as text because print is using the function to_string for any object it gets.
and when you "leave" the data frame at the and of the cell it will show it as a table because it one of the functions that Jupiter does.
the function test_csv.drop(columns=['platform']) returns df
if you want it to do the drop in the dataFrame you have to use inplace=True or df=df.drop(col)...
and than print the dataFrame
test_csv.drop(columns=['platform']) does not actually drop the column. It just shows you the interim picture of dataframe by printing its state.
To actually drop the column:
test_csv.drop(columns=['platform'], inplace=True)
OR
test_csv.drop('platform', axis=1, inplace=True)
test_csv['aggregator'] = None changes the state of the dataframe by assigning a new column to it. Hence, it does not print anything.
I think you are running both the statements in the same cell of the Jupyter Notebook
You can run the below snippet in one cell
# (1) How to add a new column?
test_csv['aggregator'] = None
test_csv # no need to use print
and in the other cell
# (2) How to remove a column?
test_csv.drop(columns=['platform'])

Python Type-error: string indices must be integers, creating a new column using existing columns in a data frame

I am trying to create an additional custom column using existing column of a data-frame, however the function I am using throws the type error while execution. I am very new to python, can someone please help.
The dataframe used is as below
match_all = match[['country_id','league_id','season','stage','date',
'home_team_api_id','away_team_api_id','home_team_goal','away_team_goal']]
And the function I am using is as below
def goal_diff(matches):
for i in matches:
i['home_team_goal']-i['away_team_goal']
goal_diff(match_all)
The reason your function did not work is because matches in your function is a dataframe. When you do:
for i in matches:
print(i)
You would see that column names are returned of your current df. This is how a for loop operates on a df. So in your function, when you are using i in your subtraction call:
i['home_team_goal'] -i['away_team_goal']
it is like doing
['country_id']['home_team_goal'] - ['country_id']['away_team_goal']
['league_id']['home_team_goal'] - ['league_id']['away_team_goal']
...
This operation in pandas doesn't make any sense. So what you actually want to do when you are calling specific dataframe columns is the name of the df with the column:
matches['home_team_goal'] - matches['away_team_goal']
remember, matches is your function's input df. Lastly, in your for loop you are neither returning any value or storing any value, you are just calling a subtraction method on 2 columns. In your text editor or IDE you might see something print to screen, but in the future you will probably want to use these values for the next step in your code. So in a function, we use the return call to have the function actually give us values when we call it on something.
In your case, if I write my function below without the return call, and then call the function on my dataframe, the operation would complete, and no value would be "returned" to me, it would just be produced and disappear.
Pre-edit answer.
You do not need to create a loop for this, pandas will do it for you:
def goal_dff(matches):
return matches['home_team_goal'] - matches['away_team_goal']
match_all['home_away_goal_diff'] = goal_diff(match_all)
This function takes an input df and uses the columns 'home_team_goal' and 'away_team_goal' to calculate the difference. You also don't need a function for this. If you wanted to create a new column in your existing match_all df you could do this:
match_all['home_away_goal_diff'] = match_all['home_team_goal'] - match_all['away_team_goal']

Categories