Newline created after each iteration - python

I'm using tqdm combined with Pandas on Jupyter notebooks.
I have a Pandas dataframe df.
When i use df.progress_apply, new lines are printed instead of updating only one.
This is what I currently do :
tqdm.pandas(desc="Computing MONTH...")
df["MONTH"] = df.progress_apply(compute_month, axis=1)
My question is not a duplicate of this question : tqdm in Jupyter Notebook
because they said to use tqdm_notebook instead of tqdm.
I can't use tqdm_notebook as I need to implement df.progress_apply.
I can't reproduce this issue on a minimal example because my code is too heavy.
Here is an issue in Github related of this problem but couldn't help me : https://github.com/tqdm/tqdm/issues/375

You should do something like that:
from tqdm import tqdm_notebook
tqdm_notebook().pandas()
And that will do!

Related

Plotly express doesn't load and refuse to connect

I have this simple program that should display a pie chart, but whenever I run the program, it opens a page on Chrome and just keeps loading without any display, and sometimes it refuses to connect. How do I solve this?
P.S: I would like to use it offline, and I'm running it using cmd on windows10
import pandas as pd
import numpy as np
from datetime import datetime
import plotly.express as px
def graph(dataframe):
figure0 = px.pie(dataframe,values=dataframe['POPULATION'],names=dataframe['CONTINENT'])
figure0.show()
df = pd.DataFrame({'POPULATION':[60,17,9,13,1],'CONTINENT':['Asia','Africa','Europe','Americas','Oceania']})
graph(df)
Disclaimer: I extracted this answer from the OPs question. Answers should not be contained in the question itself.
Answer provided by g_odim_3:
So instead of figure0.show(), I used figure0.write_html('first_figure.html', auto_open=True) and it worked:
import pandas as pd
import numpy as np
from datetime import datetime
import plotly.express as px
def graph(dataframe):
figure0 = px.pie(dataframe,values=dataframe['POPULATION'],names=dataframe['CONTINENT'],title='Global Population')
# figure0.show()
figure0.write_html('first_figure.html', auto_open=True)
df = pd.DataFrame({'POPULATION':[60,17,9,13,1],'CONTINENT':['Asia','Africa','Europe','Americas','Oceania']})
graph(df)
I'm 99% sure that this is a version issue. It's a long time since you needed an internet connection to build Plotly figures. Follow the instructions here on how to upgrade your system. I've tried your exact code on my end at it produces the following plot:
I'm on Plotly 5.2.2. Run import plotly and plotly.__version__ to check the version on your end.

import excel data in Jupyter notebook faced with problem

I want to import excel data in jupyter notebook,in the python 3.7, but I got the following errors, can anyone explain to me the solution of the problem awaiting for your kind response.
Make sure your path is correct for the excel file. Also, pay attention to slash '/' It is NOT '\'.
import pandas as pd
import os
my_path = 'C:/Users/user/Desktop/2nd Semester Research Topic'
df = pd.read_excel(os.path.join(my_path,'com.xlsx'))

For categorical data, how to show results in a table, instead of lists? (Jupyter Notebook / Python)

I'm an entry level Python user who just started self-teaching use Python to do data analytics. These days I'm practicing with a Global Suicide Rate data in Jupyter Notebook on Kaggle. I met some problems with formatting my result.I'm wondering how to make my result in several lists, into a well-formatted table?
The dataset I's using is a Global Suicide Date data. For the following section of the code, I want to retrieve all country information in the min_year (which is 1985), and max_year (which is 2016).
So what I expected as my output is something like this: (just an example)
Following are my code.
country_1985 = data [(data['year']==min_year)].country.unique()
country_2016 = data [(data['year']==max_year)].country.unique()
print ([country_1985],[country_2016])
The result shows like this:
However, I don't want those in a list. I'd like it to be shown in a table format something like this:
I tried to use pandas.DataFrame, also doesn't make any sense... Could anyone help me to solve my problem?
Updated:
Thanks for #Code Pope code!!! Thank you for your explanation and patience!
import pandas as pd
import numpy as np
country_1985 = data [(data['year']==min_year)].country.unique()
country_2016 = data [(data['year']==max_year)].country.unique()
country_1985 = pd.DataFrame(country_1985.categories)
country_2016 = pd.DataFrame(country_2016.categories)
# Following are the code from #Code Pope
from IPython.display import display_html
def display_side_by_side(dataframe1, dataframe2):
modified_HTML=dataframe1.to_html() + dataframe2.to_html()
display_html(modified_HTML.replace('table','table
style="display:inline"'),raw=True)
display_side_by_side(country_1985,country_2016 )
Then it looks like this:
Updated Output
As you are saying that you are using Jupyter Notebook, you can change the html of your dataframes before displaying it. Use the following function:
from IPython.display import display_html
def display_side_by_side(dataframe1, dataframe2):
modified_HTML=dataframe1.to_html() + dataframe2.to_html()
display_html(modified_HTML.replace('table','table style="display:inline"'),raw=True)
# then call the function with your two dataframes
display_side_by_side(country_1985,country_2016 )

python - exporting multi-index pandas dataframe to excel

I'm trying the following example from this (closed) GitHub issue: https://github.com/pandas-dev/pandas/issues/2701
import pandas as pd
m = pd.MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])
df = pd.DataFrame([[1,2],[3,4]], columns=m)
df.to_excel('test.xls')
When I open test.xls, there is a blank line on row 3:
The example image from GitHub doesn't have this blank line:
Is this a bug? And are there workaround available for writing multiindex dataframes to Excel? I'd rather not go the CSV route, as pandas will do the merge-and-center for me.
Using pandas version 0.19.2 on Ubuntu 14.04 and Windows 10.
I am able to reproduce whatever you have done. This is most likely a bug.
No easy way out of this but to delete that row by reading the xlsx in again. Please add this to the closed github chain and reopen it.

Unable to write my dataframe using feather (strided data not supported)

When using the feather package (http://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/) to try and write a simple 20x20 dataframe, I keep getting an error stating that strided data isn't yet supported. I don't believe my data is strided (or out of the ordinary), and I can replicate the sample code given on the website, but can't seem to get it to work with my own. Here is some sample code:
import feather
import numpy as np
import pandas as pd
tempArr = reshape(np.arange(400), (20,20))
df = pd.DataFrame(tempArr)
feather.write_dataframe(df, 'test.feather')
The last line returns the following error:
FeatherError: Invalid: no support for strided data yet
I am running this on Ubuntu 14.04. Am I perhaps misunderstanding something about how pandas dataframes are stored?
Please come to GitHub: https://github.com/wesm/feather/issues/97
Bug reports do not belong on StackOverflow

Categories