Read excel file in spyder but some data missing - python

I am a new in python and is trying to read my excel file in spyder, anaconda. However, when I run it, some row is missing and replaced with '...'. I have seven columns and 100 rows in my excel file. The column arrangement also quite weird.
This is my code:
import pandas as pd
print(" Comparing within 100 Airline \n\n")
def view():
airlines = pd.ExcelFile('Airline_final.xlsx')
df1 = pd.read_excel("Airline_final.xlsx",sheet_name=2)
print("\n\n 1: list of all Airlines \n")
print(df1)
view()
Here is what I get:
18 #051 Cubana Cuba
19 #003 Aigle Azur France
20 #011 Air Corsica France
21 #012 Air France France
22 #019 Air Mediterranee France
23 #050 Corsair France
24 #072 HOP France
25 #087 Joon France
26 #006 Air Berlin Germany
27 #049 Condor Flugdienst Germany
28 #057 Eurowings Germany
29 #064 Germania Germany
.. ... ... ...
70 #018 Air Mandalay Myanmar
71 #020 Air KBZ Myanmar
72 #067 Golden Myanmar Airlines Myanmar
73 #017 Air Koryo North Korea
74 #080 Jetstar Asia Singapore
75 #036 Binter Canarias Spain
76 #040 Canaryfly Spain
77 #073 Iberia and Iberia Express Spain

To print the whole dataframe use:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df1)

Related

Unable to do web scraping from URL using Python Alchemy

I have a script where I'm trying to web scraping the data into table. But I'm getting errors
raise exc.with_traceback(traceback)
ValueError: No tables found
Script :
import pandas as pd
import logging
from sqlalchemy import create engine
from urlib.parse import quote
db_connection = {mysql}://{username}:{quote'pwd'}#{DB:port}
ds_connection = create_engine(db_connection)
a = pd.read_html("https://www.centralbank.ae/en/forex-eibor/exchange-rates/")
df = pd.Dataframe(a[0])
df_final = df.loc[:,['Currency','Rate']]
df_final.to_sql('rate_table',db_connection,if_exists = append,index=false)
Can anyone suggest on this
One easy way to obtain those exchange rates would be to scrape the API accessed to retrieve information in page (check Dev Tools - network tab):
import pandas as pd
import requests
from bs4 import BeautifulSoup
headers = {'Accept-Language': 'en-US,en;q=0.9',
'Referer': 'https://www.centralbank.ae/en/forex-eibor/exchange-rates/'
}
r = requests.post('https://www.centralbank.ae/umbraco/Surface/Exchange/GetExchangeRateAllCurrency', headers=headers)
dfs = pd.read_html(r.text)
print(dfs[0].loc[:,['Currency','Rates']])
This returns:
Currency
Rates
0
US Dollar
3.6725
1
Argentine Peso
0.026993
2
Australian Dollar
2.52753
3
Bangladesh Taka
0.038508
4
Bahrani Dinar
9.74293
5
Brunei Dollar
2.64095
6
Brazilian Real
0.706549
7
Botswana Pula
0.287552
8
Belarus Rouble
1.45526
9
Canadian Dollar
2.82565
10
Swiss Franc
3.83311
11
Chilean Peso
0.003884
12
Chinese Yuan - Offshore
0.536978
13
Chinese Yuan
0.538829
14
Colombian Peso
0.000832
15
Czech Koruna
0.149763
16
Danish Krone
0.496304
17
Algerian Dinar
0.025944
18
Egypt Pound
0.191775
19
Euro
3.69096
20
GB Pound
4.34256
21
Hongkong Dollar
0.468079
22
Hungarian Forint
0.009112
23
Indonesia Rupiah
0.000248
24
Indian Rupee
0.045976
25
Iceland Krona
0.026232
26
Jordan Dinar
5.17472
27
Japanese Yen
0.026818
28
Kenya Shilling
0.030681
29
Korean Won
0.002746
30
Kuwaiti Dinar
11.9423
31
Kazakhstan Tenge
0.007704
32
Lebanon Pound
0.002418
33
Sri Lanka Rupee
0.010201
34
Moroccan Dirham
0.353346
35
Macedonia Denar
0.059901
36
Mexican Peso
0.181874
37
Malaysia Ringgit
0.820395
38
Nigerian Naira
0.008737
39
Norwegian Krone
0.37486
40
NewZealand Dollar
2.27287
41
Omani Rial
9.53921
42
Peru Sol
0.952659
43
Philippine Piso
0.065562
44
Pakistan Rupee
0.017077
45
Polish Zloty
0.777446
46
Qatari Riyal
1.00254
47
Serbian Dinar
0.031445
48
Russia Rouble
0.06178
49
Saudi Riyal
0.977847
50
Sudanese Pound
0.006479
51
Swedish Krona
0.347245
52
Singapore Dollar
2.64038
53
Thai Baht
0.102612
54
Tunisian Dinar
1.1505
55
Turkish Lira
0.20272
56
Trin Tob Dollar
0.541411
57
Taiwan Dollar
0.121961
58
Tanzania Shilling
0.001575
59
Uganda Shilling
0.000959
60
Vietnam Dong
0.000157
61
Yemen Rial
0.01468
62
South Africa Rand
0.216405
63
Zambian Kwacha
0.227752
64
Azerbaijan manat
2.16157
65
Bulgarian lev
1.8873
66
Croatian kuna
0.491344
67
Ethiopian birr
0.069656
68
Iraqi dinar
0.002516
69
Israeli new shekel
1.12309
70
Libyan dinar
0.752115
71
Mauritian rupee
0.079837
72
Romanian leu
0.755612
73
Syrian pound
0.001462
74
Turkmen manat
1.05079
75
Uzbekistani som
0.000336

Panda: how to assign colors to specific values on a pie chart? [duplicate]

I am working from a DataFrame called plot_df that looks like this:
Country Visual Format $
0 France DEFAULT 4.378900e+03
1 France DIGITAL3D 1.170000e+02
2 France IMAX3D 0.000000e+00
3 Hong Kong DIGITAL 1.061189e+07
4 Hong Kong DIGITAL3D 1.881850e+05
5 India DBOX 1.137234e+06
6 India DIGIMAX 2.653723e+06
7 India DIGITAL 3.283665e+07
8 Japan DEFAULT 5.819080e+07
9 Japan DIGIMAX 8.193800e+06
10 Kuwait DEFAULT 6.130250e+04
11 Kuwait DIGITAL3D 1.099000e+03
12 Kuwait IMAX3D 1.057550e+04
13 Kuwait MXP3D 8.736000e+03
14 Malaysia DIGIMAX 2.941200e+04
15 Malaysia DIGITAL 2.590491e+06
16 Malaysia MXP2D 9.478000e+03
17 Mexico 4D3D 3.806130e+06
18 Mexico DIGIMAX3D 0.000000e+00
19 Mexico DIGITAL 3.631979e+07
20 Mexico DIGITAL3D 7.510887e+06
21 Netherlands, The 4D3D 4.435451e+04
22 Netherlands, The DIGIMAX3D 7.488704e+04
23 Netherlands, The DIGITAL 3.350028e+04
24 Netherlands, The DIGITAL3D 2.521642e+05
25 Netherlands, The MXP3D 3.298899e+04
26 Peru DIGITAL 1.707998e+06
27 Peru DIGITAL3D 1.030680e+05
28 Peru MXP2D 3.961500e+04
29 Peru MXP3D 4.077950e+04
30 Peru PLF 1.310630e+05
31 Spain DIGIMAX3D 7.717070e+03
32 Spain DIGITAL 5.198949e+05
33 Spain DIGITAL3D 2.494451e+04
34 Spain MXP3D 1.025880e+04
35 Thailand DIGITAL 3.217920e+05
36 Turkey 4D3D 5.433525e+04
37 Turkey DIGITAL 2.693310e+05
38 Turkey DIGITAL3D 6.161560e+05
39 Turkey MXP3D 4.168149e+04
40 UK & Ireland DEFAULT 1.170058e+06
41 UK & Ireland DIGITAL3D 1.755717e+05
42 UK & Ireland IMAX3D 1.065599e+05
43 United Arab Emirates DEFAULT 4.317666e+06
44 United Arab Emirates DIGITAL3D 2.808751e+04
45 United Arab Emirates IMAX3D 6.832500e+04
I am trying to create _ number of pie chart subplots. This is my code so far:
fig, axes = plt.subplots(nrows=int(np.ceil(plot_df.index.get_level_values(0).nunique()/3)),
ncols=3,
figsize=(15,15))
fig.tight_layout()
axes_list = [item for sublist in axes for item in sublist]
for country in plot_df.index.get_level_values(0).unique():
ax = axes_list.pop(0)
plot_df.loc[(country, slice(None))].plot(kind='pie',
subplots=True,
legend=False,
autopct='%1.1f%%',
ax=ax)
ax.set_title(country, fontweight='bold')
ax.tick_params(
bottom=False
)
ax.set_ylabel(ylabel=None)
for ax in axes_list:
ax.remove()
My end result will look something like this:
My question has to do with the colors assigned to each visual format. Every country has a different set of formats and this leads to inconsistent assignment of colors to labels. (For example, DIGITAL is BLUE in Hong Kong but is GREEN in India).
Is there a way to create a dictionary, with visual formats as keys and colors as values, and assign this dictionary to the pandas plot color parameter? Thanks.
You can use the colors parameter for pie charts. Since this takes an array, you'll have to create an array that corresponds to your input data for each plot.
cdict = {'DIGITAL': 'r', 'DIGIMAX3D': 'y', 'DIGITAL3D': 'b', ...}
for country in plot_df.index.get_level_values(0).unique():
ax = axes_list.pop(0)
df = plot_df.loc[(country, slice(None))]
colors = [cdict[x] for x in df.index] % colors based on index of input data
df.plot(kind='pie', colors=colors, subplots=True, legend=False, autopct='%1.1f%%', ax=ax)
Another thing you could do is to reshape the data and plot with pandas:
(df['$'].unstack(0,fill_value=0)
.plot.pie(subplots=True, layout=(4,4), figsize=(12,12))
);
Output:

Change color of pie chart according to section label (pandas/matplotlib)

I am working from a DataFrame called plot_df that looks like this:
Country Visual Format $
0 France DEFAULT 4.378900e+03
1 France DIGITAL3D 1.170000e+02
2 France IMAX3D 0.000000e+00
3 Hong Kong DIGITAL 1.061189e+07
4 Hong Kong DIGITAL3D 1.881850e+05
5 India DBOX 1.137234e+06
6 India DIGIMAX 2.653723e+06
7 India DIGITAL 3.283665e+07
8 Japan DEFAULT 5.819080e+07
9 Japan DIGIMAX 8.193800e+06
10 Kuwait DEFAULT 6.130250e+04
11 Kuwait DIGITAL3D 1.099000e+03
12 Kuwait IMAX3D 1.057550e+04
13 Kuwait MXP3D 8.736000e+03
14 Malaysia DIGIMAX 2.941200e+04
15 Malaysia DIGITAL 2.590491e+06
16 Malaysia MXP2D 9.478000e+03
17 Mexico 4D3D 3.806130e+06
18 Mexico DIGIMAX3D 0.000000e+00
19 Mexico DIGITAL 3.631979e+07
20 Mexico DIGITAL3D 7.510887e+06
21 Netherlands, The 4D3D 4.435451e+04
22 Netherlands, The DIGIMAX3D 7.488704e+04
23 Netherlands, The DIGITAL 3.350028e+04
24 Netherlands, The DIGITAL3D 2.521642e+05
25 Netherlands, The MXP3D 3.298899e+04
26 Peru DIGITAL 1.707998e+06
27 Peru DIGITAL3D 1.030680e+05
28 Peru MXP2D 3.961500e+04
29 Peru MXP3D 4.077950e+04
30 Peru PLF 1.310630e+05
31 Spain DIGIMAX3D 7.717070e+03
32 Spain DIGITAL 5.198949e+05
33 Spain DIGITAL3D 2.494451e+04
34 Spain MXP3D 1.025880e+04
35 Thailand DIGITAL 3.217920e+05
36 Turkey 4D3D 5.433525e+04
37 Turkey DIGITAL 2.693310e+05
38 Turkey DIGITAL3D 6.161560e+05
39 Turkey MXP3D 4.168149e+04
40 UK & Ireland DEFAULT 1.170058e+06
41 UK & Ireland DIGITAL3D 1.755717e+05
42 UK & Ireland IMAX3D 1.065599e+05
43 United Arab Emirates DEFAULT 4.317666e+06
44 United Arab Emirates DIGITAL3D 2.808751e+04
45 United Arab Emirates IMAX3D 6.832500e+04
I am trying to create _ number of pie chart subplots. This is my code so far:
fig, axes = plt.subplots(nrows=int(np.ceil(plot_df.index.get_level_values(0).nunique()/3)),
ncols=3,
figsize=(15,15))
fig.tight_layout()
axes_list = [item for sublist in axes for item in sublist]
for country in plot_df.index.get_level_values(0).unique():
ax = axes_list.pop(0)
plot_df.loc[(country, slice(None))].plot(kind='pie',
subplots=True,
legend=False,
autopct='%1.1f%%',
ax=ax)
ax.set_title(country, fontweight='bold')
ax.tick_params(
bottom=False
)
ax.set_ylabel(ylabel=None)
for ax in axes_list:
ax.remove()
My end result will look something like this:
My question has to do with the colors assigned to each visual format. Every country has a different set of formats and this leads to inconsistent assignment of colors to labels. (For example, DIGITAL is BLUE in Hong Kong but is GREEN in India).
Is there a way to create a dictionary, with visual formats as keys and colors as values, and assign this dictionary to the pandas plot color parameter? Thanks.
You can use the colors parameter for pie charts. Since this takes an array, you'll have to create an array that corresponds to your input data for each plot.
cdict = {'DIGITAL': 'r', 'DIGIMAX3D': 'y', 'DIGITAL3D': 'b', ...}
for country in plot_df.index.get_level_values(0).unique():
ax = axes_list.pop(0)
df = plot_df.loc[(country, slice(None))]
colors = [cdict[x] for x in df.index] % colors based on index of input data
df.plot(kind='pie', colors=colors, subplots=True, legend=False, autopct='%1.1f%%', ax=ax)
Another thing you could do is to reshape the data and plot with pandas:
(df['$'].unstack(0,fill_value=0)
.plot.pie(subplots=True, layout=(4,4), figsize=(12,12))
);
Output:

Filter and drop rows by proportion python

I have a dataframe called wine that contains a bunch of rows I need to drop.
How do i drop all rows in column 'country' that are less than 1% of the whole?
Here are the proportions:
#proportion of wine countries in the data set
wine.country.value_counts() / len(wine.country)
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233
New Zealand 0.009069
Israel 0.006133
Greece 0.004493
Canada 0.002526
Hungary 0.001755
Romania 0.001558
...
I got lazy and didn't include all of the results, but i think you catch my drift. I need to drop all rows with proportions less than .01
Here is the head of my dataframe:
country designation points price province taster_name variety year price_category
Portugal Avidagos 87 15.0 Douro Roger Voss Portuguese Red 2011.0 low
You can use something like this:
df = df[df.proportion >= .01]
From that dataset it should give you something like this:
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233
figured it out
country_filter = wine.country.value_counts(normalize=True) > 0.01
country_index = country_filter[country_filter.values == True].index
wine = wine[wine.country.isin(list(country_index))]

Python Pandas adding column values based on condition

I have a DataFrame (df) with following values:
Title
fintech_countries
US 60
UK 54
India 28
Australia 25
Germany 13
Singapore 11
Canada 10
I want to add all the countries with values < 25, and show them as 'Others' with their sum (34).
I have created a column name for countries through the following code:
df1 = df.rename_axis('fintech_countries').rename_axis("countries", axis="columns" , inplace=True)
countries Title
fintech_countries
US 60
UK 54
India 28
Australia 25
Germany 13
Singapore 11
Canada 10
Now, I have tried the following code based on another query on StackOverflow:
df1.loc[df1['Title'] < 25, "countries"].sum()
but am getting the following error:
KeyError: 'the label [countries] is not in the [columns]'
Can someone please help? I need the final output as:
countries Title
fintech_countries
US 60
UK 54
India 28
Australia 25
Others 34
TIA
Solution with loc for setting with enlargement and filtering by boolean indexing:
mask = df['Title'] < 25
print (mask)
fintech_countries
US False
UK False
India False
Australia False
Germany True
Singapore True
Canada True
Name: Title, dtype: bool
df1 = df[~mask].copy()
df1.loc['Others', 'Title'] = df.loc[mask, 'Title'].sum()
df1.Title = df1.Title.astype(int)
print (df1)
countries Title
fintech_countries
US 60
UK 54
India 28
Australia 25
Others 34

Categories