Convert Flightradar24 API into pandas dataframe - python

I've tried to convert flightradar24api, which gives a list of airlines, into a pandas data frame without succeed. So this is what i've done:
import flightradar24
import pandas as pd
fr = flightradar24.Api()
airlines = fr.get_airlines()
items = airlines.items()
list_items = list(items)
df = pd.DataFrame(list_items)
print(airlines)
print(df.head())
And this was the result:
0 1
0 version 1594656446
1 rows [{'Name': '21 Air', 'Code': '2I', 'ICAO': 'CSB...
That being said, could you please help me convert flightradar24 api into a pandas dataframe?
Thanks in advance.

This should do it:
fr = flightradar24.Api()
airlines = fr.get_airlines()
df = pd.json_normalize(airlines['rows'])
print(df)
Name Code ICAO
0 21 Air 2I CSB
1 40-Mile Air Q5 MLA
2 9 Air AQ JYH
3 ABX Air GB ABX
4 ACE Belgium Freighters X7 FRH
... ... ... ...
1337 Zambian Airways Q3 MBN
1338 Zanair B4 TAN
1339 Zimex Aviation XM IMX
1340 ZIPAIR ZG TZP
1341 Zorex ORZ

Related

How to create a dataframe using a list of dictionaries that also consist of lists

I have a list of dictionaries that also consist of lists and would like to create a dataframe using this list. For example, the data looks like this:
lst = [{'France': [[12548, ABC], [45681, DFG], [45684, HJK]]},
{'USA': [[84921, HJK], [28917, KLESA]]},
{'Japan':[[38292, ASF], [48902, DSJ]]}]
And this is the dataframe I'm trying to create
Country Amount Code
France 12548 ABC
France 45681 DFG
France 45684 HJK
USA 84921 HJK
USA 28917 KLESA
Japan 38292 ASF
Japan 48902 DSJ
As you can see, the keys became column values of the country column and the numbers and the strings became the amount and code columns. I thought I could use something like the following, but it's not working.
df = pd.DataFrame(lst)
You probably need to transform the data into a format that Pandas can read.
Original data
data = [
{"France": [[12548, "ABC"], [45681, "DFG"], [45684, "HJK"]]},
{"USA": [[84921, "HJK"], [28917, "KLESA"]]},
{"Japan": [[38292, "ASF"], [48902, "DSJ"]]},
]
Transforming the data
new_data = []
for country_data in data:
for country, values in country_data.items():
new_data += [{"Country": country, "Amount": amt, "Code": code} for amt, code in values]
Create the dataframe
df = pd.DataFrame(new_data)
Ouput
Country Amount Code
0 France 12548 ABC
1 France 45681 DFG
2 France 45684 HJK
3 USA 84921 HJK
4 USA 28917 KLESA
5 Japan 38292 ASF
6 Japan 48902 DSJ
df = pd.concat([pd.DataFrame(elem) for elem in list])
df = df.apply(lambda x: pd.Series(x.dropna().values)).stack()
df = df.reset_index(level=[0], drop=True).to_frame(name = 'vals')
df = pd.DataFrame(df["vals"].to_list(),index= df.index, columns=['Amount', 'Code']).sort_index()
print(df)
output:
Amount Code
France 12548 ABC
USA 84921 HJK
Japan 38292 ASF
France 45681 DFG
USA 28917 KLESA
Japan 48902 DSJ
France 45684 HJK
Use nested list comprehension for flatten data and pass to DataFrame constructor:
lst = [
{"France": [[12548, "ABC"], [45681, "DFG"], [45684, "HJK"]]},
{"USA": [[84921, "HJK"], [28917, "KLESA"]]},
{"Japan": [[38292, "ASF"], [48902, "DSJ"]]},
]
L = [(country, *x) for country_data in lst
for country, values in country_data.items()
for x in values]
df = pd.DataFrame(L, columns=['Country','Amount','Code'])
print (df)
Country Amount Code
0 France 12548 ABC
1 France 45681 DFG
2 France 45684 HJK
3 USA 84921 HJK
4 USA 28917 KLESA
5 Japan 38292 ASF
6 Japan 48902 DSJ
Build a new dictionary that combines the individual dicts into one, before concatenating the dataframes:
new_dict = {}
for ent in lst:
for key, value in ent.items():
new_dict[key] = pd.DataFrame(value, columns = ['Amount', 'Code'])
pd.concat(new_dict, names=['Country']).droplevel(1).reset_index()
Country Amount Code
0 France 12548 ABC
1 France 45681 DFG
2 France 45684 HJK
3 USA 84921 HJK
4 USA 28917 KLESA
5 Japan 38292 ASF
6 Japan 48902 DSJ

Find top n elements in pandas dataframe column by keeping the grouping

I am trying to find the top 5 elements of the column total_petitions, but keeping the ordered grouping I did.
df = df[['fy', 'EmployerState', 'total_petitions']]
table = df.groupby(['fy','EmployerState']).mean()
table.nlargest(5, 'total_petitions')
sample output:
fy EmployerState total_petitions
2020 WA 7039.333333
2016 MD 2647.400000
2017 MD 2313.142857
... TX 2305.541667
2020 TX 2081.952381
desired output:
fy EmployerState total_petitions
2016 AL 3.875000
AR 225.333333
AZ 26.666667
CA 326.056604
CO 21.333333
... ... ...
2020 VA 36.714286
WA 7039.333333
WI 43.750000
WV 8986086.08
WY 1.000000
with the elements of total_petitions being the 5 states with highest means by year
What you are looking for is a pivot table:
df = df.pivot_table(values='total_petitions', index=['fy','EmployerState'])
df = df.groupby(level='fy')['total_petitions'].nlargest(5).reset_index(level=0, drop=True).reset_index()

Function to move specific row to top or bottom of pandas dataframe

I have two functions which shift a row of a pandas dataframe to the top or bottom, respectively. After applying them more then once to a dataframe, they seem to work incorrectly.
These are the 2 functions to move the row to top / bottom:
def shift_row_to_bottom(df, index_to_shift):
"""Shift row, given by index_to_shift, to bottom of df."""
idx = df.index.tolist()
idx.pop(index_to_shift)
df = df.reindex(idx + [index_to_shift])
return df
def shift_row_to_top(df, index_to_shift):
"""Shift row, given by index_to_shift, to top of df."""
idx = df.index.tolist()
idx.pop(index_to_shift)
df = df.reindex([index_to_shift] + idx)
return df
Note: I don't want to reset_index for the returned df.
Example:
df = pd.DataFrame({'Country' : ['USA', 'GE', 'Russia', 'BR', 'France'],
'ID' : ['11', '22', '33','44', '55'],
'City' : ['New-York', 'Berlin', 'Moscow', 'London', 'Paris'],
'short_name' : ['NY', 'Ber', 'Mosc','Lon', 'Pa']
})
df =
Country ID City short_name
0 USA 11 New-York NY
1 GE 22 Berlin Ber
2 Russia 33 Moscow Mosc
3 BR 44 London Lon
4 France 55 Paris Pa
This is my dataframe:
Now, apply function for the first time. Move row with index 0 to bottom:
df_shifted = shift_row_to_bottom(df,0)
df_shifted =
Country ID City short_name
1 GE 22 Berlin Ber
2 Russia 33 Moscow Mosc
3 BR 44 London Lon
4 France 55 Paris Pa
0 USA 11 New-York NY
The result is exactly what I want.
Now, apply function again. This time move row with index 2 to the bottom:
df_shifted = shift_row_to_bottom(df_shifted,2)
df_shifted =
Country ID City short_name
1 GE 22 Berlin Ber
2 Russia 33 Moscow Mosc
4 France 55 Paris Pa
0 USA 11 New-York NY
2 Russia 33 Moscow Mosc
Well, this is not what I was expecting. There must be a problem when I want to apply the function a second time. The promblem is analog to the function shift_row_to_top.
My question is:
What's going on here?
Is there a better way to shift a specific row to top / bottom of the dataframe? Maybe a pandas-function?
If not, how would you do it?
Your problem is these two lines:
idx = df.index.tolist()
idx.pop(index_to_shift)
idx is a list and idx.pop(index_to_shift) removes the item at index index_to_shift of idx, which is not necessarily valued index_to_shift as in the second case.
Try this function:
def shift_row_to_bottom(df, index_to_shift):
idx = [i for i in df.index if i!=index_to_shift]
return df.loc[idx+[index_to_shift]]
# call the function twice
for i in range(2): df = shift_row_to_bottom(df, 2)
Output:
Country ID City short_name
0 USA 11 New-York NY
1 GE 22 Berlin Ber
3 BR 44 London Lon
4 France 55 Paris Pa
2 Russia 33 Moscow Mosc

Generate random rows and keep an order between observations to create a dataframe in python

I'd would to generate more than 100 rows randomly and keep the link within the observations.
Below my example :
There are 4 variables which are Country, Category, Product and Price.
And Category, Product need to have a link together.
import random as rd
import pandas as pd
Country = []
Category = []
Product = []
Price = []
for i in range(1000):
Country.append(rd.choice(['England','Germany','France','USA','China','Japan']))
Category.append(rd.choice(['Electronics','home appliances','Computer','Food','Bedding']))
Product.append(rd.choice(['Iphone 6S','Samsung Fridge','PC ASUS','Cheese','Bed']))
Price.append(rd.randint(10,10000))
data = pd.DataFrame(data = {'Country':Country,'Category':Category,'Product':Product,'Price':Price})
When I executed the code above, Category observations aren't with their corresponding Product observations. For example you could have a row with Electronics (Category) and Cheese (Product) and it makes no sens obviously.
Any ideas would be appreciated
Thank you in advance
You can use Series.map for new column by dictionary with zip of lists after generating DataFrame without column Product:
Also append to lists is not necessary, faster is use numpy.random.choice and numpy.random.randint
import numpy as np
N = 10000
L0 = ['England','Germany','France','USA','China','Japan']
L1 = ['Electronics','home appliances','Computer','Food','Bedding']
L2 = ['Iphone 6S','Samsung Fridge','PC ASUS','Cheese','Bed']
d = dict(zip(L1, L2))
print (d)
{'Electronics': 'Iphone 6S', 'home appliances': 'Samsung Fridge',
'Computer': 'PC ASUS', 'Food': 'Cheese', 'Bedding': 'Bed'}
data = pd.DataFrame(data = {'Country':np.random.choice(L0, size=N),
'Category':np.random.choice(L1, size=N),
'Price':np.random.randint(10,size=N)})
data['Product'] = data['Category'].map(d)
print (data)
Country Category Price Product
0 Germany Food 1 Cheese
1 England Food 6 Cheese
2 Japan Bedding 3 Bed
3 France Electronics 1 Iphone 6S
4 Japan home appliances 8 Samsung Fridge
... ... ... ...
9995 England Electronics 3 Iphone 6S
9996 China Electronics 1 Iphone 6S
9997 Germany Bedding 0 Bed
9998 USA Electronics 3 Iphone 6S
9999 Germany home appliances 6 Samsung Fridge
[10000 rows x 4 columns]

Geopy, checking cities, avoiding duplicates, pandas

I want to get the lat of ~ 100 k entries in a pandas dataframe. Since I can query geopy only with a second delay, I want to make sure I do not query duplicates (most should be duplicates since there are not that many cities)
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="xxx")
df['loc']=0
for x in range(1,len(df):
for y in range(1,x):
if df['Location'][y]==df['Location'][x]:
df['lat'][x]=df['lat'][y]
else:
location = geolocator.geocode(df['Location'][x])
time.sleep(1.2)
df.at[x,'lat']=location.latitude
The idea is to check if the location is already in the list, and only if not query geopy. Somehow it is painfully slow and seems not to be doing what I intended. Any help or tip is appreciated.
Prepare the initial dataframe:
import pandas as pd
df = pd.DataFrame({
'some_meta': [1, 2, 3, 4],
'city': ['london', 'paris', 'London', 'moscow'],
})
df['city_lower'] = df['city'].str.lower()
df
Out[1]:
some_meta city city_lower
0 1 london london
1 2 paris paris
2 3 London london
3 4 moscow moscow
Create a new DataFrame with unique cities:
df_uniq_cities = df['city_lower'].drop_duplicates().to_frame()
df_uniq_cities
Out[2]:
city_lower
0 london
1 paris
3 moscow
Run geopy's geocode on that new DataFrame:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="specify_your_app_name_here")
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
df_uniq_cities['location'] = df_uniq_cities['city_lower'].apply(geocode)
# Or, instead, do this to get a nice progress bar:
# from tqdm import tqdm
# tqdm.pandas()
# df_uniq_cities['location'] = df_uniq_cities['city_lower'].progress_apply(geocode)
df_uniq_cities
Out[3]:
city_lower location
0 london (London, Greater London, England, SW1A 2DU, UK...
1 paris (Paris, Île-de-France, France métropolitaine, ...
3 moscow (Москва, Центральный административный округ, М...
Merge the initial DataFrame with the new one:
df_final = pd.merge(df, df_uniq_cities, on='city_lower', how='left')
df_final['lat'] = df_final['location'].apply(lambda location: location.latitude if location is not None else None)
df_final['long'] = df_final['location'].apply(lambda location: location.longitude if location is not None else None)
df_final
Out[4]:
some_meta city city_lower location lat long
0 1 london london (London, Greater London, England, SW1A 2DU, UK... 51.507322 -0.127647
1 2 paris paris (Paris, Île-de-France, France métropolitaine, ... 48.856610 2.351499
2 3 London london (London, Greater London, England, SW1A 2DU, UK... 51.507322 -0.127647
3 4 moscow moscow (Москва, Центральный административный округ, М... 55.750446 37.617494
The key to resolving your issue with timeouts is the geopy's RateLimiter class. Check out the docs for more details: https://geopy.readthedocs.io/en/1.18.1/#usage-with-pandas
Imports
see geopy documentation for how to instantiate the Nominatum geoencoder
import pandas as pd
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="specify_your_app_name_here") # specify your application name
Generate some data with locations
d = ['New York, NY', 'Seattle, WA', 'Philadelphia, PA',
'Richardson, TX', 'Plano, TX', 'Wylie, TX',
'Waxahachie, TX', 'Washington, DC']
df = pd.DataFrame(d, columns=['Location'])
print(df)
Location
0 New York, NY
1 Seattle, WA
2 Philadelphia, PA
3 Richardson, TX
4 Plano, TX
5 Wylie, TX
6 Waxahachie, TX
7 Washington, DC
Use a dict to geoencode only the unique Locations per this SO post
extract all parameters simultaneously
first, get lat and lon in same step (as tuples in a single column of the DataFrame)
second, split the column of tuples into separate columns
locations = df['Location'].unique()
# Create dict of geoencodings
d = (
dict(zip(locations, pd.Series(locations)
.apply(geolocator.geocode, args=(10,))
.apply(lambda x: (x.latitude, x.longitude)) # get tuple of latitude and longitude
)
)
)
# Map dict to `Location` column
df['city_coord'] = df['Location'].map(d)
# Split single column of tuples into multiple (2) columns
df[['lat','lon']] = pd.DataFrame(df['city_coord'].tolist(), index=df.index)
print(df)
Location city_coord lat lon
0 New York, NY (40.7308619, -73.9871558) 40.730862 -73.987156
1 Seattle, WA (47.6038321, -122.3300624) 47.603832 -122.330062
2 Philadelphia, PA (39.9524152, -75.1635755) 39.952415 -75.163575
3 Richardson, TX (32.9481789, -96.7297206) 32.948179 -96.729721
4 Plano, TX (33.0136764, -96.6925096) 33.013676 -96.692510
5 Wylie, TX (33.0151201, -96.5388789) 33.015120 -96.538879
6 Waxahachie, TX (32.3865312, -96.8483311) 32.386531 -96.848331
7 Washington, DC (38.8950092, -77.0365625) 38.895009 -77.036563

Categories