Get the live standings of formula 2 driver - python

I have been trying to use python to show me the live championship standings in formula 2 by taking the standings from before the race and then adding the points to each driver that correspond with the position they are in. The problem I have is that I cannot get live updates from the formula 2 live timing page.
I have been using BeautifulSoup to try and scrape the data from the f2 website.

Try this:
import pandas as pd
from bs4 import BeautifulSoup
import lxml
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time
url = 'https://www.fiaformula2.com/livetiming/index.html'
options = Options()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get(url)
time.sleep(3)
page_source = driver.page_source
# --------------------------------------------
soup = BeautifulSoup(page_source, 'lxml')
reviews = []
cols = []
dv = soup.find('div', {"class": "scrollable"})
if dv:
heads = dv.find_all('th')
for i in range(len(heads)):
cols.append(str(heads[i]["class"][0]))
rows = dv.find_all('tr')
data = {}
tds = []
for row in rows:
tds = row.find_all('td')
if tds:
i = -1
for td in tds:
i += 1
data[cols[i]] = td.text
reviews.append(data)
data = {}
df = pd.DataFrame(reviews)
print(df)
Used selenium to get url and wait 3 seconds for table to show up then bs4.
Here is the result (pandas dataframe)
'''
position car-number driver-short-name driver-full-name gap ... best-lap sector1-time sector2-time sector3-time pit
0 1 11 DRU F.DRUGOVICH LAP ... 1:25.265 29.2 30.4 25.6 2
1 2 20 VER R.VERSCHOOR 2.4 ... 1:25.640 29.1 30.8 25.6 2
2 3 17 IWA A.IWASA 3.6 ... 1:26.022 29.2 31.0 25.7 2
3 4 1 HAU D.HAUGER 6.9 ... 1:25.647 29.1 30.8 25.5 2
4 5 22 FIT E.FITTIPALDI 7.9 ... 1:25.641 29.1 30.8 25.6 2
5 6 25 COR A.CORDEEL 10.8 ... 1:25.573 28.9 31.0 25.5 2
6 7 8 VIP J.VIPS 12.4 ... 1:25.909 29.1 31.1 25.6 2
7 8 24 BEC D.BECKMANN 13.1 ... 1:25.986 28.9 31.1 25.9 3
8 9 14 OCA O.CALDWELL 14.6 ... 1:25.834 29.2 30.9 25.5 2
9 10 10 POU T.POURCHAIRE 15.5 ... 1:25.494 28.8 30.8 25.8 2
10 11 2 DAR J.DARUVALA 17.2 ... 1:27.889 29.3 31.1 27.3 2
11 12 21 WIL C.WILLIAMS 18.4 ... 1:26.363 29.2 31.1 25.9 2
12 13 5 LAW L.LAWSON 20.2 ... 1:24.738 28.7 30.3 25.6 2
13 14 7 ARM M.ARMSTRONG 25.6 ... 1:24.474 28.7 30.4 25.2 2
14 15 16 NIS R.NISSANY 27.1 ... 1:24.314 28.6 30.5 25.1 2
15 16 9 VES F.VESTI 29.3 ... 1:24.348 28.8 30.1 25.3 2
16 17 15 BOS R.BOSCHUNG 50.6 ... 1:29.340 30.5 31.8 26.9 2
17 18 23 TCA T.CALDERON 17L ... RETIRED 2
18 19 12 NOV C.NOVALAK 17L ... RETIRED 3
19 20 3 DOO J.DOOHAN 18L ... STOP 40.4 39.9 STOP 2
20 21 4 SAT M.SATO 22L ... STOP STOP 2
21 22 6 SAR L.SARGEANT ... STOP STOP
'''

Or u can use nonselenium solution. Install websocket lib, for example:
pip install websocket-client
Now to connect to socket u need token, simple get it:
def get_connection_token():
url = "https://ltss.fiaformula2.com/streaming/negotiate?clientProtocol=2.1&connectionData=%5B%7B%22name%22%3A%22streaming%22%7D%5D"
headers = {
'Accept': 'text/plain, */*; q=0.01',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
response = requests.request("GET", url, headers=headers)
return response.json()['ConnectionToken']
Next, we need to substitute our token into the websocket path. We subscribe to the events of opening and receiving messages. When we open, we send a dictionary to the socket with the information we want to receive. As a result, we have a very informative JSON with which you can now do whatever you want.
def on_message(ws, message):
json_obj = json.loads(message)
if 'R' in json_obj:
print(json_obj['R']['data'][2])
ws.keep_running = False
def on_open(ws):
data = {"H": "streaming", "M": "GetData2", "A": ["F2",
["data", "statsfeed", "weatherfeed", "sessionfeed", "trackfeed",
"commentaryfeed", "timefeed", "racedetailsfeed"]], "I": 1}
ws.send(json.dumps(data))
if __name__ == "__main__":
token = str(get_connection_token()).replace('/', '%2F').replace('+', '%2B')
ws = websocket.WebSocketApp(f"wss://ltss.fiaformula2.com/streaming/connect?transport=webSockets&clientProtocol=2.1"
f"&connectionToken={token}"
f"&connectionData=%5B%7B%22name%22%3A%22streaming%22%7D%5D",
on_open=on_open,
on_message=on_message)
ws.run_forever()
OUTPUT:
{'R': {'commentaryfeed': [['Good morning! The sun is shining, the breeze is blowing and the FIA Formula 2 Feature Race is about to get underway around the Circuit Zandvoort. ', "It's advantage Felipe Drugovich as the Championship leader lines up on pole. He'll have rookie Jack Doohan for company on the front row.", 'The formation lap is underway.', "It's a mixed start for our leading pack - Drugovich gets a strong start to hold off Doohan, while Logan Sargeant locks up and sails through the gravel into Turn 1.", 'SAFETY CAR!', "The Safety Car is deployed after Sargeant's Carlin finds the barriers.", 'Replays show that the American driver made contact with the rear end of Ralph Boschung, sending him into the barriers at Turn 7.', "Lap 3: As it stands, Drugovich leads ahead of Doohan. However, it's Dennis Hauger who's made the most of the opening lap to go from seventh to third.", 'RED FLAG!', 'The race has been suspended on Lap 4 of 40.', 'The cars proceed into the pit lane as barrier repairs are made. ', 'Off the back of his maiden podium in the Sprint Race yesterday, Clement Novalak has made moves - going from 10th to seventh.', 'Amaury Cordeel also improved on the opening lap, as the Van Amersfoort Racing driver now sits in ninth, having started 12th. ', 'After starting on the option tyres, Drugovich, Doohan, Hauger, Verschoor, Lawson and Cordeel have all switched to the medium compound under the Red Flag - they will all still need to make their mandatory stop later in the race. ', 'The race will resume in 10 minutes time.', 'The drivers are heading back out behind the Safety Car as Drugovich will get racing underway with a rolling start.', 'Lap 6: Drugovich floors the throttle early to pull away on the restart, as Novalak and Armstrong knock wheels on the exit of Turn 1. ', 'Lap 7: Fittipaldi goes brave around the outside of Vesti to move up in the top 10. ', "Taking full advantage of the soft tyres, Novalak's on a charge and the MP Motorsport driver is putting Liam Lawson under immense pressure for sixth. ", 'Lap 8: Cordeel and Calan Williams become the first drivers to jump into the pits for their mandatory stops, bolting on the white-walled hard tyres. ', 'Lap 9: A huge lock-up for Doohan! What a nervy moment for the Virtuosi Racing driver. ', "Lap 10: Novalak comes into the pits - he'll need to stretch his tyres across the 30 laps remaining. ", 'Lap 11: Trying to undercut Drugovich and Doohan ahead, Hauger dives into the pits - can the PREMA Racing driver get the out lap he needs?', 'Lap 12: On their last legs, Doohan heads into the pits to swap out his heavily degraded tyres - Iwasa also heads in on the same lap. ', 'Lap 13: The overcut has worked for Iwasa, as the DAMS driver gets the jump on Hauger for 12th - a net podium position.', 'Lap 14: Time for Drugovich to pit and he heads back out comfortably ahead of Doohan. Verschoor assumes the race lead and is pushing all the way to his pit stop. ', "Lap 15: Verschoor pits and comes out smack bang in between Drugovich and Doohan - but the Virtuosi isn't waiting around as he goes around the outside of the banking at Turn 3 to take ninth. ", 'SAFETY CAR!', 'Lap 17: Marino Sato has found the barrier on the exit of Turn 2 on his exit from the pits.', 'Pit lane entry has been closed - the drivers are unable to make their mandatory pit stops behind the Safety Car this weekend. ', 'Lap 21: Racing to resume at the end of this lap. ', 'Lap 22: Lawson leaves it late, bunching the field up behind him all the way to the line. ', 'SAFETY CAR!', "It's a disaster for Doohan on the restart, as the Australian is in the wall after contact behind. Novalak's race is also over following damage at the start of the main straight.", 'Moments before the Safety Car was deployed, Theo Pourchaire got the move done on his teammate Frederik Vesti to move up to third.', 'Charouz Racing System;s Tatiana Calderon is another retiree from the race. ', 'The Safety Car is in at the end of this lap.', 'Lap 26: Lawson catches half the field napping on the restart to pull clear of Pourchaire - as Armstrong dives in to change his tyres once the pit window reopens. ', 'The race is now running to time - with 12 minutes left of the maximum 60 minutes plus a lap remaining.', '11 Minutes: Despite a bashed front nose, Verschoor is on the charge. The home hero is hunting down Drugovich for the lead of the race. ', 'David Beckman and Olli Caldwell are the biggest movers so far - the pair have made up 10 places each and sit in eighth and ninth as things stand.', '7 Minutes: Drugovich has pulled out a 1.3s gap, which has allowed Iwasa to close in to within DRS on second place Verschoor. ', "4 Minutes: Cordeel is putting Enzo Fittipaldi under pressure for fifth - but with only a handful of minutes left, there's not long to make a move stick.", '2 Minutes: Caldwell is putting on a strong defensive drive to hold Daruvala at bay for ninth. ', "1 Minute: Jüri Vips is right on Cordeel's rear wing, but the Hitech Grand Prix driver is also having to manage defending from Cordeel's Van Amersfoort Racing teammate Beckmann behind. ", "0 Minutes: Drugovich crosses the line moments after the MP car crosses the line as the race heads into it's final two laps.", 'FINAL LAP!', 'FELIPE DRUGOVICH WINS THE ZANDVOORT FEATURE RACE!', "The crowd's are jubilant as Richard Verschoor takes second at his home race and Ayumu Iwasa rounds out the podium.", 'Hauger rounded off a positive weekend in fourth ahead of Fittipaldi and Cordeel, who secures his first points in F2.', 'Vips was seventh, as both VAR cars finish in the points with Beckman taking eighth ahead of Caldwell. Pourchaire fights back to finish inside the points in P10.', "You won't have long to wait as Formula 2 returns to Monza next weekend, September 9-11, for the penultimate round of the 2022 season. ", "You won't have long to wait as Formula 2 returns to Monza next weekend, September 9-11, for the penultimate round of the 2022 season. "]], 'data': ['2022-09-04T10:49:19.064', {'Series': 'F2', 'Session': 'Race', 'DataWithheld': 0}, {'11': {'Number': '1', 'position': {'Show': 1, 'Value': '1'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:07:54.477'}, 'driver': {'RacingNumber': '11', 'FullName': 'Felipe DRUGOVICH', 'BroadcastName': 'F DRUGOVICH', 'TLA': 'DRU'}, 'gap': {'Value': ''}, 'interval': {'Value': ''}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '29.2'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '30.4'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.6'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:25.265'}, 'best': {'Lap': '29', 'Value': '1:24.879'}}, '3': {'Number': '20', 'position': {'Show': 0, 'Value': '20'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 1, 'ts': '04/09/2022 10:22:40.370'}, 'driver': {'RacingNumber': '3', 'FullName': 'Jack DOOHAN', 'BroadcastName': 'J DOOHAN', 'TLA': 'DOO'}, 'gap': {'Value': '.'}, 'interval': {'Value': '.'}, 'laps': {'Value': '20'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '40.4'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '39.9'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 1, 'Id': '3', 'Value': ''}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:58.929'}, 'best': {'Lap': '8', 'Value': '1:25.056'}}, '6': {'Number': '22', 'position': {'Show': 0, 'Value': '22'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 1, 'ts': '04/09/2022 09:25:00.414'}, 'driver': {'RacingNumber': '6', 'FullName': 'Logan SARGEANT', 'BroadcastName': 'L SARGEANT', 'TLA': 'SAR'}, 'gap': {'Value': ''}, 'interval': {'Value': ''}, 'laps': {'Value': ''}, 'pits': {'Value': ''}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': ''}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 1, 'Id': '2', 'Value': ''}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': ''}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': ''}, 'best': {'Lap': '', 'Value': ''}}, '20': {'Number': '2', 'position': {'Show': 1, 'Value': '2'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:09:23.127'}, 'driver': {'RacingNumber': '20', 'FullName': 'Richard VERSCHOOR', 'BroadcastName': 'R VERSCHOOR', 'TLA': 'VER'}, 'gap': {'Value': '+2.4'}, 'interval': {'Value': '+2.4'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '29.1'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '30.8'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.6'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:25.640'}, 'best': {'Lap': '30', 'Value': '1:25.064'}}, '17': {'Number': '3', 'position': {'Show': 1, 'Value': '3'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:06:34.428'}, 'driver': {'RacingNumber': '17', 'FullName': 'Ayumu IWASA', 'BroadcastName': 'A IWASA', 'TLA': 'IWA'}, 'gap': {'Value': '+3.6'}, 'interval': {'Value': '+1.2'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '29.2'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '31.0'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.7'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:26.022'}, 'best': {'Lap': '34', 'Value': '1:24.995'}}, '5': {'Number': '13', 'position': {'Show': 1, 'Value': '13'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:33:46.667'}, 'driver': {'RacingNumber': '5', 'FullName': 'Liam LAWSON', 'BroadcastName': 'L LAWSON', 'TLA': 'LAW'}, 'gap': {'Value': '+20.2'}, 'interval': {'Value': '+1.7'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '28.7'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '30.3'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.6'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:24.738'}, 'best': {'Lap': '30', 'Value': '1:23.865'}}, '1': {'Number': '4', 'position': {'Show': 1, 'Value': '4'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:03:42.218'}, 'driver': {'RacingNumber': '1', 'FullName': 'Dennis HAUGER', 'BroadcastName': 'D HAUGER', 'TLA': 'HAU'}, 'gap': {'Value': '+6.9'}, 'interval': {'Value': '+3.3'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '29.1'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '30.8'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.5'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:25.647'}, 'best': {'Lap': '32', 'Value': '1:25.203'}}, '8': {'Number': '7', 'position': {'Show': 1, 'Value': '7'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:05:17.324'}, 'driver': {'RacingNumber': '8', 'FullName': 'Juri VIPS', 'BroadcastName': 'J VIPS', 'TLA': 'VIP'}, 'gap': {'Value': '+12.4'}, 'interval': {'Value': '+1.5'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '29.1'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '31.1'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.6'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:25.909'}, 'best': {'Lap': '14', 'Value': '1:24.934'}}, '7': {'Number': '14', 'position': {'Show': 1, 'Value': '14'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:30:58.538'}, 'driver': {'RacingNumber': '7', 'FullName': 'Marcus ARMSTRONG', 'BroadcastName': 'M ARMSTRONG', 'TLA': 'ARM'}, 'gap': {'Value': '+25.6'}, 'interval': {'Value': '+5.3'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 1, 'Stopped': 0, 'Id': '1', 'Value': '28.7'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '30.4'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.2'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:24.474'}, 'best': {'Lap': '28', 'Value': '1:24.329'}}, '12': {'Number': '19', 'position': {'Show': 0, 'Value': '19'}, 'status': {'Retired': 1, 'InPit': 1, 'PitOut': 0, 'Stopped': 1, 'ts': '04/09/2022 10:23:55.369'}, 'driver': {'RacingNumber': '12', 'FullName': 'Clement NOVALAK', 'BroadcastName': 'C NOVALAK', 'TLA': 'NOV'}, 'gap': {'Value': '.'}, 'interval': {'Value': '+4.6'}, 'laps': {'Value': '21'}, 'pits': {'Value': '3'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '39.6'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '39.8'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '54.4'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '2:14.011'}, 'best': {'Lap': '12', 'Value': '1:25.358'}}, '9': {'Number': '16', 'position': {'Show': 1, 'Value': '16'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:35:23.891'}, 'driver': {'RacingNumber': '9', 'FullName': 'Frederik VESTI', 'BroadcastName': 'F VESTI', 'TLA': 'VES'}, 'gap': {'Value': '+29.3'}, 'interval': {'Value': '+2.2'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '28.8'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '30.1'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.3'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:24.348'}, 'best': {'Lap': '31', 'Value': '1:23.078'}}, '25': {'Number': '6', 'position': {'Show': 1, 'Value': '6'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 09:59:33.446'}, 'driver': {'RacingNumber': '25', 'FullName': 'Amaury CORDEEL', 'BroadcastName': 'A CORDEEL', 'TLA': 'COR'}, 'gap': {'Value': '+10.8'}, 'interval': {'Value': '+2.8'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '28.9'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '31.0'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.5'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:25.573'}, 'best': {'Lap': '10', 'Value': '1:25.469'}}, '22': {'Number': '5', 'position': {'Show': 1, 'Value': '5'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:03:52.349'}, 'driver': {'RacingNumber': '22', 'FullName': 'Enzo FITTIPALDI', 'BroadcastName': 'E FITTIPALDI', 'TLA': 'FIT'}, 'gap': {'Value': '+7.9'}, 'interval': {'Value': '+1.0'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '1', 'Value': '29.1'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '30.8'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '3', 'Value': '25.6'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '1:25.641'}, 'best': {'Lap': '37', 'Value': '1:25.200'}}, '16': {'Number': '15', 'position': {'Show': 1, 'Value': '15'}, 'status': {'Retired': 0, 'InPit': 0, 'PitOut': 0, 'Stopped': 0, 'ts': '04/09/2022 10:31:01.734'}, 'driver': {'RacingNumber': '16', 'FullName': 'Roy NISSANY', 'BroadcastName': 'R NISSANY', 'TLA': 'NIS'}, 'gap': {'Value': '+27.1'}, 'interval': {'Value': '+1.5'}, 'laps': {'Value': '38'}, 'pits': {'Value': '2'}, 'sectors': [{'OverallFastest': 0, 'PersonalFastest': 1, 'Stopped': 0, 'Id': '1', 'Value': '28.6'}, {'OverallFastest': 0, 'PersonalFastest': 0, 'Stopped': 0, 'Id': '2', 'Value': '30.5'}, {'OverallFastest': 0, 'PersonalFastest': 1, 'Stopped': 0, 'Id': '3', 'Value': '25.1'}], 'last': {'OverallFastest': 0, 'PersonalFastest': 0, 'Value': '2:12.100'}, 'best': {'Lap': '12', 'Value': '1:26.891'}}}], 'racedetailsfeed': ['2022-09-04T09:15:06.375', {'Season': 'Formula 2 2022', 'Round': 'Round 12', 'Race': 'Zandvoort', 'Circuit': 'Zandvoort Circuit', 'Country': 'Netherlands', 'CountryCode': 'NL', 'Date': '2022-09-04', 'Session': 'Feature Race'}], 'sessionfeed': ['2022-09-04T10:51:15.553', {'Value': 'Finalised'}], 'statsfeed': ['2022-09-04T10:48:55.474', {'1': {'Number': '1', 'driver': {'RacingNumber': '1'}, 'PersonalBestLapTime': {'Lap': '32', 'Position': '14', 'Value': '1:25.203'}, 'BestSectors': [{'Position': '13', 'Id': '1', 'Value': '28.9'}, {'Position': '11', 'Id': '2', 'Value': '30.3'}, {'Position': '4', 'Id': '3', 'Value': '25.2'}], 'BestSpeeds': [{'Position': '15', 'Id': 'FL', 'Value': '283'}, {'Position': '13', 'Id': 'I1', 'Value': '267'}, {'Position': '17', 'Id': 'I2', 'Value': '248'}, {'Position': '9', 'Id': 'ST', 'Value': '284'}]}, '8': {'Number': '8', 'driver': {'RacingNumber': '8'}, 'PersonalBestLapTime': {'Lap': '14', 'Position': '8', 'Value': '1:24.934'}, 'BestSectors': [{'Position': '9', 'Id': '1', 'Value': '28.9'}, {'Position': '10', 'Id': '2', 'Value': '30.3'}, {'Position': '12', 'Id': '3', 'Value': '25.4'}], 'BestSpeeds': [{'Position': '1', 'Id': 'FL', 'Value': '296'}, {'Position': '3', 'Id': 'I1', 'Value': '270'}, {'Position': '1', 'Id': 'I2', 'Value': '254'}, {'Position': '6', 'Id': 'ST', 'Value': '285'}]}, '9': {'Number': '9', 'driver': {'RacingNumber': '9'}, 'PersonalBestLapTime': {'Lap': '31', 'Position': '1', 'Value': '1:23.078'}, 'BestSectors': [{'Position': '2', 'Id': '1', 'Value': '28.6'}, {'Position': '1', 'Id': '2', 'Value': '29.3'}, {'Position': '1', 'Id': '3', 'Value': '25.0'}], 'BestSpeeds': [{'Position': '10', 'Id': 'FL', 'Value': '290'}, {'Position': '14', 'Id': 'I1', 'Value': '266'}, {'Position': '13', 'Id': 'I2', 'Value': '250'}, {'Position': '4', 'Id': 'ST', 'Value': '285'}]}, '10': {'Number': '10', 'driver': {'RacingNumber': '10'}, 'PersonalBestLapTime': {'Lap': '30', 'Position': '2', 'Value': '1:23.339'}, 'BestSectors': [{'Position': '1', 'Id': '1', 'Value': '28.5'}, {'Position': '2', 'Id': '2', 'Value': '29.5'}, {'Position': '2', 'Id': '3', 'Value': '25.0'}], 'BestSpeeds': [{'Position': '9', 'Id': 'FL', 'Value': '291'}, {'Position': '1', 'Id': 'I1', 'Value': '270'}, {'Position': '5', 'Id': 'I2', 'Value': '252'}, {'Position': '19', 'Id': 'ST', 'Value': '276'}]}, '17': {'Number': '16', 'driver': {'RacingNumber': '17'}, 'PersonalBestLapTime': {'Lap': '34', 'Position': '9', 'Value': '1:24.995'}, 'BestSectors': [{'Position': '11', 'Id': '1', 'Value': '28.9'}, {'Position': '16', 'Id': '2', 'Value': '30.4'}, {'Position': '5', 'Id': '3', 'Value': '25.3'}], 'BestSpeeds': [{'Position': '3', 'Id': 'FL', 'Value': '293'}, {'Position': '4', 'Id': 'I1', 'Value': '269'}, {'Position': '11', 'Id': 'I2', 'Value': '251'}, {'Position': '21', 'Id': 'ST', 'Value': '267'}]}, '20': {'Number': '17', 'driver': {'RacingNumber': '20'}, 'PersonalBestLapTime': {'Lap': '30', 'Position': '12', 'Value': '1:25.064'}, 'BestSectors': [{'Position': '12', 'Id': '1', 'Value': '28.9'}, {'Position': '12', 'Id': '2', 'Value': '30.3'}, {'Position': '9', 'Id': '3', 'Value': '25.4'}], 'BestSpeeds': [{'Position': '12', 'Id': 'FL', 'Value': '289'}, {'Position': '7', 'Id': 'I1', 'Value': '268'}, {'Position': '8', 'Id': 'I2', 'Value': '251'}, {'Position': '17', 'Id': 'ST', 'Value': '277'}]}, '21': {'Number': '18', 'driver': {'RacingNumber': '21'}, 'PersonalBestLapTime': {'Lap': '11', 'Position': '19', 'Value': '1:25.571'}, 'BestSectors': [{'Position': '14', 'Id': '1', 'Value': '29.0'}, {'Position': '17', 'Id': '2', 'Value': '30.4'}, {'Position': '17', 'Id': '3', 'Value': '25.5'}], 'BestSpeeds': [{'Position': '7', 'Id': 'FL', 'Value': '291'}, {'Position': '9', 'Id': 'I1', 'Value': '268'}, {'Position': '7', 'Id': 'I2', 'Value': '252'}, {'Position': '8', 'Id': 'ST', 'Value': '284'}]}, '22': {'Number': '19', 'driver': {'RacingNumber': '22'}, 'PersonalBestLapTime': {'Lap': '37', 'Position': '13', 'Value': '1:25.200'}, 'BestSectors': [{'Position': '7', 'Id': '1', 'Value': '28.9'}, {'Position': '19', 'Id': '2', 'Value': '30.5'}, {'Position': '10', 'Id': '3', 'Value': '25.4'}], 'BestSpeeds': [{'Position': '4', 'Id': 'FL', 'Value': '293'}, {'Position': '2', 'Id': 'I1', 'Value': '270'}, {'Position': '3', 'Id': 'I2', 'Value': '253'}, {'Position': '2', 'Id': 'ST', 'Value': '286'}]}, '23': {'Number': '20', 'driver': {'RacingNumber': '23'}, 'PersonalBestLapTime': {'Lap': '12', 'Position': '21', 'Value': '1:26.891'}, 'BestSectors': [{'Position': '21', 'Id': '1', 'Value': '29.7'}, {'Position': '21', 'Id': '2', 'Value': '31.0'}, {'Position': '21', 'Id': '3', 'Value': '25.9'}], 'BestSpeeds': [{'Position': '17', 'Id': 'FL', 'Value': '281'}, {'Position': '21', 'Id': 'I1', 'Value': '261'}, {'Position': '15', 'Id': 'I2', 'Value': '249'}, {'Position': '15', 'Id': 'ST', 'Value': '280'}]}, '24': {'Number': '21', 'driver': {'RacingNumber': '24'}, 'PersonalBestLapTime': {'Lap': '15', 'Position': '10', 'Value': '1:25.044'}, 'BestSectors': [{'Position': '8', 'Id': '1', 'Value': '28.9'}, {'Position': '8', 'Id': '2', 'Value': '30.1'}, {'Position': '18', 'Id': '3', 'Value': '25.6'}], 'BestSpeeds': [{'Position': '5', 'Id': 'FL', 'Value': '292'}, {'Position': '6', 'Id': 'I1', 'Value': '268'}, {'Position': '4', 'Id': 'I2', 'Value': '252'}, {'Position': '18', 'Id': 'ST', 'Value': '276'}]}, '25': {'Number': '22', 'driver': {'RacingNumber': '25'}, 'PersonalBestLapTime': {'Lap': '10', 'Position': '17', 'Value': '1:25.469'}, 'BestSectors': [{'Position': '6', 'Id': '1', 'Value': '28.8'}, {'Position': '18', 'Id': '2', 'Value': '30.5'}, {'Position': '16', 'Id': '3', 'Value': '25.5'}], 'BestSpeeds': [{'Position': '8', 'Id': 'FL', 'Value': '291'}, {'Position': '8', 'Id': 'I1', 'Value': '268'}, {'Position': '9', 'Id': 'I2', 'Value': '251'}, {'Position': '11', 'Id': 'ST', 'Value': '283'}]}}], 'timefeed': ['2022-09-04T10:56:15.007', False, '00:00:00'], 'trackfeed': ['2022-09-04T10:29:52.008', {'Value': '1', 'Message': 'AllClear'}], 'weatherfeed': ['2022-09-04T10:55:26.977', {'airtemp': '22.8', 'humidity': '69.0', 'pressure': '1016.5', 'rainfall': '0', 'tracktemp': '28.3', 'windspeed': '1.5', 'winddir': '251'}]}, 'I': '1'}

Related

How to get data between braces splitted into pandas dataframe column [duplicate]

This question already has answers here:
Extract data from array - Python
(1 answer)
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Closed 6 months ago.
This post was edited and submitted for review 6 months ago and failed to reopen the post:
Original close reason(s) were not resolved
I have these data taken from soccer stat site:
[{'id': '18209', 'isResult': True, 'side': 'h', 'h': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'a': {'id': '220', 'title': 'Brighton', 'short_title': 'BRI'}, 'goals': {'h': '1', 'a': '2'}, 'xG': {'h': '1.42103', 'a': '1.7289'}, 'datetime': '2022-08-07 13:00:00'}, {'id': '18218', 'isResult': True, 'side': 'a', 'h': {'id': '244', 'title': 'Brentford', 'short_title': 'BRE'}, 'a': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'goals': {'h': '4', 'a': '0'}, 'xG': {'h': '1.38785', 'a': '0.896038'}, 'datetime': '2022-08-13 16:30:00'}, {'id': '18231', 'isResult': True, 'side': 'h', 'h': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'a': {'id': '87', 'title': 'Liverpool', 'short_title': 'LIV'}, 'goals': {'h': '2', 'a': '1'}, 'xG': {'h': '2.01764', 'a': '1.52301'}, 'datetime': '2022-08-22 19:00:00'}, {'id': '18232', 'isResult': True, 'side': 'a', 'h': {'id': '74', 'title': 'Southampton', 'short_title': 'SOU'}, 'a': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'goals': {'h': '0', 'a': '1'}, 'xG': {'h': '1.35887', 'a': '1.34359'}, 'datetime': '2022-08-27 11:30:00'}]
If I put them into a dataframe and then into CSV, I obtain this:
id isResult ... xG datetime
0 18209 True ... {'h': '1.42103', 'a': '1.7289'} 2022-08-07 13:00:00
1 18218 True ... {'h': '1.38785', 'a': '0.896038'} 2022-08-13 16:30:00
2 18231 True ... {'h': '2.01764', 'a': '1.52301'} 2022-08-22 19:00:00
3 18232 True ... {'h': '1.35887', 'a': '1.34359'} 2022-08-27 11:30:00
The part in braces is not split. Is there a way to get also this part split into pandas dataframe columns?
This is the code:
import pandas as pd
ta = [{'id': '18209', 'isResult': True, 'side': 'h', 'h': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'a': {'id': '220', 'title': 'Brighton', 'short_title': 'BRI'}, 'goals': {'h': '1', 'a': '2'}, 'xG': {'h': '1.42103', 'a': '1.7289'}, 'datetime': '2022-08-07 13:00:00'}, {'id': '18218', 'isResult': True, 'side': 'a', 'h': {'id': '244', 'title': 'Brentford', 'short_title': 'BRE'}, 'a': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'goals': {'h': '4', 'a': '0'}, 'xG': {'h': '1.38785', 'a': '0.896038'}, 'datetime': '2022-08-13 16:30:00'}, {'id': '18231', 'isResult': True,'side': 'h', 'h': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'a': {'id': '87', 'title': 'Liverpool', 'short_title': 'LIV'}, 'goals': {'h': '2', 'a': '1'}, 'xG': {'h': '2.01764', 'a': '1.52301'}, 'datetime': '2022-08-22 19:00:00'}, {'id': '18232', 'isResult': True, 'side': 'a', 'h': {'id': '74', 'title': 'Southampton', 'short_title': 'SOU'}, 'a': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'goals': {'h': '0', 'a': '1'}, 'xG': {'h': '1.35887', 'a': '1.34359'}, 'datetime': '2022-08-27 11:30:00'}]
df = pd.DataFrame(ta)
df.to_csv("G:\\stat.csv", header=True)
print(df)
I have been checking the in comments and in duplicate hints provided links expecting a simple pandas command to achieve the goal, but haven't run into this simple solution which gives the same result as the code I came up with. Thanks to #Сергей Кох for providing THIS:
import pandas as pd
ta = [{'id': '18209', 'isResult': True, 'side': 'h', 'h': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'a': {'id': '220', 'title': 'Brighton', 'short_title': 'BRI'}, 'goals': {'h': '1', 'a': '2'}, 'xG': {'h': '1.42103', 'a': '1.7289'}, 'datetime': '2022-08-07 13:00:00'}, {'id': '18218', 'isResult': True, 'side': 'a', 'h': {'id': '244', 'title': 'Brentford', 'short_title': 'BRE'}, 'a': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'goals': {'h': '4', 'a': '0'}, 'xG': {'h': '1.38785', 'a': '0.896038'}, 'datetime': '2022-08-13 16:30:00'}, {'id': '18231', 'isResult': True, 'side': 'h', 'h': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'a': {'id': '87', 'title': 'Liverpool', 'short_title': 'LIV'}, 'goals': {'h': '2', 'a': '1'}, 'xG': {'h': '2.01764', 'a': '1.52301'}, 'datetime': '2022-08-22 19:00:00'}, {'id': '18232', 'isResult': True, 'side': 'a', 'h': {'id': '74', 'title': 'Southampton', 'short_title': 'SOU'}, 'a': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'goals': {'h': '0', 'a': '1'}, 'xG': {'h': '1.35887', 'a': '1.34359'}, 'datetime': '2022-08-27 11:30:00'}]
df = pd.json_normalize(ta)
df.to_csv("pandas_splitting.csv", header=True)
Anyway checking out the code below reveals what the json_normalize() function actually does. The only difference to my code appears to be usage of a '.' and not '_' for creating column names and ... sure it will be a way faster especially on large datasets.
import pandas as pd
ta = [
{'id' : '18209',
'isResult': True,
'side' : 'h',
'h' : {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'},
'a' : {'id': '220', 'title': 'Brighton', 'short_title': 'BRI'},
'goals' : {'h': '1', 'a': '2'},
'xG': {'h': '1.42103', 'a': '1.7289'},
'datetime': '2022-08-07 13:00:00'
},
{'id' : '18218',
'isResult': True,
'side' : 'a',
'h': {'id': '244', 'title': 'Brentford', 'short_title': 'BRE'},
'a': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'},
'goals' : {'h': '4', 'a': '0'},
'xG': {'h': '1.38785', 'a': '0.896038'},
'datetime': '2022-08-13 16:30:00'
}, {'id': '18231', 'isResult': True,'side': 'h', 'h': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'a': {'id': '87', 'title': 'Liverpool', 'short_title': 'LIV'}, 'goals': {'h': '2', 'a': '1'}, 'xG': {'h': '2.01764', 'a': '1.52301'}, 'datetime': '2022-08-22 19:00:00'}, {'id': '18232', 'isResult': True, 'side': 'a', 'h': {'id': '74', 'title': 'Southampton', 'short_title': 'SOU'}, 'a': {'id': '89', 'title': 'Manchester United', 'short_title': 'MUN'}, 'goals': {'h': '0', 'a': '1'}, 'xG': {'h': '1.35887', 'a': '1.34359'}, 'datetime': '2022-08-27 11:30:00'}]
for dct in ta:
h = dct['h']
a = dct['a']
goals = dct['goals']
xG = dct['xG']
for key, value in h.items():
dct['h_'+key] = value
dct.pop('h')
for key, value in a.items():
dct['a_'+key] = value
dct.pop('a')
for key, value in goals.items():
dct['goals_'+key] = value
dct.pop('goals')
for key, value in xG.items():
dct['xG_'+key] = value
dct.pop('xG')
df = pd.DataFrame(ta)
print(df)
df.to_csv("pandas_splitting.csv", header=True)
print_df = """
id isResult side datetime ... goals_h goals_a xG_h xG_a
0 18209 True h 2022-08-07 13:00:00 ... 1 2 1.42103 1.7289
1 18218 True a 2022-08-13 16:30:00 ... 4 0 1.38785 0.896038
2 18231 True h 2022-08-22 19:00:00 ... 2 1 2.01764 1.52301
3 18232 True a 2022-08-27 11:30:00 ... 0 1 1.35887 1.34359
"""
gives:
id isResult side datetime ... goals_h goals_a xG_h xG_a
0 18209 True h 2022-08-07 13:00:00 ... 1 2 1.42103 1.7289
1 18218 True a 2022-08-13 16:30:00 ... 4 0 1.38785 0.896038
2 18231 True h 2022-08-22 19:00:00 ... 2 1 2.01764 1.52301
3 18232 True a 2022-08-27 11:30:00 ... 0 1 1.35887 1.34359
with following CSV file content:
,id,isResult,side,datetime,h_id,h_title,h_short_title,a_id,a_title,a_short_title,goals_h,goals_a,xG_h,xG_a
0,18209,True,h,2022-08-07 13:00:00,89,Manchester United,MUN,220,Brighton,BRI,1,2,1.42103,1.7289
1,18218,True,a,2022-08-13 16:30:00,244,Brentford,BRE,89,Manchester United,MUN,4,0,1.38785,0.896038
2,18231,True,h,2022-08-22 19:00:00,89,Manchester United,MUN,87,Liverpool,LIV,2,1,2.01764,1.52301
3,18232,True,a,2022-08-27 11:30:00,74,Southampton,SOU,89,Manchester United,MUN,0,1,1.35887,1.34359
What you can do is to load them as JSON, then create a dictionary, where the keys are the columns' names and the values are a list of the values
the values can be filled by iterating over the examples you have
then used pd.DataFrame(data)

Split in different columns the content of a cell

I have a column with 1,5k lines and each line has this structure:
" [{'id': 4099, 'name': 'xxxxxxxx + 30 filter', 'product_id': 6546, 'variation_id': 3352, 'quantity': 1, 'tax_class': '', 'subtotal': '110.89', 'subtotal_tax': '0.00', 'total': '29.90', 'total_tax': '0.00', 'taxes': [], 'meta_data': [{'id': 39083, 'key': 'pa_size', 'value': 'l', 'display_key': 'Size', 'display_value': 'L'}, {'id': 39094, 'key': '_reduced_stock', 'value': '1', 'display_key': '_reduced_stock', 'display_value': '1'}], 'sku': 'FS00055.L', 'price': 29.9, 'parent_name': 'xxxxxxxx + 30 filter'}] "
I want to get to a result in which the keys of the inner dictionary become new columns and the values are copied underneath. For example:
id name ......
4099 xxxxxxxx + 30 filter ......
I've tried:
import ast
# Acess only the first line and try to split each into columns
li_column = my_df.loc[0,'line_items']
li_column = ast.literal_eval(li_column)
Then I was able to get into a list with 1 item that has a dictionary inside, now I'm stuck.
I can't convert your string with ast but you could use .apply() to do it for all rows
df['line_items'] = df['line_items'].apply(dirtyjson.loads)
Later you could get dictionary from list
df['line_items'] = df['line_items'].str[0]
And later you can use again .apply() with pd.Series to create new DataFrame from dictionaries.
new_df = df['line_items'].apply(pd.Series)
As I said ast doesn't work for me with your strings. Standard json has also problem to convert it. But module dirtyjson convert it correctly.
import pandas as pd
df = pd.DataFrame({
'line_items': [
" [{'id': 4099, 'name': 'xxxxxxxx + 30 filter', 'product_id': 6546, 'variation_id': 3352, 'quantity': 1, 'tax_class': '', 'subtotal': '110.89', 'subtotal_tax': '0.00', 'total': '29.90', 'total_tax': '0.00', 'taxes': [], 'meta_data': [{'id': 39083, 'key': 'pa_size', 'value': 'l', 'display_key': 'Size', 'display_value': 'L'}, {'id': 39094, 'key': '_reduced_stock', 'value': '1', 'display_key': '_reduced_stock', 'display_value': '1'}], 'sku': 'FS00055.L', 'price': 29.9, 'parent_name': 'xxxxxxxx + 30 filter'}] ",
" [{'id': 4199, 'name': 'xxxxxxxx + 30 filter', 'product_id': 6546, 'variation_id': 3352, 'quantity': 1, 'tax_class': '', 'subtotal': '110.89', 'subtotal_tax': '0.00', 'total': '29.90', 'total_tax': '0.00', 'taxes': [], 'meta_data': [{'id': 39083, 'key': 'pa_size', 'value': 'l', 'display_key': 'Size', 'display_value': 'L'}, {'id': 39094, 'key': '_reduced_stock', 'value': '1', 'display_key': '_reduced_stock', 'display_value': '1'}], 'sku': 'FS00055.L', 'price': 29.9, 'parent_name': 'xxxxxxxx + 30 filter'}] ",
]
})
#import ast
#df['line_items'] = df['line_items'].apply(ast.literal_eval)
import dirtyjson
df['line_items'] = df['line_items'].apply(dirtyjson.loads)
df['line_items'] = df['line_items'].str[0]
new_df = df['line_items'].apply(pd.Series)
print(new_df)
Result:
id name ... price parent_name
0 4099 xxxxxxxx + 30 filter ... 29.9 xxxxxxxx + 30 filter
1 4199 xxxxxxxx + 30 filter ... 29.9 xxxxxxxx + 30 filter
[2 rows x 15 columns]
EDIT:
If you need to add it to existing dataframe
df = df.join(new_df)
# remove old column
del df['line_items']
print(df)
EDIT:
If every list may have more dictionaries then you can use explode() instead of str[0] - and it will put every dictionary in separated row.
df = df.explode('line_items').reset_index(drop=True) # extract from list
import pandas as pd
df = pd.DataFrame({
#'A': ['123', '456', '789'],
'line_items': [
""" [{'id': 4099, 'name': 'xxxxxxxx + 30 filter', 'product_id': 6546, 'variation_id': 3352, 'quantity': 1, 'tax_class': '', 'subtotal': '110.89', 'subtotal_tax': '0.00', 'total': '29.90', 'total_tax': '0.00', 'taxes': [], 'meta_data': [{'id': 39083, 'key': 'pa_size', 'value': 'l', 'display_key': 'Size', 'display_value': 'L'}, {'id': 39094, 'key': '_reduced_stock', 'value': '1', 'display_key': '_reduced_stock', 'display_value': '1'}], 'sku': 'FS00055.L', 'price': 29.9, 'parent_name': 'xxxxxxxx + 30 filter'},
{'id': 5099, 'name': 'xxxxxxxx + 30 filter', 'product_id': 6546, 'variation_id': 3352, 'quantity': 1, 'tax_class': '', 'subtotal': '110.89', 'subtotal_tax': '0.00', 'total': '29.90', 'total_tax': '0.00', 'taxes': [], 'meta_data': [{'id': 39083, 'key': 'pa_size', 'value': 'l', 'display_key': 'Size', 'display_value': 'L'}, {'id': 39094, 'key': '_reduced_stock', 'value': '1', 'display_key': '_reduced_stock', 'display_value': '1'}], 'sku': 'FS00055.L', 'price': 29.9, 'parent_name': 'xxxxxxxx + 30 filter'}] """,
""" [{'id': 4100, 'name': 'xxxxxxxx + 30 filter', 'product_id': 6546, 'variation_id': 3352, 'quantity': 1, 'tax_class': '', 'subtotal': '110.89', 'subtotal_tax': '0.00', 'total': '29.90', 'total_tax': '0.00', 'taxes': [], 'meta_data': [{'id': 39083, 'key': 'pa_size', 'value': 'l', 'display_key': 'Size', 'display_value': 'L'}, {'id': 39094, 'key': '_reduced_stock', 'value': '1', 'display_key': '_reduced_stock', 'display_value': '1'}], 'sku': 'FS00055.L', 'price': 29.9, 'parent_name': 'xxxxxxxx + 30 filter'},
{'id': 5100, 'name': 'xxxxxxxx + 30 filter', 'product_id': 6546, 'variation_id': 3352, 'quantity': 1, 'tax_class': '', 'subtotal': '110.89', 'subtotal_tax': '0.00', 'total': '29.90', 'total_tax': '0.00', 'taxes': [], 'meta_data': [{'id': 39083, 'key': 'pa_size', 'value': 'l', 'display_key': 'Size', 'display_value': 'L'}, {'id': 39094, 'key': '_reduced_stock', 'value': '1', 'display_key': '_reduced_stock', 'display_value': '1'}], 'sku': 'FS00055.L', 'price': 29.9, 'parent_name': 'xxxxxxxx + 30 filter'}] """,
""" [{'id': 4101, 'name': 'xxxxxxxx + 30 filter', 'product_id': 6546, 'variation_id': 3352, 'quantity': 1, 'tax_class': '', 'subtotal': '110.89', 'subtotal_tax': '0.00', 'total': '29.90', 'total_tax': '0.00', 'taxes': [], 'meta_data': [{'id': 39083, 'key': 'pa_size', 'value': 'l', 'display_key': 'Size', 'display_value': 'L'}, {'id': 39094, 'key': '_reduced_stock', 'value': '1', 'display_key': '_reduced_stock', 'display_value': '1'}], 'sku': 'FS00055.L', 'price': 29.9, 'parent_name': 'xxxxxxxx + 30 filter'},
{'id': 5101, 'name': 'xxxxxxxx + 30 filter', 'product_id': 6546, 'variation_id': 3352, 'quantity': 1, 'tax_class': '', 'subtotal': '110.89', 'subtotal_tax': '0.00', 'total': '29.90', 'total_tax': '0.00', 'taxes': [], 'meta_data': [{'id': 39083, 'key': 'pa_size', 'value': 'l', 'display_key': 'Size', 'display_value': 'L'}, {'id': 39094, 'key': '_reduced_stock', 'value': '1', 'display_key': '_reduced_stock', 'display_value': '1'}], 'sku': 'FS00055.L', 'price': 29.9, 'parent_name': 'xxxxxxxx + 30 filter'}] """,
]
})
import dirtyjson
df['line_items'] = df['line_items'].apply(dirtyjson.loads)
#df['line_items'] = df['line_items'].str[0] # extract first item from list
df = df.explode('line_items').reset_index(drop=True) # extract all items from list
new_df = df['line_items'].apply(pd.Series)
print(new_df)
df = df.join(new_df)
del df['line_items']
print(df)
id name ... price parent_name
0 4099 xxxxxxxx + 30 filter ... 29.9 xxxxxxxx + 30 filter
1 5099 xxxxxxxx + 30 filter ... 29.9 xxxxxxxx + 30 filter
2 4100 xxxxxxxx + 30 filter ... 29.9 xxxxxxxx + 30 filter
3 5100 xxxxxxxx + 30 filter ... 29.9 xxxxxxxx + 30 filter
4 4101 xxxxxxxx + 30 filter ... 29.9 xxxxxxxx + 30 filter
5 5101 xxxxxxxx + 30 filter ... 29.9 xxxxxxxx + 30 filter
[6 rows x 15 columns]

remove repeated values in dictionary

I want to remove the repeated value in a dictionary after I extracted the needed data which is 'rate' and 'genre'
a=[{'movie': 'abc', 'rate': '9', 'origin': 'AU', 'genre': 'horror'},
{'movie': 'xyz', 'rate': '7', 'origin': 'NY', 'genre': 'romance'},
{'movie': 'jkl', 'rate': '9', 'origin': 'HK', 'genre': 'horror'},
{'movie': 'qwe', 'rate': '6', 'origin': 'HK', 'genre': 'comedy'},
{'movie': 'vbn', 'rate': '9', 'origin': 'BKK', 'genre': 'romance'}]
needed_data=[]
for test in a:
x={}
word=['rate','genre']
for key,value in test.items():
for words in word:
if key == words:
x[key] = value
needed_data.append(x)
results = {}
filters=[]
for yy in needed_data:
for key,value in yy.items():
if value not in results.values():
results[key] = value
filters.append(results)
print(filters)
the output from above code is
[{'rate': '9', 'genre': 'romance'},
{'rate': '9', 'genre': 'romance'},
{'rate': '9', 'genre': 'romance'},
{'rate': '9', 'genre': 'romance'},
{'rate': '9', 'genre': 'romance'}]
my desired output would be
[{'rate': '9', 'genre': 'horror'},
{'rate': '7', 'genre': 'romance'},
{'rate': '6', 'genre': 'comedy'},
{'rate': '9', 'genre': 'romance'}]
I would recommend to use pandas for data processing
import pandas as pd
df = pd.DataFrame(a)
df_dd= df[["genre", "rate"]].drop_duplicates()
new_a = df_dd.to_dict(orient="records")
print(new_a)
Output
[{'genre': 'horror', 'rate': '9.'},
{'genre': 'romance', 'rate': '7'},
{'genre': 'horror', 'rate': '9'},
{'genre': 'comedy', 'rate': '6'},
{'genre': 'romance', 'rate': '9'}]
Your data has strings '9.' and '9' Do you want it that way?
z = {f"{float(x['rate']):.2f}-{x['genre']}": x for x in needed_data}
list(z.values())
Output
[{'rate': '9', 'genre': 'horror'},
{'rate': '7', 'genre': 'romance'},
{'rate': '6', 'genre': 'comedy'},
{'rate': '9', 'genre': 'romance'}]
This is the easy way to do your task:
a=[{'movie': 'abc', 'rate': '9.', 'origin': 'AU', 'genre': 'horror'},
{'movie': 'xyz', 'rate': '7', 'origin': 'NY', 'genre': 'romance'},
{'movie': 'jkl', 'rate': '9', 'origin': 'HK', 'genre': 'horror'},
{'movie': 'qwe', 'rate': '6', 'origin': 'HK', 'genre': 'comedy'},
{'movie': 'vbn', 'rate': '9', 'origin': 'BKK', 'genre': 'romance'}]
c = []
for b in a:
c.append({'rate':b['rate'],'genre':b['genre'] })
print(c)
So the Output will be:
[{'rate': '9.', 'genre': 'horror'}, {'rate': '7', 'genre': 'romance'}, {'rate': '9', 'genre': 'horror'}, {'rate': '6', 'genre': 'comedy'}, {'rate': '9', 'genre': 'romance'}]

Finding missing value in JSON using python

I am facing this problem, I want to separate the dataset that has completed and not complete.
So, I want to put flag like 'complete' in the JSON. Example as in output.
This is the data that i have
data=[{'id': 'abc001',
'demo':{'gender':'1',
'job':'6',
'area':'3',
'study':'3'},
'ex_data':{'fam':'small',
'scholar':'2'}},
{'id': 'abc002',
'demo':{'gender':'1',
'edu':'6',
'qual':'3',
'living':'3'},
'ex_data':{'fam':'',
'scholar':''}},
{'id': 'abc003',
'demo':{'gender':'1',
'edu':'6',
'area':'3',
'sal':'3'}
'ex_data':{'fam':'big',
'scholar':NaN}}]
Output
How can I put the flag and also detect NaN and NULL in JSON?
Output=[{'id': 'abc001',
'completed':'yes',
'demo':{'gender':'1',
'job':'6',
'area':'3',
'study':'3'},
'ex_data':{'fam':'small',
'scholar':'2'}},
{'id': 'abc002',
'completed':'no',
'demo':{'gender':'1',
'edu':'6',
'qual':'3',
'living':'3'},
'ex_data':{'fam':'',
'scholar':''}},
{'id': 'abc003',
'completed':'no',
'demo':{'gender':'1',
'edu':'6',
'area':'3',
'sal':'3'}
'ex_data':{'fam':'big',
'scholar':NaN}}]
Something like this should work for you:
data = [
{
'id': 'abc001',
'demo': {
'gender': '1',
'job': '6',
'area': '3',
'study': '3'},
'ex_data': {'fam': 'small',
'scholar': '2'}
},
{
'id': 'abc002',
'demo': {
'gender': '1',
'edu': '6',
'qual': '3',
'living': '3'},
'ex_data': {'fam': '',
'scholar': ''}},
{
'id': 'abc003',
'demo': {
'gender': '1',
'edu': '6',
'area': '3',
'sal': '3'},
'ex_data': {'fam': 'big',
'scholar': None}
}
]
def browse_dict(dico):
empty_values = 0
for key in dico:
if dico[key] is None or dico[key] == "":
empty_values += 1
if isinstance(dico[key], dict):
for k in dico[key]:
if dico[key][k] is None or dico[key][k] == "":
empty_values += 1
if empty_values == 0:
dico["completed"] = "yes"
else:
dico["completed"] = "no"
for d in data:
browse_dict(d)
print(d)
Output :
{'id': 'abc001', 'demo': {'gender': '1', 'job': '6', 'area': '3', 'study': '3'}, 'ex_data': {'fam': 'small', 'scholar': '2'}, 'completed': 'yes'}
{'id': 'abc002', 'demo': {'gender': '1', 'edu': '6', 'qual': '3', 'living': '3'}, 'ex_data': {'fam': '', 'scholar': ''}, 'completed': 'no'}
{'id': 'abc003', 'demo': {'gender': '1', 'edu': '6', 'area': '3', 'sal': '3'}, 'ex_data': {'fam': 'big', 'scholar': None}, 'completed': 'no'}
Note that I changed NaN to None, because here you are most likely showing a python dictionary, not a JSON file since you are using data =
In a dictionary, the NaN value would be changed for None.
If you have to convert your JSON to a dictionary, refer to the JSON module documentation.
Also please check your dictionary syntax. You missed several commas to separate data.
You should try
The Input is
data = [{'demo': {'gender': '1', 'job': '6', 'study': '3', 'area': '3'}, 'id': 'abc001', 'ex_data': {'scholar': '2', 'fam': 'small'}}, {'demo': {'living': '3', 'gender': '1', 'qual': '3', 'edu': '6'}, 'id': 'abc002', 'ex_data': {'scholar': '', 'fam': ''}}, {'demo': {'gender': '1', 'area': '3', 'sal': '3', 'edu': '6'}, 'id': 'abc003', 'ex_data': {'scholar': None, 'fam': 'big'}}]
Also, Nan will not work in Python. So, instead of Nan we have used None.
for item in data:
item["completed"] = 'yes'
for key in item.keys():
if isinstance(item[key],dict):
for inner_key in item[key].keys():
if (not item[key][inner_key]):
item["completed"] = "no"
break
else:
if (not item[key]):
item["completed"] = "no"
break
The Output will be
data = [{'demo': {'gender': '1', 'job': '6', 'study': '3', 'area': '3'}, 'completed': 'yes', 'id': 'abc001', 'ex_data': {'scholar': '2', 'fam': 'small'}}, {'demo': {'living': '3', 'edu': '6', 'qual': '3', 'gender': '1'}, 'completed': 'no', 'id': 'abc002', 'ex_data': {'scholar': '', 'fam': ''}}, {'demo': {'edu': '6', 'gender': '1', 'sal': '3', 'area': '3'}, 'completed': 'no', 'id': 'abc003', 'ex_data': {'scholar': None, 'fam': 'big'}}]

Convert list-of-dicts into tree

For two days I try to traverse a list of dicts into a tree.
`list_of_dicts = [
{'name':Category1, 'id': '7', 'parent_id': '7', 'level': '1'}
{'name':Category3, 'id': '33', 'parent_id': '7', 'level': '2'}
{'name':Category5, 'id': '334', 'parent_id': '33', 'level': '3'}
{'name':Category10, 'id': '23', 'parent_id': '7', 'level': '2'}
{'name':Category2, 'id': '8', 'parent_id': '8', 'level': '1'}
{'name':Category6, 'id': '24', 'parent_id': '8', 'level': '2'}
]`
As informations, we know a category on top level (1), has its own id as its parent_id, children have the id of its parent as parent_id and the level.
In a first step the list need to turn in something like a tree:
`traversed_list = [
{'name':Category1, 'id': '7', 'parent_id': '7', 'level': '1', 'children':
[
{'name':Category3, 'id': '33', 'parent_id': '7', 'level': '2', 'children': [
{'name':Category5, 'id': '334', 'parent_id': '33', 'level': '3', 'children':[]}]}
{'name':Category10, 'id': '23', 'parent_id': '7', 'level': '2', 'children':[]}
]}
{'name':Category2, 'id': '8', 'parent_id': '8', 'level': '1', 'children':
[{'name':Category6, 'id': '24', 'parent_id': '8', 'level': '2', 'children':[]}]
}]`
The following code:
import copy
def treeify(lst):
tree = [copy.deepcopy(cat) for cat in lst if cat['level'] == '1']
for el in tree:
el["children"] = []
for i in xrange(len(lst)):
for j in xrange(len(tree)):
if lst[i]["parent_id"] == tree[j]["id"]:
tree[j]["children"].append(copy.deepcopy(lst[i]))
return tree
list_of_dicts = [
{'name':"Category1", 'id': '7', 'parent_id': '7', 'level': '1'},
{'name':"Category3", 'id': '33', 'parent_id': '7', 'level': '2'},
{'name':"Category5", 'id': '334', 'parent_id': '33', 'level': '3'},
{'name':"Category10", 'id': '23', 'parent_id': '7', 'level': '2'},
{'name':"Category2", 'id': '8', 'parent_id': '8', 'level': '1'},
{'name':"Category6", 'id': '24', 'parent_id': '8', 'level': '2'}
]
tree = treeify(list_of_dicts)
for d in tree:
print d
prints
{'id': '7', 'parent_id': '7', 'children': [{'id': '7', 'parent_id': '7', 'name': 'Category1', 'level': '1'}, {'id': '33', 'parent_id': '7', 'name': 'Category3', 'level': '2'}, {'id': '23', 'parent_id': '7', 'name': 'Category10', 'level': '2'}], 'name': 'Category1', 'level': '1'}
{'id': '8', 'parent_id': '8', 'children': [{'id': '8', 'parent_id': '8', 'name': 'Category2', 'level': '1'}, {'id': '24', 'parent_id': '8', 'name': 'Category6', 'level': '2'}], 'name': 'Category2', 'level': '1'}

Categories