I have the following pandas dataframe that was converted to string with to_string().
It was printed like this:
S T Q U X A D
02:36 06:00 06:00 06:00 06:30 09:46 07:56
02:37 06:10 06:15 06:15 06:40 09:48 08:00
12:00 11:00 12:00 12:00 07:43 12:00 18:03
13:15 13:00 13:15 13:15 07:50 13:15 18:08
14:00 14:00 14:00 14:00 14:00 19:00
15:15 15:00 14:15 15:15 15:15 19:05
16:15 16:00 15:15 16:15 16:15 20:15
17:15 17:00 17:15 17:15 17:15 20:17
18:15 21:22 21:19 19:55 18:15 20:18
19:15 21:24 21:21 19:58 19:15 20:19
The gaps are due to empty values in the dataframe. I would like to keep the column alignment, perhaps by replacing the empty values with tabs. I would also like to center align the header line.
This wasn't printed in a terminal, but was sent over telegram with the requests post command. I think though, it is just a print formatting problem, independent of the telegram requests library.
The desired output would be like this:
S T Q U X A D
02:36 06:00 06:00 06:00 06:30 09:46 07:56
02:37 06:10 06:15 06:15 06:40 09:48 08:00
12:00 11:00 12:00 12:00 07:43 12:00 18:03
13:15 13:00 13:15 13:15 07:50 13:15 18:08
14:00 14:00 14:00 14:00 14:00 19:00
15:15 15:00 14:15 15:15 15:15 19:05
16:15 16:00 15:15 16:15 16:15 20:15
17:15 17:00 17:15 17:15 17:15 20:17
18:15 21:22 21:19 19:55 18:15 20:18
19:15 21:24 21:21 19:58 19:15 20:19
you can use dataframe style.set_properties to set some of these options like:
df.style.set_properties(**{'text-align': 'center'})
read more here:
https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.set_properties.html
Related
I'm trying to extract a table from a webpage and have tried a number of alternatives, but the table always seems to remain empty.
Two of what I thought were the most promising sets of code are attached below. Any means of extracting the data from the webpage would be considered as helpful. I have also included a screenshot of the table I want to extract.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome()
browser.set_window_size(1120, 550)
# Create an URL object
url = 'https://www.flightradar24.com/data/aircraft/ja11jc'
browser.get(url)
element = WebDriverWait(browser, 3).until(
EC.presence_of_element_located((By.ID, "tbl-datatable"))
)
data = element.get_attribute('tbl-datatable')
print(data)
browser.quit()
or alternatively,
# Import libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Create an URL object
url = 'https://www.flightradar24.com/data/aircraft/ja11jc'
# Create object page
page = requests.get(url)
# parser-lxml = Change html to Python friendly format
# Obtain page's information
soup = BeautifulSoup(page.text, 'lxml')
soup
# Obtain information from tag <table>
table1 = soup.find("table", id='tbl-datatable')
table1
# Obtain every title of columns with tag <th>
headers = []
for i in table1.find_all('th'):
title = i.text
headers.append(title)
# Create a dataframe
mydata = pd.DataFrame(columns = headers)
# Create a for loop to fill mydata
for j in table1.find_all('tr')[1:]:
row_data = j.find_all('td')
row = [i.text for i in row_data]
length = len(mydata)
mydata.loc[length] = row
Best practice is and first shot scraping table data should go with pandas.read_html(), it works in most cases, needs adjustments in some cases and only fails in specific ones.
Issue here is, that a user-agent is needed with requests to avoid the 403, so we have to help pandas with that:
requests.get('http://www.flightradar24.com/data/aircraft/ja11jc',
headers={'User-Agent': 'some user agent string'}).text
)[0]
Now the table could be scraped, but have to be transformed a bit, cause that is what the browser will do while rendering - .dropna(axis=1) drops columns with NaN values and [:-1] slices the last row, that contains non relevant information:
requests.get('http://www.flightradar24.com/data/aircraft/ja11jc',
headers={'User-Agent': 'some user agent string'}).text
)[0].dropna(axis=1)[:-1]
You could also use selenium give it some time.sleep(3) while browser renders table in final form and process the driver.page_source but in my opinion this is a bit to much, in this case.
Example
import pandas as pd
import requests
df = pd.read_html(
requests.get('http://www.flightradar24.com/data/aircraft/ja11jc',
headers={'User-Agent': 'some user agent string'}).text
)[0].dropna(axis=1)[:-1]
df.columns = ['DATE','FROM', 'TO', 'FLIGHT', 'FLIGHT TIME', 'STD', 'ATD', 'STA','STATUS']
df
Output
DATE
FROM
TO
FLIGHT
FLIGHT TIME
STD
ATD
STA
STATUS
0
10 Dec 2022
Tokunoshima (TKN)
Kagoshima (KOJ)
JL3798
—
10:00
—
11:10
Scheduled
1
10 Dec 2022
Amami (ASJ)
Tokunoshima (TKN)
JL3843
—
08:55
—
09:30
Scheduled
...
...
...
...
...
...
...
...
...
...
58
03 Dec 2022
Amami (ASJ)
Kagoshima (KOJ)
JL3724
0:56
01:45
02:02
02:50
Landed 02:58
59
03 Dec 2022
Kagoshima (KOJ)
Amami (ASJ)
JL3725
1:06
00:00
00:09
01:15
Landed 01:14
Try this
import requests
import pandas as pd
response = requests.get('https://www.flightradar24.com/data/aircraft/ja11jc', headers={'User-agent': 'Mozilla/5.0'})
df = pd.read_html(response.text)[0][:-1]
df = df.dropna(axis=1, how='all')
print(df.to_string(index=False))
OUTPUT:
JL3798 10 Dec 2022 - Scheduled STD 10:00 ATD — STA 11:10 FROM Tokunoshima (TKN) TO Kagoshima (KOJ) 10 Dec 2022 Tokunoshima (TKN) Kagoshima (KOJ) JL3798 — 10:00 — 11:10 Scheduled Play
JL3843 10 Dec 2022 - Scheduled STD 08:55 ATD — STA 09:30 FROM Amami (ASJ) TO Tokunoshima (TKN) 10 Dec 2022 Amami (ASJ) Tokunoshima (TKN) JL3843 — 08:55 — 09:30 Scheduled Play
JL3844 10 Dec 2022 - Scheduled STD 07:45 ATD — STA 08:15 FROM Tokunoshima (TKN) TO Amami (ASJ) 10 Dec 2022 Tokunoshima (TKN) Amami (ASJ) JL3844 — 07:45 — 08:15 Scheduled Play
JL3710 10 Dec 2022 - Scheduled STD 06:45 ATD — STA 07:15 FROM Okierabu (OKE) TO Tokunoshima (TKN) 10 Dec 2022 Okierabu (OKE) Tokunoshima (TKN) JL3710 — 06:45 — 07:15 Scheduled Play
JL3715 10 Dec 2022 - Estimated departure 05:25 STD 05:25 ATD — STA 06:15 FROM Okinawa (OKA) TO Okierabu (OKE) 10 Dec 2022 Okinawa (OKA) Okierabu (OKE) JL3715 — 05:25 — 06:15 Estimated departure 05:25 Play
JL3716 10 Dec 2022 - Scheduled STD 03:55 ATD — STA 04:45 FROM Okierabu (OKE) TO Okinawa (OKA) 10 Dec 2022 Okierabu (OKE) Okinawa (OKA) JL3716 — 03:55 — 04:45 Scheduled Play
JL3711 10 Dec 2022 - Scheduled STD 02:55 ATD — STA 03:25 FROM Tokunoshima (TKN) TO Okierabu (OKE) 10 Dec 2022 Tokunoshima (TKN) Okierabu (OKE) JL3711 — 02:55 — 03:25 Scheduled Play
JL3841 10 Dec 2022 - Scheduled STD 01:50 ATD — STA 02:25 FROM Amami (ASJ) TO Tokunoshima (TKN) 10 Dec 2022 Amami (ASJ) Tokunoshima (TKN) JL3841 — 01:50 — 02:25 Scheduled Play
JL3842 10 Dec 2022 - Scheduled STD 00:35 ATD — STA 01:05 FROM Tokunoshima (TKN) TO Amami (ASJ) 10 Dec 2022 Tokunoshima (TKN) Amami (ASJ) JL3842 — 00:35 — 01:05 Scheduled Play
JL3791 09 Dec 2022 - Estimated departure 22:45 STD 22:45 ATD — STA 00:05 FROM Kagoshima (KOJ) TO Tokunoshima (TKN) 09 Dec 2022 Kagoshima (KOJ) Tokunoshima (TKN) JL3791 — 22:45 — 00:05 Estimated departure 22:45 Play
JL3734 09 Dec 2022 0:57 Landed 09:38 STD 08:40 ATD 08:41 STA 09:45 FROM Amami (ASJ) TO Kagoshima (KOJ) 09 Dec 2022 Amami (ASJ) Kagoshima (KOJ) JL3734 0:57 08:40 08:41 09:45 Landed 09:38 KML CSV Play
JL3836 09 Dec 2022 0:15 Landed 07:52 STD 07:40 ATD 07:37 STA 08:00 FROM Kikai (KKX) TO Amami (ASJ) 09 Dec 2022 Kikai (KKX) Amami (ASJ) JL3836 0:15 07:40 07:37 08:00 Landed 07:52 KML CSV Play
JL3837 09 Dec 2022 0:11 Landed 07:00 STD 06:50 ATD 06:49 STA 07:10 FROM Amami (ASJ) TO Kikai (KKX) 09 Dec 2022 Amami (ASJ) Kikai (KKX) JL3837 0:11 06:50 06:49 07:10 Landed 07:00 KML CSV Play
JL3867 09 Dec 2022 0:46 Landed 05:45 STD 04:50 ATD 04:59 STA 05:50 FROM Okinawa (OKA) TO Amami (ASJ) 09 Dec 2022 Okinawa (OKA) Amami (ASJ) JL3867 0:46 04:50 04:59 05:50 Landed 05:45 KML CSV Play
JL3866 09 Dec 2022 0:30 Landed 04:15 STD 03:25 ATD 03:45 STA 04:10 FROM Yoronjima (RNJ) TO Okinawa (OKA) 09 Dec 2022 Yoronjima (RNJ) Okinawa (OKA) JL3866 0:30 03:25 03:45 04:10 Landed 04:15 KML CSV Play
JL3861 09 Dec 2022 0:35 Landed 02:59 STD 02:10 ATD 02:24 STA 02:55 FROM Amami (ASJ) TO Yoronjima (RNJ) 09 Dec 2022 Amami (ASJ) Yoronjima (RNJ) JL3861 0:35 02:10 02:24 02:55 Landed 02:59 KML CSV Play
JL3803 09 Dec 2022 - Unknown STD 01:12 ATD 01:12 STA 01:28 FROM Kikai (KKX) TO Amami (ASJ) 09 Dec 2022 Kikai (KKX) Amami (ASJ) JL3803 — 01:12 01:12 01:28 Unknown KML CSV Play
JL3830 09 Dec 2022 - Estimated departure 01:12 STD 01:10 ATD — STA 01:30 FROM Kikai (KKX) TO Amami (ASJ) 09 Dec 2022 Kikai (KKX) Amami (ASJ) JL3830 — 01:10 — 01:30 Estimated departure 01:12 Play
JL3831 09 Dec 2022 0:12 Landed 00:30 STD 00:20 ATD 00:18 STA 00:40 FROM Amami (ASJ) TO Kikai (KKX) 09 Dec 2022 Amami (ASJ) Kikai (KKX) JL3831 0:12 00:20 00:18 00:40 Landed 00:30 KML CSV Play
JL3721 08 Dec 2022 0:59 Landed 23:34 STD 22:25 ATD 22:35 STA 23:40 FROM Kagoshima (KOJ) TO Amami (ASJ) 08 Dec 2022 Kagoshima (KOJ) Amami (ASJ) JL3721 0:59 22:25 22:35 23:40 Landed 23:34 KML CSV Play
JL3734 08 Dec 2022 0:55 Landed 09:35 STD 08:40 ATD 08:41 STA 09:45 FROM Amami (ASJ) TO Kagoshima (KOJ) 08 Dec 2022 Amami (ASJ) Kagoshima (KOJ) JL3734 0:55 08:40 08:41 09:45 Landed 09:35 KML CSV Play
JL3836 08 Dec 2022 0:09 Landed 07:51 STD 07:40 ATD 07:42 STA 08:00 FROM Kikai (KKX) TO Amami (ASJ) 08 Dec 2022 Kikai (KKX) Amami (ASJ) JL3836 0:09 07:40 07:42 08:00 Landed 07:51 KML CSV Play
JL3837 08 Dec 2022 0:09 Landed 07:03 STD 06:50 ATD 06:54 STA 07:10 FROM Amami (ASJ) TO Kikai (KKX) 08 Dec 2022 Amami (ASJ) Kikai (KKX) JL3837 0:09 06:50 06:54 07:10 Landed 07:03 KML CSV Play
JL3867 08 Dec 2022 0:48 Landed 05:41 STD 04:50 ATD 04:53 STA 05:50 FROM Okinawa (OKA) TO Amami (ASJ) 08 Dec 2022 Okinawa (OKA) Amami (ASJ) JL3867 0:48 04:50 04:53 05:50 Landed 05:41 KML CSV Play
JL3866 08 Dec 2022 0:25 Landed 04:01 STD 03:25 ATD 03:36 STA 04:10 FROM Yoronjima (RNJ) TO Okinawa (OKA) 08 Dec 2022 Yoronjima (RNJ) Okinawa (OKA) JL3866 0:25 03:25 03:36 04:10 Landed 04:01 KML CSV Play
JL3861 08 Dec 2022 0:32 Landed 02:55 STD 02:10 ATD 02:24 STA 02:55 FROM Amami (ASJ) TO Yoronjima (RNJ) 08 Dec 2022 Amami (ASJ) Yoronjima (RNJ) JL3861 0:32 02:10 02:24 02:55 Landed 02:55 KML CSV Play
JL3830 08 Dec 2022 0:09 Landed 01:20 STD 01:10 ATD 01:11 STA 01:30 FROM Kikai (KKX) TO Amami (ASJ) 08 Dec 2022 Kikai (KKX) Amami (ASJ) JL3830 0:09 01:10 01:11 01:30 Landed 01:20 KML CSV Play
JL3831 08 Dec 2022 0:08 Landed 00:28 STD 00:20 ATD 00:20 STA 00:40 FROM Amami (ASJ) TO Kikai (KKX) 08 Dec 2022 Amami (ASJ) Kikai (KKX) JL3831 0:08 00:20 00:20 00:40 Landed 00:28 KML CSV Play
JL3721 07 Dec 2022 0:57 Landed 23:29 STD 22:25 ATD 22:31 STA 23:40 FROM Kagoshima (KOJ) TO Amami (ASJ) 07 Dec 2022 Kagoshima (KOJ) Amami (ASJ) JL3721 0:57 22:25 22:31 23:40 Landed 23:29 KML CSV Play
JL3772 07 Dec 2022 0:30 Landed 09:32 STD 09:00 ATD 09:02 STA 09:30 FROM Tanegashima (TNE) TO Kagoshima (KOJ) 07 Dec 2022 Tanegashima (TNE) Kagoshima (KOJ) JL3772 0:30 09:00 09:02 09:30 Landed 09:32 KML CSV Play
JL3777 07 Dec 2022 0:27 Landed 08:25 STD 07:50 ATD 07:58 STA 08:30 FROM Kagoshima (KOJ) TO Tanegashima (TNE) 07 Dec 2022 Kagoshima (KOJ) Tanegashima (TNE) JL3777 0:27 07:50 07:58 08:30 Landed 08:25 KML CSV Play
JL3784 07 Dec 2022 0:55 Landed 07:13 STD 06:05 ATD 06:18 STA 07:10 FROM Kikai (KKX) TO Kagoshima (KOJ) 07 Dec 2022 Kikai (KKX) Kagoshima (KOJ) JL3784 0:55 06:05 06:18 07:10 Landed 07:13 KML CSV Play
JL3785 07 Dec 2022 1:00 Landed 05:32 STD 04:25 ATD 04:31 STA 05:35 FROM Kagoshima (KOJ) TO Kikai (KKX) 07 Dec 2022 Kagoshima (KOJ) Kikai (KKX) JL3785 1:00 04:25 04:31 05:35 Landed 05:32 KML CSV Play
JL3762 07 Dec 2022 0:26 Landed 03:37 STD 03:10 ATD 03:10 STA 03:45 FROM Tanegashima (TNE) TO Kagoshima (KOJ) 07 Dec 2022 Tanegashima (TNE) Kagoshima (KOJ) JL3762 0:26 03:10 03:10 03:45 Landed 03:37 KML CSV Play
JL3763 07 Dec 2022 0:24 Landed 02:32 STD 02:00 ATD 02:08 STA 02:40 FROM Kagoshima (KOJ) TO Tanegashima (TNE) 07 Dec 2022 Kagoshima (KOJ) Tanegashima (TNE) JL3763 0:24 02:00 02:08 02:40 Landed 02:32 KML CSV Play
JL3780 07 Dec 2022 0:51 Landed 01:13 STD 00:15 ATD 00:22 STA 01:20 FROM Kikai (KKX) TO Kagoshima (KOJ) 07 Dec 2022 Kikai (KKX) Kagoshima (KOJ) JL3780 0:51 00:15 00:22 01:20 Landed 01:13 KML CSV Play
JL3783 06 Dec 2022 1:00 Landed 23:45 STD 22:35 ATD 22:45 STA 23:45 FROM Kagoshima (KOJ) TO Kikai (KKX) 06 Dec 2022 Kagoshima (KOJ) Kikai (KKX) JL3783 1:00 22:35 22:45 23:45 Landed 23:45 KML CSV Play
JL3772 06 Dec 2022 0:25 Landed 09:15 STD 09:00 ATD 08:50 STA 09:30 FROM Tanegashima (TNE) TO Kagoshima (KOJ) 06 Dec 2022 Tanegashima (TNE) Kagoshima (KOJ) JL3772 0:25 09:00 08:50 09:30 Landed 09:15 KML CSV Play
JL3777 06 Dec 2022 0:25 Landed 08:15 STD 07:50 ATD 07:50 STA 08:30 FROM Kagoshima (KOJ) TO Tanegashima (TNE) 06 Dec 2022 Kagoshima (KOJ) Tanegashima (TNE) JL3777 0:25 07:50 07:50 08:30 Landed 08:15 KML CSV Play
JL3784 06 Dec 2022 0:55 Landed 07:09 STD 06:05 ATD 06:14 STA 07:10 FROM Kikai (KKX) TO Kagoshima (KOJ) 06 Dec 2022 Kikai (KKX) Kagoshima (KOJ) JL3784 0:55 06:05 06:14 07:10 Landed 07:09 KML CSV Play
JL3785 06 Dec 2022 0:59 Landed 05:34 STD 04:25 ATD 04:35 STA 05:35 FROM Kagoshima (KOJ) TO Kikai (KKX) 06 Dec 2022 Kagoshima (KOJ) Kikai (KKX) JL3785 0:59 04:25 04:35 05:35 Landed 05:34 KML CSV Play
JL3762 06 Dec 2022 0:22 Landed 03:48 STD 03:10 ATD 03:26 STA 03:45 FROM Tanegashima (TNE) TO Kagoshima (KOJ) 06 Dec 2022 Tanegashima (TNE) Kagoshima (KOJ) JL3762 0:22 03:10 03:26 03:45 Landed 03:48 KML CSV Play
JL3763 06 Dec 2022 0:24 Landed 02:47 STD 02:00 ATD 02:23 STA 02:40 FROM Kagoshima (KOJ) TO Tanegashima (TNE) 06 Dec 2022 Kagoshima (KOJ) Tanegashima (TNE) JL3763 0:24 02:00 02:23 02:40 Landed 02:47 KML CSV Play
JL3780 06 Dec 2022 0:54 Landed 01:38 STD 00:15 ATD 00:44 STA 01:20 FROM Kikai (KKX) TO Kagoshima (KOJ) 06 Dec 2022 Kikai (KKX) Kagoshima (KOJ) JL3780 0:54 00:15 00:44 01:20 Landed 01:38 KML CSV Play
JL3783 05 Dec 2022 1:04 Landed 23:45 STD 22:35 ATD 22:41 STA 23:45 FROM Kagoshima (KOJ) TO Kikai (KKX) 05 Dec 2022 Kagoshima (KOJ) Kikai (KKX) JL3783 1:04 22:35 22:41 23:45 Landed 23:45 KML CSV Play
JL3686 05 Dec 2022 0:53 Landed 12:09 STD 11:10 ATD 11:16 STA 12:10 FROM Matsuyama (MYJ) TO Kagoshima (KOJ) 05 Dec 2022 Matsuyama (MYJ) Kagoshima (KOJ) JL3686 0:53 11:10 11:16 12:10 Landed 12:09 KML CSV Play
JL3687 05 Dec 2022 0:42 Landed 10:44 STD 09:45 ATD 10:02 STA 10:40 FROM Kagoshima (KOJ) TO Matsuyama (MYJ) 05 Dec 2022 Kagoshima (KOJ) Matsuyama (MYJ) JL3687 0:42 09:45 10:02 10:40 Landed 10:44 KML CSV Play
JL3808 05 Dec 2022 1:07 Landed 09:22 STD 07:50 ATD 08:15 STA 09:05 FROM Okierabu (OKE) TO Kagoshima (KOJ) 05 Dec 2022 Okierabu (OKE) Kagoshima (KOJ) JL3808 1:07 07:50 08:15 09:05 Landed 09:22 KML CSV Play
JL3809 05 Dec 2022 1:28 Landed 07:31 STD 05:55 ATD 06:03 STA 07:20 FROM Kagoshima (KOJ) TO Okierabu (OKE) 05 Dec 2022 Kagoshima (KOJ) Okierabu (OKE) JL3809 1:28 05:55 06:03 07:20 Landed 07:31 KML CSV Play
JL3785 05 Dec 2022 - Canceled STD 04:25 ATD — STA 05:35 FROM Kagoshima (KOJ) TO Kikai (KKX) 05 Dec 2022 Kagoshima (KOJ) Kikai (KKX) JL3785 — 04:25 — 05:35 Canceled Play
JL3762 05 Dec 2022 0:25 Landed 03:49 STD 03:10 ATD 03:24 STA 03:45 FROM Tanegashima (TNE) TO Kagoshima (KOJ) 05 Dec 2022 Tanegashima (TNE) Kagoshima (KOJ) JL3762 0:25 03:10 03:24 03:45 Landed 03:49 KML CSV Play
JL3763 05 Dec 2022 0:28 Landed 02:48 STD 02:00 ATD 02:20 STA 02:40 FROM Kagoshima (KOJ) TO Tanegashima (TNE) 05 Dec 2022 Kagoshima (KOJ) Tanegashima (TNE) JL3763 0:28 02:00 02:20 02:40 Landed 02:48 KML CSV Play
JL3780 05 Dec 2022 0:51 Landed 01:34 STD 00:15 ATD 00:42 STA 01:20 FROM Kikai (KKX) TO Kagoshima (KOJ) 05 Dec 2022 Kikai (KKX) Kagoshima (KOJ) JL3780 0:51 00:15 00:42 01:20 Landed 01:34 KML CSV Play
JL3783 04 Dec 2022 1:03 Landed 23:45 STD 22:35 ATD 22:42 STA 23:45 FROM Kagoshima (KOJ) TO Kikai (KKX) 04 Dec 2022 Kagoshima (KOJ) Kikai (KKX) JL3783 1:03 22:35 22:42 23:45 Landed 23:45 KML CSV Play
JL3464 04 Dec 2022 0:57 Landed 08:10 STD 07:10 ATD 07:13 STA 08:15 FROM Amami (ASJ) TO Kagoshima (KOJ) 04 Dec 2022 Amami (ASJ) Kagoshima (KOJ) JL3464 0:57 07:10 07:13 08:15 Landed 08:10 KML CSV Play
JL3465 04 Dec 2022 1:05 Landed 06:31 STD 05:25 ATD 05:26 STA 06:40 FROM Kagoshima (KOJ) TO Amami (ASJ) 04 Dec 2022 Kagoshima (KOJ) Amami (ASJ) JL3465 1:05 05:25 05:26 06:40 Landed 06:31 KML CSV Play
JL3724 04 Dec 2022 0:53 Landed 02:43 STD 01:45 ATD 01:50 STA 02:50 FROM Amami (ASJ) TO Kagoshima (KOJ) 04 Dec 2022 Amami (ASJ) Kagoshima (KOJ) JL3724 0:53 01:45 01:50 02:50 Landed 02:43 KML CSV Play
JL3725 04 Dec 2022 0:56 Landed 01:02 STD 00:00 ATD 00:05 STA 01:15 FROM Kagoshima (KOJ) TO Amami (ASJ) 04 Dec 2022 Kagoshima (KOJ) Amami (ASJ) JL3725 0:56 00:00 00:05 01:15 Landed 01:02 KML CSV Play
JL3724 03 Dec 2022 0:56 Landed 02:58 STD 01:45 ATD 02:02 STA 02:50 FROM Amami (ASJ) TO Kagoshima (KOJ) 03 Dec 2022 Amami (ASJ) Kagoshima (KOJ) JL3724 0:56 01:45 02:02 02:50 Landed 02:58 KML CSV Play
JL3725 03 Dec 2022 1:06 Landed 01:14 STD 00:00 ATD 00:09 STA 01:15 FROM Kagoshima (KOJ) TO Amami (ASJ) 03 Dec 2022 Kagoshima (KOJ) Amami (ASJ) JL3725 1:06 00:00 00:09 01:15 Landed 01:14 KML CSV Play
I have a dataframe with thousands of rows, there is a column which is datetime:
I would like to adjust the time, a little like 00 ± 15 -> 00, and 30±15 ->30.
More precise saying is the minute within the range 46<->15 will change to 00, 16<->45 will change to 30, but it also needs care ± 1 on the hour
datetime
2022/11/15 00:29
2022/11/15 00:29
2022/11/15 00:29
2022/11/15 00:59
2022/11/15 00:59
2022/11/15 00:59
2022/11/15 01:35
2022/11/15 01:35
2022/11/15 01:35
2022/11/15 02:01
2022/11/15 02:01
2022/11/15 02:01
2022/11/15 02:45
2022/11/15 02:45
2022/11/15 02:45
2022/11/15 02:48
2022/11/15 02:48
2022/11/15 02:48
After adjustment, it would become
datetime
2022/11/15 00:30
2022/11/15 00:30
2022/11/15 00:30
2022/11/15 01:00
2022/11/15 01:00
2022/11/15 01:00
2022/11/15 01:30
2022/11/15 01:30
2022/11/15 01:30
2022/11/15 02:00
2022/11/15 02:00
2022/11/15 02:00
2022/11/15 02:30
2022/11/15 02:30
2022/11/15 02:30
2022/11/15 03:00
2022/11/15 03:00
2022/11/15 03:00
Use Series.dt.ceil by 15 minutes and then Series.dt.floor by 30:
df['datetime'] = pd.to_datetime(df['datetime']).dt.ceil('15Min').dt.floor('30Min')
print (df)
datetime
0 2022-11-15 00:30:00
1 2022-11-15 00:30:00
2 2022-11-15 00:30:00
3 2022-11-15 01:00:00
4 2022-11-15 01:00:00
5 2022-11-15 01:00:00
6 2022-11-15 01:30:00
7 2022-11-15 01:30:00
8 2022-11-15 01:30:00
9 2022-11-15 02:00:00
10 2022-11-15 02:00:00
11 2022-11-15 02:00:00
12 2022-11-15 02:30:00
13 2022-11-15 02:30:00
14 2022-11-15 02:30:00
15 2022-11-15 03:00:00
16 2022-11-15 03:00:00
17 2022-11-15 03:00:00
So I created 2 different DataFrame Table and integrate it to tkinter GUI.
First Table looks like this;
Entry
Start
Finish
Total Time (Hour)
Status
Reason for Stoppage
1
23.05.2020 07:30
23.05.2020 08:30
01:00
MANUFACTURE
2
23.05.2020 08:30
23.05.2020 12:00
03:30
MANUFACTURE
3
23.05.2020 12:00
23.05.2020 13:00
01:00
STOPPAGE
MALFUNCTION
4
23.05.2020 13:00
23.05.2020 13:45
00:45
MANUFACTURE
5
23.05.2020 13:45
23.05.2020 17:30
03:45
MANUFACTURE
And second Table looks like this;
Start
Finish
Reason for Stoppage
10:00
10:15
Coffee Break
12:00
12:30
Lunch Break
15:00
15:15
Coffee Break
The main task is,combining these Tables and creating another Table.While doing that we should arrange the lines according to hours.At that time,the program has to create new lines 'itself' and show every starting/finishing hour in the Table.But I just can't do it by combining or merging them.
The third graph has to look like this;
Entry
Start
Finish
Total Time (Hour)
Status
Reason for Stoppage
1
23.05.2020 07:30
23.05.2020 08:30
01:00
MANUFACTURE
2
23.05.2020 08:30
23.05.2020 10:00
01:30
MANUFACTURE
3
23.05.2020 10:00
23.05.2020 10:15
00:15
STOPPAGE
Coffee Break
4
23.05.2020 10:15
23.05.2020 12:00
01:45
MANUFACTURE
5
23.05.2020 12:00
23.05.2020 12:30
00:30
STOPPAGE
Lunch Break
6
23.05.2020 12:30
23.05.2020 13:00
00:30
MANUFACTURE
7
23.05.2020 13:00
23.05.2020 13:45
00:45
STOPPAGE
MALFUNCTION
8
23.05.2020 13:45
23.05.2020 15:00
01:15
MANUFACTURE
9
23.05.2020 15:00
23.05.2020 15:15
00:15
STOPPAGE
Coffee Break
10
23.05.2020 15:15
23.05.2020 17:30
02:15
MANUFACTURE
I hope I explained the problem clearly.Thanks in advance.
from tkinter import *
import tkinter as tk
from tkinter import ttk
from pandastable import Table
import pandas as pd
import numpy as np
# import style
root = tk.Tk()
root.title("Çalışma Ve Mola Saatleri")
root.geometry("1800x1600")
work={"Entry":["1","2","3","4","5"],
"Start":["23.05.2020" " 07:30","23.05.2020 08:30",
"23.05.2020 12:00","23.05.2020" " 13:00","23.05.2020 13:45"],
"Finish":["23.05.2020 08:30","23.05.2020 12:00",
"23.05.2020 13:00","23.05.2020 13:45","23.05.2020 17:30"],
"Total Time (Hour)":["01:00","03:30","01:00","00:45","03:45"],
"Status":["MANUFACTURE","MANUFACTURE","STOPPAGE","MANUFACTURE","MANUFACTURE"],
"Reason For Stoppage":[" "," ","MALFUNCTION"," "," "]}
graph1=pd.DataFrame(work)
frame=tk.Frame(root)
frame.place(width=200)
frame.pack(anchor=W,padx=100,pady=50,ipadx=120,ipady=30)
pt=Table(frame,dataframe=graph1)
pt.show()
Break={"Start":["10:00","12:00","15:00"],
"Finish":["10:15","12:30","15:15"],
"Reason For Stoppage":["Coffee Break","Lunch Break","Coffee Break"]}
graph2=pd.DataFrame(Break)
frame2=tk.Frame(root)
frame2.place(width=100,height=50)
frame2.pack(anchor=NE,padx=150,ipadx=20,ipady=10)
pt2=Table(frame2,dataframe=graph2)
pt2.show()
graph3=pd.concat([graph1,graph2])
frame3=tk.Frame(root)
frame3.place()
frame3.pack(anchor=SW,padx=100,ipadx=120,ipady=500)
pt3=Table(frame3,dataframe=graph3)
pt3.show()
root.mainloop()
I have a web page which I can access from my server. The contents of the web page are as below.
xys.server.com - /xys/reports/
[To Parent Directory]
3/4/2021 6:09 AM <dir> All_Master
3/4/2021 6:09 AM <dir> Hartland
3/4/2021 6:09 AM <dir> Hauppauge
3/4/2021 6:09 AM <dir> Hazelwood
2/15/2019 7:41 AM 58224 NetBackup Retention and Full Backup Occupancy.xlsx
1/1/2022 11:00 AM 23959 OpsCenter_All_Master_Server_Backup_Report_01_01_2022_10_00_45_259_AM_49.zip
2/1/2022 11:00 AM 18989 OpsCenter_All_Master_Server_Backup_Report_01_02_2022_10_00_04_813_AM_4.zip
3/1/2022 11:00 AM 18969 OpsCenter_All_Master_Server_Backup_Report_01_03_2022_10_00_24_664_AM_17.zip
4/1/2021 10:00 AM 21709 OpsCenter_All_Master_Server_Backup_Report_01_04_2021_10_00_02_266_AM_31.zip
5/1/2021 10:00 AM 27491 OpsCenter_All_Master_Server_Backup_Report_01_05_2021_10_00_27_655_AM_11.zip
6/1/2021 10:00 AM 21260 OpsCenter_All_Master_Server_Backup_Report_01_06_2021_10_00_54_053_AM_19.zip
7/1/2021 10:00 AM 19898 OpsCenter_All_Master_Server_Backup_Report_01_07_2021_10_00_12_544_AM_42.zip
8/1/2021 10:00 AM 22642 OpsCenter_All_Master_Server_Backup_Report_01_08_2021_10_00_28_384_AM_25.zip
9/1/2021 10:00 AM 19426 OpsCenter_All_Master_Server_Backup_Report_01_09_2021_10_00_43_851_AM_70.zip
10/1/2021 10:01 AM 19149 OpsCenter_All_Master_Server_Backup_Report_01_10_2021_10_01_00_422_AM_7.zip
11/1/2021 10:00 AM 19638 OpsCenter_All_Master_Server_Backup_Report_01_11_2021_10_00_15_326_AM_20.zip
12/1/2021 11:00 AM 19375 OpsCenter_All_Master_Server_Backup_Report_01_12_2021_10_00_29_943_AM_13.zip
1/2/2022 11:00 AM 22281 OpsCenter_All_Master_Server_Backup_Report_02_01_2022_10_00_45_803_AM_37.zip
2/2/2022 11:00 AM 19435 OpsCenter_All_Master_Server_Backup_Report_02_02_2022_10_00_05_577_AM_71.zip
3/2/2022 11:00 AM 19380 OpsCenter_All_Master_Server_Backup_Report_02_03_2022_10_00_24_973_AM_90.zip
4/2/2021 10:00 AM 21411 OpsCenter_All_Master_Server_Backup_Report_02_04_2021_10_00_03_069_AM_56.zip
Now, I need to get the contents from this page in a structured format. I am using requests module but the data is highly un-structured and difficult to parse. The code is as below..
req = requests.get(url)
print (req.content.decode('utf-8'))
Output is like :
<pre>[To Parent Directory]<br><br> 3/4/2021 6:09 AM <dir> All_Master<br> 3/4/2021 6:09 AM <dir> Hartland<br> 3/4/2021 6:09 AM <dir> Hauppauge<br> 3/4/2021 6:09 AM <dir> Hazelwood<br> 2/15/2019 7:41 AM 58224 NetBackup Retention and Full Backup Occupancy.xlsx<br> 1/1/2022 11:00 AM 23959 OpsCenter_All_Master_Server_Backup_Report_01_01_2022_10_00_45_259_AM_49.zip<br> 2/1/2022 11:00 AM 18989 OpsCenter_All_Master_Server_Backup_Report_01_02_2022_10_00_04_813_AM_4.zip<br> 3/1/2022 11:00 AM 18969 OpsCenter_All_Master_Server_Backup_Report_01_03_2022_10_00_24_664_AM_17.zip<br> 4/1/2021 10:00 AM 21709 OpsCenter_All_Master_Server_Backup_Report_01_04_2021_10_00_02_266_AM_31.zip<br> 5/1/2021 10:00 AM 27491 OpsCenter_All_Master_Server_Backup_Report_01_05_2021_10_00_27_655_AM_11.zip<br> 6/1/2021 10:00 AM 21260 OpsCenter_All_Master_Server_Backup_Report_01_06_2021_10_00_54_053_AM_19.zip<br> 7/1/2021 10:00 AM 19898 OpsCenter_All_Master_Server_Backup_Report_01_07_2021_10_00_12_544_AM_42.zip<br> 8/1/2021 10:00 AM 22642 OpsCenter_All_Master_Server_Backup_Report_01_08_2021_10_00_28_384_AM_25.zip<br> 9/1/2021 10:00 AM 19426 OpsCenter_All_Master_Server_Backup_Report_01_09_2021_10_00_43_851_AM_70.zip<br> 10/1/2021 10:01 AM 19149 OpsCenter_All_Master_Server_Backup_Report_01_10_2021_10_01_00_422_AM_7.zip<br> 11/1/2021 10:00 AM 19638 OpsCenter_All_Master_Server_Backup_Report_01_11_2021_10_00_15_326_AM_20.zip<br> 12/1/2021 11:00 AM 19375 OpsCenter_All_Master_Server_Backup_Report_01_12_2021_10_00_29_943_AM_13.zip<br> 1/2/2022 11:00 AM 22281 OpsCenter_All_Master_Server_Backup_Report_02_01_2022_10_00_45_803_AM_37.zip<br> 2/2/2022 11:00 AM 19435 OpsCenter_All_Master_Server_Backup_Report_02_02_2022_10_00_05_577_AM_71.zip<br> 3/2/2022 11:00 AM 19380 OpsCenter_All_Master_Server_Backup_Report_02_03_2022_10_00_24_973_AM_90.zip<br> 4/2/2021 10:00 AM 21411 OpsCenter_All_Master_Server_Backup_Report_02_04_2021_10_00_03_069_AM_56.zip<br> 5/2/2021 10:00 AM 24191 OpsCenter_All_Master_Server_Backup_Report_02_05_2021_10_00_28_556_AM_14.zip<br> 6/2/2021 10:00 AM 21675 OpsCenter_All_Master_Server_Backup_Report_02_06_2021_10_00_54_962_AM_73.zip<br> 7/2/2021 10:00 AM 19954 OpsCenter_All_Master_Server_Backup_Report_02_07_2021_10_00_13_058_AM_31.zip<br> 8/2/2021 10:00 AM 21085 OpsCenter_All_Master_Server_Backup_Report_02_08_2021_10_00_28_778_AM_79.zip<br> 9/2/2021 10:00 AM 19691 OpsCenter_All_Master_Server_Backup_Report_02_09_2021_10_00_44_294_AM_5.zip<br> 10/2/2021 10:01 AM 23477 OpsCenter_All_Master_Server_Backup_Report_02_10_2021_10_01_00_793_AM_9.zip<br> 11/2/2021 10:00 AM 2
This is very unstructured.
Kindly suggest a way to make this content more readable so it is easy to parse the data...
I have dataframe with a timestamp column and iam using lambda function to that column. When i am doing that i am getting the following error:
row['date'] = pd.Timestamp(row['date']).apply(lambda t: t.replace(minute=15*(t.minute//15)).strftime('%H:%M'))
AttributeError: 'Timestamp' object has no attribute 'apply'
How can i do that in pandas?
example: output:
05:06 05:00
05:20 05:15
09:18 09:15
10:03 10:00
It seems you need to_datetime for convert column to datetimes instead Timestamp - it convert only scalar:
row['date']=pd.to_datetime(row['date']).apply(lambda t: t.replace(minute=15*(t.minute//15)))
.dt.strftime('%H:%M')
EDIT:
print (df)
a b
0 05:06 05:00
1 05:20 05:15
2 09:18 09:15
3 10:03 10:00
df['date'] = pd.to_datetime(df['a'])
.apply(lambda t: t.replace(minute=15*(t.minute//15)))
.dt.strftime('%H:%M')
print (df)
a b date
0 05:06 05:00 05:00
1 05:20 05:15 05:15
2 09:18 09:15 09:15
3 10:03 10:00 10:00
Another solution but with different output:
df['date'] = pd.to_datetime(df['a']).dt.round('15min').dt.strftime('%H:%M')
For checking output you can use:
L = ['5:' + str(x).zfill(2) for x in range(60)]
df = pd.DataFrame({'a':L})
#print (df)
df['date1'] = pd.to_datetime(df['a']).dt.round('15min').dt.strftime('%H:%M')
df['date'] = pd.to_datetime(df['a'])
.apply(lambda t: t.replace(minute=15*(t.minute//15)))
.dt.strftime('%H:%M')
print (df)
a date1 date
0 5:00 05:00 05:00
1 5:01 05:00 05:00
2 5:02 05:00 05:00
3 5:03 05:00 05:00
4 5:04 05:00 05:00
5 5:05 05:00 05:00
6 5:06 05:00 05:00
7 5:07 05:00 05:00
8 5:08 05:15 05:00
9 5:09 05:15 05:00
10 5:10 05:15 05:00
11 5:11 05:15 05:00
12 5:12 05:15 05:00
13 5:13 05:15 05:00
14 5:14 05:15 05:00
15 5:15 05:15 05:15
16 5:16 05:15 05:15
17 5:17 05:15 05:15
18 5:18 05:15 05:15
19 5:19 05:15 05:15
20 5:20 05:15 05:15
21 5:21 05:15 05:15
22 5:22 05:15 05:15
23 5:23 05:30 05:15
24 5:24 05:30 05:15
25 5:25 05:30 05:15
26 5:26 05:30 05:15
27 5:27 05:30 05:15
28 5:28 05:30 05:15
29 5:29 05:30 05:15
30 5:30 05:30 05:30
31 5:31 05:30 05:30
32 5:32 05:30 05:30
33 5:33 05:30 05:30
34 5:34 05:30 05:30
35 5:35 05:30 05:30
36 5:36 05:30 05:30
37 5:37 05:30 05:30
38 5:38 05:45 05:30
39 5:39 05:45 05:30
40 5:40 05:45 05:30
41 5:41 05:45 05:30
42 5:42 05:45 05:30
43 5:43 05:45 05:30
44 5:44 05:45 05:30
45 5:45 05:45 05:45
46 5:46 05:45 05:45
47 5:47 05:45 05:45
48 5:48 05:45 05:45
49 5:49 05:45 05:45
50 5:50 05:45 05:45
51 5:51 05:45 05:45
52 5:52 05:45 05:45
53 5:53 06:00 05:45
54 5:54 06:00 05:45
55 5:55 06:00 05:45
56 5:56 06:00 05:45
57 5:57 06:00 05:45
58 5:58 06:00 05:45
59 5:59 06:00 05:45