Recognize downloaded file names from google patents

Recognize downloaded file names from google patents - python

I have a csv file with about 500 hundred links from google patents and I iterate them in scrapy in order to download the csv file from each link (There is a download link in each link). I have successfully implement this but what I want to do now is a way from the html markup to discover the name from each downloaded file in order to edit with python. An example link is this https://patents.google.com/?q=O1C(%3dCCCC1C)C&oq=O1C(%3dCCCC1C)C. The name for the downloaded file is generated dynamically so is there a way to find it out?

the name is just the date : gp-search-20210816-142027.csv 2021-08-16 14:20:27.

As a demo of what you may want to do, and if I'm understanding the question you could follow the direction in the code below. Note: It is only a suggested idea and it only scrapes the PDF links from the first page to show the idea.
Code:
from bs4 import BeautifulSoup
from requests import get
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
# next 12 lines have the job of getting the links of the PDF files from the URL below
# just the FIRST PAGE as a demo
url = "https://patents.google.com/?q=O1C(%3dCCCC1C)C&oq=O1C(%3dCCCC1C)C"
path = r'chromedriver'
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
driver = webdriver.Chrome(path, options=options)
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
links = []
print('# this just gets the FIRST PAGE for a demo')
for link in soup.find_all('a',attrs={'class':'pdfLink style-scope search-result-item'}):
print(link['href'])
links.append(link['href'])
# next 11 lines cover creating a dataframe from the downloaded CSV file on the Google search page
# and a 2nd frame for the links scraped. The two frames are eventually joined by a partial match
# of the 'result_link' from the first frame and a partial match of the filename of the pdf URL from the 2nd frame
pattern = r'/([A-Z]{2}\w{7})'
df = pd.read_csv('gp-search-20210816-190925.csv', skiprows=1)
df.columns = df.columns.str.replace(' ', '_')
df['partial_file_name'] = df['result_link'].str.extract(pattern)
df1 = pd.DataFrame(links, columns=['pdf_link'])
df1['partial_file_name'] = df1['pdf_link'].str.extract(pattern)
df = pd.concat([df, df1], axis=1)
df['filename'] = df['pdf_link'].str.extract(r'/([A-Z]{2}\w+)\.')
del df['partial_file_name']
print('\n\n', df.columns)
# 12 columns in total but for demo showing five
df[['filename', 'id', 'title', 'filing/creation_date', 'pdf_link']].head(
)
Outputs:
# this just gets the FIRST PAGE for a demo
https://patentimages.storage.googleapis.com/01/d1/77/6b0b7640eaccda/US7550931.pdf
https://patentimages.storage.googleapis.com/a2/32/15/69cf7713e8e2bf/JP2008525498A.pdf
https://patentimages.storage.googleapis.com/7e/6b/b7/001a8040e216ee/TWI686424B.pdf
https://patentimages.storage.googleapis.com/0f/14/fc/ecb56564f14f6b/WO2005009447A1.pdf
https://patentimages.storage.googleapis.com/95/fd/d5/ed4fe960bdec1c/KR20140096378A.pdf
https://patentimages.storage.googleapis.com/7e/29/01/231cc0813a0f6a/US5026677.pdf
https://patentimages.storage.googleapis.com/ff/f9/c9/7b775d6534d9cb/EP0628427A1.pdf
https://patentimages.storage.googleapis.com/bd/b3/ba/f38866e0b298e2/KR960004857B1.pdf
https://patentimages.storage.googleapis.com/79/e2/11/78aea87078687f/US5942486.pdf
https://patentimages.storage.googleapis.com/62/f5/da/f291e7552a45a6/US5142089.pdf
Index(['id', 'title', 'assignee', 'inventor/author', 'priority_date',
'filing/creation_date', 'publication_date', 'grant_date', 'result_link',
'representative_figure_link', 'pdf_link', 'filename'],
dtype='object')
filename id title filing/creation_date pdf_link
0 US7550931 US-7550931-B2 Controlled lighting methods and apparatus 2007-03-15 https://patentimages.stora.....ccda/US7550931.pdf
1 JP2008525498A JP-2008525498-A Enzyme modulators and therapy 2005-12-23 https://patentimages.stora....f/JP2008525498A.pdf
2 TWI686424B TW-I686424-B Polymer containing triazine ring and compositi... 2016-01-15 https://patentimages.storage.googleapis.com/7e...
3 WO2005009447A1 WO-2005009447-A1 Single dose fast dissolving azithromycin 2004-07-22 https://patentimages.storage.googleapis.com/0f...
4 KR20140096378A KR-20140096378-A Low chloride compositions of olefinically func... 2012-11-19 https://patentimages.storage.googleapis.com/95...
It shows a way you can get the filenames, links, and other fields aligned.

Related

How to make the Selenium/BS4 program using Pandas and DataFrame more optimized and elegant?

I'm learning web scraping and found a fun challenge scraping a Javascript handlebars table from this page: Samsung Knox Devices
I eventually got the output I wanted, but I think it feels "hacky", so I'd appreciate any refinements to make it more elegant.
Desired output is a dataframe/csv table with columns = Device, Model_Nums, OS/Platform, Knox Version. Don't need anything else on the page, and I will split/expand and melt the Model Nums separately.
import pandas as pd
# Libraries for this task:
from bs4 import BeautifulSoup
from selenium import webdriver
# Because the target table is built using Javascript handlebars, we have to use Selenium and a webdriver
driver = webdriver.Edge("MY_PATH") # REPLACE WITH >YOUR< PATH!
# Point the driver at the target webpage:
driver.get('https://www.samsungknox.com/en/knox-platform/supported-devices')
# Get the page content
html = driver.page_source
# Typically I'd do something like: soup = BeautifulSoup(html, "lxml")
# Link below suggested the following, which works; I don't know if it matters
sp = BeautifulSoup(html, "html.parser")
# The 'table here is really a bunch of nested divs
tables = soup.find_all("div", class_='table-row')
# https://www.angularfix.com/2021/09/how-to-extract-text-from-inside-div-tag.html
rows = []
for t in tables:
row = t.text
rows.append(row)
# These are the table-row div classes within each table-row from the output at the previous step that I want:
# div class="supported-devices pivot-fixed"
# div class="model"
# div class="operating system"
# div class="knox-version"
# Define div class names:
targets = ["supported-devices pivot-fixed", "model", "operating-system", "knox-version"]
# Create an empty list and loop through each target div class; append to list
data = []
for t in targets:
hold = sp.find_all("div", class_=t)
for h in hold:
row = h.text
data.append({'column': t, 'value': row})
df = pd.DataFrame(data)
# This feels like a hack, but I got stuck and it works, so \shrug/
# Create Series from filtered df based on 'column' value (corresponding to the the four "targets" above)
name = pd.Series(df['value'][df['column']=='supported-devices pivot-fixed']).reset_index(drop=True)
model = pd.Series(df['value'][df['column']=='model']).reset_index(drop=True)
os = pd.Series(df['value'][df['column']=='operating-system']).reset_index(drop=True)
knox = pd.Series(df['value'][df['column']=='knox-version']).reset_index(drop=True)
# Concatenate Series into df
df2 = pd.concat([df_name, df_model, df_os, df_knox], axis=1)
# Make the first row the column names:
new_header = df2.iloc[0] #grab the first row for the header
sam_knox_table = df2[1:] #take the data less the header row
sam_knox_table.columns = new_header #set the header row as the df header
# Bob's your uncle
sam_knox_table.to_csv('sam_knox.csv', index=False)

To scrape the texts from the DEVICE and MODEL CODE column you need to create a list of the desired texts using list comprehension inducing WebDriverWait for the visibility_of_all_elements_located() and then write it into a DataFrame using Pandas and you can use the following locator strategies:
Code Block:
driver.get("https://www.samsungknox.com/en/knox-platform/supported-devices")
devices = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.table-row:not(.table-header) > div.supported-devices")))]
models = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.table-row:not(.table-header) > div.model")))]
df = pd.DataFrame(data=list(zip(devices, models)), columns=['DEVICE', 'MODEL CODE'])
print(df)
driver.quit()
Console Output:
DEVICE MODEL CODE
0 Galaxy A42 5G SM-A426N, SM-A426U, SM-A4260, SM-A426B
1 Galaxy A52 SM-A525F, SM-A525M
2 Galaxy A52 5G SM-A5260
3 Galaxy A52 5G SM-A526U, SC-53B, SM-A526W, SM-A526B
4 Galaxy A52s 5G SM-A528B, SM-A528N
.. ... ...
371 Gear Sport SM-R600
372 Gear S3 Classic SM-R775V
373 Gear S3 Frontier SM-R765V
374 Gear S2 SM-R720, SM-R730A, SM-R730S, SM-R730V
375 Gear S2 Classic SM-R732, SM-R735, SM-R735A, SM-R735V, SM-R735S
[376 rows x 2 columns]

Issues with web-scraping: Unable to retrieve the right table (Quarterly stock price) from MacroTrends using Tesla stock price

I'm want to import data from the Tesla Quarterly Revenues into a pandas data frame. I keep extracting the Annual Revenues table instead (both tables are sides by sides on the webpage). How do I need to modify my code to extract the Quarterly Revenues? Thanks in advance.
```html_data = requests.get('https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue').text
soup = BeautifulSoup(html_data, 'html5lib')
tesla_revenue = pd.DataFrame(columns=['Date', 'Revenue'])
for row in soup.find("tbody").find_all("tr"):
col = row.find_all("td")
if (col != []):
date =col[0].text
revenue =col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue```

To stick with your method you can use the following css selector. Note that I skip the first row which are the headers
Py requests:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
r = requests.get('https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue')
soup = bs(r.content, 'html5lib')
tesla_revenue = pd.DataFrame(
[{"Date": r.select_one('td:nth-of-type(1)').text,
"Revenue":r.select_one('td:nth-of-type(2)').text}
for r in soup.select('#style-1 div + div .historical_data_table tr')[1:]]
, columns = ['Date', 'Revenue'])
print(tesla_revenue)
This #style-1 div + div .historical_data_table tr select for a parent that has id style-1 that has a child div , via descendant combinator ( ), then moves to the adjacent div with an adjacent sibling combinator (+), then the table element which has class (.) historical_data_table, and finally, via descendant combinator ( ), selects all the rows, tr, within.
You can test the css within the browser's elements tab by pressing F12 and then Ctrl + F and enter the css #style-1 div + div .historical_data_table tr and hit enter. You can then cycle through the matches.
You can read about css selectors here: https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
Py pandas read_html:
There is a lot of unnecessary work going on there though.
You could more easily just use pandas read_html and index in for the right table:
import pandas as pd
table = pd.read_html('https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue')[1]
print(table)
You can find the right index by searching in the elements tab, as described above, but using the css type selector table - you will see the second actual table element match is the one you want.
If you examine the page source (right click view source) you will find the table, meaning that content is static and read_html can read it (unless request headers) are required.
Scraping:
As per the helpful comment by #MichaelLongstreth
Please read the https://www.macrotrends.net/robots.txt for whether scraping is permitted. If it is not permitted, look for a public API serving the same data or another website that does permit scraping.

Trying to Structure BeautifulSoup to Flexibly Scrape Company Annual Reports

I am trying to use the United States Securities and Exchange (SEC) database, to look at company financial reports (known as 10k’s) to pull out a list of the executive committee members for each filing. I am currently using the most recent files for Microsoft (stock ticker: MSFT) and Walmart (stock ticker: WMT). I know I can look up this information elsewhere on finance websites but I am trying to make a flexible database for personal use. My issue:
The table index position is different in each report, on one company report the table I want may
be table 38 and on another it may be table 45 so a static index/position count will not work across
multiple filings.
The specific attributes in each HTML table tag change so I cannot search for a common attribute. In
some cases I find common attributes and sometimes I do not.
I am starting to think I may not be able to automate this due to lack of identifiers that are unique within each file and common across all files. I've banged my head looking at many Python Webscraping tutorials and videos the last few weeks. Any suggestions are appreciated, full automation would be ideal so I can loop through multiple filings, partial helps too I'm here to learn. I might be bumping into trying to automate something that is too diverse.
Microsoft Link:
https://www.sec.gov/Archives/edgar/data/789019/000156459019027952/msft-10k_20190630.htm
Desired Table:
<table border="0" cellspacing="0" cellpadding="0" align="center" style="border-collapse:collapse; width:100%;">
Walmart Link:
https://www.sec.gov/Archives/edgar/data/104169/000010416919000016/wmtform10-kx1312019.htm
Desired Table:
<table cellpadding="0" cellspacing="0" style="font-family:Times New Roman;font-size:10pt;width:100%;border-collapse:collapse;text-align:left;">
Code to Count Number of Tables in Each Page:
from selenium import webdriver
from bs4 import BeautifulSoup
chrome_path = r"C:\webdrivers\chromedriver.exe"
browser = webdriver.Chrome(chrome_path)
#Microsoft
browser.get("https://www.sec.gov/Archives/edgar/data/789019/000156459019027952/msft-10k_20190630.htm")
msft = browser.page_source
page_msft = BeautifulSoup(msft, 'html.parser')
tables_msft = page_msft.find_all("table")
#Walmart
browser.get("https://www.sec.gov/Archives/edgar/data/104169/000010416919000016/wmtform10-kx1312019.htm")
wmt = browser.page_source
page_wmt = BeautifulSoup(wmt, 'html.parser')
tables_wmt = page_wmt.find_all("table")
print("MSFT Result Table Count: " + str(len(tables_msft)))
print("Walmart Result Table Count: " + str(len(tables_wmt)))
Results:
MSFT Result Table Count: 263
Walmart Result Table Count: 258
Process finished with exit code 0

Firstly you don't need Selenium, requests library will be faster and avoid overhead. So I was able to partially figure out a way to extract the required data. But since the number of columns is different they cannot be combined together(for Microsoft and Walmart).
The below code generates two required dataframe one for Microsoft and one for Walmart.
You still need to manipulate the column names. The idea is to get the table with td value as 'Age' since it is a unique table data. Let me know if you need some clarifications:-
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
#Microsoft
page = requests.get("https://www.sec.gov/Archives/edgar/data/789019/000156459019027952/msft-10k_20190630.htm")
soup = BeautifulSoup(page.text, 'html')
resmsft = []
tables_msft = soup.find(text="Age").find_parent("table")
for row in tables_msft.find_all("tr")[1:]:
# print([cell.get_text(strip=True) for cell in row.find_all("td")])
if row:
resmsft.append([cell.get_text(strip=True) for cell in row.find_all("td")])
non_empty = [sublist for sublist in resmsft if any(sublist)]
df_msft = pd.DataFrame.from_records(non_empty)
df_msft[df_msft==''] = np.nan
df_msft=df_msft.dropna(axis=1,how='all')
#Walmart
page = requests.get("https://www.sec.gov/Archives/edgar/data/104169/000010416919000016/wmtform10-kx1312019.htm")
soup = BeautifulSoup(page.text, 'html')
#page_wmt = BeautifulSoup(soup, 'html.parser')
tables_wmt = soup.find(text="Age").find_parent("table")
reswmt = []
for row in tables_wmt.find_all("tr")[1:]:
# print([cell.get_text(strip=True) for cell in row.find_all("td")])
if row:
reswmt.append([cell.get_text(strip=True) for cell in row.find_all("td")])
non_empty_wmt = [sublist for sublist in reswmt if any(sublist)]
df_wmt = pd.DataFrame.from_records(non_empty_wmt)
df_wmt[df_wmt==''] = np.nan
df_wmt=df_wmt.dropna(axis=1,how='all')

Scraping OSHA website using BeautifulSoup

I'm looking for help with two main things: (1) scraping a web page and (2) turning the scraped data into a pandas dataframe (mostly so I can output as .csv, but just creating a pandas df is enough for now). Here is what I have done so far for both:
(1) Scraping the web site:
I am trying to scrape this page: https://www.osha.gov/pls/imis/establishment.inspection_detail?id=1285328.015&id=1284178.015&id=1283809.015&id=1283549.015&id=1282631.015. My end goal is to create a dataframe that would ideally contain only the information I am looking for (i.e. I'd be able to select only the parts of the site that I am interested in for my df); it's OK if I have to pull in all the data for now.
As you can see from the URL as well as the ID hyperlinks underneath "Quick Link Reference" at the top of the page, there are five distinct records on this page. I would like each of these IDs/records to be treated as an individual row in my pandas df.
EDIT: Thanks to a helpful comment, I'm including an example of what I would ultimately want in the table below. The first row represents column headers/names and the second row represents the first inspection.
inspection_id open_date inspection_type close_conference close_case violations_serious_initial
1285328.015 12/28/2017 referral 12/28/2017 06/21/2018 2
Mostly relying on BeautifulSoup4, I've tried a few different options to get at the page elements I'm interested in:
# This is meant to give you the first instance of Case Status, which in the case of this page is "CLOSED".
case_status_template = html_soup.head.find('div', {"id" : "maincontain"},
class_ = "container").div.find('table', class_ = "table-bordered").find('strong').text
# I wasn't able to get the remaining Case Statuses with find_next_sibling or find_all, so I used a different method:
for table in html_soup.find_all('table', class_= "table-bordered"):
print(table.text)
# This gave me the output I needed (i.e. the Case Status for all five records on the page),
# but didn't give me the structure I wanted and didn't really allow me to connect to the other data on the page.
# I was also able to get to the same place with another page element, Inspection Details.
# This is the information reflected on the page after "Inspection: ", directly below Case Status.
insp_details_template = html_soup.head.find('div', {"id" : "maincontain"},
class_ = "container").div.find('table', class_ = "table-unbordered")
for div in html_soup.find_all('table', class_ = "table-unbordered"):
print(div.text)
# Unfortunately, although I could get these two pieces of information to print,
# I realized I would have a hard time getting the rest of the information for each record.
# I also knew that it would be hard to connect/roll all of these up at the record level.
So, I tried a slightly different approach. By focusing instead on a version of that page with a single inspection record, I thought maybe I could just hack it by using this bit of code:
url = 'https://www.osha.gov/pls/imis/establishment.inspection_detail?id=1285328.015'
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
first_table = html_soup.find('table', class_ = "table-borderedu")
first_table_rows = first_table.find_all('tr')
for tr in first_table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
# Then, actually using pandas to get the data into a df and out as a .csv.
dfs_osha = pd.read_html('https://www.osha.gov/pls/imis/establishment.inspection_detail?id=1285328.015',header=1)
for df in dfs_osha:
print(df)
path = r'~\foo'
dfs_osha = pd.read_html('https://www.osha.gov/pls/imis/establishment.inspection_detail?id=1285328.015',header=1)
for df[1,3] in dfs_osha:
df.to_csv(os.path.join(path,r'osha_output_table1_012320.csv'))
# This worked better, but didn't actually give me all of the data on the page,
# and wouldn't be replicable for the other four inspection records I'm interested in.
So, finally, I found a pretty handy example here: https://levelup.gitconnected.com/quick-web-scraping-with-python-beautiful-soup-4dde18468f1f. I was trying to work through it, and had gotten as far as coming up with this code:
for elem in all_content_raw_lxml:
wrappers = elem.find_all('div', class_ = "row-fluid")
for x in wrappers:
case_status = x.find('div', class_ = "text-center")
print(case_status)
insp_details = x.find('div', class_ = "table-responsive")
for tr in insp_details:
td = tr.find_all('td')
td_row = [i.text for i in td]
print(td_row)
violation_items = insp_details.find_next_sibling('div', class_ = "table-responsive")
for tr in violation_items:
tr = tr.find_all('tr')
tr_row = [i.text for i in tr]
print(tr_row)
print('---------------')
Unfortunately, I ran into too many bugs with this to be able to use it so I was forced to abandon the project until I got some further guidance. Hopefully the code I've shared so far at least shows the effort I've put in, even if it doesn't do much to get to the final output! Thanks.

For this type of page you don't really need beautifulsoup; pandas is enough.
url = 'your url above'
import pandas as pd
#use pandas to read the tables on the page; there are lots of them...
tables = pd.read_html(url)
#Select from this list of tables only those tables you need:
incident = [] #initialize a list of inspections
for i, table in enumerate(tables): #we need to find the index position of this table in the list; more below
if table.shape[1]==5: #all relevant tables have this shape
case = [] #initialize a list of inspection items you are interested in
case.append(table.iat[1,0]) #this is the location in the table of this particular item
case.append(table.iat[1,2].split(' ')[2]) #the string in the cell needs to be cleaned up a bit...
case.append(table.iat[9,1])
case.append(table.iat[12,3])
case.append(table.iat[13,3])
case.append(tables[i+2].iat[0,1]) #this particular item is in a table which 2 positions down from the current one; this is where the index position of the current table comes handy
incident.append(case)
columns = ["inspection_id", "open_date", "inspection_type", "close_conference", "close_case", "violations_serious_initial"]
df2 = pd.DataFrame(incident,columns=columns)
df2
Output (pardon the formatting):
inspection_id open_date inspection_type close_conference close_case violations_serious_initial
0 Nr: 1285328.015 12/28/2017 Referral 12/28/2017 06/21/2018 2
1 Nr: 1283809.015 12/18/2017 Complaint 12/18/2017 05/24/2018 5
2 Nr: 1284178.015 12/18/2017 Accident 05/17/2018 09/17/2018 1
3 Nr: 1283549.015 12/13/2017 Referral 12/13/2017 05/22/2018 3
4 Nr: 1282631.015 12/12/2017 Fat/Cat 12/12/2017 11/16/2018 1

Scraping Table using Python and Selenium

I am trying to scrape the table below using python. Tried pulling html tags to find the element id_dt1_NGY00 and so on but cannot find it once the page is populated so someone told me use Selenium and did manage to scrape some data.
https://www.insidefutures.com/markets/data.php?page=quote&sym=ng&x=13&y=8
The numbers are updated every 10 minutes so this website is dynamic. Used the following code below but it is printing out everything in a linear format rather than in a format that can be tabular as rows and columns. Included below are two sections of sample output
Contract
Last
Change
Open
High
Low
Volume
Prev. Stl.
Time
Links
May '21 (NGK21)
2.550s
+0.006
2.550
2.550
2.550
1
2.544
05/21/18
Q / C / O
Jun '21 (NGM21)
2.576s
+0.006
0.000
2.576
2.576
0
2.570
05/21/18
Q / C / O
Code below
import time
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
browser = webdriver.Chrome(executable_path= "C:\Users\siddk\PycharmProjects\WebSraping\venv\selenium\webdriver\chromedriver.exe")
browser.get("https://www.insidefutures.com/markets/data.php?page=quote&sym=ng&x=14&y=16")
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser')
th_tags = soup.find_all('tr')
for th in th_tags:
print (th.get_text())
I want to extract this data in Panda and analyze averages etc on daily basis. Please help. I have exhausted my strength on doing this myself with multiple iterations to code.

Try the below script to get the tabular data. It is necessary to find the right url which contains the same table but does not get generated dynamically so that you can do your operation without using any browser simulator.
Give it a go:
from bs4 import BeautifulSoup
import requests
url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for tr in soup.find(class_="bcQuoteTable").find_all("tr"):
data = [item.get_text(strip=True) for item in tr.find_all(["th","td"])]
print(data)
Rusults are like:
['Contract', 'Last', 'Change', 'Open', 'High', 'Low', 'Volume', 'Prev. Stl.', 'Time', 'Links']
['Cash (NGY00)', '2.770s', '+0.010', '0.000', '2.770', '2.770', '0', '2.760', '05/21/18', 'Q/C/O']
["Jun \\'18 (NGM18)", '2.901', '-0.007', '2.902', '2.903', '2.899', '138', '2.908', '17:11', 'Q/C/O']
["Jul \\'18 (NGN18)", '2.927', '-0.009', '2.928', '2.930', '2.926', '91', '2.936', '17:11', 'Q/C/O']
["Aug \\'18 (NGQ18)", '2.944', '-0.008', '2.945', '2.947', '2.944', '42', '2.952', '17:10', 'Q/C/O']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Recognize downloaded file names from google patents - python

the name is just the date : gp-search-20210816-142027.csv 2021-08-16 14:20:27.

Related

How to make the Selenium/BS4 program using Pandas and DataFrame more optimized and elegant?

Issues with web-scraping: Unable to retrieve the right table (Quarterly stock price) from MacroTrends using Tesla stock price

Trying to Structure BeautifulSoup to Flexibly Scrape Company Annual Reports

Scraping OSHA website using BeautifulSoup

Scraping Table using Python and Selenium

Categories

Resources