Trying to extract data from multiple websites at once but can't - python

I have a dataset of 110,003 churches in the US. The dataset includes the name of the church together with other details. I need to populate a column with the date of opening. This requires Googling each church to find its date of opening and populating each row. I am posting an example dataset with 10,000 cases. Please help!
https://docs.google.com/spreadsheets/d/1B1db58lmP5nK1ZeJPEJTuYnPjGTxfu_74RG1IELLsP0/edit?usp=sharing
I have tried using import.io and scrapy but they don't seem to work for me as well as I would appreciate if you recommend a better tool.

Related

Scraping Contact Information from Several Websites with Python

I want to collect contact information from all county governments. I do not have a list of their websites. I want to do three things with Python: 1) create a list of county government websites, 2) extract names, email addresses, and phone numbers of government officials, and 3) convert URLs and all the contact information into an excel sheet or csv.
I am a beginner in Python, and any guidance would be greatly appreciated. Thanks!
For creating tables, you would use a package called pandas
for extracting info from websites, a package called beautifulsoup4 is commonly used.
For scraping a website (all data present in the world) you should
define what type of search you want to start, I mean do you want to
Search in google or a specific website for both of them you need a
request library to curl a site or query a google (like search in
the search bar) and got HTML. for parsing data, you have gotten you
can choose BEATIFULSOAP. Both of them have good documents and you
must read them don't disappoint it's easy.
Because the count of countries around the world is more than 170+
you should manage your data; for managing data I recommend using pandas
and finally, after processing data you can convert data to any type of file
pandas.to_excel, pandas.to_csv and more.

Regenerate table in googlesheets using python

I am trying to find/work out the code needed to get a table of sales data to regenerate at the beginning of each sales day using googlesheets and python3. I've been combing the web and youtube but not had any luck yet. Any advice would be really appreciated :)

Using Scrapy with a List of Keyword

I have a csv file contains book names with single columns and 1000 row. I need the crawl author and published year on the next columns. Can i do this with Scrapy? Is there any document to share with me?
Thanks for now.
Book_Name;Author;Published_Date
don quijote;;
name of the rose;;
oliver twist;;
Edit: I tried to find data from "https://isbndb.com". I wonder if scrapy is suitable for this job.
Scrapy is used for Webscraping.
In your case you could simply use CSV files and Python

How to set a date range for scraping google search using Python?

I would like to know if it is possible to scrape google search specifying a date range. I read about googlesearch and I am trying to use its module (search). However it seems that something it is not working.
Using 'cdr:1,cd_min:01/01/2020,cd_max:01/01/2020' to search all results about a query (for example Kevin Spacey), it is not returning the expected urls. I guess something it is not working with the function (as defined in the library). Has someone ever tried to use it?
I am looking for results in Italian (only pages in Italian and with domain google.it). Another way to scrape these results would be also welcomed.
Many thanks
May this information help you:
Then, use the HTTP Spy to get the detail of the request. It's useful when Google changes their format of search, and the Module has not applied update to their code.
Good luck!

extracing data from websites using python

I'm pretty new to web development and I have an idea for something that I would like to explore and I'd like some advice on what tools I should use. I know python and have been learning django recently so I would ideally like to incorporate them.
What I want to do is related to some basic html parsing and use of regular expressions I think. Basically, I want to be able to aggregate certain bits of useful information from several websites into one site. Suppose, for example, there are a dozen high schools whose graduation dates, times, and locations I'm interested in knowing. How the information on each high school site is presented is roughly similar and so I want to extract the data for the word after "location" or "venue", "time", "date", etc and then have that automatically posted on my site and I would also like it updated if any of the info happens to change on any of the high school sites.
What would you use to accomplish this task? Also, if you know of any useful tutorials, resources, etc that you could point me to, that would be much appreciated!
For the extraction part I think your best bet would be Beautiful soup mostly beacause it's easy to use and would try to parse anything even broken xml/html.
Check out BeautifulSoup
Update:
If you want to fill forms you can use mechanize

Categories