Python: Trying to gather specific information from a website

Python: Trying to gather specific information from a website - python

Beginning programmer here. Just completed the CS61A introduction Python class # UC Berkeley, and I was thinking of trying to implement a little program:
Basically, I want to be able to enter a band name, and have the program search www.setlist.fm, and return a bunch of setlists for recent concerts of that band. Sounds easy enough... I have a VERY basic idea with what to do with urllib and urlopen, but that's about it. Any pointers or guidance on how to get started?
Thanks!

Read about their API.
http://api.setlist.fm/docs/index.html
Read how to make HTTP GET requests using urllib2
http://www.voidspace.org.uk/python/articles/urllib2.shtml

Related

How to read Values from a web page using python?

I am very very new to python but I have started an internship that requires me to do some python work, I am asked to create an application that can retrieve data from a webpage(IP address) and then compare those values to the correct values and then print out if it has passed or not. Please check this diagram I have made so that you guys can understand it better. Please take a look.
So far I have only written some python code to check if the website/ip address is up or not but I have no idea how to go further. could you guys please help me to execute the further steps with some examples maybe?
Here is a picture of the website. the values circled in red color need to be compared with the Auxiliary Values I hope this picture helps.
However, I could use http://192.168.100.2/globals.xml on this page to compare the values. Any help is much appreciated.
import requests
import urllib.request
import eventlet
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
eventlet.monkey_patch()
with eventlet.Timeout(10):
print(urllib.request.urlopen("http://192.168.100.5").getcode())
print("Website is UP")
eventlet.monkey_patch()
with eventlet.Timeout(10):
print(urllib.request.urlopen("http://10.10.10.2").getcode())
print("Website is UP")

You are off to a great start! Your next steps should be identifying unique traits about the elements that you want to scrape. Specifically, look for things like class or id names that are unique to only the data that you want to scrape.
You can also use tools like Selector Gadget (https://selectorgadget.com/) that can help automate the process. Unfortunately, since you are accessing local IP addresses, nobody here will be able to help you find these.
After you find the proper selectors, you can use BeautifulSoup to view the data. I'd recommend looking at the find and findall commands that BeautifulSoup has.

How to fetch data from Skyscanner?

I am new to Python and there has been a request for grabbing the dynamic data from www.skyscanner.net.
Can someone guide me on doing so?
import requests
import lxml.html as lh
url = 'http://www.skyscanner.net/transport/flights/sin/lhr/131231/140220/'
response = requests.post(url)
tree = lh.document_fromstring(response.content)
print(tree);
All I did was to find the pattern in URL and attempt to grab from there. However, no data were successfully pulled. I learnt that Python was the best language in doing such task, but the library seems too huge and I do not know where to start form.

My name is Piotr - I work for Skyscanner - in Data Acquisition team - which I assume that you are applying to join :-) As this is a part of your task I wouldn't like to give you a straight answer , however you might consider:
Understand how our site works - how the requests are built and what data you can find in the http response.
You could use some libraries that will help you parsing xml/json responses
I think that's all I can say :-)
Cheers,
piotr

Extracting and parsing HTML from a secure website with Python?

Let's dive into this, shall we?
Ok, I need to write a script (I don't care what language, prefer something like Python or Javascript, but whatever works I will take time to learn). The script will access multiple URL's, extract text from each site and store it into a folder on my PC. (From there I am manipulating the data with Python, which I know how to do.)
EDIT:
Currently I am using python's NLTK module. Here is a simple version of my code:
url = "<URL HERE>"
html = urlopen(url).read()
raw = nltk.clean_html(html)
print(raw)
This code works fine for both http and https, but not for instances where authentication is required.
Is there a Python module which deals with secure authentication?
Thanks in advance for help! And to the mods who will view this as a bad question, please just give me ways to make it better. I need ideas..from people, not Google.

Mechanize (2) is one option, other is just with urllib2

transferring real time data from a website in python

I am programming in Python.
I would like to extract real time data from a webpage without refreshing it:
http://www.fxstreet.com/rates-charts/currency-rates/
I think the real time data webpage is written in AJAX but I am not quite sure..
I thought about opening an internet browser with the program but I do not really know/like this way... Is there an other way to do it?
I would like to fill a dictionnary in my program (or even a SQL database) with the latest numbers each second.
please help me in python, thanks!

To get the data, you'll need to look through the javascript and HTML source to find what URL it's hitting to get the data it's displaying. Then, you can call that URL with urllib or your favorite python library and parse it
Also, it may be easier if you use a plugin like Firebug that lets you watch the AJAX requests.

A simple spider question

I am a newbie trying to achive this simple task by using Scrapy with no luck so far. I am asking your advice about how to do this with Scrapy or with any other tool (with Python). Thank you.
I want to
start from a page that lists bios of attorneys whose last name start with A: initial_url = www.example.com/Attorneys/List.aspx?LastName=A
From LastName=A to extract links to actual bios: /BioLinks/
visit each of the /BioLinks/ to extract the school info for each attorney.
I am able to extract the /BioLinks/ and School information but I am unable to go from the initial url to the bio pages.
If you think this is the wrong way to go about this, then, how would you achieve this goal?
Many thanks.

Not sure I fully understand what you're asking, but maybe you need to get the absolute URL to each bio and retrieve the source code for that page:
import urllib2
bio_page = urllib.urlopen(bio_url).read()
Then use a regular expressions or other parsing to get the attorney's law school.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Trying to gather specific information from a website - python

Read about their API. http://api.setlist.fm/docs/index.html Read how to make HTTP GET requests using urllib2 http://www.voidspace.org.uk/python/articles/urllib2.shtml

Related

How to read Values from a web page using python?

How to fetch data from Skyscanner?

Extracting and parsing HTML from a secure website with Python?

transferring real time data from a website in python

A simple spider question

Categories

Resources