Web scraping with Python3, and string formatting ?

Web scraping with Python3, and string formatting ? - python

this script is meant to parse Bloomberg finance to find the GBP value during the day, this following script does that however when it returns you get this:
{'dateTime': '2017-01-17T22:00:00Z', 'value': 1.6406}
I don't want the dateTime, or the value text there. I don't know how to get rid of it. and when I try it gives me errors like this: list index out of range.
any answers will be greatly appreciated. here is the script (in python3):
import urllib.request
import json
htmltext = urllib.request.urlopen('https://www.bloomberg.com/markets/api/bulk- time-series/price/GBPAUD%3ACUR?timeFrame=1_DAY').read().decode('utf8')
data = json.loads(htmltext)
datapoints = data[1]['price']
print(datapoints)

This should work for you.
print (data[0]['price'][-1]['value'])
EDIT: To get all the values,
for data_row in data[0]['price']:
print data_row['value']
EXPLANATION: data[0] gets the first and only element of the list, which is a dict. ['price'] gets the list corresponding to the price key. [-1] gets the last element of the list, which is presumably the data you'll be looking for as it's the latest data point.
Finally, ['value'] gets the value of the currency conversion from the dict we obtained earlier.

Related

Looping through json.loads(response.text) with Python

i'm learning and would appreciate any help in this code.
The issue is trying to print the values in the data that are contained in one line of the JSON using Python.
import json
import requests
data = json.loads(response.text)
print(len(data)) #showing correct value
#where i'm going wrong below obviously this will print the first value then the second as it's indexed. Q how do I print all values when using seperate print statements when the total indexed value is unknown?
for item in data:
print(data[0]['full_name'])
print(data[1]['full_name'])
I tried without the index value this gave me the first value multiple times depending on the length.
I expect to be able to access from the JSON file each indexed value separately even though they are named the same thing "full_name" for example.

import json
import requests
data = json.loads(response.text)
print(len(data)) #showing correct value
for item in data:
print(item['full_name'])
#the below code will throw error.. because python index starts with 0
print(data[0]['full_name'])
print(data[1]['full_name'])
hope this help

Presuming data is a list of dictionaries, where each dictionary contains a full_name key:
for item in data:
print(item['full_name'])
This code sample from your post makes no sense:
for item in data:
print(data[0]['full_name'])
print(data[1]['full_name'])
Firstly it's a syntax error because there is nothing indented underneath the loop.
Secondly it's a logic error, because the loop variable is item but then you never refer to that variable.

Scrape data from JSON

I am making a price statistics project with Python, and I have a problem with scraping data from an API. The API is https://www.rolimons.com/api/activity
I want to get prices from the API, which are the last 2 values from one block.
For example, from [1588247532, 0, "1028606", 464, 465] I would need 464 and 465 only. Also I want to do this for all tables.
How can I do that? Here is the code I have so far:
import requests
import json
r = requests.get('https://www.rolimons.com/api/activity')
content = json.loads(r.content.decode())
for key, value in content.items():
print(key)

Give this a go:
for value in content['activities']:
print(value[-2:])
It iterates through activities and prints the last two items of each value.
Or you can collect the prices in a separate list to use later on like so:
prices=[value[-2:] for value in content['activities']]

I recommend using print statements whenever you are not sure of how or why. See below, it might help give a visual of what is going on.
import requests
import json
r = requests.get('https://www.rolimons.com/api/activity')
content = json.loads(r.content.decode())
for key, value in content.items():
print("Key: ", key)
print("content[key]: ", content[key])
for array in content["activities"]:
print("array: ", array)
print("array[len(array)-1]:", array[len(array)-1])
print("array[len(array)-2]:", array[len(array)-2])

Python URL Extractions from Chrome History

I have been using the following code to try and extract URLs from a copy of my chrome history, i have been writing this in PyCharm:
import sqlite3
import os
PATH='C:\\Users\\%s\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\History - Copy' % os.environ.get('USERNAME')
HistCop = sqlite3.connect(PATH)
c = HistCop.cursor()
ccp = c.execute('SELECT url FROM urls ORDER BY "id" DESC LIMIT 5')
ccpp=ccp.fetchall()
print ccpp
My main goal is to open this up at least one url in a browser, but when I use the code:
import webbrowser
url = ccpp[4]
webbrowser.open(url)
I end up with an error. I think it does not work because ...
(u'https://stackoverflow.com/search',)
there is a "u" in front of it.
Please let me know why this happens, if there is a way to get rid of it, or if there is a better way for my goal.

It doesn't work because you're passing a tuple into a function that expects a string. cursor.fetchall() returns a list of tuples (since a row with n elements is represented as an n-tuple), so you just need to get the single element contained in the tuple:
rows = cursor.fetchall()
url = rows[4][0]

sqlite's fetchall method returns a list, which contains an item per row in the result of the query. These items are each a tuple (similar to a list) which contain the field data for that row. So:
ccpp # this is a list
ccpp[4] # this is a tuple
You can tell it's a tuple because the output you printed shows that. If you want the data from the first column, the 'url' column, you need to index it (similar to how you would a list):
ccpp[4][0] # get the first column of the fifth row

Using an IF THEN loop with nested JSON files in Python

I am currently writing a program which uses the ComapaniesHouse API to return a json file containing information about a certain company.
I am able to retrieve the data easily using the following commands:
r = requests.get('https://api.companieshouse.gov.uk/company/COMPANY-NO/filing-history', auth=('API-KEY', ''))
data = r.json()
With that information I can do an awful lot, however I've ran into a problem which I was hoping you guys could possible help me with. What I aim to do is go through every nested entry in the json file and check if the value of certain keys matches certain criteria, if the values of 2 keys match a certain criteria then other code is executed.
One of the keys is the date of an entry, and I would like to ignore results that are older than a certain date, I have attempted to do this with the following:
date_threshold = datetime.date.today() - datetime.timedelta(days=30)``
for each in data["items"]:
date = ['date']
type = ['type']
if date < date_threshold and type is "RM01":
print("wwwwww")
In case it isn't clear, what I'm attempting to do (albeit very badly) is assign each of the entries to a variable, which then gets tested against certain criteria.
Although this doesn't work, python spits out a variable mismatch error:
TypeError: unorderable types: list() < datetime.date()
Which makes me think the date is being stored as a string, and so I can't compare it to the datetime value set earlier, but when I check the API documentation (https://developer.companieshouse.gov.uk/api/docs/company/company_number/filing-history/filingHistoryItem-resource.html), it says clearly that the 'date' entry is returned as a date type.
What am I doing wrong, its very clear that I'm extremely new to python given what I presume is the atrocity of my code, but in my head it seems to make at least a little sense. In case none of this clear, I basically want to go through all the entries in the json file, and the if the date and type match a certain description, then other code can be executed (in this case I have just used random text).
Any help is greatly appreciated! Let me know if you need anything cleared up.
:)
EDIT
After tweaking my code to the below:
for each in data["items"]:
date = each['date']
type = each['type']
if date is '2016-09-15' and type is "RM01":
print("wwwwww")
The code executes without any errors, but the words aren't printed, even though I know there is an entry in the json file with that exact date, and that exact type, any thoughts?
SOLUTION:
Thanks to everyone for helping me out, I had made a couple of very basic errors, the code that works as expected is below::
for each in data["items"]:
date = each['date']
typevariable = each['type']
if date == '2016-09-15' and typevariable == "RM01":
print("wwwwww")
This prints the word "wwwwww" 3 times, which is correct seeing as there are 3 entries in the JSON that fulfil those criteria.

You need to first convert your date variable to a datetime type using datetime.strptime()
You are comparing a list type variable date with datetime type variable date_threshold.

Parsing multiple occurrences of an item into a dictionary

Attempting to parse several separate image links from JSON data through python, but having some issues drilling down to the right level, due to what I believe is from having a list of strings.
For the majority of the items, I've had success with the below example, pulling back everything I need. Outside of this instance, everything is a 1:1 ratio of keys:values, but for this one, there are multiple values associated with one key.
resultsdict['item_name'] = item['attribute_key']
I've been adding it all to a resultsdict={}, but am only able to get to the below sample string when I print.
INPUT:
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures']
OUTPUT (only relevant section):
'images': [{u'VariationSpecificPictureSet': [{u'PictureURL': [u'http//imagelink1'], u'VariationSpecificValue': u'color1'}, {u'PictureURL': [u'http//imagelink2'], u'VariationSpecificValue': u'color2'}, {u'PictureURL': [u'http//imagelink3'], u'VariationSpecificValue': u'color3'}, {u'PictureURL': [u'http//imagelink4'], u'VariationSpecificValue': u'color4'}]
I feel like I could add ['VariationPictureSet']['PictureURL'] at the end of my initial input, but that throws an error due to the indices not being integers, but strings.
Ideally, I would like to see the output as a simple comma-separated list of just the URLs, as follows:
OUTPUT:
'images': http//imagelink1, http//imagelink2, http//imagelink3, http//imagelink4

An answer to your comment that required a bit of code to it.
When using
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures']
you get a list with one element, so I recommend using this
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures'][0]
now you can use
for image in resultsdict['images']['VariationsSpecificPictureSet']:
print(image['PictureUR‌L'])

Thanks for the help, #uzzee, it's appreciated. I kept tinkering with it and was able to pull the continuous string of all the image URLs with the following code.
resultsdict['images'] = sum([x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']],[])
Without the sum it looks like this and pulls in the whole list of lists...
resultsdict['images'] = [x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Web scraping with Python3, and string formatting ? - python

Related

Looping through json.loads(response.text) with Python

Scrape data from JSON

Python URL Extractions from Chrome History

Using an IF THEN loop with nested JSON files in Python

Parsing multiple occurrences of an item into a dictionary

Categories

Resources