There is probably a term for what I'm attempting to do, but it escapes me. I'm using peewee to set some values in a class, and want to iterate through a list of keys and values to generate the command to store the values.
Not all 'collections' contain each of the values within the class, so I want to just include the ones that are contained within my data set. This is how far I've made it:
for value in result['response']['docs']:
for keys in value:
print keys, value[keys] # keys are "identifier, title, language'
#for value in result['response']['docs']:
# collection = Collection(
# identifier = value['identifier'],
# title = value['title'],
# language = value['language'],
# mediatype = value['mediatype'],
# description = value['description'],
# subject = value['subject'],
# collection = value['collection'],
# avg_rating = value['avg_rating'],
# downloads = value['downloads'],
# num_reviews = value['num_reviews'],
# creator = value['creator'],
# format = value['format'],
# licenseurl = value['licenseurl'],
# publisher = value['publisher'],
# uploader = value['uploader'],
# source = value['source'],
# type = value['type'],
# volume = value['volume']
# )
# collection.save()
for value in result['response']['docs']:
Collection(**value).save()
See this question for an explanation on how **kwargs work.
Are you talking about how to find out whether a key is in a dict or not?
>>> somedict = {'firstname': 'Samuel', 'lastname': 'Sample'}
>>> if somedict.get('firstname'):
>>> print somedict['firstname']
Samuel
>>> print somedict.get('address', 'no address given'):
no address given
If there is a different problem you'd like to solve, please clarify your question.
Related
I am trying to list a bunch of azure containers that have a specific name type - they are all called cycling-asset-group-x where x is a number or a letter e.g. cycling-asset-group-a, cycling-asset-group-1, cycling-asset-group-b, cycling-asset-group-2.
I only want to print the containers with a number in the suffix i.e. cycling-asset-group-1, cycling-asset-group-2 etc
How can I do this? Here's where I am up to so far:
account_name = 'name'
account_key = 'key'
# connect to the storage account
blob_service = BaseBlobService(account_name = account_name, account_key = account_key)
prefix_input_container = 'cycling-asset-group-'
# get a list of the containers - I think it's something like this...?
cycling_containers = blob_service.list_containers("%s%d" % (prefix_input_container,...))
for c in cycling_containers:
contname = c.name
print(contname)
Just pass your prefix_input_container value to the parameter prefix of the method list_containers of BaseBlobService, as the code below. Please see the API reference BaseBlobService.list_containers.
list_containers(prefix=None, num_results=None, include_metadata=False, marker=None, timeout=None)[source]
Parameters:
prefix (str) – Filters the results to return only containers whose names begin with the specified prefix.
prefix_input_container = 'cycling-asset-group-'
cycling_containers = blob_service.list_containers(prefix=prefix_input_container)
# Import regex module to filter the results
import re
re_expression = r"%s\d+$" % prefix_input_container
pattern = re.compile(re_expression)
# There are two ways.
# No.1 Create a generator from the generator of cycling_containers
filtered_cycling_container_names = (c.name for c in cycling_containers if pattern.match(c.name))
for contname in filtered_cycling_container_names:
print(contname)
# No.2 Create a name list
contnames = [c.name for c in cycling_containers if pattern.match(c.name)]
print(contnames)
Im working on a small project of retrieving information about books from the Google Books API using Python 3. For this i make a call to the API, read out the variables and store those in a list. For a search like "linkedin" this works perfectly. However when i enter "Google", it reads the second title from the JSON input. How can this happen?
Please find my code below (Google_Results is the class I use to initialize the variables):
import requests
def Book_Search(search_term):
parms = {"q": search_term, "maxResults": 3}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
print(r.url)
results = r.json()
i = 0
for result in results["items"]:
try:
isbn13 = str(result["volumeInfo"]["industryIdentifiers"][0]["identifier"])
isbn10 = str(result["volumeInfo"]["industryIdentifiers"][1]["identifier"])
title = str(result["volumeInfo"]["title"])
author = str(result["volumeInfo"]["authors"])[2:-2]
publisher = str(result["volumeInfo"]["publisher"])
published_date = str(result["volumeInfo"]["publishedDate"])
description = str(result["volumeInfo"]["description"])
pages = str(result["volumeInfo"]["pageCount"])
genre = str(result["volumeInfo"]["categories"])[2:-2]
language = str(result["volumeInfo"]["language"])
image_link = str(result["volumeInfo"]["imageLinks"]["thumbnail"])
dict = Google_Results(isbn13, isbn10, title, author, publisher, published_date, description, pages, genre,
language, image_link)
gr.append(dict)
print(gr[i].title)
i += 1
except:
pass
return
gr = []
Book_Search("Linkedin")
I am a beginner to Python, so any help would be appreciated!
It does so because there is no publisher entry in volumeInfo of the first entry, thus it raises a KeyError and your except captures it. If you're going to work with fuzzy data you have to account for the fact that it will not always have the expected structure. For simple cases you can rely on dict.get() and its default argument to return a 'valid' default entry if an entry is missing.
Also, there are a few conceptual problems with your function - it relies on a global gr which is bad design, it shadows the built-in dict type and it captures all exceptions guaranteeing that you cannot exit your code even with a SIGINT... I'd suggest you to convert it to something a bit more sane:
def book_search(search_term, max_results=3):
results = [] # a list to store the results
parms = {"q": search_term, "maxResults": max_results}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
try: # just in case the server doesn't return valid JSON
for result in r.json().get("items", []):
if "volumeInfo" not in result: # invalid entry - missing volumeInfo
continue
result_dict = {} # a dictionary to store our discovered fields
result = result["volumeInfo"] # all the data we're interested is in volumeInfo
isbns = result.get("industryIdentifiers", None) # capture ISBNs
if isinstance(isbns, list) and isbns:
for i, t in enumerate(("isbn10", "isbn13")):
if len(isbns) > i and isinstance(isbns[i], dict):
result_dict[t] = isbns[i].get("identifier", None)
result_dict["title"] = result.get("title", None)
authors = result.get("authors", None) # capture authors
if isinstance(authors, list) and len(authors) > 2: # you're slicing from 2
result_dict["author"] = str(authors[2:-2])
result_dict["publisher"] = result.get("publisher", None)
result_dict["published_date"] = result.get("publishedDate", None)
result_dict["description"] = result.get("description", None)
result_dict["pages"] = result.get("pageCount", None)
genres = result.get("authors", None) # capture genres
if isinstance(genres, list) and len(genres) > 2: # since you're slicing from 2
result_dict["genre"] = str(genres[2:-2])
result_dict["language"] = result.get("language", None)
result_dict["image_link"] = result.get("imageLinks", {}).get("thumbnail", None)
# make sure Google_Results accepts keyword arguments like title, author...
# and make them optional as they might not be in the returned result
gr = Google_Results(**result_dict)
results.append(gr) # add it to the results list
except ValueError:
return None # invalid response returned, you may raise an error instead
return results # return the results
Then you can easily retrieve as much info as possible for a term:
gr = book_search("Google")
And it will be far more tolerant of data omissions, provided that your Google_Results type makes most of the entries optional.
Following #Coldspeed's recommendation it became clear that missing information in the JSON file caused the exception to run. Since I only had a "pass" statement there it skipped the entire result. Therefore I will have to adapt the "Try and Except" statements so errors do get handled properly.
Thanks for the help guys!
So I'm trying to compare a dict that I have created to a dict response returned by a boto3 call.
The response is a representation of a JSON document and I want to check they are the same.
Boto3 always returned the strings as unicode. Here's the response:
{u'Version': u'2012-10-17', u'Statement': [{u'Action': u'sts:AssumeRole', u'Principal': {u'Service': u'ec2.amazonaws.com'}, u'Effect': u'Allow', u'Sid': u''}]}
I initially created my dict like this:
default_documment = {}
default_documment['Version'] = '2012-10-17'
default_documment['Statement'] = [{}]
default_documment['Statement'][0]['Sid'] = ''
default_documment['Statement'][0]['Effect'] = 'Allow'
default_documment['Statement'][0]['Principal'] = {}
default_documment['Statement'][0]['Principal']['Service'] = 'ec2.amazonaws.com'
default_documment['Statement'][0]['Action'] = 'sts:AssumeRole'
However, when i compare these two dicts with == they are not equal.
So then I tried adding u to all the strings when I create the dict:
# Default document for a new role
default_documment = {}
default_documment[u'Version'] = u'2012-10-17'
default_documment[u'Statement'] = [{}]
default_documment[u'Statement'][0][u'Sid'] = u''
default_documment[u'Statement'][0][u'Effect'] = u'Allow'
default_documment[u'Statement'][0][u'Principal'] = {}
default_documment[u'Statement'][0][u'Principal'][u'Service'] = u'ec2.amazonaws.com'
default_documment[u'Statement'][0][u'Action'] = u'sts:AssumeRole'
This doesn't work either. The dicts are not equally and if i do a print of my dict it doesn't show u'somestring' it just shows 'somestring'.
How can I compare my dict to what boto3 has returned?
Your second attempt works correctly in Python 2.7 and 3.3. Below is just a cut-and-paste of your Boto3 response and your code (with document spelling corrected :)
D = {u'Version': u'2012-10-17', u'Statement': [{u'Action': u'sts:AssumeRole', u'Principal': {u'Service': u'ec2.amazonaws.com'}, u'Effect': u'Allow', u'Sid': u''}]}
default_document = {}
default_document[u'Version'] = u'2012-10-17'
default_document[u'Statement'] = [{}]
default_document[u'Statement'][0][u'Sid'] = u''
default_document[u'Statement'][0][u'Effect'] = u'Allow'
default_document[u'Statement'][0][u'Principal'] = {}
default_document[u'Statement'][0][u'Principal'][u'Service'] = u'ec2.amazonaws.com'
default_document[u'Statement'][0][u'Action'] = u'sts:AssumeRole'
print(D == default_document)
Output:
True
I have some log files that look like many lines of the following:
<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>
<tickSize tickerId=0, field=3, size=25>
<tickSize tickerId=0, field=8, size=534349>
<tickPrice tickerId=0, field=2, price=201.82, canAutoExecute=1>
I need to define a class of type tickPrice or tickSize. I will need to decide which to use before doing the definition.
What would be the Pythonic way to grab these values? In other words, I need an effective way to reverse str() on a class.
The classes are already defined and just contain the presented variables, e.g., tickPrice.tickerId. I'm trying to find a way to extract these values from the text and set the instance attributes to match.
Edit: Answer
This is what I ended up doing-
with open(commandLineOptions.simulationFilename, "r") as simulationFileHandle:
for simulationFileLine in simulationFileHandle:
(date, time, msgString) = simulationFileLine.split("\t")
if ("tickPrice" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickPrice()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.price = float(msgList[3][6:])
msg.canAutoExecute = int(msgList[4][15:])
elif ("tickSize" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickSize()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.size = int(msgList[3][5:])
else:
print "Unsupported tick message type"
I'm not sure how you want to dynamically create objects in your namespace, but the following will at least dynamically create objects based on your loglines:
Take your line:
line = '<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>'
Remove chars that aren't interesting to us, then split the line into a list:
line = line.translate(None, ''.join('<>,'))
line = line.split(' ')
Name the potential class attributes for convenience:
line_attrs = line[1:]
Then create your object (name, base tuple, dictionary of attrs):
tickPriceObject = type(line[0], (object,), { key:value for key,value in [at.split('=') for at in line_attrs]})()
Prove it works as we'd expect:
print(tickPriceObject.field)
# 2
Approaching the problem with regex, but with the same result as tristan's excellent answer (and stealing his use of the type constructor that I will never be able to remember)
import re
class_instance_re = re.compile(r"""
<(?P<classname>\w[a-zA-Z0-9]*)[ ]
(?P<arguments>
(?:\w[a-zA-Z0-9]*=[0-9.]+[, ]*)+
)>""", re.X)
objects = []
for line in whatever_file:
result = class_instance_re.match(line)
classname = line.group('classname')
arguments = line.group('arguments')
new_obj = type(classname, (object,),
dict([s.split('=') for s in arguments.split(', ')]))
objects.append(new_obj)
I'm writing a python scraper code for OpenData and I have one question about : how to check if all values aren't filled in site and if it is null change value to null.
My scraper is here.
Currently I'm working on it to optimalize.
My variables now look like:
evcisloval = soup.find_all('td')[3].text.strip()
prinalezival = soup.find_all('td')[5].text.strip()
popisfaplnenia = soup.find_all('td')[7].text.replace('\"', '')
hodnotafaplnenia = soup.find_all('td')[9].text[:-1].replace(",", ".").replace(" ", "")
datumdfa = soup.find_all('td')[11].text
datumzfa = soup.find_all('td')[13].text
formazaplatenia = soup.find_all('td')[15].text
obchmenonazov = soup.find_all('td')[17].text
sidlofirmy = soup.find_all('td')[19].text
pravnaforma = soup.find_all('td')[21].text
sudregistracie = soup.find_all('td')[23].text
ico = soup.find_all('td')[25].text
dic = soup.find_all('td')[27].text
cislouctu = soup.find_all('td')[29].text
And Output :
scraperwiki.sqlite.save(unique_keys=["invoice_id"],
data={ "invoice_id":number,
"invoice_price":hodnotafaplnenia,
"evidence_no":evcisloval,
"paired_with":prinalezival,
"invoice_desc":popisfaplnenia,
"date_received":datumdfa,
"date_payment":datumzfa,
"pay_form":formazaplatenia,
"trade_name":obchmenonazov,
"trade_form":pravnaforma,
"company_location":sidlofirmy,
"court":sudregistracie,
"ico":ico,
"dic":dic,
"accout_no":cislouctu,
"invoice_attachment":urlfa,
"invoice_url":url})
I googled it but without success.
First, write a configuration dict of your variables in the form:
conf = {'evidence_no': (3, str.strip),
'trade_form': (21, None),
...}
i.e. key is the output key, value is a tuple of id from soup.find_all('td') and of an optional function that has to be applied to the result, None otherwise. You don't need those Slavic variable names that may confuse other SO members.
Then iterate over conf and fill the data dict.
Also, run soup.find_all('td') before the loop.
tds = soup.find_all('td')
data = {}
for name, (num, func) in conf.iteritems():
text = tds[num].text
# replace text with None or "NULL" or whatever if needed
...
if func is None:
data[name] = text
else:
data[name] = func(text)
This will remove a lot of duplicated code. Easier to maintain.
Also, I am not sure the strings "NULL" are the best way to write missing data. Doesn't sqlite support Python's real None objects?
Just read your attached link, and it seems what you want is
evcisloval = soup.find_all('td')[3].text.strip() or "NULL"
But be careful. You should only do this with strings. If the part before or is either empty or False or None, or 0, they will all be replaced with "NULL"