I'm trying to add the JSON output below into a dictionary, to be saved into a SQL database.
{'Parkirisca': [
{
'ID_Parkirisca': 2,
'zasedenost': {
'Cas': '2016-10-08 13:17:00',
'Cas_timestamp': 1475925420,
'ID_ParkiriscaNC': 9,
'P_kratkotrajniki': 350
}
}
]}
I am currently using the following code to add the value to a dictionary:
import scraperwiki
import json
import requests
import datetime
import time
from pprint import pprint
html = requests.get("http://opendata.si/promet/parkirisca/lpt/")
data = json.loads(html.text)
for carpark in data['Parkirisca']:
zas = carpark['zasedenost']
free_spaces = zas.get('P_kratkotrajniki')
last_updated = zas.get('Cas_timestamp')
parking_type = carpark.get('ID_Parkirisca')
if parking_type == "Avtomatizirano":
is_automatic = "Yes"
else:
is_automatic = "No"
scraped = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
savetodb = {
'scraped': scraped,
'id': carpark.get("ID_Parkirisca"),
'total_spaces': carpark.get("St_mest"),
'free_spaces': free_spaces,
'last_updated': last_updated,
'is_automatic': is_automatic,
'lon': carpark.get("KoordinataX_wgs"),
'lat': carpark.get("KoordinataY_wgs")
}
unique_keys = ['id']
pprint savetodb
However when I run this, it gets stuck at for zas in carpark["zasedenost"] and outputs the following error:
Traceback (most recent call last):
File "./code/scraper", line 17, in <module>
for zas in carpark["zasedenost"]:
KeyError: 'zasedenost'
I've been led to believe that zas is in fact now a string, rather than a dictionary, but I'm new to Python and JSON, so don't know what to search for to get a solution. I've also searched here on Stack Overflow for KeyErrror when key exist questions, but they didn't help, and I believe that this might be due to the fact that's a sub for loop.
Update: Now, when I swapped the double quotes for single quotes, I get the following error:
Traceback (most recent call last):
File "./code/scraper", line 17, in <module>
free_spaces = zas.get('P_kratkotrajniki')
AttributeError: 'unicode' object has no attribute 'get'
I fixed up your code:
Added required imports.
Fixed the pprint savetodb line which isn't valid Python.
Didn't try to iterate over carpark['zasedenost'].
I then added another pprint statement in the for loop to see what's in carpark when the KeyError occurs. From there, the error is clear. (Not all the elements in the array in your JSON contain the 'zasedenost' key.)
Here's the code I used:
import datetime
import json
from pprint import pprint
import time
import requests
html = requests.get("http://opendata.si/promet/parkirisca/lpt/")
data = json.loads(html.text)
for carpark in data['Parkirisca']:
pprint(carpark)
zas = carpark['zasedenost']
free_spaces = zas.get('P_kratkotrajniki')
last_updated = zas.get('Cas_timestamp')
parking_type = carpark.get('ID_Parkirisca')
if parking_type == "Avtomatizirano":
is_automatic = "Yes"
else:
is_automatic = "No"
scraped = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
savetodb = {
'scraped': scraped,
'id': carpark.get("ID_Parkirisca"),
'total_spaces': carpark.get("St_mest"),
'free_spaces': free_spaces,
'last_updated': last_updated,
'is_automatic': is_automatic,
'lon': carpark.get("KoordinataX_wgs"),
'lat': carpark.get("KoordinataY_wgs")
}
unique_keys = ['id']
pprint(savetodb)
And here's the output on the iteration where the KeyError occurs:
{u'A_St_Mest': None,
u'Cena_dan_Eur': None,
u'Cena_mesecna_Eur': None,
u'Cena_splosno': None,
u'Cena_ura_Eur': None,
u'ID_Parkirisca': 7,
u'ID_ParkiriscaNC': 72,
u'Ime': u'P+R Studenec',
u'Invalidi_St_mest': 9,
u'KoordinataX': 466947,
u'KoordinataX_wgs': 14.567929171694901,
u'KoordinataY': 101247,
u'KoordinataY_wgs': 46.05457609543313,
u'Opis': u'2,40 \u20ac /dan',
u'St_mest': 187,
u'Tip_parkirisca': None,
u'U_delovnik': u'24 ur (ponedeljek - petek)',
u'U_sobota': None,
u'U_splosno': None,
u'Upravljalec': u'JP LPT d.o.o.'}
Traceback (most recent call last):
File "test.py", line 14, in <module>
zas = carpark['zasedenost']
KeyError: 'zasedenost'
As you can see, the error is quite accurate. There's no key 'zasedenost' in the dictionary. If you look through your JSON, you'll see that's true for a number of the elements in that array.
I'd suggest a fix, but I don't know what you want to do in the case where this dictionary key is absent. Perhaps you want something like this:
zas = carpark.get('zasedenost')
if zas is not None:
free_spaces = zas.get('P_kratkotrajniki')
last_updated = zas.get('Cas_timestamp')
else:
free_spaces = None
last_updated = None
Related
Can someone help me fix this, I don't know why I am getting this error.
I am trying to use a python program someone made, I tried to mess around with it but I could not figure out the issue.
Error:
PS D:\Python> python .\quizlet.py
Traceback (most recent call last):
File "D:\Python\quizlet.py", line 69, in <module>
q = QuizletParser(website)
File "D:\Python\quizlet.py", line 17, in QuizletParser
data = json.loads(BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152])
File "C:\Users\john\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\john\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 14 (char 13)
I am trying to use the code I found here from a bit ago: https://github.com/daijro/python-quizlet
Source:
from requests_html import HTMLSession
from box import Box
import box
import json
from bs4 import BeautifulSoup
from difflib import SequenceMatcher
def FindFlashcard(flashcards: box.box_list.BoxList, match: str):
similar = lambda a, b: SequenceMatcher(None, a, b).ratio()
data = max(list(zip([similar(match, x.term) for x in flashcards], [x for x in range(len(flashcards))])))
flashcard = flashcards[data[1]]
flashcard.update({'similarity': data[0]})
return flashcard
def QuizletParser(link: str):
session = HTMLSession()
data = json.loads(BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152])
flashcards = []
for i in list(data['termIdToTermsMap'].values()):
i = {
'index': i['rank'],
'id': i['id'],
'term': i['word'],
'definition': i['definition'],
'setId': i['setId'],
'image': i['_imageUrl'],
'termTts': 'https://quizlet.com'+i['_wordTtsUrl'],
'termTtsSlow': 'https://quizlet.com'+i['_wordSlowTtsUrl'],
'definitionTts': 'https://quizlet.com'+i['_definitionTtsUrl'],
'definitionTtsSlow': 'https://quizlet.com'+i['_definitionSlowTtsUrl'],
'lastModified': i['lastModified'],
}
flashcards.append(i)
output = {
'title': data['set']['title'],
'flashcards': flashcards,
'author': {
'name': data['creator']['username'],
'id': data['creator']['id'],
'timestamp': data['creator']['timestamp'],
'lastModified': data['creator']['lastModified'],
'image': data['creator']['_imageUrl'],
'timezone': data['creator']['timeZone'],
'isAdmin': data['creator']['isAdmin'],
},
'id': data['set']['id'],
'link': data['set']['_webUrl'],
'thumbnail': data['set']['_thumbnailUrl'],
'timestamp': data['set']['timestamp'],
'lastModified': data['set']['lastModified'],
'publishedTimestamp': data['set']['publishedTimestamp'],
'authorsId': data['set']['creatorId'],
'termLanguage': data['set']['wordLang'],
'definitionLanguage': data['set']['defLang'],
'description': data['set']['description'],
'numTerms': data['set']['numTerms'],
'hasImages': data['set']['hasImages'],
'hasUploadedImage': data['hasUploadedImage'],
'hasDiagrams': data['set']['hasDiagrams'],
'hasImages': data['set']['hasImages'],
}
return Box(output)
website = 'https://quizlet.com/475389316/python-web-scraping-flash-cards/'
text = 'Two popular parsers'
q = QuizletParser(website)
flashcard = FindFlashcard(q.flashcards, match=text) # finds the flashcard most similar to the input
print(flashcard.term + " " + flashcard.definition) # calculates how similar the identified flashcard is to the input
Hard to give a solution without looking at the data.
A few tips for debugging JSON errors:
Check the input data to the JSONDecoder. You might be adding a comma to the last key-value pair of the input dictionary (which is very common).
Check the data type. If your input data came from an external source check the data first.
I would suggest doing a print of this and pasting it here if possible.
input_data = BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152]
print(input_data)
I'm getting the following error when executing the following script:
Error Type: <type 'exceptions.TypeError'>
Error Contents: 'NoneType' object is not iterable
Traceback (most recent call last):
File "addon.py", line 75, in <module>
plugin.run()
File "xbmcswift2/plugin.py", line 332, in run
items = self._dispatch(self.request.path)
File "/plugin.py", line 306, in _dispatch
listitems = view_func(**items)
File "/addon.py", line 42, in all_episodes
items = thisiscriminal.compile_playable_podcast(playable_podcast)
File "/lib/thisiscriminal.py", line 121, in compile_playable_podcast
for podcast in playable_podcast:
TypeError: 'NoneType' object is not iterable
The code in question is as follows, any advice would be greatly appreciated as I have no idea what I'm doing wrong:
def get_playable_podcast(soup):
"""
#param: parsed html page
"""
r = urllib.urlopen('https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=10000&page=1')
data = json.loads(r.read().decode('utf-8'))
for post in data['posts']:
print post['title']
print post['episodeNumber']
print post['audioSource']
print post['image']['medium']
subjects = []
item = {
'title': post['title'],
'audioSource': post['audioSource'],
'episodeNumber': post['episodeNumber'],
'medium': post['image']['medium']
}
subjects.append(item)
print subjects
def compile_playable_podcast(playable_podcast):
"""
#para: list containing dict of key/values pairs for playable podcasts
"""
items = []
for podcast in playable_podcast:
items.append({
post['title']: podcast['title']['episodeNumber'],
post['audioSource']: podcast['audioSource'],
post['image']['medium']: podcast['medium'],
'is_playable': True,})
return items
I assume your script does something alike to the following,
podcast = get_playable_podcast(soup)
compiled = compile_playable_podcast(podcast)
The problem is that get_playable_podcast has no return statement. In such a case, Python defaults to returning None - which you then pass into compile_playable_podcast. Since None is not iterable, compile_playable_podcast rightfully raises a TypeError.
Now, the solution is of course to return the podcast list you're building in get_playable_podcast, like so,
def get_playable_podcast(soup):
"""
#param: parsed html page
"""
r = urllib.urlopen('https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=10000&page=1')
data = json.loads(r.read().decode('utf-8'))
subjects = []
for post in data['posts']:
print post['title']
print post['episodeNumber']
print post['audioSource']
print post['image']['medium']
item = {
'title': post['title'],
'audioSource': post['audioSource'],
'episodeNumber': post['episodeNumber'],
'medium': post['image']['medium']
}
subjects.append(item)
print subjects
return subjects
Beside this, it may be worthwhile to carefully check your script for unused parameters and/or duplicate code.
I got an error,TypeError: 'int' object is not subscriptable .
I wanna connect 2 excel data
to User model.
So my ideal output is
1|1|Blear|40|false|l|America|A|1
2|5|Tom|23|true|o|UK|A|3
3|9|Rose|52|false|m
4|10|Karen||||Singapore|C|2
For example,Rose data of user_id=3 is not in second excel, in that case being 2nd data empty is ok.I am thinking putting 2nd excel in dictionary type to User model.
I searched the errors I thought this part for data in data_dict was wrong, I changed it into for data in range(len(data_dict)) but same error happens.I really cannot understand where is wrong.How should I fix this?
Now views.py is
#coding:utf-8
from django.shortcuts import render
import xlrd
from .models import User
book = xlrd.open_workbook('../data/excel1.xlsx')
sheet = book.sheet_by_index(1)
def build_employee(employee):
if employee == 'leader':
return 'l'
if employee == 'manager':
return 'm'
if employee == 'others':
return 'o'
for row_index in range(sheet.nrows):
rows = sheet.row_values(row_index)
is_man = rows[4] != ""
emp = build_employee(rows[5])
user = User(user_id=rows[1], name_id=rows[2], name=rows[3],
age=rows[4],man=is_man,employee=emp)
user.save()
book2 = xlrd.open_workbook('../data/excel2.xlsx')
sheet2 = book2.sheet_by_index(0)
headers = sheet2.row_values(0)
large_item = None
data_dict = {}
for row_index in range(sheet2.nrows):
rows2 = sheet2.row_values(row_index)
large_item = rows2[1] or large_item
# Create dict with headers and row values
row_data = {}
for idx_col, value in enumerate(rows2):
header_value = headers[idx_col]
# Avoid to add empty column. A column in your example
if header_value:
row_data[headers[idx_col]] = value
# Add row_data to your data_dict with
data_dict[row_index] = row_data
for row_number, row_data in data_dict.items():
user1 = User.objects.filter(user_id = data['user_id']).exists()
if user1:
user1.__dict__.update(**data_dict)
user1.save()
Now Traceback is
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/XXX/testapp/app/views.py", line 123, in <module>
user1 = User.objects.filter(user_id = row_data['user_id']).exists()
KeyError: 'user_id'
data is an integer. So calling data like a dict raises that expection.
>>> a=1
>>> a['a']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not subscriptable
Why is it an int ? Because you're iterating over data's keys:
>>> a={1: 'x', 2: 'c'}
>>> for i in a: print(i)
...
1
2
Try using items() as such:
>>> for key, value in a.items(): print(key, value)
...
1 x
2 c
Or, in your specific case:
for row_number, row_data in data_dict.items():
print(row_number, row_data)
See looping techniques for dict documentation for details.
Why does this work:
import pymongo
from selenium import webdriver
import smtplib
import sys
import json
from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client.properties
collection = db['capitalpacific']
fromDB = []
if collection.count() != 0:
for post in collection.find():
fromDB.append(post)
print(fromDB[0]['url'])
correctly prints url only from document 1 of collection (xxx.com)
but I get a keyError when I do this:
for i in range(0, 2):
print(fromDB[i]['url'}
KeyError: 'url'
The documents stored in the DB look like so :
{'url':'xxx.com', 'location':'oregon'}
KeyError generally means the key doesn't exist in the dictionary collection.
For example :
>>> mydoc1=dict(url='xxx.com', location='oregon')
>>> mydoc2=dict(wrongkey='yyy.com', location='oregon')
>>> mylist=[]
>>> mylist.append(mydoc1)
>>> mylist.append(mydoc2)
>>> print mylist[0]['url']
xxx.com
>>> for i in range(0, 2):
... print(mylist[i]['url'])
...
xxx.com
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
KeyError: 'url'
>>>
Here, mydoc2 doesn't have a key called 'url', hence the "KeyError" is being raised for the second element in the list.
So,are you sure 'url' exist in first two records. can you print the contents of "fromDB" and make
sure that first two records has 'url' key.
>>> print mylist
[{'url': 'xxx.com', 'location': 'oregon'}, {'wrongkey': 'yyy.com', 'location': 'oregon'}]
I'm using python's colander library for validation. In my code there is a createdon field of colander.DateTime() type. When I'm providing it a value of datetime.datetime.now() it is failing with exception saying that createdon field has invalid date. What may be the problem?
Here is the code of python module :
import colander
import htmllaundry
import pymongo
import random
import datetime
import hashlib
from pymongo import Connection
# Database Connection
HOST = "localhost"
PORT = 27017
DATABASE = "testdb"
conn = Connection(HOST, PORT)
db = conn[DATABASE]
# function to generate random string
def getRandomString(wordLen):
word = ''
for i in range(wordLen):
word += random.choice('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789')
return word
# Colander classe for User object
class User(colander.MappingSchema):
username = colander.SchemaNode(colander.String(),validator=colander.Length(3,100))
email = colander.SchemaNode(colander.String(),validator=colander.All(colander.Email(),colander.Length(3,254)))
password = colander.SchemaNode(colander.String(),validator=colander.Length(8,100))
isactive = colander.SchemaNode(colander.Boolean())
code = colander.SchemaNode(colander.String(),validator=colander.Length(64,64))
name = colander.SchemaNode(colander.String(),validator=colander.Length(3,100))
picture = colander.SchemaNode(colander.String())
about = colander.SchemaNode(colander.String(),preparer=htmllaundry.sanitize,validator=colander.Length(0,1024))
ipaddress = colander.SchemaNode(colander.String())
createdon = colander.SchemaNode(colander.DateTime())
updatedon = colander.SchemaNode(colander.DateTime())
status = colander.SchemaNode(colander.Int(),validator=colander.Range(0,4)) #0->active, 1->Deleted, 2->Suspended, 3->Deactivated
# getUser(username)
def getUser(username):
user = db.users.find_one({"username" : username })
return user
# getUserByEmail(email)
def getUserByEmail(email):
user = db.users.find_one({"email" : email })
return user
# createUser(userdata) #accepts a dictionary argument
def createUser(userdata):
schema = User() # generate schema object for User Validation Class
try:
# set current date/time in createdon/updatedon
userdata['createdon'] = datetime.datetime.now()
userdata['updatedon'] = userdata['createdon']
# generate unique activation code, set isactive to False, and set status to 0
randomCode = getRandomString(64)
userdata['code'] = hashlib.sha256(randomCode).hexdigest()
userdata['isactive'] = False
userdata['status'] = 0
# validate and deserialize userdata
schema.deserialize(userdata)
# sha256 the password
userdata['password'] = hashlib.sha256(userdata['password']).hexdigest()
# save the userdata object in mongodb database
result = db.users.insert(userdata)
# return the result of database operation and final userdata object
return result, userdata
except colander.Invalid, e:
errors = e.asdict()
return errors, userdata
Here is how I'm using it in test.py :
import usermodule
UserObject = {
'username':'anuj',
'email': 'anuj.kumar#gmail.com',
'password':'testpassword',
'name':'Anuj Kumar',
'picture':'/data/img/1.jpg',
'about':'Hacker & Designer, New Delhi',
'ipaddress':'127.0.0.1'
}
result, data = usermodule.createUser(UserObject)
print result
print data
and I'm getting following error :
anuj#anuj-Vostro-1450:~/Projects/test$ python test.py
{'createdon': u'Invalid date', 'updatedon': u'Invalid date'}
{'username': 'anuj', 'picture': '/data/img/1.jpg', 'about': 'Hacker & Designer, New Delhi', 'code': 'd6450b49e760f96256886cb24c2d54e8e8033293c479ef3976e6cbeabbd9d1f1', 'name': 'Anuj Kumar', 'updatedon': datetime.datetime(2012, 9, 14, 16, 16, 32, 311705), 'createdon': datetime.datetime(2012, 9, 14, 16, 16, 32, 311705), 'status': 0, 'password': 'testpassword', 'ipaddress': '127.0.0.1', 'email': 'anuj.kumar#gmail.com', 'isactive': False}
You are deserializing a cstruct, and it contains a datetime.datetime instance. Colander expects only simple types such as int, float, str, etc.
If you make your datetime value a ISO-formatted string, things work just fine:
>>> import datetime
>>> import colander
>>> colander.SchemaNode(colander.DateTime()).deserialize(datetime.datetime.now())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/private/tmp/colander/lib/python2.7/site-packages/colander/__init__.py", line 1598, in deserialize
appstruct = self.typ.deserialize(self, cstruct)
File "/private/tmp/colander/lib/python2.7/site-packages/colander/__init__.py", line 1265, in deserialize
mapping={'val':cstruct, 'err':e}))
colander.Invalid: {'': u'Invalid date'}
>>> colander.SchemaNode(colander.DateTime()).deserialize(datetime.datetime.now().isoformat())
datetime.datetime(2012, 9, 14, 13, 5, 37, 666630, tzinfo=<colander.iso8601.Utc object at 0x109732d50>)
Note that datetime.datetime.now() does not include timezone information, nor would .isoformat() preserve that. Colander, on the other hand, parses the ISO 8601 string as a timestamp in the UTC timezone, so you want to generate your timestamp in the same timezone by using the datetime.datetime.utcnow() class method instead:
>>> colander.SchemaNode(colander.DateTime()).deserialize(datetime.datetime.utcnow().isoformat())
datetime.datetime(2012, 9, 14, 11, 23, 25, 695256, tzinfo=<colander.iso8601.Utc object at 0x1005aaf10>)
So, replace
userdata['createdon'] = datetime.datetime.now()
with
userdata['createdon'] = datetime.datetime.utcnow().isoformat()
and colander will happily parse that for you, using the correct timezone.