I have a huge list like below, where i'm trying to access each value. However i have a hard time retrieving \xe6ndret, since it has special characters. How can access the column when there is unicode characters in the key?
vejstykker = [{u'navngivenvej_id': u'4fb0b0a2-8be7-4254-90d5-fc1af6eb111c', u'kode': u'0007', u'oprettet': u'2019-04-03T13:30:37.031', u'kommunekode': u'0101', u'navn': u'Dompapvej', u'adresseringsnavn': u'Dompapvej', u'\xe6ndret': None, u'id': u'097ae470-9532-4d67-8cb9-6420c601fc24'}]
i have tried doing something like below:
for vejstykke in vejstykker:
created_at = datetime.datetime.strptime(vejstykke['oprettet'],'%Y-%m-%dT%H:%M:%S.%f');
id = vejstykke['id']
kommunekode = vejstykke['kommunekode']
kode = vejstykke['kode']
navn = vejstykke['navn']
adresseringsnavn = vejstykke['adresseringsnavn']
navngivenvej_id = vejstykke['navngivenvej_id']
changed_at = vejstykke['ændret']
however i get below errror:
Traceback (most recent call last):\n File "", line 28, in run\nKeyError: \'\\xc3\\xa6ndret\'\n'
Related
import boto3
client = boto3.client('secretsmanager')
response = client.get_secret_value(SecretId='xxxx')
print('entire response:', response)
print('SecretString:',response['SecretString'])
print('testvalue:'response['SecretString']["testkey"])
I am trying to implement aws secretsmanager and need to acces the testvalue.
entire response:{---, u'SecretString': u'{"testkey":"testvalue","testkey2":"testvalue2"}', ----}
Secretstring:{"testkey":"testvalue","testkey2":"testvalue2"}
Traceback (most recent call last):
File "secretmanagertest.py", line 7, in <module>
print('testvalue',response['SecretString']["testkey"])
TypeError: string indices must be integers
When I try integer instead I only get the specific character.
print(response['SecretString'][0])
{
print(response['SecretString'][1])
"
print(response['SecretString'][2])
t
etc.
The response is a nested JSON document, not a dictionary yet. Decode it first, with json.loads():
import json
secret = json.loads(response['SecretString'])
print(secret['testkey'])
Demo:
>>> import json
>>> response = {u'SecretString': u'{"testkey":"testvalue","testkey2":"testvalue2"}'}
>>> response['SecretString']
u'{"testkey":"testvalue","testkey2":"testvalue2"}'
>>> json.loads(response['SecretString'])
{u'testkey2': u'testvalue2', u'testkey': u'testvalue'}
>>> json.loads(response['SecretString'])['testkey']
u'testvalue'
i have a nested dictionary in the form of:
self.emoji_per_word = {0: {'worte': 0, 'emojis': 0, '#': 0}}
Now i need to add more sub dictionaries to this as my program runs. I do this:
worte = 0
emoji = 0
# some code that assigns values to the 2 variables and creates the time_stamp variable
if time_stamp in self.emoji_per_word:
self.emoji_per_word[time_stamp]['worte'] = self.emoji_per_word[time_stamp]['worte'] + worte
self.emoji_per_word[time_stamp]['emojis'] = self.emoji_per_word[time_stamp]['emojis'] + emojis
else:
self.emoji_per_word[time_stamp]['worte'] = worte
self.emoji_per_word[time_stamp]['emojis'] = emojis
As you can see, i try to the test if the key time_stamp already exists and if yes, update the value with the new data. If not i want to create the key time_stamp and assign it a inital value. However im getting a Key Error once the programm goes past the inital value (see top).
Exception in thread Video 1:
Traceback (most recent call last):
File "C:\Anaconda\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "C:\MA\Code\jsonparser_v2\jsonparser_v2.py", line 418, in run
self.process_json()
File "C:\MA\Code\jsonparser_v2\jsonparser_v2.py", line 201, in process_json
self.emoji_per_word[time_stamp]['worte'] = worte
KeyError: 1
What I want in the end is something like this:
self.emoji_per_word = {0: {'worte': 20, 'emojis': 5, '#':0.25}, 1: {'worte': 20, 'emojis': 5, '#':0.25}}
What am I doing wrong here?
You're getting the error because self.emoji_per_word[time_stamp] doesn't exist when time_stamp != 0 so you need to create the dictionary first before assigning values to it, like so:
else:
self.emoji_per_word[time_stamp] = {}
self.emoji_per_word[time_stamp]['worte'] = worte
self.emoji_per_word[time_stamp]['emojis'] = emojis
My script migrates data from MySQL to mongodb. It runs perfectly well when there are no unicode columns included. But throws me below error when OrgLanguages column is added.
mongoImp = dbo.insert_many(odbcArray)
File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/collection.py", line 711, in insert_many
blk.execute(self.write_concern.document)
File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/bulk.py", line 493, in execute
return self.execute_command(sock_info, generator, write_concern)
File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/bulk.py", line 319, in execute_command
run.ops, True, self.collection.codec_options, bwc)
bson.errors.InvalidStringData: strings in documents must be valid UTF-8: 'Portugu\xeas do Brasil, ?????, English, Deutsch, Espa\xf1ol latinoamericano, Polish'
My code:
import MySQLdb, MySQLdb.cursors, sys, pymongo, collections
odbcArray=[]
mongoConStr = '192.168.10.107:36006'
sqlConnect = MySQLdb.connect(host = "54.175.170.187", user = "testuser", passwd = "testuser", db = "testdb", cursorclass=MySQLdb.cursors.DictCursor)
mongoConnect = pymongo.MongoClient(mongoConStr)
sqlCur = sqlConnect.cursor()
sqlCur.execute("SELECT ID,OrgID,OrgLanguages,APILoginID,TransactionKey,SMTPSpeed,TimeZoneName,IsVideoWatched FROM organizations")
dbo = mongoConnect.eaedw.mysqlData
tuples = sqlCur.fetchall()
for tuple in tuples:
odbcArray.append(collections.OrderedDict(tuple))
mongoImp = dbo.insert_many(odbcArray)
sqlCur.close()
mongoConnect.close()
sqlConnect.close()
sys.exit()
Above script migraates data perfectly when tried without OrgLanguages column in the SELECT query.
To overcome this, I have tried to use the OrderedDict() in another way but gives me a different type of error
Changed Code:
for tuple in tuples:
doc = collections.OrderedDict()
doc['oid'] = tuple.OrgID
doc['APILoginID'] = tuple.APILoginID
doc['lang'] = unicode(tuple.OrgLanguages)
odbcArray.append(doc)
mongoImp = dbo.insert_many(odbcArray)
Error Received:
Traceback (most recent call last):
File "pymsql.py", line 19, in <module>
doc['oid'] = tuple.OrgID
AttributeError: 'dict' object has no attribute 'OrgID'
Your MySQL connection is returning characters in a different encoding than UTF-8, which is the encoding that all BSON strings must be in. Try your original code but pass charset='utf8' to MySQLdb.connect.
I'm trying to add the JSON output below into a dictionary, to be saved into a SQL database.
{'Parkirisca': [
{
'ID_Parkirisca': 2,
'zasedenost': {
'Cas': '2016-10-08 13:17:00',
'Cas_timestamp': 1475925420,
'ID_ParkiriscaNC': 9,
'P_kratkotrajniki': 350
}
}
]}
I am currently using the following code to add the value to a dictionary:
import scraperwiki
import json
import requests
import datetime
import time
from pprint import pprint
html = requests.get("http://opendata.si/promet/parkirisca/lpt/")
data = json.loads(html.text)
for carpark in data['Parkirisca']:
zas = carpark['zasedenost']
free_spaces = zas.get('P_kratkotrajniki')
last_updated = zas.get('Cas_timestamp')
parking_type = carpark.get('ID_Parkirisca')
if parking_type == "Avtomatizirano":
is_automatic = "Yes"
else:
is_automatic = "No"
scraped = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
savetodb = {
'scraped': scraped,
'id': carpark.get("ID_Parkirisca"),
'total_spaces': carpark.get("St_mest"),
'free_spaces': free_spaces,
'last_updated': last_updated,
'is_automatic': is_automatic,
'lon': carpark.get("KoordinataX_wgs"),
'lat': carpark.get("KoordinataY_wgs")
}
unique_keys = ['id']
pprint savetodb
However when I run this, it gets stuck at for zas in carpark["zasedenost"] and outputs the following error:
Traceback (most recent call last):
File "./code/scraper", line 17, in <module>
for zas in carpark["zasedenost"]:
KeyError: 'zasedenost'
I've been led to believe that zas is in fact now a string, rather than a dictionary, but I'm new to Python and JSON, so don't know what to search for to get a solution. I've also searched here on Stack Overflow for KeyErrror when key exist questions, but they didn't help, and I believe that this might be due to the fact that's a sub for loop.
Update: Now, when I swapped the double quotes for single quotes, I get the following error:
Traceback (most recent call last):
File "./code/scraper", line 17, in <module>
free_spaces = zas.get('P_kratkotrajniki')
AttributeError: 'unicode' object has no attribute 'get'
I fixed up your code:
Added required imports.
Fixed the pprint savetodb line which isn't valid Python.
Didn't try to iterate over carpark['zasedenost'].
I then added another pprint statement in the for loop to see what's in carpark when the KeyError occurs. From there, the error is clear. (Not all the elements in the array in your JSON contain the 'zasedenost' key.)
Here's the code I used:
import datetime
import json
from pprint import pprint
import time
import requests
html = requests.get("http://opendata.si/promet/parkirisca/lpt/")
data = json.loads(html.text)
for carpark in data['Parkirisca']:
pprint(carpark)
zas = carpark['zasedenost']
free_spaces = zas.get('P_kratkotrajniki')
last_updated = zas.get('Cas_timestamp')
parking_type = carpark.get('ID_Parkirisca')
if parking_type == "Avtomatizirano":
is_automatic = "Yes"
else:
is_automatic = "No"
scraped = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
savetodb = {
'scraped': scraped,
'id': carpark.get("ID_Parkirisca"),
'total_spaces': carpark.get("St_mest"),
'free_spaces': free_spaces,
'last_updated': last_updated,
'is_automatic': is_automatic,
'lon': carpark.get("KoordinataX_wgs"),
'lat': carpark.get("KoordinataY_wgs")
}
unique_keys = ['id']
pprint(savetodb)
And here's the output on the iteration where the KeyError occurs:
{u'A_St_Mest': None,
u'Cena_dan_Eur': None,
u'Cena_mesecna_Eur': None,
u'Cena_splosno': None,
u'Cena_ura_Eur': None,
u'ID_Parkirisca': 7,
u'ID_ParkiriscaNC': 72,
u'Ime': u'P+R Studenec',
u'Invalidi_St_mest': 9,
u'KoordinataX': 466947,
u'KoordinataX_wgs': 14.567929171694901,
u'KoordinataY': 101247,
u'KoordinataY_wgs': 46.05457609543313,
u'Opis': u'2,40 \u20ac /dan',
u'St_mest': 187,
u'Tip_parkirisca': None,
u'U_delovnik': u'24 ur (ponedeljek - petek)',
u'U_sobota': None,
u'U_splosno': None,
u'Upravljalec': u'JP LPT d.o.o.'}
Traceback (most recent call last):
File "test.py", line 14, in <module>
zas = carpark['zasedenost']
KeyError: 'zasedenost'
As you can see, the error is quite accurate. There's no key 'zasedenost' in the dictionary. If you look through your JSON, you'll see that's true for a number of the elements in that array.
I'd suggest a fix, but I don't know what you want to do in the case where this dictionary key is absent. Perhaps you want something like this:
zas = carpark.get('zasedenost')
if zas is not None:
free_spaces = zas.get('P_kratkotrajniki')
last_updated = zas.get('Cas_timestamp')
else:
free_spaces = None
last_updated = None
I am building an algorithm for sentiment analysis which could segment do the segmentation on a .txt corpus, but there has been some problem in the code I dont know how to resolve?
class Splitter(object):
def _init_(self):
self.nltk_splitter = nltk.data.load('tokenizers/punkt/english/pickle')
self.nltk_tokenizer = nltk.tokenize.TreebankWordTokenizer()
def split(self,text):
"""imput format: a .txt file
output format : a list of lists of words.
for eg [['this', 'is']['life' , 'worth' , 'living']]"""
sentences = self.nltk_splitter.tokenize(text)
tokenized_sentences = [self.nltk_tokenizer.tokenize(sent) for sent in sentences]
return tokenized_sentences
and then I did the following things
>>> f = open('amazonshoes.txt')
>>> raw = f.read()
>>> text = nltk.Text(raw)
>>> splitter = Splitter()
>>> splitted_sentences = splitter.split(text)
and the error is
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
splitted_sentences = splitter.split(text)
File "<pyshell#14>", line 9, in split
sentences = self.nltk_splitter.tokenize(text)
AttributeError: 'Splitter' object has no attribute 'nltk_splitter'
The constructor of the class Splitter should be called __init__, with two leading and trailing underscores.
Currently the _init_ method (single underscore) is not executed, so the Splitter object your create (by calling Splitter()) never acquires the attribute/field nltk_splitter