can't parse a (probably) valid json object - python

I am trying to parse a json object whith following code in python 3.
import json
str = '{"created_at":"Sun Aug 30 13:59:15 +0000 2015","id":637987951842951168,"id_str":"637987951842951168","text":"The Truth About the Iran Vatican False Prophet Anglo-American Western Alliance for Antichrist Israel: Palestin... http:\/\/t.co\/G79X164K9g","source":"\u003ca href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"\u003etwitterfeed\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":311859117,"id_str":"311859117","name":"Miko Furura","screen_name":"MikoFurura","location":"","url":null,"description":null,"protected":false,"verified":false,"followers_count":10,"friends_count":3,"listed_count":2,"favourites_count":4,"statuses_count":1264,"created_at":"Mon Jun 06 05:32:44 +0000 2011","utc_offset":32400,"time_zone":"Osaka","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"EBEBEB","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme7\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme7\/bg.gif","profile_background_tile":false,"profile_link_color":"990000","profile_sidebar_border_color":"DFDFDF","profile_sidebar_fill_color":"F3F3F3","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_3_normal.png","profile_image_url_https":"https:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_3_normal.png","default_profile":false,"default_profile_image":true,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[{"url":"http:\/\/t.co\/G79X164K9g","expanded_url":"http:\/\/bit.ly\/1KvlIEu","display_url":"bit.ly\/1KvlIEu","indices":[114,136]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1440943155619"}'
c = json.loads(str)
print(c['id'])
when I execute the script, I got an error:
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 270 (char 269)
I have parsed many json objects with this code and can't understand what is wrong with it now, or what is wrong with this particular json object.
Regards.

The solution is to use r in from of your string
str = r'{"created_at":"Sun Aug 30 13:59:15 ...}'
This helps interpret your str variable as a raw string so you won't have trouble with the backslashes inside the json string.

In this part you could remove double quotes (") from html.
"source":"\u003ca href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"\u003etwitterfeed\u003c\/a\u003e"
to
"source":"\u003ca href=http:\/\/twitterfeed.com rel=nofollow\u003etwitterfeed\u003c\/a\u003e"
the extra double quotes are creating cyclic errors in JSON parser and HTML is fine without double quotes inside elements.

Try putting r before the string in str. I just tried it and it worked for me. Check out Lexical Analysis for more info.
str = r'{"created_at":"Sun Aug 30 13:59:15 +0000 2015","id":637987951842951168,"id_str":"637987951842951168","text":"The Truth About the Iran Vatican False Prophet Anglo-American Western Alliance for Antichrist Israel: Palestin... http:\/\/t.co\/G79X164K9g","source":"\u003ca href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"\u003etwitterfeed\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":311859117,"id_str":"311859117","name":"Miko Furura","screen_name":"MikoFurura","location":"","url":null,"description":null,"protected":false,"verified":false,"followers_count":10,"friends_count":3,"listed_count":2,"favourites_count":4,"statuses_count":1264,"created_at":"Mon Jun 06 05:32:44 +0000 2011","utc_offset":32400,"time_zone":"Osaka","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"EBEBEB","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme7\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme7\/bg.gif","profile_background_tile":false,"profile_link_color":"990000","profile_sidebar_border_color":"DFDFDF","profile_sidebar_fill_color":"F3F3F3","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_3_normal.png","profile_image_url_https":"https:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_3_normal.png","default_profile":false,"default_profile_image":true,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[{"url":"http:\/\/t.co\/G79X164K9g","expanded_url":"http:\/\/bit.ly\/1KvlIEu","display_url":"bit.ly\/1KvlIEu","indices":[114,136]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1440943155619"}'

Related

Unable to convert string to Json in Python because of unicode characters

I have a String in Python 3.5 from which I'd like to create a Json object. But turns out that the string contains things like this:
"saved_search_almost_max_people_i18n":"You are reaching your current limit of saved people searches. \\u003ca href=\\"/mnyfe/subscriptionv2?displayProducts=&family=general&trk=vsrp_ss_upsell\\"\\u003eLearn more >\\u003c/a\\u003e"
These unicode characters make the json.loads function fail; actually if I try to format the string as Json in any online formatter, multiple errors show up.
As you can see, I'm a Python newbie, but I've been looking many sources and haven't found any solution.
By the way, the String comes from a Beautifulsoup operation:
soup = self.loadSoup(URL)
result = soup.find('code', id=TAG_TO_FIND)
rTxt=str(result)
j = json.loads(rTxt)
The first error I see (if I correct this one, there are many more coming):
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 858 (char 857)
Thanks everybody.
If I understand you correctly, you’re trying to parse an HTML document with Beautiful Soup and extract JSON text out of a particular code element in that document.
If so, the following line is wrong:
rTxt=str(result)
Calling str() on a Beautiful Soup Tag returns its HTML representation. Instead, you want the string attribute:
rTxt=result.string

How to remove 4 byte unicode symbols in Python 2?

I have some problem with differents in one string after adding it to database.
I have string "space 222 m²". If I write it to mysql via mysqldb module I got "space 222 m²" in table, which is ok. But when I got this value from table, after decoding I get something like "space 222 m\eb000\b1111", which is not "space 222 m²".
This string before adding to database in unicode looks like "space 222 m\xcb", but on print it's displayed right, string from database is displayed with unicode codes and in consequence giving error.
MySQL charset - utf-8
Database collation - utf8_general_ci
Source string - utf-8
And i have problems with integrate string with special characters with other string without that
## db it's mongodb
st=db.objects.find()[0]['value']
string=st.encode('utf-8') # can be with m² or not. Encoding identical
some_string=u"some"
x="%s %s"%(string,some_string)
if string not contains special symbols all fine,
but if string contains special symbols i got UnicodeDecodeError
Python version:
Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32
a note on UTF-8: There are different ISO character sets within UTF-8, so keep that in mind when sending data from your UI to the DB. Have a look at localization and character encoding\sets this will help you a lot in understanding unicode\ascii.
I don't know the exact mappings of your strings, but to answer your question try get_string().encode('utf-8')
and get_string().decode('utf-8')

How do I convert this unicode string from a database to utf-8

I'm getting the following from iterating through the items in a database call to sqlite3
(u'9', u'HS 09 - Coffee, Tea, Mat\xe9 and Spices', u'Bangladesh', 2000, 6127)
I need to convert it to utf-8, specifically, the second field u'HS 09 - Coffee, Tea, Mat\xe9 and Spices' The resulting text should be :
'HS 09 - Coffee, Tea, Maté and Spices'
How can this be done?
Use .encode('utf-8'). EG:
>>> u'HS 09 - Coffee, Tea, Mat\xe9 and Spices'.encode('utf-8')
'HS 09 - Coffee, Tea, Mat\xc3\xa9 and Spices'
A note on terminology - the results of your database call are unicode. Your question text is correct that you want to convert (encode) the unicode object into utf-8, but your header was a bit off. I edited it to reflect this - a utf-8 encoded bytestring is not a Unicode string.

Typecasting not works for 08 and 09

While typecasting in python I got an error.
int(01)
int(02)
int(03)
int(04)
int(05)
int(06)
int(07)
Above all works fine.
But when I do same for bellow -:
int(08)
and
int(09)
I am getting error i.e
SyntaxError: invalid token
I know, this typecasting is not correct for converting int to int.
But I just want to know, when it works for 01 to 07, then why it is not working for 08 and 09 only ??
Numbers starting with 0 are considered as octal data. Octal numbers can't have number more than 7.
To fix this, you can convert the data to string and pass the base explicitly like this
print int("09", 10)
Output
9

(python) working with a hex literal trying to convert to string, or int, or anything

I'm working on an assignment in which we have to simulate a kaminski attack on a dns server we create. I'm currently trying to generate the falsified dns reponse packet payload. Using dnslib I'm generating a packet and then pack() the result. this gives me a hex literal:
'\xcf\x90\x85\x80\x00\x01\x00\x01\x00\x00\x00\x00\x03abc\x03com\x00\x00\x01\x00\x01\xc0\x0c\x00\x01\x00\x01\x00\x00\x00\x00\x00\x04\x01\x02\x03\x04'
I don't believe this is the correct format for the payload data. In specific I think I need to ditch all the "\x"'s so my stream will be
cf 90 85 80 ...
Unfortunately I can't seem to do this. String manipulation tools don't seem to work on a literal and the usual literal-> string conversion (literal_eval) fails with an error:
TypeError: compile() expected string without null bytes
Other conversions I've tried (int(packet,0)) fails because part of the literal is text (leading to odd length).
There's probably a very simple solution, any help?
You seem to miss the meaning of '\x' in that context. '\x' is an escape sequence available in python string literals such that \xhh represents a character with the hex value of hh.
Try dumping that string to a file and output its contents as hex:
yoni#gaga:~ python -c "file('payload','wb').write('\xcf\x90\x85\x80\x00\x01')"
yoni#gaga:~ hexdump payload
0000000 cf 90 85 80 00 01
0000006
I believe that's exactly the behaviour you were looking for, just go ahead and use that string. If, however, for some reason you do want to convert it to a "human readable" hex string, you could use:
''.join(['%x' % ord(x) for x in packet])

Categories