I would like to scrap data from this plain text :
"data": [
{
"id": "10150635906994798_21377910",
"from": {
"id": "100001249878256",
"location" : "Stockholm"
"name": "Mouhamadoul Moussa"
},
"message": "#Yeaaaahh!!! \u2665",
},
{
"id": "10150635906994798_21392047",
"from": {
"id": "100000648164454",
"location" : "Malmo"
"name": "mallow ty"
},
"message": "droit au butttttttttttttttttt",
},
]
but I would like to retrieve only second id, xpath for id selection
response.selector.xpath ('//*[contains(text(), "id")]')
Output should be :
100000648164454
100001249878256
That's not a plain text ! that's a json. However, you can store it as a dictionary:
>>> a = {'data': [{'from': {'id': '100001249878256',
... 'location': 'Stockholm',
... 'name': 'Mouhamadoul Moussa'},
... 'id': '10150635906994798_21377910',
... 'message': '#Yeaaaahh!!! \\u2665'},
... {'from': {'id': '100000648164454', 'location': 'Malmo', 'name': 'mallow ty'},
... 'id': '10150635906994798_21392047',
... 'message': 'droit au butttttttttttttttttt'}]}
>>> for data in a['data']:
... print data['from']['id']
...
100001249878256
100000648164454
Related
I'm trying to insert this object into a mongo DB collection. I've tried a lot of ways and haven't gotten any results. I was wondering if anyone here could help me. The main problem is when passing the array of items into the key items.
The JSON File is similar to this:
{
'code': 'iuhuilknlkn',
'description': 'nllksnd',
'currency': 'Mxn',
'items': [
{
'item': {
'_id': {
'$oid': '60065d253ef6d468ced3603f'
},
'code': '2',
'description': '22',
'currency': 'Mxn',
'MU': 'Hr',
'sellingPrice': 1,
'buyingPrice': 3,
'supplier': None,
'cid': '5fbd81b32b325e5ca15fe5c9',
'itemAmount': 1,
'itemProductPrice': 1,
'itemAmountPrice': 1
}
},
{
'item': {
'_id': {
'$oid': '6011c18883a280ae0e5b8185'
},
'code': 'prb-001',
'description': 'prueba 1 artículo',
'currency': 'Mxn',
'MU': 'Ser',
'sellingPrice': 100.59,
'buyingPrice': 12,
'supplier': None,
'cid': '5fbd81b32b325e5ca15fe5c9',
'itemAmount': 1,
'itemProductPrice': 100.59,
'itemAmountPrice': 100.59
}
}
],
'price': 101.59
}
and the code I'm using right now is the following:
productsM.insert({
'code':newP['code'],
'description':newP['description'],
'currency' :newP['currency'],
'items' : [([val for dic in newP['items'] for val in
dic.values()])],
'price' : newP['price'],
'cid': csHe })
The error I get is the next one:
key '$oid' must not start with '$'
Your error is quite clear. Simply removing the $ from the $oid keys in your dictionary/json will resolve the issue. I don't think you can have a $ in key names since they are reserved for operators such as $in or $regex.
I removed the $ from the $oid keys and it worked like a charm. All I did was the following:
data = {
"code": "iuhuilknlkn",
"description": "nllksnd",
"currency": "Mxn",
"items": [
{
"item": {
"_id": {
"oid": "60065d253ef6d468ced3603f"
},
"code": "2",
"description": "22",
"currency": "Mxn",
"MU": "Hr",
"sellingPrice": 1,
"buyingPrice": 3,
"supplier": None,
"cid": "5fbd81b32b325e5ca15fe5c9",
"itemAmount": 1,
"itemProductPrice": 1,
"itemAmountPrice": 1
}
},
{
"item": {
"_id": {
"oid": "6011c18883a280ae0e5b8185"
},
"code": "prb-001",
"description": "prueba 1 artículo",
"currency": "Mxn",
"MU": "Ser",
"sellingPrice": 100.59,
"buyingPrice": 12,
"supplier": None,
"cid": "5fbd81b32b325e5ca15fe5c9",
"itemAmount": 1,
"itemProductPrice": 100.59,
"itemAmountPrice": 100.59
}
}
],
"price": 101.59
}
db.insert(data)
I am trying to extract data from a JSON file, of which a snippet is below. I want to loop through it to get all categories>name and get , as in this case, "Convenience Store" as a result.
{
'meta': {
'code': 200,
'requestId': '5ea184baedbcad001b7a3f8c'
},
'response': {
'venues': [
{
'id': '4d03b2f6dc45a093b4b0e5c6',
'name': 'Ozbesa Market',
'location': {
'address': 'Acibadem basogretmen sokak',
'lat': 41.00622726261631,
'lng': 29.051791450375678,
'labeledLatLngs': [
{
'label': 'display',
'lat': 41.00622726261631,
'lng': 29.051791450375678
}
],
'distance': 92,
'cc': 'TR',
'country': 'Türkiye',
'formattedAddress': [
'Acibadem basogretmen sokak',
'Türkiye'
]
},
'categories': [
{
'id': '4d954b0ea243a5684a65b473',
'name': 'Convenience Store',
'pluralName': 'Convenience Stores',
'shortName': 'Convenience Store',
'icon': {
'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/conveniencestore_',
'suffix': '.png'
},
'primary': True
}
],
'referralId': 'v-1587643627',
'hasPerk': False
},
Here is my for loop, please help me fix it. It is only returning just convenience stores, but there are also others like 'shopping mall', 'residential building', etc.
for ven in json_data:
for cat in ven:
print(json_data['response']['venues'][0]['categories'][0]['name'])
Thanks in advance!
For the sake of example, I elided some of the per-venue data and added some categories... but as I mentioned in the comment, you're not using the values you loop over.
json_data = {
"meta": {"code": 200, "requestId": "5ea184baedbcad001b7a3f8c"},
"response": {
"venues": [
{
"name": "Ozbesa Market",
"categories": [
{"name": "Convenience Store", "primary": True},
{"name": "Imaginary Category", "primary": False},
],
},
{
"name": "Another Location",
"categories": [
{"name": "Bus Station", "primary": True},
{"name": "Fun Fair", "primary": False},
],
},
]
},
}
for venue in json_data["response"]["venues"]:
print(venue["name"])
for cat in venue["categories"]:
print("..", cat["name"])
will output e.g.
Ozbesa Market
.. Convenience Store
.. Imaginary Category
Another Location
.. Bus Station
.. Fun Fair
create api with python and use this code
def findusers():
x = mycollection_users.find()
return x
#app.route('/api/users', methods=['GET'])
def users():
db = findusers()
js = jsonpickle.encode(db)
return Response(response=js, status=200, mimetype="application/json")
app.run(host="0.0.0.0", port=4000)
then json is:
{
"py/iterator": [
{
"City": "Us",
"Phone": "02",
"_id": 1,
"name": "Tom"
},
{
"City": "EN",
"Phone": "11",
"_id": 2,
"name": "Jack"
},
],
"py/object": "pymongo.cursor.Cursor"
}
How to delete "py/iterator": and "py/object": pymongo.cursor.Cursor ?
i want this format: [{},{}]
How does this look? If you don't want those two keys in the dictionary, you can simply access the portion you do care about.
>>> a = {
... "py/iterator": [
... {
... "City": "Us",
... "Phone": "02",
... "_id": 1,
... "name": "Tom"
... },
... {
... "City": "EN",
... "Phone": "11",
... "_id": 2,
... "name": "Jack"
... },
... ],
... "py/object": "pymongo.cursor.Cursor"
... }
...
>>> b = a['py/iterator']
>>> print(b)
[{'City': 'Us', '_id': 1, 'name': 'Tom', 'Phone': '02'}, {'City': 'EN', '_id': 2, 'name': 'Jack', 'Phone': '11'}]
I'm rewriting a view based on what I know the final output should be in json but it's returning the dictionary as a string.
new output
{
"results":
["
{
'plot': u'',
'runtime': u'N/A',
'description': u'x',
'videos': [
{
'id': 823,
'name': u'x',
'youtube_id': u'FtcubOnXgZk'
}
],
'country': u'India',
'writer': u'Neetu Varma, Ranjeev Verma',
'name': u'Chalk N Duster',
'id': 940,
'director': u'Jayant Gilatar',
'hot': True,
'content': u'x',
'actors': u'Shabana Azmi, Arya Babbar, Gavie Chahal, Juhi Chawla',
'year': 2015,
'images': [
{'small': '/media/cache/62/fd/62fd5158d281c042e3cf1f919183e94e.jpg', 'medium': '/media/cache/5e/32/5e32ebb1a4d25bba0d0c70b4b448e948.jpg'}],
'trailer_youtube_id': u'FtcubOnXgZk',
'type': 'movie',
'slug': u'chalk-n-duster',
'categories': [{'parent_id': 2, 'id': 226, 'name': u'Drama'}],
'shows': {
'starts': '2016-01-16',
'booking_url': u'',
'venue': {
'address': u'',
'id': 854,
'name': u'Nyali Cinemax',
'area': {
'id': 52,
'parent': {
'id': 48,
'name': u'Mombasa'
},
'name': u'Nyali'
}
},
'starts_time': '18:30:00'
}
}", "{'plot': u'' ....
old output
"results": [
{
"actors": "x",
"categories": [
{
"id": 299,
"name": "Biography",
"parent_id": 2
},
],
"content": "x",
"country": "x",
"description": "x",
"director": "x",
"hot": true,
"id": 912,
"images": [
{
"medium": "/media/cache/d2/b3/d2b3a7885e7c39bfc5c2b297b66619c5.jpg",
"small": "/media/cache/e2/d0/e2d01b2c7c77d3590536666de4a7fd7d.jpg"
}
],
"name": "Bridge of Spies",
"plot": "x",
"runtime": "141 min",
"shows": [
{
"booking_url": "",
"starts": "2015-11-27",
"starts_time": "16:30:00",
"venue": {
"address": "The Junction Shopping Mall",
"area": {
"id": 68,
"name": "Ngong Road",
"parent": {
"id": 2,
"name": "Nairobi"
}
},
"id": 1631,
"name": "Century Cinemax Junction"
}
},
],
"slug": "bridge-of-spies",
"trailer_youtube_id": "",
"type": "movie",
"videos": [
{
"id": "795",
"name": "Bridge of Spies",
"youtube_id": "2-2x3r1m2I4"
}
],
"writer": "Matt Charman, Ethan Coen, Joel Coen",
"year": 2015
}, ...
]
Here's the view, I know the shows should also be a list, but in order to start testing I'll need the data to come in the right format. If it's involves too much rewriting I'm okay with links and explanation.
#memoize(timeout=60*60)
def movies_json():
today = datetime.date.today()
movies = Movie.objects.filter(shows__starts__gte=today)
results = []
number = len(movies)
for movie in movies:
print "Now Remaining: {0}".format(number)
number -= 1
medium = get_thumbnail(movie.picture(), '185x274', crop='center', quality=99).url
small = get_thumbnail(movie.picture(), '50x74', crop='center', quality=99).url
movie_details = {
'director':movie.director,
'plot':movie.plot,
'actors':movie.actors,
'content':movie.content,
'country':movie.country,
'description':movie.description,
'hot':movie.hot,
'id':movie.id,
'images':[{'medium':medium, 'small':small}],
'name':movie.name,
'plot':movie.plot,
'runtime':movie.runtime,
'slug':movie.slug,
'type':'movie',
'writer':movie.writer,
'year':movie.year,
}
youtube_details = movie.videos.filter(youtube_id__isnull=False)[0]
movie_details['trailer_youtube_id'] = youtube_details.youtube_id if youtube_details.youtube_id else ""
movie_details['videos'] = [
{
'id':youtube_details.id,
'name':movie.name,
'youtube_id':youtube_details.youtube_id,
}
]
shows = []
for show in movie.shows.all():
show_details = {
'booking_url':show.booking_url,
'starts':show.starts.isoformat(),
'starts_time':show.starts_time.isoformat(),
'venue': {
'address':show.venue.address,
'area': {
'id': show.venue.area.id,
'name': show.venue.area.name,
'parent': {
'id': show.venue.area.parent.id,
'name': show.venue.area.parent.name,
}
},
'id': show.venue.id,
'name': show.venue.name,
}
}
shows.append(show_details)
movie_details['shows'] = show_details
category_list = []
for category in movie.categories.all():
category_details = {
'id':category.id,
'name':category.name,
'parent_id':category.parent.id,
}
category_list.append(category_details)
movie_details['categories'] = category_list
results.append(movie_details)
return results
The data is returned by django rest framework 0.4.0
import json
json_obj = json.load(json_string)
I have a Google Chrome Bookmark file, and it's in JSON format
{
"checksum": "b884cbfb1a6697fa9b9eea9cb2054183",
"roots": {
"bookmark_bar": {
"children": [ {
"date_added": "12989159740428363",
"id": "4",
"name": "test2",
"type": "url",
"url": "chrome://bookmarks/#1"
} ],
"date_added": "12989159700896551",
"date_modified": "12989159740428363",
"id": "1",
"name": "bookmark_bar",
"type": "folder"
},
"other": {
"children": [ {
"date_added": "12989159740428363",
"id": "4",
"name": "test",
"type": "url",
"url": "chrome://bookmarks/#1"
} ],
"date_added": "12989159700896557",
"date_modified": "0",
"id": "2",
"name": "aaa",
"type": "folder"
},
"synced": {
"children": [ ],
"date_added": "12989159700896558",
"date_modified": "0",
"id": "3",
"name": "bbb",
"type": "folder"
}
},
"version": 1
}
and in Python format:
{'checksum': 'b884cbfb1a6697fa9b9eea9cb2054183', 'version': 1, 'roots': {'synced': {'name': 'bbb', 'date_modified': '0', 'children': [], 'date_added': '12989159700896558', 'type': 'folder', 'id': '3'}, 'bookmark_bar': {'name': 'bookmark_bar', 'date_modified': '12989159740428363', 'children': [{'url': 'chrome://bookmarks/#1', 'date_added': '12989159740428363', 'type': 'url', 'id': '4', 'name': 'test2'}], 'date_added': '12989159700896551', 'type': 'folder', 'id': '1'}, 'other': {'name': 'aaa', 'date_modified': '0', 'children': [{'url': 'chrome://bookmarks/#1', 'date_added': '12989159740428363', 'type': 'url', 'id': '4', 'name': 'test'}], 'date_added': '12989159700896557', 'type': 'folder', 'id': '2'}}}
I'm writing a bookmark manager now.
I want to move the web pages by name.
For example: mv /bookmark_bar/test2 /other/test2
But every web pages are dictionaries, and they are in a list. So, I must use index to locate the web pages, I can't locate them by name.
Any ideas?
Is it what you need https://gist.github.com/3332055 ?
An example of how to iterate over the structure - exactly what you want to do with it then, is up to you:
for root, val in bm['roots'].iteritems():
print root, 'is named', val['name']
for child in val['children']:
print '\t', child['name'], 'is at', child['url']
# -*- coding: utf-8 -*-
import json
def hook(pairs):
o = {}
for k, v in pairs.iteritems():
o[str(k)] = v
return o
jsonString = """{"a":"a","b":"b","c":{"c1":"c1","c2":"c2"}}"""
r = json.loads(jsonString, object_hook=hook)
assert r['c']['c1'] == "c1"
del r['c']['c1']
assert not r['c'].has_key('c1')