Creating dictionary using the data of planets - python

I have a bunch of details about of the planets in the solar system. I am supposed to make a dictionary in Python out of it. For each planet, I have its radius, distance from sun, number of moons, if a atmospheres exists, name of moons, if it's a gas planets and so on.
For example, here is the data for Mercury:
Mercury
Radius - 2,439.7 km
Distance from the sun - 58 million km
Moons - none
Atmosphere? True
Gas planet? False
How would I use all this data to create a dictionary?
So far I have:
radius = {} #radius of planets
radius['Mercury'] = 2439.7
radius['Venus'] = 6051.8
radius['Earth'] = 6371.0
radius['Mars'] = 3,396.2
radius['Jupiter'] = 69,911
radius['Saturn'] = 60,268
radius['Uranus'] = 25,559
radius['Neptune'] = 24,764
distance = {} # distance from sun
distance['Mercury'] = 58000000
distance['Venus'] = 108000000
i was planning of continuing this to create a dictionary for all the data that I have so that I would have different sections for each different type of data.
However, I don't if this is the right way to do it. Could somebody tell me if I am going in the right direction? If not, how would I fix it?

It's probably much easier to make a dictionary of planets, each member of which contains a dictionary of that planet's properties. You can also save yourself some effort and avoid repetition by using the dictionary literal syntax.
Doing it as suggested above looks something like this:
planets = {
'Mercury': {
'radius': 2439.7,
'distance': 58000000
'moons': []
# etc...
},
'Venus': {
'radius': 6051.8,
'distance': 108000000,
'moons': []
# etc...
},
'Earth': {
'radius': 6371.0,
'distance': 150000000,
'moons': ['Luna']
# etc...
}
# etc...
}

It would probably be better to structure your data so it looks like this:
planets = {
"mercury": {
"radius": 2439.7,
"distance": 58000000,
# etc
},
"venus": {
"radius": 6051.8,
"distance": 108000000,
# etc
},
#etc
}
That way, we only need a single variable, and can automatically keep all the data related to a single planet in one place.
Then, if you want to obtain all the distances of a planet (for example), you can construct another temporary dictionary by using either list or dictionary comprehensions:
distances = {planet: planets[planet]['distance'] for planet in planets}

I would design the dictionary so that the key is the planet name, and the values are another dictionary, the keys of which are the properties.
You would access it as follows:
mercury_radius = planet["mercury"]["radius"]
And declare it as follows:
planets = {
"mercury" : {
"radius" : 2439.7,
"distance_sun" : 58000000,
"moons" : 0,
"atmosphere" : True,
"gas_planet" : False
}
}

Related

Python Key Value Error (Json)

I am trying to grab this data and print into a string of text i am having the worst! issues getting this to work.
Here is the source i am working with to get a better understanding i am working on an envirmental controller and my sonoff switch combined
https://github.com/FirstCypress/LiV/blob/master/software/liv/iotConnectors/sonoff/sonoff.py this code works for two pages once completed so ignore the keys for tempature etc
m = json.loads(content)
co2 = m["Value"]
I need the value of "Value" under the "TaskValues" it should be either a 1 or a 0 in almost any case how would i pulled that key in the right form?
"Sensors":[
{
"TaskValues": [
{"ValueNumber":1,
"Name":"Switch",
"NrDecimals":0,
"Value":0
}],
"DataAcquisition": [
{"Controller":1,
"IDX":0,
"Enabled":"false"
},
{"Controller":2,
"IDX":0,
"Enabled":"false"
},
{"Controller":3,
"IDX":0,
"Enabled":"false"
}],
"TaskInterval":0,
"Type":"Switch input - Switch",
"TaskName":"relias",
"TaskEnabled":"true",
"TaskNumber":1
}
],
"TTL":60000
}
You can get it by
m['Sensors'][0]['TaskValues'][0]['Value']
"Value" is nested in your json, as you've mentioned. To get what you want, you'll need to traverse the parent data structures:
m = json.loads(content)
# This is a list
a = m.get('Sensors')
# This is a dictionary
sensor = a[0]
# This is a list
taskvalue = sensor.get('TaskValues')
# Your answer
value = taskvalue[0].get('Value')

How to find locations near a given location

I am trying to create a Bounding Box (or a circle) for the given latitude and longitude with some distance(or radius) using Python3.
I have gone through the previous solutions for this problem but I am having some doubt on how it works. There are some variables like halfsideinKm and some degree to radian and radian to degree conversion and I am unable to understand what are those conversions for and how it works.
Given lat and long finding binding box
Geocoding calculate bounding box
I have a database collection Locations(in MongoDB) which holds the lat and long.
My Requirement is if I enter a lat and long I want to have the list of Places(from my mongodb) which lie inside of the Bounding Box region(with a distance of say 20 Km).
Can anyone provide me with a solution for this problem or some explanation on how those codes work?
Can this be achieved using geopy?(because it says something about great circle distance calculation)
Database values
{
"place_id":"151142295",
"osm_type":"relation",
"osm_id":"65606",
"lat":"51.5073219",
"lon":"-0.1276474",
"display_name":"London, Greater London, England, United Kingdom",
"class":"place",
"type":"city",
"importance":0.9754895765402
},
{
"place_id":"4566287",
"osm_type":"node",
"osm_id":"485248691",
"lat":"42.988097",
"lon":"-81.2460295",
"display_name":"London, Ontario, Canada",
"class":"place",
"type":"city",
"importance":0.6515723047601
}
(just a sample of how data is stored in my db)
The very "first" thing you must do is change how you are storing your data if you intend to use geospatial queries with MongoDB. You have the option of legacy coordinate pairs or GeoJSON format. But your current storage with "lat" and "long" in separate fields and also as "strings" will not work.
Here is a schema fix for your collection, written for the mongo shell because this should be a one off operation. I'm advising on GeoJSON, as it is generally compatible with quite a few libraries, and all distances returned are in kilometers rather than radians.
var bulk = db.collection.initializeUnorderedBulkOp(),
count = 0;
db.collection.find().forEach(function(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": {
"location": {
"type": "Point",
"coordinates": [parseFloat(doc.lon),parseFloat(doc.lat)]
}
},
"$unset": { "lat": "", "lon": "" }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if ( count % 1000 !=0 )
bulk.execute();
Now the data is fixed and compatible with an index, create the index. What makes sense here with GeoJSON data is a "2sphere" index:
db.collection.createIndex({ "location": "2dsphere" })
Now you are ready to query. Sticking with the shell as the python syntax is identical and I don't know your library calls:
db.collection.find({
"location": {
"$nearSphere": {
"$geometry": {
"type": "Point",
"coordinates": [lon,lat]
},
"$maxDistance": distance
}
}
})
This query uses $nearSphere which will calculate properly on distance based on the curvature of the earth, ideal for real location data. Your three variables there are the "longitude", "latitude" ( in that order ) in the coordinates array and the "distance" under $maxDistance that you want to find things within that radius.
This is a very simple query procedure once your data is suitable and the required geospatial index is in place.
No need for messy calculations in your client, as the server does all the work.
The links to the relevant documentation parts are all included for your reference. Read them and learn more about geospatial queries with MongoDB.

Trying to parse JSON data with python

I am having no luck trying to parse this json data, i only care about a small amount of it.
json data
{
"timestamp" : 1397555135361,
"sets" : {
"worldguard.markerset" : {
"areas" : {
"world_region_name" : {
"markup" : false,
"desc" : "What I really want.",
"weight" : 3,
"color" : "#FF0000",
"fillopacity" : 0.35,
"opacity" : 0.8,
"label" : "Region_name",
"ytop" : 65.0,
"fillcolor" : "#FF0000",
"z" : [846.0, 847.0, 847.0, 846.0],
"ybottom" : 65.0,
"x" : [773.0, 773.0, 774.0, 774.0]
}
}
}
}
}
I hope I copied it correctly, it a very large file, and I only care about the region info that it has.
there are other parts of this json file, that I don't care about, so I haven't included them. but there are many items under 'areas' that I do care about. I just cant work out how to parse them all
import json
from pprint import pprint
json_data=open('marker_world.json')
data = json.load(json_data)
for item in data["sets"]["worldguard.markerset"]["areas"]:
print item
the items that i care about from each region is; desc, label, z, & x .
It doesn't seem to print out the everything under that region like I would expect all I get is a screen of "u'w'"
I haven't even started to try and select only the bits out of each region I care about. A push in the right direction would be great if you can workout what I am doing wrong.
Here's what you can start with.
Define a list of keys you need from an area, then iterate over areas, for each area get the values of the keys you've defined:
keys = ['desc', 'label', 'x', 'z']
for area_key, area_items in data["sets"]["worldguard.markerset"]["areas"].iteritems():
print area_key
for key in keys:
print '%s: %s' % (key, area_items[key])
prints:
world_region_name
desc: What I really want.
label: Region_name
x: [773.0, 773.0, 774.0, 774.0]
z: [846.0, 847.0, 847.0, 846.0]

Optimize loops with big datasets Python

It's the first time I go so big with Python so I need some help.
I have a mongodb (or python dict) with the following structure:
{
"_id": { "$oid" : "521b1fabc36b440cbe3a6009" },
"country": "Brazil",
"id": "96371952",
"latitude": -23.815124482000001649,
"longitude": -45.532670811999999216,
"name": "coffee",
"users": [
{
"id": 277659258,
"photos": [
{
"created_time": 1376857433,
"photo_id": "525440696606428630_277659258",
},
{
"created_time": 1377483144,
"photo_id": "530689541585769912_10733844",
}
],
"username": "foo"
},
{
"id": 232745390,
"photos": [
{
"created_time": 1369422344,
"photo_id": "463070647967686017_232745390",
}
],
"username": "bar"
}
]
}
Now, I want to create two files, one with the summaries and the other with the weight of each connection. My loop which works for small datasets is the following:
#a is the dataset
data = db.collection.find()
a =[i for i in data]
#here go the connections between the locations
edges = csv.writer(open("edges.csv", "wb"))
#and here the location data
nodes = csv.writer(open("nodes.csv", "wb"))
for i in a:
#find the users that match
for q in a:
if i['_id'] <> q['_id'] and q.get('users') :
weight = 0
for user_i in i['users']:
for user_q in q['users']:
if user_i['id'] == user_q['id']:
weight +=1
if weight>0:
edges.writerow([ i['id'], q['id'], weight])
#find the number of photos
photos_number =0
for p in i['users']:
photos_number += len(p['photos'])
nodes.writerow([ i['id'],
i['name'],
i['latitude'],
i['longitude'],
len(i['users']),
photos_number
])
The scaling problems: I have 20000 locations, each location might have up to 2000 users, each user might have around 10 photos.
Is there any more efficient way to create the above loops? Maybe Multithreads, JIT, more indexes?
Because if I run the above in a single thread can be up to 20000^2 *2000 *10 results...
So how can I handle more efficiently the above problem?
Thanks
#YuchenXie and #PaulMcGuire's suggested microoptimizations probably aren't your main problem, which is that you're looping over 20,000 x 20,000 = 400,000,000 pairs of entries, and then have an inner loop of 2,000 x 2,000 user pairs. That's going to be slow.
Luckily, the inner loop can be made much faster by pre-caching sets of the user ids in i['users'], and replacing your inner loop with a simple set intersection. That changes an O(num_users^2) operation that's happening in the Python interpreter to an O(num_users) operation happening in C, which should help. (I just timed it with lists of integers of size 2,000; on my computer, it went from 156ms the way you're doing it to 41µs this way, for a 4,000x speedup.)
You can also cut off half your work of the main loop over pairs of locations by noticing that the relationship is symmetric, so there's no point in doing both i = a[1], q = a[2] and i = a[2], q = a[1].
Taking these and #PaulMcGuire's suggestions into account, along with some other stylistic changes, your code becomes (caveat: untested code ahead):
from itertools import combinations, izip
data = db.collection.find()
a = list(data)
user_ids = [{user['id'] for user in i['users']} if 'users' in i else set()
for i in a]
with open("edges.csv", "wb") as f:
edges = csv.writer(f)
for (i, i_ids), (q, q_ids) in combinations(izip(a, user_ids), 2):
weight = len(i_ids & q_ids)
if weight > 0:
edges.writerow([i['id'], q['id'], weight])
edges.writerow([q['id'], i['id'], weight])
with open("nodes.csv", "wb") as f:
nodes = csv.writer(f)
for i in a:
nodes.writerow([
i['id'],
i['name'],
i['latitude'],
i['longitude'],
len(i['users']),
sum(len(p['photos']) for p in i['users']), # total number of photos
])
Hopefully this should be enough of a speedup. If not, it's possible that #YuchenXie's suggestion will help, though I'm doubtful because the stdlib/OS is fairly good at buffering that kind of thing. (You might play with the buffering settings on the file objects.)
Otherwise, it may come down to trying to get the core loops out of Python (in Cython or handwritten C), or giving PyPy a shot. I'm doubtful that'll get you any huge speedups now, though.
You may also be able to push the hard weight calculations into Mongo, which might be smarter about that; I've never really used it so I don't know.
The bottle neck is disk I/O.
It should be much faster when you merge the results and use one or several writerows call instead of many writerow.
Does collapsing this loop:
photos_number =0
for p in i['users']:
photos_number += len(p['photos'])
down to:
photos_number = sum(len(p['photos']) for p in i['users'])
help at all?
Your weight computation:
weight = 0
for user_i in i['users']:
for user_q in q['users']:
if user_i['id'] == user_q['id']:
weight +=1
should also be collapsible down to:
weight = sum(user_i['id'] == user_q['id']
for user_i,user_q in product([i['users'],q['users']))
Since True equates to 1, summing all the boolean conditions is the same as counting all the values that are True.

Spatial index/query (finding k nearest points)

I have +10k points (latitude, longitude) and I'm building an app that shows you the k nearest points to a user's location.
I think this is a very common problem and I don't want to reinvent the wheel. I'm learning about Quadtrees. It seems to be a good approach to solve this spatial problem.
I'm using these tools:
Python 2.5
MySQL
MongoDb
Building the Quadtree is not that hard: http://donar.umiacs.umd.edu/quadtree/points/pointquad.html But once I've created the tree and saved it to a db (MySQL or MongoDb), how I run the query?
I need to run queries like these:
Find all points within 10 km of the user's location.
Find the 6 (or at least 6) nearest points to the
user's location.
What's the standard and common approach to do it?
EDIT 1:
I've loaded the +10k points into MongoDB (Geospatial indexing) and it works fine at first glance. Anyway I've found PostGis:
PostGIS is an extension to the PostgreSQL object-relational database system which allows GIS (Geographic Information Systems) objects to be stored in the database.
So I think I'll give PostGis a try.
I've also found SimpleGeo. You can store points/places in the cloud and then query them via an API: https://simplegeo.com/docs/tutorials/python#how-do-radial-nearby-query
MongoDB has support for spatial indexes built-in, so all you'd need to do is load your points using the correct format, create the spatial index, and then run your queries.
For a quick example, I loaded the center points for all 50 states in the mongo shell:
> db.places.ensureIndex({loc: "2d"})
> db.places.save({name: "AK", loc: {long: -152.2683, lat: 61.3850}})
> db.places.save({name: "AL", loc: {long: -86.8073, lat: 32.7990}})
> db.places.save({name: "AR", loc: {long: -92.3809, lat: 34.9513}})
> db.places.save({name: "AS", loc: {long: -170.7197, lat: 14.2417}})
> ...
Next, to query for the 6 nearest points to a given location:
> db.places.find({loc: { $near: {long: -90, lat: 50}}}).limit(6)
{"name" : "WI", "loc" : { "long" : -89.6385, "lat" : 44.2563 } }
{"name" : "MN", "loc" : { "long" : -93.9196, "lat" : 45.7326 } }
{"name" : "MI", "loc" : { "long" : -84.5603, "lat" : 43.3504 } }
{"name" : "IA", "loc" : { "long" : -93.214, "lat" : 42.0046 } }
{"name" : "IL", "loc" : { "long" : -89.0022, "lat" : 40.3363 } }
{"name" : "ND", "loc" : { "long" : -99.793, "lat" : 47.5362 } }
Next, to query for all points within 10km of a given location. Since I'm calculating the nearest states, I'll use 888km (which is approximately 8 degrees of latitude):
> db.places.find({loc: { $near: {long: -90, lat: 50}, $maxDistance: 8}})
{"name" : "WI", "loc" : { "long" : -89.6385, "lat" : 44.2563 } }
{"name" : "MN", "loc" : { "long" : -93.9196, "lat" : 45.7326 } }
Since one degree of latitude is approximately 111.12km, you'd use a $maxDistance: 0.08999 to represent 10km for your application.
Updated By default MongoDB assumes an "idealized flat earth model" but this results in inaccuracies since longitude lines converge at the poles. MongoDB versions 1.7+ support spherical distance calculations, which provides the increased precision.
Here is an example of running the above query using spherical distance. the maxDistance is in radians, so we need to divide by the earth's average radius:
> db.runCommand({geoNear: "places", near: [-90, 50], spherical: true,
maxDistance: 800/6378});
(summarizing results as they're too verbose to include)
"MN" dis: 0.087..
"WI" dis: 0.100..
"ND" dis: 0.120..
You may want to look at kdtree entry in wikipedia. This would be useful when you have more than two dimensions too (unlike quadtrees). I suggest the kd-tree because the entry has python code for creating and querying the tree.
If you want to use MongoDB, then read their docs carefully. The default model is flat earth. It assumes that a degree of longitude has the same length as a degree of latitude.
I quote: """The current implementation assumes an idealized model of a flat earth, meaning that an arcdegree of latitude (y) and longitude (x) represent the same distance everywhere. This is only true at the equator where they are both about equal to 69 miles or 111km. However, at the 10gen offices at { x : -74 , y : 40.74 } one arcdegree of longitude is about 52 miles or 83 km (latitude is unchanged). This means that something 1 mile to the north would seem closer than something 1 mile to the east."""
You need their "new spherical model". Be warned: you need to use (longtitude, latitude) in that order -- again, read their docs carefully.

Categories