Prevent adding name attribute when using pandas dataframe to_file() method

Prevent adding name attribute when using pandas dataframe to_file() method - python

I have a geopandas.GeoDataFrame. When I want to dump it into a file, it adds a "name" attribute that will be the name of the file I passed. How to prevent this?
df = geopandas.GeoDataFrame(data, geometry="geometry")
# some logic here
df.to_file(filename="a_random_name.geojson", driver="GeoJSON")
inside a_random_name.geosjon:
"type": "FeatureCollection",
"name": "a_random_name",
"features": [...]
to_file method adds the "name" attribute to my data frame and I want to prevent that.
Thanks!

you have not provided sample data, so have used naturalearth_lowres
as you can see no random name key has been added. Expected name keys are present to define the CRS
import geopandas
import json
from pathlib import Path
data = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))
df = geopandas.GeoDataFrame(data, geometry="geometry").sample(3)
df.to_file(filename="a_random_name.geojson", driver="GeoJSON")
with open(Path.cwd().joinpath("a_random_name.geojson")) as f:
geojson = json.load(f)
# exclude geometry for SO answer
{
k: v
if k != "features"
else [{kk: vv for kk, vv in f.items() if kk != "geometry"} for f in v]
for k, v in geojson.items()
}
output
{'type': 'FeatureCollection',
'crs': {'type': 'name',
'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}},
'features': [{'type': 'Feature',
'properties': {'pop_est': 4917000.0,
'continent': 'Oceania',
'name': 'New Zealand',
'iso_a3': 'NZL',
'gdp_md_est': 206928}},
{'type': 'Feature',
'properties': {'pop_est': 9746117.0,
'continent': 'North America',
'name': 'Honduras',
'iso_a3': 'HND',
'gdp_md_est': 25095}},
{'type': 'Feature',
'properties': {'pop_est': 1912789.0,
'continent': 'Europe',
'name': 'Latvia',
'iso_a3': 'LVA',
'gdp_md_est': 34102}}]}

Related

TimestampedGeoJson with MultiPolygon shows Time Not Available

I want to display a shape over the Canada map.
The idea is 2 shapes in different years.
But my slide at the end says:
"Time Not Available"
I tried to find here at the community, but I haven't found a problem like it.
Here you can find my file and here you can find my code:
import folium
from folium.plugins import TimestampedGeoJson
import json
import pandas as pd
with open('outputfile.json') as f:
poly = json.load(f)
features = [
{
'type': 'Feature',
'geometry': {
'type': 'MultiPolygon',
'coordinates': pol['coordinates'],
},
'properties': {
'ABBREVNAME': pol['ABBREVNAME'],
'time': pol['date'],
}
} for pol in poly
]
mapa = folium.Map(
location = [56.130,-106.35],
tiles='openstreetmap',
zoom_start = 3
)
TimestampedGeoJson({'type': 'FeatureCollection', 'features': features}).add_to(mapa)
mapa
Thanks!!!

I had the same problem of yours of time not available and solved by following the example on the documentation and this other post.
Basically, some key points from doc:
1- It's is not 'time' but "times" and it must be the same length of the list of coordinates
2- Lookout for time format it only takes ISO or ms epoch
enter image description here
Here is a code example of a store location code i was working, I didn't try Polygon yet but hope it helps u:
m = folium.Map([-23.579782, -46.687754], zoom_start=6, tiles="cartodbpositron")
TimestampedGeoJson({
'type': 'FeatureCollection',
'features': [
{
'type': 'Feature',
'geometry': {
'type': 'LineString',
'coordinates': [[-46.687754, -23.579782]],
},
'properties': {
'icon': 'marker',
'iconstyle': {
'iconSize': [20, 20],
'iconUrl':
'https://img.icons8.com/ios-filled/50/000000/online-store.png'
},
'id': 'house',
'popup': 1,
'times': [1633046400000.0]
}
}, {
'type': 'Feature',
'geometry': {
'type': 'LineString',
'coordinates': [[-46.887754, -23.579782]],
},
'properties': {
'icon': 'marker',
'iconstyle': {
'iconSize': [20, 20],
'iconUrl':
'https://img.icons8.com/ios-filled/50/000000/online-store.png'
},
'id': 'house',
'popup': 1,
'times': [1635046400000.0]
}
}
]
}).add_to(m)
folium_static(m)
m.save('test.html')

Folium GeoJson can't change Icon (marker) color and activate the popup

I want to have a bunch of points in the map, with a red Icon and with some text as a popup when you click on it. I need to use features.GeoJson, because I'll create also a Search on a specific layer, so I can't use features.Marker.
I checked this examples: https://nbviewer.jupyter.org/github/python-visualization/folium/tree/master/examples/ But They don't say what key of the properties dictionary of each point change this color. Regarding the popup, even though I add it as a child it doesn't work.
Here is what I get
Here the code:
import folium
from folium import features
m = folium.Map([0, 0], zoom_start=1)
points_ = {'type': 'FeatureCollection',
'features': [{'type': 'Feature',
'properties': {'Codice': 500732, 'Categoria': 'D1', 'Cluster': 3},
'geometry': {'type': 'Point', 'coordinates': [12.34117475, 45.75345246]},
'id': '0'},
{'type': 'Feature',
'properties': {'Codice': 500732, 'Categoria': 'A2', 'Cluster': 3},
'geometry': {'type': 'Point', 'coordinates': [12.34117475, 45.75345246]},
'id': '1'}]}
pp = folium.Popup("hello")
ic = features.Icon(color="red")
gj = folium.GeoJson(points)#, tooltip=tooltip)
gj.add_child(ic)
gj.add_child(pp)
m.add_child(gj)
m

The standard icon just loads from https://cdn.jsdelivr.net/npm/leaflet#1.6.0/dist/images/marker-icon.png
So either you can change its color by using something like this
or you can use a different built-in icon instead:
points = {'type': 'FeatureCollection',
'features': [{'type': 'Feature',
'properties': {'Codice': 500732, 'Categoria': 'D1', 'Cluster': 3},
'geometry': {'type': 'Point', 'coordinates': [12.34117475, 45.75345246]},
'id': '0'},
{'type': 'Feature',
'properties': {'Codice': 500732, 'Categoria': 'A2', 'Cluster': 3},
'geometry': {'type': 'Point', 'coordinates': [12.32117475, 45.72345246]},
'id': '1'}]}
gj = folium.GeoJson(points)
feature_group = folium.FeatureGroup('markers')
for feature in gj.data['features']:
if feature['geometry']['type'] == 'Point':
folium.Marker(location=list(reversed(feature['geometry']['coordinates'])),
icon=folium.Icon(color='red'),
popup='Hello',
Categoria=feature['properties']['Categoria']
).add_to(feature_group)
feature_group.add_to(m)
Search(
layer=feature_group,
search_label="Categoria",
).add_to(m)
m

Filter python dictionary with dictionary-comprehension

I have a dictionary that is really a geojson:
points = {
'crs': {'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}, 'type': 'name'},
'features': [
{'geometry': {
'coordinates':[[[-3.693162104185235, 40.40734504903418],
[-3.69320229317164, 40.40719570724241],
[-3.693227952841606, 40.40698546120488],
[-3.693677594635894, 40.40712700492216]]],
'type': 'Polygon'},
'properties': {
'name': 'place1',
'temp': 28},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place2',
'temp': 27},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place',
'temp': 25},
'type': 'Feature'
}
],
'type': u'FeatureCollection'
}
I would like to filter it to stay only with places that have a specific temperature, for example, more than 25 degrees Celsius.
I have managed to do it this way:
dict(crs = points["crs"],
features = [i for i in points["features"] if i["properties"]["temp"] > 25],
type = points["type"])
But I wondered if there was any way to do it more directly, with dictionary comprehension.
Thank you very much.

I'm very late. A dict compreheneison won't help you since you have only three keys. But if you meet the following conditions: 1. you don't need a copy of features (e.g. your dict is read only); 2. you don't need index access to features, you my use a generator comprehension instead of a list comprehension:
dict(crs = points["crs"],
features = (i for i in points["features"] if i["properties"]["temp"] > 25),
type = points["type"])
The generator is created in constant time, while the list comprehension is created in O(n). Furthermore, if you create a lot of those dicts, you have only one copy of the features in memory.

Python: retrieve arbitrary dictionary path and amend data?

Simple Python question, but I'm scratching my head over the answer!
I have an array of strings of arbitrary length called path, like this:
path = ['country', 'city', 'items']
I also have a dictionary, data, and a string, unwanted_property. I know that the dictionary is of arbitrary depth and is dictionaries all the way down, with the exception of the items property, which is always an array.
[CLARIFICATION: The point of this question is that I don't know what the contents of path will be. They could be anything. I also don't know what the dictionary will look like. I need to walk down the dictionary as far as the path indicates, and then delete the unwanted properties from there, without knowing in advance what the path looks like, or how long it will be.]
I want to retrieve the parts of the data object (if any) that matches the path, and then delete the unwanted_property from each.
So in the example above, I would like to retrieve:
data['country']['city']['items']
and then delete unwanted_property from each of the items in the array. I want to amend the original data, not a copy. (CLARIFICATION: By this I mean, I'd like to end up with the original dict, just minus the unwanted properties.)
How can I do this in code?
I've got this far:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
{
'name': '114th Street',
'unwanted_property': 'foo',
},
{
'name': '8th Avenue',
'unwanted_property': 'foo',
},
]
}
}
}
for p in path:
if p == 'items':
data = [i for i in data[p]]
else:
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
else:
del data['unwanted_property']
The problem is that this doesn't amend the original data. It also relies on items always being the last string in the path, which may not always be the case.
CLARIFICATION: I mean that I'd like to end up with:
{
'country': {
'city': {
'items': [
{
'name': '114th Street'
},
{
'name': '8th Avenue'
},
]
}
}
}
Whereas what I have available in data is only [{'name': '114th Street'}, {'name': '8th Avenue'}].
I feel like I need something like XPath for the dictionary.

The problem you are overwriting the original data reference. Change your processing code to
temp = data
for p in path:
temp = temp[p]
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
else:
del temp['unwanted_property']
In this version, you set temp to point to the same object that data was referring to. temp is not a copy, so any changes you make to it will be visible in the original object. Then you step temp along itself, while data remains a reference to the root dictionary. When you find the path you are looking for, any changes made via temp will be visible in data.
I also removed the line data = [i for i in data[p]]. It creates an unnecessary copy of the list that you never need, since you are not modifying the references stored in the list, just the contents of the references.
The fact that path is not pre-determined (besides the fact that items is going to be a list) means that you may end up getting a KeyError in the first loop if the path does not exist in your dictionary. You can handle that gracefully be doing something more like:
try:
temp = data
for p in path:
temp = temp[p]
except KeyError:
print('Path {} not in data'.format(path))
else:
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
else:
del temp['unwanted_property']

The problem you are facing is that you are re-assigning the data variable to an undesired value. In the body of your for loop you are setting data to the next level down on the tree, for instance given your example data will have the following values (in order), up to when it leaves the for loop:
data == {'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}}
data == {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}
data == {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}
data == [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]
Then when you delete the items from your dictionaries at the end you are left with data being a list of those dictionaries as you have lost the higher parts of the structure. Thus if you make a backup reference for your data you can get the correct output, for example:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
{
'name': '114th Street',
'unwanted_property': 'foo',
},
{
'name': '8th Avenue',
'unwanted_property': 'foo',
},
]
}
}
}
data_ref = data
for p in path:
if p == 'items':
data = [i for i in data[p]]
else:
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
else:
del data['unwanted_property']
data = data_ref

def delKey(your_dict,path):
if len(path) == 1:
for item in your_dict:
del item[path[0]]
return
delKey( your_dict[path[0]],path[1:])
data
{'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo'}, {'name': '8th Avenue', 'unwanted_property': 'foo'}]}}}
path
['country', 'city', 'items', 'unwanted_property']
delKey(data,path)
data
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}}

You need to remove the key unwanted_property.
names_list = []
def remove_key_from_items(data):
for d in data:
if d != 'items':
remove_key_from_items(data[d])
else:
for item in data[d]:
unwanted_prop = item.pop('unwanted_property', None)
names_list.append(item)
This will remove the key. The second parameter None is returned if the key unwanted_property does not exist.
EDIT:
You can use pop even without the second parameter. It will raise KeyError if the key does not exist.
EDIT 2: Updated to recursively go into depth of data dict until it finds the items key, where it pops the unwanted_property as desired and append into the names_list list to get the desired output.

Using operator.itemgetter you can compose a function to return the final key's value.
import operator, functools
def compose(*functions):
'''returns a callable composed of the functions
compose(f, g, h, k) -> f(g(h(k())))
'''
def compose2(f, g):
return lambda x: f(g(x))
return functools.reduce(compose2, functions, lambda x: x)
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
Then use it like this:
path = ['country', 'city', 'items']
unwanted_property = 'unwanted_property'
for thing in get_items(data):
del thing[unwanted_property]
Of course if the path contains non-existent keys it will throw a KeyError - you probably should account for that:
path = ['country', 'foo', 'items']
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
try:
for thing in get_items(data):
del thing[unwanted_property]
except KeyError as e:
print('missing key:', e)

You can try this:
path = ['country', 'city', 'items']
previous_data = data[path[0]]
previous_key = path[0]
for i in path:
previous_data = previous_data[i]
previous_key = i
if isinstance(previous_data, list):
for c, b in enumerate(previous_data):
if "unwanted_property" in b:
del previous_data[c]["unwanted_property"]
current_dict = {}
previous_data_dict = {}
for i, a in enumerate(path):
if i == 0:
current_dict[a] = data[a]
previous_data_dict = data[a]
else:
if a == previous_key:
current_dict[a] = previous_data
else:
current_dict[a] = previous_data_dict[a]
previous_data_dict = previous_data_dict[a]
data = current_dict
print(data)
Output:
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}, 'items': [{'name': '114th Street'}, {'name': '8th Avenue'}], 'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}

Merging similar dictionaries in a list together

New to python here. I've been pulling my hair for hours and still can't figure this out.
I have a list of dictionaries:
[ {'FX0XST001.MID5': '195', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'}
{'FX0XST001.MID13': '4929', 'Name': 'Firmicutes', 'Taxonomy ID': '1239','Type': 'phylum'},
{'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'},
.
.
.
.
{'FX0XST001.MID6': '125', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}
{'FX0XST001.MID25': '70', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}
{'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'} ]
I want to merge the dictionaries in the list based on their Type, Name, and Taxonomy ID
[ {'FX0XST001.MID5': '195', 'FX0XST001.MID13': '4929', 'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'}
.
.
.
.
{'FX0XST001.MID6': '125', 'FX0XST001.MID25': '70', 'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}]
I have the data structure setup like this because I need to write the data to CSV using csv.DictWriter later.
Would anyone kindly point me to the right direction?

You can use the groupby function for this:
http://docs.python.org/library/itertools.html#itertools.groupby
from itertools import groupby
keyfunc = lambda row : (row['Type'], row['Taxonomy ID'], row['Name'])
result = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
# you can either add the matching rows to the item so you end up with what you wanted
item = {}
for row in g:
item.update(row)
result.append(item)
# or you could just add the matched rows as subitems to a parent dictionary
# which might come in handy if you need to work with just the parts that are
# different
item = {'Type': k[0], 'Taxonomy ID' : k[1], 'Name' : k[2], 'matches': [])
for row in g:
del row['Type']
del row['Taxonomy ID']
del row['Name']
item['matches'].append(row)
result.append(item)

Make some test data:
list_of_dicts = [
{"Taxonomy ID":1, "Name":"Bob", "Type":"M", "hair":"brown", "eyes":"green"},
{"Taxonomy ID":1, "Name":"Bob", "Type":"M", "height":"6'2''", "weight":200},
{"Taxonomy ID":2, "Name":"Alice", "Type":"F", "hair":"black", "eyes":"hazel"},
{"Taxonomy ID":2, "Name":"Alice", "Type":"F", "height":"5'7''", "weight":145}
]
I think this (below) is a neat trick using reduce that improves upon the other groupby solution.
import itertools
def key_func(elem):
return (elem["Taxonomy ID"], elem["Name"], elem["Type"])
output_list_of_dicts = [reduce((lambda x,y: x.update(y) or x), list(val)) for key, val in itertools.groupby(list_of_dicts, key_func)]
Then print the output:
for elem in output_list_of_dicts:
print elem
This prints:
{'eyes': 'green', 'Name': 'Bob', 'weight': 200, 'Taxonomy ID': 1, 'hair': 'brown', 'height': "6'2''", 'Type': 'M'}
{'eyes': 'hazel', 'Name': 'Alice', 'weight': 145, 'Taxonomy ID': 2, 'hair': 'black', 'height': "5'7''", 'Type': 'F'}
FYI, Python Pandas is far better for this sort of aggregation, especially when dealing with file I/O to .csv or .h5 files, than the itertools stuff.

Perhaps the easiest thing to do would be to create a new dictionary, indexed by a (Type, Name, Taxonomy ID) tuple, and iterate over your dictionary, storing values by (Type, Name, Taxonomy ID). Use a default dict to make this easier. For example:
from collections import defaultdict
grouped = defaultdict(lambda : {})
# iterate over items and store:
for entry in list_of_dictionaries:
grouped[(entry["Type"], entry["Name"], entry["Taxonomy ID"])].update(entry)
# now you have everything stored the way you want in values, and you don't
# need the dict anymore
grouped_entries = grouped.values()
This is a bit hackish, especially because you end up overwriting "Type", "Name", and "Phylum" every time you use update, but since your dict keys are variable, that might be the best you can do. This will get you at least close to what you need.
Even better would be to do this on your initial import and skip intermediate steps (unless you actually need to transform the data beforehand). Plus, if you could get at the only varying field, you could change the update to just: grouped[(type, name, taxonomy_id)][key] = value where key and value are something like: 'FX0XST001.MID5', '195'

from itertools import groupby
data = [ {'FX0XST001.MID5': '195', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type':'phylum'},
{'FX0XST001.MID13': '4929', 'Name': 'Firmicutes', 'Taxonomy ID': '1239','Type': 'phylum'},
{'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'},
{'FX0XST001.MID6': '125', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'},
{'FX0XST001.MID25': '70', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'},
{'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'} ,]
kk = ('Name', 'Taxonomy ID', 'Type')
def key(item): return tuple(item[k] for k in kk)
result = []
data = sorted(data, key=key)
for k, g in groupby(data, key):
result.append(dict((i, j) for d in g for i,j in d.items()))
print result

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Prevent adding name attribute when using pandas dataframe to_file() method - python

Related

TimestampedGeoJson with MultiPolygon shows Time Not Available

Folium GeoJson can't change Icon (marker) color and activate the popup

Filter python dictionary with dictionary-comprehension

Python: retrieve arbitrary dictionary path and amend data?

Merging similar dictionaries in a list together

Categories

Resources