citybikes: JSON to Dataframe - python

I am using python-citybikes (https://pypi.org/project/python-citybikes/) to retrieve some data.
However, I can't figure out a way export the data
import citybikes
import pandas as pd
client = citybikes.Client()
GlasgowNextBike = citybikes.Network(client, uid='nextbike-glasgow')
list(GlasgowNextBike.stations)
Stations = list(GlasgowNextBike.stations)
pd.read_json(Stations)
I am getting
Traceback (most recent call last):
File "<ipython-input-15-5a1904def0e8>", line 1, in <module>
pd.read_json(Stations)
File "/Users/noor/opt/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 214, in wrapper
return func(*args, **kwargs)
File "/Users/noor/opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py", line 585, in read_json
path_or_buf, encoding=encoding, compression=compression
File "/Users/noor/opt/anaconda3/lib/python3.7/site-packages/pandas/io/common.py", line 200, in get_filepath_or_buffer
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'list'>
My question is :
How can I export/save the results as JSON or CSV file

Try using the json module, like so:
import citybikes, json
client = citybikes.Client()
GlasgowNextBike = citybikes.Network(client, uid='nextbike-glasgow')
with open('GlasgowNextBike.json', 'w') as f:
json.dump(GlasgowNextBike.data, f, indent=2)

Related

Geopandas not working after importing a file from a different directory

I am trying to make a map in python using shapefiles I have downloaded from bbike.org. Here is my code:
import geopandas as gpd
import os
import sys
import matplotlib.pyplot as plt
bos_files_list = ['buildings.shx', 'landuse.shx', 'natural.shx', 'places.shx', 'points.shx', 'railways.shx', 'roads.shx']
cur_path = os.path.dirname(__file__)
def maps_of_bos(files):
for x in range(len(files)):
os.chdir(f'location/of/file')
f = open(f'{files[x]}', 'r')
gpd.read_file(f)
z = maps_of_bos(bos_files_list)
z.plot()
plt.show()
However, my error output is as follows:
Traceback (most recent call last):
File "test.py", line 16, in <module>
z = maps_of_bos(bos_files_list)
File "test.py", line 13, in maps_of_bos
gpd.read_file(f)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/geopandas/io/f
ile.py", line 76, in read_file
with reader(path_or_bytes, **kwargs) as features:
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in
__enter__
return next(self.gen)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/fiona/__init__
.py", line 206, in fp_reader
dataset = memfile.open()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/fiona/io.py",
line 63, in open
return Collection(vsi_path, 'w', crs=crs, driver=driver,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/fiona/collecti
on.py", line 126, in __init__
raise DriverError("no driver")
fiona.errors.DriverError: no driver
I am relatively new to python, and don't really understand my error. can someone please help me?
According to the docs read_file should take the path to the file not an object.
gpd.read_file(f'{files[x]}')
you dont need
f = open(f'{files[x]}', 'r')

how to explode dict (or list of dict) object in multiple column in dask.dataframe

When I try to convert some xml to dataframe using xmltodict it happens that a particular column contains all the info I need as dict or list of dict. I'm able to convert this column in multiple ones with pandas but I'm not able to perform the similar operation in dask.
Is not possible to use meta because I've no idea of all the possible fields that are available in the xml and dask is necessary because the true xml files are bigger than 1Gb each.
example.xml:
<?xml version="1.0" encoding="UTF-8"?>
<itemList>
<eventItem uid="1">
<timestamp>2019-07-04T09:57:35.044Z</timestamp>
<eventType>generic</eventType>
<details>
<detail>
<name>columnA</name>
<value>AAA</value>
</detail>
<detail>
<name>columnB</name>
<value>BBB</value>
</detail>
</details>
</eventItem>
<eventItem uid="2">
<timestamp>2019-07-04T09:57:52.188Z</timestamp>
<eventType>generic</eventType>
<details>
<detail>
<name>columnC</name>
<value>CCC</value>
</detail>
</details>
</eventItem>
</itemList>
Working pandas code:
import xmltodict
import collections
import pandas as pd
def pd_output_dict(details):
detail = details.get("detail", [])
ret_value = {}
if type(detail) in (collections.OrderedDict, dict):
ret_value[detail["name"]] = detail["value"]
elif type(detail) == list:
for i in detail:
ret_value[i["name"]] = i["value"]
return pd.Series(ret_value)
with open("example.xml", "r", encoding="utf8") as f:
df_dict_list = xmltodict.parse(f.read()).get("itemList", {}).get("eventItem", [])
df = pd.DataFrame(df_dict_list)
df = pd.concat([df, df.apply(lambda row: pd_output_dict(row.details), axis=1, result_type="expand")], axis=1)
print(df.head())
Not working dask code:
import xmltodict
import collections
import dask
import dask.bag as db
import dask.dataframe as dd
def dd_output_dict(row):
detail = row.get("details", {}).get("detail", [])
ret_value = {}
if type(detail) in (collections.OrderedDict, dict):
row[detail["name"]] = detail["value"]
elif type(detail) == list:
for i in detail:
row[i["name"]] = i["value"]
return row
with open("example.xml", "r", encoding="utf8") as f:
df_dict_list = xmltodict.parse(f.read()).get("itemList", {}).get("eventItem", [])
df_bag = db.from_sequence(df_dict_list)
df = df_bag.to_dataframe()
df = df.apply(lambda row: dd_output_dict(row), axis=1)
The idea is to have in dask similar result I've in pandas but a the moment I'm receiving errors:
>>> df = df.apply(lambda row: output_dict(row), axis=1)
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\dask\dataframe\utils.py", line 169, in raise_on_meta_error
yield
File "C:\Anaconda3\lib\site-packages\dask\dataframe\core.py", line 4711, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "C:\Anaconda3\lib\site-packages\dask\utils.py", line 854, in __call__
return getattr(obj, self.method)(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 6487, in apply
return op.get_result()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 151, in get_result
return self.apply_standard()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 257, in apply_standard
self.apply_series_generator()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 286, in apply_series_generator
results[i] = self.f(v)
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 4, in output_dict
AttributeError: ("'str' object has no attribute 'get'", 'occurred at index 0')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda3\lib\site-packages\dask\dataframe\core.py", line 3964, in apply
M.apply, self._meta_nonempty, func, args=args, udf=True, **kwds
File "C:\Anaconda3\lib\site-packages\dask\dataframe\core.py", line 4711, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "C:\Anaconda3\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "C:\Anaconda3\lib\site-packages\dask\dataframe\utils.py", line 190, in raise_on_meta_error
raise ValueError(msg)
ValueError: Metadata inference failed in `apply`.
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
AttributeError("'str' object has no attribute 'get'", 'occurred at index 0')
Traceback:
---------
File "C:\Anaconda3\lib\site-packages\dask\dataframe\utils.py", line 169, in raise_on_meta_error
yield
File "C:\Anaconda3\lib\site-packages\dask\dataframe\core.py", line 4711, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "C:\Anaconda3\lib\site-packages\dask\utils.py", line 854, in __call__
return getattr(obj, self.method)(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 6487, in apply
return op.get_result()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 151, in get_result
return self.apply_standard()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 257, in apply_standard
self.apply_series_generator()
File "C:\Anaconda3\lib\site-packages\pandas\core\apply.py", line 286, in apply_series_generator
results[i] = self.f(v)
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 4, in output_dict
Right, so operations like map_partitions will need to know the column names and data types. As you've mentioned, you can specify this with the meta= keyword.
Perhaps you can run through your data once to compute what these will be, and then construct a proper meta object, and pass that in? This is inefficient, and requires reading through all of your data, but I'm not sure that there is another way.

JSON writing to a file error, Extra data: line 1 column 2347 (char 2346)

import json
import requests
response = requests.get('SOME_LINK_THAT_IVE_REMOVED')
try:
response.raise_for_status()
except requests.exceptions.HTTPError:
pass
#print (response.text)
with open('stats.json', 'w') as output:
json.dump(response.json(), output, indent = 4)
what it printed:
{"sessionid":"F9269D5D-2B5C-432B-B8BC-34F87F790619","game_clock_display":"02:12.84","game_clock":132.8432,"game_status":"playing","possession":[0,2],"teams":[{"players":[{"name":"Nooth","playerid":2,"position":[-8.6900005,-0.28800002,-8.6470003],"stats":{"possession_time":22.589041,"points":0,"saves":0,"goals":0,"stuns":3,"passes":0,"catches":0,"steals":0,"blocks":0,"interceptions":0,"assists":0,"shots_taken":0},"userid":658915067565875,"possession":false},{"name":"erikmelkumyan","playerid":3,"position":[-0.72400004,1.7060001,-28.595001],"stats":{"possession_time":9.5638027,"points":0,"saves":0,"goals":0,"stuns":3,"passes":0,"catches":0,"steals":0,"blocks":0,"interceptions":0,"assists":0,"shots_taken":1},"userid":2126518170756015,"possession":false},{"name":"Sandman187_","playerid":4,"position":[-2.3990002,2.3380001,-26.783001],"stats":{"possession_time":27.565685,"points":0,"saves":1,"goals":0,"stuns":4,"passes":0,"catches":0,"steals":0,"blocks":0,"interceptions":0,"assists":0,"shots_taken":1},"userid":1611289978936588,"possession":true}],"team":"BLUE
TEAM","possession":true,"stats":{"points":0,"possession_time":59.718529,"interceptions":0,"blocks":0,"steals":0,"catches":0,"passes":0,"saves":1,"goals":0,"stuns":10,"assists":0,"shots_taken":2}},{"players":[{"name":"MooneyWhy","playerid":0,"position":[-4.539,1.399,-13.481001],"stats":{"possession_time":14.364853,"points":2,"saves":1,"goals":0,"stuns":7,"passes":0,"catches":0,"steals":1,"blocks":0,"interceptions":0,"assists":0,"shots_taken":1},"userid":1265147863612788,"possession":false},{"name":"b-love","playerid":1,"position":[-11.484,2.072,0.70500004],"stats":{"possession_time":50.680099,"points":6,"saves":1,"goals":0,"stuns":1,"passes":0,"catches":0,"steals":0,"blocks":0,"interceptions":0,"assists":0,"shots_taken":1},"userid":1457786340976218,"possession":false},{"name":"onikaze","playerid":5,"position":[-7.6980004,1.268,-11.036],"stats":{"possession_time":17.629295,"points":0,"saves":0,"goals":0,"stuns":6,"passes":0,"catches":0,"steals":0,"blocks":0,"interceptions":0,"assists":0,"shots_taken":2},"userid":1636331273057819,"possession":false}],"team":"ORANGE
TEAM","possession":false,"stats":{"points":8,"possession_time":82.674248,"interceptions":0,"blocks":0,"steals":1,"catches":0,"passes":0,"saves":2,"goals":0,"stuns":14,"assists":0,"shots_taken":4}}]}
Traceback (most recent call last):
File "C:\Users\Kai\Desktop\Python\testv4.py", line 19, in <module>
json.dump(response.json(), output, indent = 4)
File "C:\Users\Kai\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\models.py",
line 897, in json
return complexjson.loads(self.text, **kwargs)
File "C:\Users\Kai\AppData\Local\Programs\Python\Python37-32\lib\json__init__.py",
line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\Kai\AppData\Local\Programs\Python\Python37-32\lib\json\decoder.py",
line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 2347 (char 2346)
Any help? Seems to create invalid JSON file in the text file.
Creates a NULL value on the end of the JSON files when viewed in NotePad++

ValueError: Extra data: Importing multiple JSON frame from JSON file in python

I am trying to import multiple JSON frames stored in JSON file to python.
My code is:
import json
import array
with open("J1.json") as J:
j_Data = json.load(J)
print j_Data
Error :
Traceback (most recent call last):
File "/home/abhi/Desktop/CSS HTML/Python Mongo/JSONtoMongoDB.py", line 9, in <module>
j_Data = json.load(J)
File "/usr/lib/python2.7/json/__init__.py", line 290, in load
**kw)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 1113 - line 1 column 2225 (char 1112 - 2224)
My JSON file data is as follows:
If you can make array of your JSON frames in JSON file:
[{"yphthd": "123.32"} , {"yphthd": "123.32"}, ... {"yphthd": "123.32"}]
and then load it:
with open("J1.json") as J:
j_Data = json.load(J)
print j_Data

loading json file using python

I am trying to load json file availalble here : https://gist.githubusercontent.com/anonymous/e5ef9cb96acb98e1f813d5166d472c70/raw/eabf219c51ace122ad82b7037bbf93d347fb4a9b/data.json
with open('data.json') as data_file:
data = json.load(data_file)
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/karimk/python/lib/python2.7/json/__init__.py", line 291, in load
**kw)
File "/home/karimk/python/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/home/karimk/python/lib/python2.7/json/decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 3 column 1 - line 6 column 740 (char 2826 - 16384)
What wrong am I doing here?
What wrong : json is not valid
Solution : update data to be a valid json, by example remove end of line inside string item.

Categories