Python Pandas ValueError: Expected object or value - python

I am trying to have Python read a JSON file and export it to a CSV. I am using Pandas for the conversion, but I am getting "ValueError: Expected object or value" when I run the code below.
import pandas as pd
df = pd.read_json ('contacts.json')
I am using Visual Studio Code for testing the script. When I run the above code, I get the message below in the Terminal window.
PS C:\Users\TaRan\tableau-wdc-tutorial-part-1> & "C:/Program
Files/Python38/python.exe" "c:/Users/TaRan/Dropbox/Team Operational
Resources/G. BI Internal/Testing/Hubspot/conversion.py"
Traceback (most recent call last):
File "c:/Users/TaRan/Dropbox/Team Operational Resources/G. BI
Internal/Testing/Hubspot/conversion.py", line 3, in
df=pd.read_json('contacts.txt') File "C:\Program
Files\Python38\lib\site-packages\pandas\util_decorators.py", line
199, in wrapper
return func(*args, **kwargs) File "C:\Program
Files\Python38\lib\site-packages\pandas\util_decorators.py", line
296, in wrapper
return func(*args, **kwargs) File "C:\Program
Files\Python38\lib\site-packages\pandas\io\json_json.py", line 618,
in read_json
result = json_reader.read() File "C:\Program
Files\Python38\lib\site-packages\pandas\io\json_json.py", line 755,
in read
obj = self._get_object_parser(self.data) File "C:\Program
Files\Python38\lib\site-packages\pandas\io\json_json.py", line 777,
in _get_object_parser
obj = FrameParser(json, **kwargs).parse() File "C:\Program
Files\Python38\lib\site-packages\pandas\io\json_json.py", line 886,
in parse
self._parse_no_numpy() File "C:\Program
Files\Python38\lib\site-packages\pandas\io\json_json.py", line 1119,
in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None ValueError:
Expected object or value
I thought it might be a problem with the JSON file, so I wrote a different one, but I still received the error. To me, it looks like something might be wrong with the Panadas package. I tried reinstalling it, but I still get the error.
EDIT
Here is a sample from the JSON file. I am displaying only one contact and changed the confidential information.
{"results":[{"id":"101","properties":{"createdate":"2020-06-05T15:18:37.746Z","email":"someone#aplace.com","firstname":"First","hs_object_id":"101","lastmodifieddate":"2020-08-12T15:17:35.104Z","lastname":"Last"},"createdAt":"2020-06-05T15:18:37.746Z","updatedAt":"2020-08-12T15:17:35.104Z","archived":false}],"paging":{"next":{"after":"452","link":"https://api.hubapi.com/sampleurl.com"},"prev":null}}
I am getting the JSON file from the Hubspot API. I am not doing any kind of formatting before pulling it into Python for the conversion (nor do I want to - I am trying to automate this entire process). Please note that my JSON is all on one line. I am not sure if this matters or not.

Are you sure your json file is correctly formatted? I wrote this json file and it seems to work fine for me.
{
"Name": {
"0": "John",
"1": "Nick",
"2": "Ali",
"3": "Joseph"
},
"Gender": {
"0": "Male",
"1": "Male",
"2": "Female",
"3": "Male"
},
"Nationality": {
"0": "UK",
"1": "French",
"2": "USA",
"3": "Brazil"
},
"Age": {
"0": 10,
"1": 25,
"2": 35,
"3": 29
}
}
I used the same code you wrote, but added a print statement to check the output and I was able to print out the head of the dataframe.
% python test.py
Name Gender Nationality Age
0 John Male UK 10
1 Nick Male French 25
2 Ali Female USA 35
3 Joseph Male Brazil 29
EDIT: Using the JSON you provided it looks like it is malformed. The JSON you provided is missing a closing "]" and also there were some missing brackets in the second array item.
It should look like this depending on what you're trying to do.
{
"results": [
{
"id": "101",
"properties": {
"createdate": "2020-06-05T15:18:37.746Z",
"email": "someone#aplace.com",
"firstname": "First",
"hs_object_id": "101",
"lastmodifieddate": "2020-08-12T15:17:35.104Z",
"lastname": "Last"
},
"createdAt": "2020-06-05T15:18:37.746Z",
"updatedAt": "2020-08-12T15:17:35.104Z",
"archived": false
},
{
"paging": {
"next": {
"after": "452",
"link": "https://api.hubapi.com/sampleurl.com"
},
"prev": null
}
}
]
}

You have problems in your JSON format. e.g. in the posted part you have '[' but don't have ']'

Related

How do I do multiple JSON entries with Python?

I'm trying to pull some data from a flight simulation JSON table. It's updated every 15 seconds and I've been trying to pull print(obj['pilots']['flight_plans']['cid']). However im getting the error
Traceback (most recent call last):
File "main.py", line 18, in <module>
print(obj['pilots']['flight_plans']['cid'])
TypeError: list indices must be integers or slices, not str
My code is below
import json
from urllib.request import urlopen
import urllib
# initial setup
URL = "https://data.vatsim.net/v3/vatsim-data.json"
# json entries
response = urllib.request.urlopen(URL)
str_response = response.read().decode('utf-8')
obj = json.loads(str_response)
# result is connections
# print(obj["general"]["connected_clients"])
print(obj['pilots']['flight_plans']['cid'])
The print(obj["general"]["connected_clients"]) does work.
Investigate your obj with print(json.dumps(obj,indent=2). You'll find that the pilots key is a list of dictionaries containing flight_plan (not plural) and cid keys. Here's the first few lines:
{
"general": {
"version": 3,
"reload": 1,
"update": "20220301062202",
"update_timestamp": "2022-03-01T06:22:02.245318Z",
"connected_clients": 292,
"unique_users": 282
},
"pilots": [
{
"cid": 1149936,
"name": "1149936",
"callsign": "URO504",
"server": "UK",
"pilot_rating": 0,
"latitude": -23.39706,
"longitude": -46.3709,
"altitude": 9061,
"groundspeed": 327,
"transponder": "0507",
"heading": 305,
"qnh_i_hg": 29.97,
"qnh_mb": 1015,
"flight_plan": {
"flight_rules": "I",
"aircraft": "A346",
...
For example, iterate over the list of pilots and print name/cid:
for pilot in obj['pilots']:
print(pilot['name'],pilot['cid'])
Output:
1149936 1149936
Nick Aydin OTHH 1534423
Oguz Aydin 1429318
Marvin Steglich LSZR 1482019
Daniel Krol EPKK 1279199
... etc ...

Why doesn't Python3 fine a json-file which is in the same directory when I use the json.load() method

I'm learning Python3 and I'm trying to create an object Agent (a custom object) by initiating the attributes of it from a JSON file.
The problem is that when I launch my python file, it does not find the file, which is in the same directory. I checked the name and the is no typo. I don't understand where the problem really is.
Here is my folder structure:
project/
model.py
agents-100k.json
Here is my model.py file
import json
class Agent:
def __init__(self, **agent_attributes):
"""Constructor of Agent class"""
# Print each element of dict
print(agent_attributes.items())
# Get the name and the value of each entry in dict
for attr_name, attr_value in agent_attributes.items():
# setattr(instance, attribute_name, attribute_value)
setattr(self, attr_name, attr_value)
def say_hello(self, first_name):
"""Say hello to name given in argument"""
return "Hello " + first_name + "!"
def main():
for agent_attributes in json.load(open("agents-100k.json")):
agent = Agent(**agent_attributes)
print(agent.agreeableness)
main()
Here is a sample of the agents-100k.json file (there are a lot of entries, so I will just show two of them):
[
{
"age": 84,
"agreeableness": -0.8437190198916452,
"conscientiousness": 0.6271643010309115,
"country_name": "China",
"country_tld": "cn",
"date_of_birth": "1933-12-27",
"extraversion": 0.3229563709288293,
"id": 227417393,
"id_str": "bNn-9Gc",
"income": 9881,
"internet": false,
"language": "Standard Chinese or Mandarin",
"latitude": 33.15219798270325,
"longitude": 100.85840672174572,
"neuroticism": 0.15407262417068612,
"openness": 0.041970542572878806,
"religion": "unaffiliated",
"sex": "Male"
},
{
"age": 6,
"agreeableness": -0.40747441203817747,
"conscientiousness": 0.4352286422343134,
"country_name": "Haiti",
"country_tld": "ht",
"date_of_birth": "2011-12-21",
"extraversion": 1.4714618156987345,
"id": 6821129477,
"id_str": "bt3-xj9",
"income": 1386,
"internet": false,
"language": "Creole",
"latitude": 19.325567983697297,
"longitude": -72.43795260265814,
"neuroticism": -0.4503674752682471,
"openness": -0.879092424231703,
"religion": "Protestant",
"sex": "Female"
},
...
]
And finally, this is the error I get when I run python3 project/model.py:
Traceback (most recent call last):
File "project/model.py", line 50, in <module>
for agent_attributes in json.load(open("agents-100k.json")):
IOError: [Errno 2] No such file or directory: 'agents-100k.json'
Is there something I did wrong ?
Thanks for your help anyway.
Python opens the file relative to where the script is executed. So if you run the file with project/model.py the json should be outside of the project folder.
If the json is always included on the same folder that your python file you can use the follow code to open the file:
import json
import os
path = os.path.dirname(os.path.abspath(__file__))
import jso
def main():
for agent_attributes in json.load(open(os.path.join(path, "agents-100k.json")):
agent = Agent(**agent_attributes)
print(agent.agreeableness)
main()
This question gives a more detailed explanation on how it works.

OverflowError: MongoDB can only handle up to 8-byte ints?

I have spent the last 12 hours scouring the web. I am completely lost, please help.
I am trying to pull data from an API endpoint and put it into MongoDB. The data looks like this:
{"_links": {
"self": {
"href": "https://us.api.battle.net/data/sc2/ladder/271302?namespace=prod"
}
},
"league": {
"league_key": {
"league_id": 5,
"season_id": 37,
"queue_id": 201,
"team_type": 0
},
"key": {
"href": "https://us.api.battle.net/data/sc2/league/37/201/0/5?namespace=prod"
}
},
"team": [
{
"id": 6956151645604413000,
"rating": 5321,
"wins": 131,
"losses": 64,
"ties": 0,
"points": 1601,
"longest_win_streak": 15,
"current_win_streak": 4,
"current_rank": 1,
"highest_rank": 10,
"previous_rank": 1,
"join_time_stamp": 1534903699,
"last_played_time_stamp": 1537822019,
"member": [
{
"legacy_link": {
"id": 9964871,
"realm": 1,
"name": "mTOR#378",
"path": "/profile/9964871/1/mTOR"
},
"played_race_count": [
{
"race": "Zerg",
"count": 195
}
],
"character_link": {
"id": 9964871,
"battle_tag": "Hellghost#11903",
"key": {
"href": "https://us.api.battle.net/data/sc2/character/Hellghost-11903/9964871?namespace=prod"
}
}
}
]
},
{
"id": 11611747760398664000, .....
....
Here's the code:
for ladder_number in ladder_array:
ladder_call_url = ladder_call+slash+str(ladder_number)+eng_locale+access_token
url = str(ladder_call_url)
response = requests.get(url)
print('trying ladder number '+str(ladder_number))
print('calling :'+url)
if response.status_code == 200:
print('status: '+str(response))
mmr_db.ladders.insert_one(response.json())
I get an error:
OverflowError: MongoDB can only handle up to 8-byte ints?
Is this because the data I am trying to load is too large? Are the "ID" integers too large?
Oh man, any help would be sincerely appreciated.
_______ EDIT ____________
Edited to include the Traceback:
Traceback (most recent call last):
File "C:\scripts\mmr_from_ladders.py", line 96, in <module>
mmr_db.ladders.insert_one(response.json(), bypass_document_validation=True)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\collection.py", line 693, in insert_one
session=session),
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\collection.py", line 607, in _insert
bypass_doc_val, session)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\collection.py", line 595, in _insert_one
acknowledged, _insert_command, session)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\mongo_client.py", line 1243, in _retryable_write
return self._retry_with_session(retryable, func, s, None)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\mongo_client.py", line 1196, in _retry_with_session
return func(session, sock_info, retryable)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\collection.py", line 590, in _insert_command
retryable_write=retryable_write)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\pool.py", line 584, in command
self._raise_connection_failure(error)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\pool.py", line 745, in _raise_connection_failure
raise error
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\pool.py", line 579, in command
unacknowledged=unacknowledged)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\network.py", line 114, in command
codec_options, ctx=compression_ctx)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\message.py", line 679, in _op_msg
flags, command, identifier, docs, check_keys, opts)
OverflowError: MongoDB can only handle up to 8-byte ints
The BSON spec — MongoDB’s native binary extended JSON format / data type — only supports 32 bit (signed) and 64 bit (signed) integers — 8 bytes being 64 bits.
The maximum integer value that can be stored in a 64 bit int is:
9,223,372,036,854,775,807
In your example you appear to have larger ids, for example:
11,611,747,760,398,664,000
I’m guessing that the app generating this data is using uint64 types (unsigned can hold x2-1 values).
I would start by looking at either of these potential solutions, if possible:
Changing the other side to use int64 (signed) types for the IDs.
Replacing the incoming IDs using ObjectId() as you then get a 12 byte ~ GUID for your unique IDs.

Read a JSON file using Python & remove certain elements if they exist and writeback to same/new file

I am having a json file which contains a lot of metadeta. This json file needs to be parsed using a python script which will remove some sensitive information like SSH Keys etc.
JSON file :
{"instanceid": "sfaf", "webapps": {"jolokia": {"Created-By": "Apache Maven", "Build-Jdk": "1.7.0_11", "Manifest-Version": "1.0", "Built-By": "roland", "Archiver-Version": "Plexus Archiver"}, "SC": {"Name": "cver/", "Manifest-Version": "1.0", "Ant-Version": "Apache Ant 1.8.3", "Specification-Vendor": "Sc.", "Implementation-Title": "dgdgd", "Implementation-Version": "3.4.85", "Sealed": "false", "Specification-Version": "1.1", "Specification-Title": "SaaS License Server", "Build-Date": "2013-12-19 09", "Implementation-Vendor": "Saet Inc.", "Created-By": "1.6.0_43-b01 (Sun Microsystems Inc.)"}}, "facter": {"kernelrelease": "3.2.0-65-virtual", "selinux": "false", "memorytotal": "1.65 GB", "swapfree": "896.00 MB", "ec2_block_device_mapping_swap": "sda3", "ec2_public_ipv4": "5.2.1.0", "ec2_placement_availability_zone": "us-west-1a", "operatingsystem": "Ubuntu", "lsbmajdistrelease": "12", "rubyversion": "1.8.7", "ipaddress_lo": "127.0.0.1", "facterversion": "1.6.5", "is_virtual": "false", "network_lo": "127.0.0.0", "ec2_block_device_mapping_root": "/dev/sda1", "memoryfree": "1.20 GB", "uptime_seconds": 12823192, "ec2_reservation_id": "r-515b720f", "ec2_local_ipv4": "10.188.22.97", "ec2_block_device_mapping_ami": "/dev/sda1", "memorysize": "1.65 GB", "swapsize": "896.00 MB", "ec2_public_keys_0_openssh_key": "ssh-rsa NFb1BSbJkNEHpW35/anJMqw/s6x+ykYELuOPk2JLt andy", "uniqueid": "bc0a6116", "processorcount": "1", "kernelmajversion": "3.2", "macaddress": "22:00:0a:bc:16:61", "ec2_hostname": "ip-.us-west-1.compute.internal", "network_eth0": "101", "uptime_hours": 3561, "ec2_security_groups": "na-prod-1w-secgroup", "rubysitedir": "/usr/local/lib/site_ruby/1.8", "sshecdsakey": "fsafsfsafsasa=", "architecture": "amd64", "netmask_eth0": "255.255.255.192", "arp": "fe:ff:ff:ff:ff:ff", "netmask_lo": "255.0.0.0", "domain": "us-west-1.compute.internal", "puppetversion": "2.7.11", "kernel": "Linux", "uptime_days": 148, "ec2_ami_launch_index": "0", "ec2_public_hostname": "ec2-50-18-7q-a.us-east-1.compute.amazonaws.com", "augeasversion": "0.10.0", "ec2_instance_type": "m1.small", "ec2_profile": "default-paravirtual", "timezone": "UTC", "hardwareisa": "x86_64", "id": "root", "ec2_ami_id": "ami-a5616be0", "ec2_local_hostname": "ip-10-138-22-27.us-west-1.compute.internal", "uptime": "148 days", "macaddress_eth0": "22:00:0aa:bc:a16:61", "hostname": "i", "lsbdistid": "Ubuntu", "virtual": "physical", "ec2_ami_manifest_path": "(unknown)", "ec2_instance_id": "i-3599826b", "sshdsakey": "WF8D0q7m/TpbMKoUAofpUYwmMmLKyC71yXhh3Q0ZCBT8AAACAbnaEqRnJD1YFrzOHs0H/pVoh/6mEXXKeoL9MYhZDGUNhNKhIKvIABCX9tK1jJEItZwnGxRvyEI=", "arp_eth0": "fe:ff:ff:ff:ff:ff", "hardwaremodel": "x86_64", "osfamily": "Debian", "sshrsakey": "AAAAB3NzaC1yc2EAAAADAsTafQ/QHvoFMycI0oo3qG5zCPI5c4NmlQDmTXP4xArXYy3oltG4mYly6MwqxELFcWKpaM1f0TCFFzCAVe8XUyyp7qt7qAZ4aJ+LbFd+TLJrE+H", "ps": "ps -ef", "interfaces": "eth0,lo", "physicalprocessorcount": 1, "ec2_kernel_id": "aki-880531cd", "path": "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "ipaddress": "10.188.22.97", "lsbdistdescription": "Ubuntu 12.04.4 LTS", "kernelversion": "3.2.0", "operatingsystemrelease": "12.04", "processor0": "Intel(R) Xeon(R) CPU E5-2651 v2 # 1.80GHz", "fqdn": "ip-10-188-22-97.us-west-1.compute.internal", "lsbdistcodename": "precise", "lsbdistrelease": "12.04", "ipaddress_eth0": "10.188.22.97", "netmask": "255.255.255.192"}, "os_packages": {"tomcat7": "7.0.26-1ubuntu1.2", "openjdk-6-jre-headless": "6b31-1.13.3-1ubuntu1~0.12.04.2", "linux-image-virtual": "3.2.0.65.77", "openssl": "1.0.1-4ubuntu5.16", "openssh-server": "1:5.9p1-5ubuntu1.4"}}
I am trying out a basic code just to remove the keys as follows:
import json
j = json.load(open("blob.json"))
obj = json.loads(j)
for element in obj:
del element['sshdsakey','sshrsakey']
json_data.write(obj)
But this throws a error:
Traceback (most recent call last):
File "popp.py", line 4, in <module>
obj = json.loads(j)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
What am i doing wrong mates?
Because you're trying to load a file object(open("blob.json") will produce a file object).
Try this:
with open("blob.json") as f:
obj = json.loads(f.read())
for element in obj:
del element['sshdsakey','sshrsakey']
json_data.write(obj)
And you're problem is about del element['sshdsakey','sshrsakey']. Use for on a dict will return the keys as string, so you're doing something like del 'foobar'.
You can do something like del obj[element]['sshrsakey'], but it will raise a key error if there is no obj[element]['sshrsakey'].
I have a solution using regex, but I'm sure that there's a more pythonic way(And I think this way is the worst way, but maybe can works):
import re
with open("blob.json") as f:
s = f.read()
s = re.sub(', "ssh.*key": ".+?"', '', s)
s = re.sub('{"ssh.*key": ".+?", ', '{', s)
s = re.sub(', "ssh.*key": ".+?"}', '}', s)
obj = json.loads(s)
json_data.write(obj)
Just do obj = json.load(open("blob.json"))

Decoding JSON file from Twitter in Python using simplejson

A small part of my JSON file looks like the following. It passed a JSON validator. (I added cl
{
"next_page": "?page=2&max_id=210389654296469504&q=cocaine&geocode=40.665572%2C-73.923557%2C10mi&rpp=100",
"completed_in": 0.289,
"max_id_str": "210389654296469504",
"since_id_str": "0",
"refresh_url": "?since_id=210389654296469504&q=cocaine&geocode=40.665572%2C-73.923557%2C10mi",
"results": [
{
"iso_language_code": "en",
"to_user_id": 486935435,
"to_user_id_str": "486935435",
"profile_image_url_https": "https://si0.twimg.com/profile_images/1561856049/Zak_W_Photo_normal.jpg",
"from_user_id_str": "82389940",
"text": "#Bill__Murray cocaine > productivity! Last night I solved the euro crisis and designed a new cat. If I could only find that napkin.",
"from_user_name": "Zak Williams",
"in_reply_to_status_id_str": "210319741322133504",
"profile_image_url": "http://a0.twimg.com/profile_images/1561856049/Zak_W_Photo_normal.jpg",
"id": 210389654296469500,
"to_user": "Bill__Murray",
"source": "<a href="http://twitter.com/#!/download/iphone" rel="nofollow">Twitter for iPhone</a>",
"in_reply_to_status_id": 210319741322133500,
"to_user_name": "Bill Murray",
"location": "Brooklyn",
"from_user": "zakwilliams",
"from_user_id": 82389940,
"metadata": {
"result_type": "recent"
},
"geo": "null",
"created_at": "Wed, 06 Jun 2012 15:16:17 +0000",
"id_str": "210389654296469504"
}
]
}
When I try to load this in Python by typing the following code I get the following error.
Code
import simplejson as json
testname = 'test.txt'
record = json.loads(testname)
Error
raise JSONDecodeError("No JSON object could be decoded", s, idx)
simplejson.decoder.JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)
What am I doing wrong? In fact, I generated the file by using simplejson.dump
The json.loads() function loads JSON data from a string, and you are just giving it a file name. The string test.txt is not a valid JSON string. Try the following to load JSON data from a file:
with open(testname) as f:
record = json.load(f)
(If you're using an old version of Python that doesn't support the with statement (as possibly indicated by your use of the old simplejson), then you'll have to open and close the file yourself.)

Categories