I am aware that there are several other posts on Stack Overflow regarding this same issue, however, not a single solution found on those posts, or any other post I've found online for that matter, has worked. I have followed numerous tutorials, videos, books, and Stack Overflow posts on pandas and all mentioned solutions have failed.
The frustrating thing is that all the solutions I have found are correct, or at least they should be; I am fairly new to pandas so my only conclusion is that I am probably doing something wrong.
Here is the pandas documentation that I started with: Pandas to_json Doc. I can't seem to get pandas to_json to convert a pandas DataFrame to a json object or json string.
Basically, I want to convert a csv string into a DataFrame, then convert that DataFrame into a json object or json string (I don't care which one). Then, once I have my json data structure, I'm going to bind it to a D3.js bar chart
Here is an example of what I am trying to do:
# Declare my csv string (Works):
csvStr = '"pid","dos","facility","a1c_val"\n"123456","2013-01-01 13:37:00","UOFU",5.4\n"65432","2014-01-01 14:32:00","UOFU",5.8\n"65432","2013-01-01 13:01:00","UOFU",6.4'
print (csvStr) # Just checking the variables contents
# Read csv and convert to DataFrame (Works):
csvDf = pandas.read_csv(StringIO.StringIO(csvStr))
print (csvDf) # Just checking the variables contents
# Convert DataFrame to json (Three of the ways I tried - None of them work):
myJSON = csvDf.to_json(path_or_buf = None, orient = 'record', date_format = 'epoch', double_precision = 10, force_ascii = True, date_unit = 'ms', default_handler = None) # Attempt 1
print (myJSON) # Just checking the variables contents
myJSON = csvDf.to_json() # Attempt 2
print (myJSON) # Just checking the variables contents
myJSON = pandas.io.json.to_json(csvDf)
print (myJSON) # Just checking the variables contents
The error that I am getting is:
argument 1 must be string or read-only character buffer, not DataFrame
Which is misleading because the documentation says "A Series or DataFrame can be converted to a valid JSON string."
Regardless, I tried giving it a string anyway, and it resulted in the exact same error.
I have tried creating a test scenario, following the exact steps from books and other tutorials and/or posts and it just results in the same error. At this point, I need a simple solution asap. I am open to suggestions, but I must emphasize that I do not have time waste on learning a completely new library.
For you first attempt, the correct string is 'records' not 'record' This worked for me:
myJSON = csvDf.to_json(path_or_buf = None, orient = 'records', date_format = 'epoch', double_precision = 10, force_ascii = True, date_unit = 'ms', default_handler = None) # Attempt 1
Printing gives:
[{"pid":123456,"dos":"2013-01-01 13:37:00","facility":"UOFU","a1c_val":5.4},
{"pid":65432,"dos":"2014-01-01 14:32:00","facility":"UOFU","a1c_val":5.8},
{"pid":65432,"dos":"2013-01-01 13:01:00","facility":"UOFU","a1c_val":6.4}]
It turns out that the problem was becuase of my own stupid mistake. While testing my use of to_json, I copy and pasted an example into my code and went from there. Thinking I had commented out that code, I proceeded to try using to_json with my test data. Turns out the error I was receiving was being thrown from the example code that I had copy and pasted. Once I deleted everything and re-wrote it using my test data it worked.
However, as user667648 (Bair) pointed out, there was another mistake in my code. The orient param was suppose to be orient = 'records' and NOT orient = 'record'.
Related
Currently, I am working on a Boot Sequence in Python for a larger project. For this specific part of the sequence, I need to access a .JSON file (specs.json), establish it as a dictionary in the main program. I then need to take a value from the .JSON file, and add 1 to it, using it's key to find the value. Once that's done, I need to push the changes to the .JSON file. Yet, every time I run the code below, I get the error:
bootNum = spcInfDat['boot_num']
KeyError: 'boot_num'`
Here's the code I currently have:
(Note: I'm using the Python json library, and have imported dumps, dump, and load.)
# Opening of the JSON files
spcInf = open('mki/data/json/specs.json',) # .JSON file that contains the current system's specifications. Not quite needed, but it may make a nice reference?
spcInfDat = load(spcInf)
This code is later followed by this, where I attempt to assign the value to a variable by using it's dictionary key (The for statement was a debug statement, so I could visibly see the Key):
for i in spcInfDat['spec']:
print(CBL + str(i) + CEN)
# Loacting and increasing the value of bootNum.
bootNum = spcInfDat['boot_num']
print(str(bootNum))
bootNum = bootNum + 1
(Another Note: CBL and CEN are just variables I use to colour text I send to the terminal.)
This is the interior of specs.json:
{
"spec": [
{
"os":"name",
"os_type":"getwindowsversion",
"lang":"en",
"cpu_amt":"cpu_count",
"storage_amt":"unk",
"boot_num":1
}
]
}
I'm relatively new with .JSON files, as well as using the Python json library; I only have experience with them through some GeeksforGeeks tutorials I found. There is a rather good chance that I just don't know how .JSON files work in conjunction with the library, but I figure that it would still be worth a shot to check here. The GeeksForGeeks tutorial had no documentation about this, as well as there being minimal I know about how this works, so I'm lost. I've tried searching here, and have found nothing.
Issue Number 2
Now, the prior part works. But, when I attempt to run the code on the following lines:
# Changing the values of specDict.
print(CBL + "Changing values of specDict... 50%" + CEN)
specDict ={
"os":name,
"os_type":ost,
"lang":"en",
"cpu_amt":cr,
"storage_amt":"unk",
"boot_num":bootNum
}
# Writing the product of makeSpec to `specs.json`.
print(CBL + "Writing makeSpec() result to `specs.json`... 75%" + CEN)
jsonobj = dumps(specDict, indent = 4)
with open('mki/data/json/specs.json', "w") as outfile:
dump(jsonobj, outfile)
I get the error:
TypeError: Object of type builtin_function_or_method is not JSON serializable.
Is there a chance that I set up my dictionary incorrectly, or am I using the dump function incorrectly?
You can show the data using:
print(spcInfData)
This shows it to be a dictionary, whose single entry 'spec' has an array, whose zero'th element is a sub-dictionary, whose 'boot_num' entry is an integer.
{'spec': [{'os': 'name', 'os_type': 'getwindowsversion', 'lang': 'en', 'cpu_amt': 'cpu_count', 'storage_amt': 'unk', 'boot_num': 1}]}
So what you are looking for is
boot_num = spcInfData['spec'][0]['boot_num']
and note that the value obtained this way is already an integer. str() is not necessary.
It's also good practice to guard against file format errors so the program handles them gracefully.
try:
boot_num = spcInfData['spec'][0]['boot_num']
except (KeyError, IndexError):
print('Database is corrupt')
Issue Number 2
"Not serializable" means there is something somewhere in your data structure that is not an accepted type and can't be converted to a JSON string.
json.dump() only processes certain types such as strings, dictionaries, and integers. That includes all of the objects that are nested within sub-dictionaries, sub-arrays, etc. See documentation for json.JSONEncoder for a complete list of allowable types.
I am trying to read a json which includes a number of tweets, but I get the following error.
OverflowError: int too large to convert
The script filters multiple json files to get specific tweets, and it crashes when reaching to a specific json.
The line that creates the error is this one :
df_temp = pd.read_json(path_or_buf=json_path, lines=True)
Here is the error in the cmd
Just store the user id as a String, and treat it like it is one (this is actually what you should do when dealing with this kind of ids). If you can't change the json input format, you can always parse it like a string before parsing it like a json object, and add the quotes to the id code, using for instance regexes: Regex in python.
I don't know with which library you are parsing the json, but maybe also implicit casting will work: either try the "getString" method on the number instead of the "getInt" method, or force python to treat the object like a string, with something like x = "" + json.getId()
Python is pretty loose on typing and may let you do it.
I have a json file for tweet data. The data that I want to look at is the text of the tweet. For some reason, some of the tweets are too long to put into the normal text part of the dictionary.
It seems like there is a dictionary within another dictionary and I can't figure out how to access it very well.
Basically, what I want in the end is one column of a data frame that will have all of the text from each individual tweet. Here is a link to a small sample of the data that contains a problem tweet.
Here is the code I have so far:
import json
import pandas as pd
tweets = []
#This writes the json file so that I can work with it. This part works correctly.
with open("filelocation.txt") as source
for line in source:
if line.strip():
tweets.append(json.loads(line))
print(len(tweets)
df = pd.DataFrame.from_dict(tweets)
df.info()
When looking at the info you can see that there will be a column called extended_tweet that only encompasses one of the two sample tweets. Within this column, there seems to be another dictionary with one of those keys being full_text.
I want to add another column to the dataframe that just has this information along with the normal text column when the full_text is null.
My first thought was to try and read that specific column of the dataframe as a dictionary again using:
d = pd.DataFrame.from_dict(tweets['extended_tweet]['full_text])
But this doesn't work. I don't really understand why that doesn't work as that is how I read the data the first time.
My guess is that I can't look at the specific names because I am going back to the list and it would have to read all or none. The error it gives me says "KeyError: 'full_text' "
I also tried using the recommendation provided by this website. But this gave me a None value no matter what.
Thanks in advance!
I tried to do what #Dan D. suggested, however, this still gave me errors. But it gave me the idea to try this:
tweet[0]['extended_tweet']['full_text']
This works and gives me the value that I am looking for. But I need to run through the whole thing. So I tried this:
df['full'] = [tweet[i]['extended_tweet']['full_text'] for i in range(len(tweet))
This gives me "Key Error: 'extended_tweet' "
Does it seem like I am on the right track?
I would suggest to flatten out the dictionaries like this:
tweet = json.loads(line)
tweet['full_text'] = tweet['extended_tweet']['full_text']
tweets.append(tweet)
I don't know if the answer suggested earlier works. I never got that successfully. But I did figure out something else that works well for me.
What I really needed was a way to display the full text of a tweet. I first loaded the tweets from the json with what I posted above. Then I noticed that in the data file, there is something called truncated. If this value is true, the tweet is cut short and the full tweet is placed within the
tweet[i]['extended_tweet]['full_text]
In order to access it, I used this:
tweet_list = []
for i in range(len(tweets)):
if tweets[i]['truncated'] == 'True':
tweet_list.append(tweets[i]['extended_tweet']['full_text']
else:
tweet_list.append(tweets[i]['text']
Then I can work with the data using the whol text from each tweet.
I am currently writing a program which uses the ComapaniesHouse API to return a json file containing information about a certain company.
I am able to retrieve the data easily using the following commands:
r = requests.get('https://api.companieshouse.gov.uk/company/COMPANY-NO/filing-history', auth=('API-KEY', ''))
data = r.json()
With that information I can do an awful lot, however I've ran into a problem which I was hoping you guys could possible help me with. What I aim to do is go through every nested entry in the json file and check if the value of certain keys matches certain criteria, if the values of 2 keys match a certain criteria then other code is executed.
One of the keys is the date of an entry, and I would like to ignore results that are older than a certain date, I have attempted to do this with the following:
date_threshold = datetime.date.today() - datetime.timedelta(days=30)``
for each in data["items"]:
date = ['date']
type = ['type']
if date < date_threshold and type is "RM01":
print("wwwwww")
In case it isn't clear, what I'm attempting to do (albeit very badly) is assign each of the entries to a variable, which then gets tested against certain criteria.
Although this doesn't work, python spits out a variable mismatch error:
TypeError: unorderable types: list() < datetime.date()
Which makes me think the date is being stored as a string, and so I can't compare it to the datetime value set earlier, but when I check the API documentation (https://developer.companieshouse.gov.uk/api/docs/company/company_number/filing-history/filingHistoryItem-resource.html), it says clearly that the 'date' entry is returned as a date type.
What am I doing wrong, its very clear that I'm extremely new to python given what I presume is the atrocity of my code, but in my head it seems to make at least a little sense. In case none of this clear, I basically want to go through all the entries in the json file, and the if the date and type match a certain description, then other code can be executed (in this case I have just used random text).
Any help is greatly appreciated! Let me know if you need anything cleared up.
:)
EDIT
After tweaking my code to the below:
for each in data["items"]:
date = each['date']
type = each['type']
if date is '2016-09-15' and type is "RM01":
print("wwwwww")
The code executes without any errors, but the words aren't printed, even though I know there is an entry in the json file with that exact date, and that exact type, any thoughts?
SOLUTION:
Thanks to everyone for helping me out, I had made a couple of very basic errors, the code that works as expected is below::
for each in data["items"]:
date = each['date']
typevariable = each['type']
if date == '2016-09-15' and typevariable == "RM01":
print("wwwwww")
This prints the word "wwwwww" 3 times, which is correct seeing as there are 3 entries in the JSON that fulfil those criteria.
You need to first convert your date variable to a datetime type using datetime.strptime()
You are comparing a list type variable date with datetime type variable date_threshold.
I don't have much experience in Python and I've ran into problem converting sql query data which is technically a list containing a JSON string into a Python dictionary. I'm querying the sqlite3 database which is returning a piece of data like this:
def get_answer(self, id):
self.__cur.execute('select answer from some_table where id= %s;' % id)
res_data = self.__cur.fetchall()
return res_data
The data is a single JSON format element which its simplified version looks like this:
[
{"ind": [ {"v": 101}, {"v": 102}, {"v": 65 }]},
{"ind": [ {"v": 33}, {"v": 102}, {"v": 65}]}
]
But when I try to convert the res_data to JSON, with code like this:
temp_json = simplejson.dumps(get_answer(trace_id))
it returns a string and when I get the len(temp_json) it returns the number of characters in res_data instead of the number of objects in res_data. However, if I use Visual Studio's JSON visualizer on what get_answer(trace_id) returns, it properly shows each of the objects res_data.
I also tried to convert the res_data to a dictionary with code like this:
dict_data = ast.literal_eval(Json_data)
or
dict_data = ast.literal_eval(Json_data[0])
and in both cases it throws a "malformed string" exception. I tried to write it to a file and read it back as a JSON but it didn't worked.
Before doing that I had the copy pasted the res_data manually and used:
with open(file_name) as json_file:
Json_data = simplejson.load(json_file)
and it worked like a charm. I've been experimenting different ways stated in SO and elsewhere but although the problem seems very straight forward, I still haven't found a solution so your help is highly appreciated.
OK, I finally found the solution:
states = json.loads(temp_json[0][0])
one confusing issue was that states = json.loads(temp_json[0]) was throwing the "Expected string or buffer" exception and temp_json was a list containing only one element, so I didn't think I will get anything from temp_json[0][0].
I hope it helps others too!
I think you are confusing the data formats. What you supposedly get back from your database wrapper seems to be a list of dictionaries (it is not SQL either - your question title is misleading). But I actually think that sqlite3 would give you a list of tuples.
JSON is a text format or more precisely the serialization of an object to a string. That's why json.dumps (= dump to string) results in a string and json.loads(= load string) would result in a python object, a dictionary or a list of dictionaries).
json.dumps does NOT mean "dump JSON to string". It is rather the .dumps method of the json module which takes a Python object (dict, list) and serializes it to JSON.
I will extend the answer if I understand what exactly you want to achieve but you get JSON with json.dumps(), JSON is a string format. In contrast simplejson.load() gives you a Python list or dict.
Did you try json.loads() just in case the return from your database is actually a string (which you could easily test).