i'm learning and would appreciate any help in this code.
The issue is trying to print the values in the data that are contained in one line of the JSON using Python.
import json
import requests
data = json.loads(response.text)
print(len(data)) #showing correct value
#where i'm going wrong below obviously this will print the first value then the second as it's indexed. Q how do I print all values when using seperate print statements when the total indexed value is unknown?
for item in data:
print(data[0]['full_name'])
print(data[1]['full_name'])
I tried without the index value this gave me the first value multiple times depending on the length.
I expect to be able to access from the JSON file each indexed value separately even though they are named the same thing "full_name" for example.
import json
import requests
data = json.loads(response.text)
print(len(data)) #showing correct value
for item in data:
print(item['full_name'])
#the below code will throw error.. because python index starts with 0
print(data[0]['full_name'])
print(data[1]['full_name'])
hope this help
Presuming data is a list of dictionaries, where each dictionary contains a full_name key:
for item in data:
print(item['full_name'])
This code sample from your post makes no sense:
for item in data:
print(data[0]['full_name'])
print(data[1]['full_name'])
Firstly it's a syntax error because there is nothing indented underneath the loop.
Secondly it's a logic error, because the loop variable is item but then you never refer to that variable.
Related
I'm trying to pull data from an API for a fantasy football project I'm working on. You can pull data on various players using the url: 'https://fantasy.premierleague.com/api/element-summary/i/' where i is a number that relates to a player and runs from 1 to 400 or so.
I wrote code that pulls this data for a specific player and stores it as a dataframe for future analysis using the code:
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize
r = requests.get('https://fantasy.premierleague.com/api/element-summary/1/')
r_json = r.json()
r_df = pd.DataFrame(r_json['history'])
r_df.head()
And this works great. The issue is it's only for 1 player and there are lots of them, so what I want is a DataFrame that contains this data for all of the players. I figured I could use a for loop for this but I can't get it to work. I'm trying the code:
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize
for i in range(5):
r = requests.get('https://fantasy.premierleague.com/api/element-summary/{}/'.format(i))
r_json = r.json()
r_df= pd.DataFrame(r_json['history'])
r_df.head()
Where I've put the logic in a for loop, but I get the error:
KeyError: 'history'
When I try to run this. Why does it not like the line r_df= pd.DataFrame(r_json['history']) when it's in a for loop when it's OK outside of one?
Any help appreciated!
Thanks!
This is because your loop is trying to query 'https://fantasy.premierleague.com/api/element-summary/0/' on the first iteration, which doesn't exist. (Or, rather, gives you the JSON {"detail": "Not found."}, which doesn't have a "history" key.)
The built-in range function generates integers starting from zero by default. Try changing the loop range to range(1, 5) instead. (Also note that the end of the range is exclusive, so this will give you integers from 1 to 4.)
A KeyError indicates that the dictionary does not have this specific key.
Your specific problem here is that the range() function is generating the following range: [0, 1, 2, 3, 4] because if you don't specify an initial value, it will take 0 as an initial value and stop at one short of the final value.
And so https://fantasy.premierleague.com/api/element-summary/0/ doesn't return anything.
To fix this change range(5) to range(1, 6).
Note: the syntax of the range() function is: range(starting_point, ending_point, step)
I am making a price statistics project with Python, and I have a problem with scraping data from an API. The API is https://www.rolimons.com/api/activity
I want to get prices from the API, which are the last 2 values from one block.
For example, from [1588247532, 0, "1028606", 464, 465] I would need 464 and 465 only. Also I want to do this for all tables.
How can I do that? Here is the code I have so far:
import requests
import json
r = requests.get('https://www.rolimons.com/api/activity')
content = json.loads(r.content.decode())
for key, value in content.items():
print(key)
Give this a go:
for value in content['activities']:
print(value[-2:])
It iterates through activities and prints the last two items of each value.
Or you can collect the prices in a separate list to use later on like so:
prices=[value[-2:] for value in content['activities']]
I recommend using print statements whenever you are not sure of how or why. See below, it might help give a visual of what is going on.
import requests
import json
r = requests.get('https://www.rolimons.com/api/activity')
content = json.loads(r.content.decode())
for key, value in content.items():
print("Key: ", key)
print("content[key]: ", content[key])
for array in content["activities"]:
print("array: ", array)
print("array[len(array)-1]:", array[len(array)-1])
print("array[len(array)-2]:", array[len(array)-2])
I have a json file for tweet data. The data that I want to look at is the text of the tweet. For some reason, some of the tweets are too long to put into the normal text part of the dictionary.
It seems like there is a dictionary within another dictionary and I can't figure out how to access it very well.
Basically, what I want in the end is one column of a data frame that will have all of the text from each individual tweet. Here is a link to a small sample of the data that contains a problem tweet.
Here is the code I have so far:
import json
import pandas as pd
tweets = []
#This writes the json file so that I can work with it. This part works correctly.
with open("filelocation.txt") as source
for line in source:
if line.strip():
tweets.append(json.loads(line))
print(len(tweets)
df = pd.DataFrame.from_dict(tweets)
df.info()
When looking at the info you can see that there will be a column called extended_tweet that only encompasses one of the two sample tweets. Within this column, there seems to be another dictionary with one of those keys being full_text.
I want to add another column to the dataframe that just has this information along with the normal text column when the full_text is null.
My first thought was to try and read that specific column of the dataframe as a dictionary again using:
d = pd.DataFrame.from_dict(tweets['extended_tweet]['full_text])
But this doesn't work. I don't really understand why that doesn't work as that is how I read the data the first time.
My guess is that I can't look at the specific names because I am going back to the list and it would have to read all or none. The error it gives me says "KeyError: 'full_text' "
I also tried using the recommendation provided by this website. But this gave me a None value no matter what.
Thanks in advance!
I tried to do what #Dan D. suggested, however, this still gave me errors. But it gave me the idea to try this:
tweet[0]['extended_tweet']['full_text']
This works and gives me the value that I am looking for. But I need to run through the whole thing. So I tried this:
df['full'] = [tweet[i]['extended_tweet']['full_text'] for i in range(len(tweet))
This gives me "Key Error: 'extended_tweet' "
Does it seem like I am on the right track?
I would suggest to flatten out the dictionaries like this:
tweet = json.loads(line)
tweet['full_text'] = tweet['extended_tweet']['full_text']
tweets.append(tweet)
I don't know if the answer suggested earlier works. I never got that successfully. But I did figure out something else that works well for me.
What I really needed was a way to display the full text of a tweet. I first loaded the tweets from the json with what I posted above. Then I noticed that in the data file, there is something called truncated. If this value is true, the tweet is cut short and the full tweet is placed within the
tweet[i]['extended_tweet]['full_text]
In order to access it, I used this:
tweet_list = []
for i in range(len(tweets)):
if tweets[i]['truncated'] == 'True':
tweet_list.append(tweets[i]['extended_tweet']['full_text']
else:
tweet_list.append(tweets[i]['text']
Then I can work with the data using the whol text from each tweet.
this script is meant to parse Bloomberg finance to find the GBP value during the day, this following script does that however when it returns you get this:
{'dateTime': '2017-01-17T22:00:00Z', 'value': 1.6406}
I don't want the dateTime, or the value text there. I don't know how to get rid of it. and when I try it gives me errors like this: list index out of range.
any answers will be greatly appreciated. here is the script (in python3):
import urllib.request
import json
htmltext = urllib.request.urlopen('https://www.bloomberg.com/markets/api/bulk- time-series/price/GBPAUD%3ACUR?timeFrame=1_DAY').read().decode('utf8')
data = json.loads(htmltext)
datapoints = data[1]['price']
print(datapoints)
This should work for you.
print (data[0]['price'][-1]['value'])
EDIT: To get all the values,
for data_row in data[0]['price']:
print data_row['value']
EXPLANATION: data[0] gets the first and only element of the list, which is a dict. ['price'] gets the list corresponding to the price key. [-1] gets the last element of the list, which is presumably the data you'll be looking for as it's the latest data point.
Finally, ['value'] gets the value of the currency conversion from the dict we obtained earlier.
I have created the following Python code that reads a method from a webservice:
def GetWeatherParameters():
""""""
client = Client('www.address.asmx?wsdl')
#WebServiceClient.GetWeatherParameters()
return client.service.GetWeatherParameters()
It works fine and I get the data returned and can print it, however the data returned contains mutltiple columns and this code just prints out everything at once.
Does anybody know how I can extract the returned data column by column?
It all depends on the returned data - a handy way to display it nicely is to use pprint:
from pprint import pprint
pprint(your_data)
That'll format it nicely so it's easier to see the structure. Then if it's a list or similar, to get the first row you can do your_data[0] to get the first one, or loop, to print it row by row:
for row in your_data:
print row
print row[0] # could be the first column...
And go from there...