Extracting multiple nested JSON keys at a time - python

How do I go about extracting more than one JSON key at a time given this script - the script cycles through a list of message ids and extracts the JSON response. I only want to extract certain keys from the response.
import urllib3
import json
import csv
from progressbar import ProgressBar
import time
pbar = ProgressBar()
base_url = 'https://api.pipedrive.com/v1/mailbox/mailMessages/'
fields = {"include_body": "1", "api_token": "token"}
json_arr = []
http = urllib3.PoolManager()
with open('ten.csv', newline='') as csvfile:
for x in pbar(csv.reader(csvfile, delimiter=' ', quotechar='|')):
r = http.request('GET', base_url + "".join(x), fields=fields)
mails = json.loads(r.data.decode('utf-8'))
json_arr.append(mails['data']['from'][0]['id'])
print(json_arr)
This works as intended. But I want to do the following.
json_arr.append(mails(['data']['from'][0]['id'],['data']['to'][0]['id'])
Which results in TypeError: list indices must be integers or slices, not str

Did you mean:
json_arr.append(mails['data']['from'][0]['id'])
json_arr.append(mails['data']['to'][0]['id'])

The answer already posted looks good but I'll share the one-liner equivalent, using extend() instead of append():
json_arr.extend([mails['data']['from'][0]['id'], mails['data']['to'][0]['id']])

Related

How to fix : TypeError: normalize() argument 2 must be str, not list

I'm making an api call that pulls the desired endpoints from ...url/articles.json and transforms it into a csv file. My problem here is that the ['labels_name'] endpoint is a string with multiple values.(an article might have multiple labels)
How can I pull multiple values of a string without getting this error . "File "articles_labels.py", line 40, in <module>
decode_3 = unicodedata.normalize('NFKD', article_label)
TypeError: normalize() argument 2 must be str, not list"?
import requests
import csv
import unicodedata
import getpass
url = 'https://......./articles.json'
user = ' '
pwd = ' '
csvfile = 'articles_labels.csv'
output_1 = []
output_1.append("id")
output_2 = []
output_2.append("title")
output_3 = []
output_3.append("label_names")
output_4 = []
output_4.append("link")
while url:
response = requests.get(url, auth=(user, pwd))
data = response.json()
for article in data['articles']:
article_id = article['id']
decode_1 = int(article_id)
output_1.append(decode_1)
for article in data['articles']:
title = article['title']
decode_2 = unicodedata.normalize('NFKD', title)
output_2.append(decode_2)
for article in data['articles']:
article_label = article['label_names']
decode_3 = unicodedata.normalize('NFKD', article_label)
output_3.append(decode_3)
for article in data['articles']:
article_url = article['html_url']
decode_3 = unicodedata.normalize('NFKD', article_url)
output_3.append(decode_3)
print(data['next_page'])
url = data['next_page']
print("Number of articles:")
print(len(output_1))
with open(csvfile, 'w') as fp:
writer = csv.writer(fp,dialect = 'excel')
writer.writerows([output_1])
writer.writerows([output_2])
writer.writerows([output_3])
writer.writerows([output_4])
My problem here is that the ['labels_name'] endpoint is a string with multiple values.(an article might have multiple labels) How can I pull multiple values of a string
It's a list not a string, so you don't have "a string with multiple values" you have a list of multiple strings, already, as-is.
The question is what you want to do with them, CSV certainly isn't going to handle that, so you must decide on a way to serialise a list of strings to a single string e.g. by joining them together (with some separator like space or comma) or by just picking the first one (beware to handle the case where there is none), … either way the issue is not really technical.
unicodedata.normalize takes a unicode string, and not a list as the error says. The correct way to use unicodedata.normalize will be (example taken from How does unicodedata.normalize(form, unistr) work?
from unicodedata import normalize
print(normalize('NFD', u'\u00C7'))
print(normalize('NFC', u'C\u0327'))
#Ç
#Ç
Hence you need to make sure that unicodedata.normalize('NFKD', title) has title as a unicode string

Process special JSON with keys as numbers

I want to extract data from file into a dictionary via json.loads. Example:
{725: 'pitcher, ewer',
726: "plane, carpenter's plane, woodworking plane"}
json.loads can't handle the keys as numbers
Some values are "" and others are '.
Any suggestions?
Code
import requests
url = url
r = requests.get(url)
response = r.text.replace('\n','')
response = re.sub(r':(\d+):*', r'"\1"', response)
The file you supplied seems like a valid Python dict, so I suggest an alternative approach, with literal_eval.
from ast import literal_eval
data = literal_eval(r.text)
print(data[726])
Output: plane, carpenter's plane, woodworking plane
If you still like json, then you can try replacing the numbers with strings using regex.
import re
s = re.sub(r"(?m)^(\W*)(\d+)\b", r'\1"\2"', r.text)
data = json.loads(s)

Simple forvalues loop in python?

is there a simple way in Python to loop over a simple list of numbers?
I want to scrape some data from different URLs that only differ in 3 numbers?
I'm quite new to python and couldn't figure out an easy way to do it.
Thanks a lot!
Here's my code:
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.example.com/3322")
bsObj = BeautifulSoup(html)
table = bsObj.findAll("table",{"class":"MainContent"})[0]
rows=table.findAll("td")
csvFile = open("/Users/Max/Desktop/file1.csv", 'wt')
writer = csv.writer(csvFile)
try:
for row in rows:
csvRow=[]
for cell in row.findAll(['tr', 'td']):
csvRow.append(cell.get_text())
writer.writerow(csvRow)
finally:
csvFile.close()
In Stata this would be like:
foreach i of 13 34 55 67{
html = urlopen("http://www.example.com/`i'")
....
}
Thanks a lot!
Max
I've broken your original code into functions simply to make clearer what I think is the answer to your question: use a simple loop, and .format() to construct urls and filenames.
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
def scrape_url(url):
html = urlopen(url)
bsObj = BeautifulSoup(html)
table = bsObj.findAll("table",{"class":"MainContent"})[0]
rows=table.findAll("td")
return rows
def write_csv_data(path, rows):
csvFile = open(path, 'wt')
writer = csv.writer(csvFile)
try:
for row in rows:
csvRow=[]
for cell in row.findAll(['tr', 'td']):
csvRow.append(cell.get_text())
writer.writerow(csvRow)
finally:
csvFile.close()
for i in (13, 34, 55, 67):
url = "http://www.example.com:3322/{}".format(i)
csv_path = "/Users/MaximilianMandl/Desktop/file-{}.csv".format(i)
rows = scrape_url(url)
write_csv_data(csv_path, rows)
i would use set.intersection() for that:
mylist=[1,16,8,32,7,5]
fieldmatch=[5,7,16]
intersection = list(set(mylist).intersection(fieldmatch))
I'm not familiar with stata, but. It looks like the python equivalent might be simply:
import request
for i in [13 34 55 67]:
response = request("http://www.example.com/{}".format(i))
....
The simplest way to do this it to apply the filter inside the loop:
mylist=[1,16,8,32,7,5]
for myitem in mylist:
if myitem in (5,7,16):
print myitem # or print(myitem)
This may not, however, be the most elegant way to do it. If you wanted to store a new list of the matching results, you can use a list comprehension:
mylist=[1,16,8,32,7,5]
fieldmatch=[5,7,16]
filteredlist=[ x for x in mylist if x in fieldmatch ]
You can then take filteredlist which contains only the items in mylist that match fieldmatch (in other words your original list filtered by your criteria) and iterate over it like any other list:
for myitem in filteredlist:
# Perform whatever process you want to each item here
do_something_with(myitem)
Hope this helps.

About file I/O in python

I want to read a txt file and store it as a list of string. This is a way that I come up with myself. It looks really clumsy. Is there any better way to do this? Thanks.
import re
import urllib2
import re
import numpy as np
url=('http://quant-econ.net/_downloads/graph1.txt')
response= urllib2.urlopen(url)
txt= response.read()
f=open('graph1.txt','w')
f.write(txt)
f.close()
f=open('graph1.txt','r')
nodes=f.readlines()
I tried the solutions provided below, but they all actually return something different from my previous code.
This is string produced by split()
'node0, node1 0.04, node8 11.11, node14 72.21'
This is what my code produce
'node0, node1 0.04, node8 11.11, node14 72.21\n'
The problem is without the'\n' when I try process the string list it will confront some index error.
" row = index[0] IndexError: list index out of range "
for node in nodes:
index = re.findall('(?<=node)\w+',node)
index = map(int,index)
row = index[0]
del index[0]
According to the documentation, response is already a file-like object: you should be able to do response.readlines().
For those problems where you do need to create an intermediate file like this, though, you want to use io.StringIO
Look at split. So:
nodes = response.read().split("\n")
EDIT: Alternatively if you want to avoid \r\n newlines, use splitlines.
nodes = response.read().splitlines()
Try:
url=('http://quant-econ.net/_downloads/graph1.txt')
response= urllib2.urlopen(url)
txt= response.read()
with open('graph1.txt','w') as f:
f.write(txt)
nodes=txt.split("\n")
If you don't want the file, this should work:
url=('http://quant-econ.net/_downloads/graph1.txt')
response= urllib2.urlopen(url)
txt= response.read()
nodes=txt.split("\n")

Python - Data Splitting and Extraction

I am using the Twitch API and i am having issues in understanding how to extract data from it.
I call the API and this is the sort of response i get:
"name":"user1", "game":"game1","name":"user2", "game":"game2"
I know i will need to use some .split()'s but i cannot work out how as each time i try i get a blank output.
The data i need is the user1, game1, user2, game2
This data is repeated several times and i cannot find out how to extract this data from the mass of other data given.
Any links or advice will be grateful, i cannot find any reference to large data extraction like this
EDIT
After being advised it was json data i edited the code to parse it appropriately. But i keep getting the error: AttributeError: 'unicode' object has no attribute 'get'
Here is the code:
import urllib2
import json
url = "https://api.twitch.tv/kraken/channels/'Mychannel'/follows/"
if __name__ == "__main__":
req = urllib2.Request(url)
opener = urllib2.build_opener()
f = opener.open(req)
json = json.load (f)
for item in json:
print item.get('name')
Any suggestions to why this error is occurring?
The response is json data; use the json module to parse it.
assuming you are receiving a string back, such that:
>>> a = '"name":"user1", "game":"game1","name":"user2", "game":"game2"'
>>> a
'"name":"user1", "game":"game1","name":"user2", "game":"game2"'
You can get your first split by doing a split by the ","
>>> mlist = a.split(",")
>>> mlist
['"name":"user1"', ' "game":"game1"', '"name":"user2"', ' "game":"game2"']
Now you can access the data of each element by looping:
>>> for e in mlist:
print("Data:", e.split(":")[1])
('Data:', '"user1"')
('Data:', '"game1"')
('Data:', '"user2"')
('Data:', '"game2"')

Categories