Get value from a dictionary into a JSON file - python

I need to get all the bodyHtml and authorId values from the file that appears here: https://drive.google.com/file/d/10EGOAWsw3G5-ETUryYX7__JPOfNwUsL6/view?usp=sharing
I have tried several ways, but I always find the error of: TypeError: list indices must be integers, not str
I've tried several ways, this is my last code:
# -*- coding: utf-8 -*-
import json
import requests
import datetime
data = json.loads(open('file.json').read())
coments = data['headDocument']['content']['id']
for comment in data['headDocument']['content']['content']['bodyHtml']:
info = comment
print(info)
and get this error:
Traceback (most recent call last):
File "coments.py", line 16, in <module>
for comment in data['headDocument']['content']['content']['bodyHtml']:
TypeError: list indices must be integers, not str
Can anyone help with this problem?

Your headDocument['content'] is a list, so you should loop through it. Like this:
for item in data['headDocument']['content']:
print(item['content']['bodyHtml'])

Related

Biopython SeqIO: AttributeError: 'str' object has no attribute 'id'

I am trying to filter out sequences using SeqIO but I am getting this error.
Traceback (most recent call last):
File "paralog_warning_filter.py", line 61, in <module>
.
.
.
SeqIO.write(desired_proteins, "filtered.fasta","fasta")
AttributeError: 'str' object has no attribute 'id'
I checked other similar questions but still couldn't understand what is wrong with my script.
Here is the relevant part of the script I am trying:
fh=open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh,'fasta'):
name = s_record.id
seq = s_record.seq
for i in paralogs_in_all:
if name.endswith(i):
desired_proteins=seq
output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta")
output_file
fh.close()
I have a separate paralagos_in_all list and that is the ID source. When I try to print name it returns a proper string id names which are in this format >coronopifolia_tair_real-AT2G35040.1#10.
Can you help me understand my problem? Thanks in advance.
try and let us know (can't test your code ) :
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
......
.......
desired_proteins = []
fh=open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh,'fasta'):
name = s_record.id
seq = s_record.seq
for i in paralogs_in_all:
if name.endswith(i):
# desired_proteins=SeqRecord( Seq(seq), id=name) ### here seq is already a Seq object see below
desired_proteins.append(SeqRecord( seq, id=name, description="")) # description='' removes the <unknown description> that otherwise would be present
output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta") ## don't know how to have SeqIO.write to append to file instead of re-writing all of it
fh.close()

Problem with parsing Json from the Supreme-Website in Python

First of all this is my code:
import urllib.request, json
with urllib.request.urlopen("https://www.supremenewyork.com/mobile_stock.json") as url:
data = json.loads(url.read().decode())
print(type(data['Shoes']))
Im trying to parse the Json, that i get from the Supreme Website with Python. Im trying to filter out the 'Shoes' Array. Printing out the whole 'data' works, but if im trying to filter out the 'Shoes' Array, I get this Error:
Traceback (most recent call last):
File "C:\Users\-\PycharmProjects\-\test.py", line 7, in <module>
print(type(data['Shoes']))
KeyError: 'Shoes'
The Json is to long to store it here, but you will find it under:
https://www.supremenewyork.com/mobile_stock.json
print(data['products_and_categories']['Shoes'])
If you look at the json the Shoes are under products and categories:
data["products_and_categories"]["Shoes"]
this will get you the data you need

How to retrieve the last 5 characters from dictionary value after converting to string

I'm writing a Python script that logs into a server and pulls router data through an api. The pulled data is then used to create a dictionary with router names as the key and a telnet url as the key. Here is an example of the url that's collected.
telnet://192.168.1.113:32769
The telnet port is the last 5 characters of the url and I'm trying to pull that information only. I know with a string I can use (-5) but I'm getting the following error.
Traceback (most recent call last):
File "C:\Users\b\Documents\Atom Test1 Project\test_wip.py", line 41, in <module>
test_value2=test_value.split(-5)
TypeError: must be str or None, not int
[Finished in 1.812s]
I think this means I need to convert it tonto a string. I tried converting and then retrieving the last 5 charcters but it's not working. Here is my code.
from __future__ import unicode_literals, print_function
import eve
import json
import time
from netmiko import ConnectHandler, redispatch
#from resteve import eve
import json
address = '192.168.1.113'
instance = m11.Server(address)
instance.login('admin', 'password', '0')
users = instance.get_all_nodes()
payload = json.loads(users.content)
data = payload['data']
users = instance.get_all_nodes()
payload = json.loads(users.content)
data = payload['data']
for item in payload["data"].values():
result[item["name"]] = item["url"]
test_value=item['url']
print(test_value)
test_value.format(str)
test_value2=test_value.split(-5)
print(test_value2)
I'm new at this and still putting it all together so any help is greatly appreciated. Thanks.
To get last 5 chars use indexing test_value[-5:], because .split() expects a string and here it will try to split on the first argument

Python for items list, to indivdual string?

Just wondering if someone can be kind enough to tell me what I am doing wrong.
working on a bit of code, with use with my cherrypy project
import glob
import os
def get_html(apath):
print (apath)
fhtml = open(apath,'r')
data = fhtml.read()
return data
article_path = os.getcwd() +'/articles/*.html'
myArticles = [glob.glob(article_path)]
print len(myArticles)
for items in myArticles:
get_html(items)
Results in:
['/home/sd/Documents/projects/cherryweb/articles/Module 5-Exploitation Techniques And Exercise.pdf.html']
Traceback (most recent call last):
File "fntest.py", line 22, in <module>
get_html(items)
File "fntest.py", line 10, in get_html
fhtml = open(apath,'r')
TypeError: coercing to Unicode: need string or buffer, list found
I assume its because the filename I'm passing has the [' and '] from list on the string,
I could write a function to trim these parts off, but it that the only option, or am I doing something stupid.
Thanks
myArticles = [glob.glob(article_path)]
should be:
myArticles = glob.glob(article_path)
glob.glob returns a list, by adding [] around it you made it a list of list.
So, in this loop:
for items in myArticles:
get_html(items)
you actually passed the whole list to get_html and open raised that error.
Demo:
>>> open([])
Traceback (most recent call last):
File "<ipython-input-242-013dc85bc958>", line 1, in <module>
open([])
TypeError: coercing to Unicode: need string or buffer, list found

renderContents in beautifulsoup (python)

The code I'm trying to get working is:
h = str(heading)
# '<h1>Heading</h1>'
heading.renderContents()
I get this error:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
print h.renderContents()
AttributeError: 'str' object has no attribute 'renderContents'
Any ideas?
I have a string with html tags and i need to clean it if there is a different way of doing that please suggest it.
Your error message and your code sample don't line up. You say you're calling:
heading.renderContents()
But your error message says you're calling:
print h.renderContents()
Which suggests that perhaps you have a bug in your code, trying to call renderContents() on a string object that doesn't define that method.
In any case, it would help if you checked what type of object heading is to make sure it's really a BeautifulSoup instance. This works for me with BeautifulSoup 3.2.0:
from BeautifulSoup import BeautifulSoup
heading = BeautifulSoup('<h1>heading</h1>')
repr(heading)
# '<h1>heading</h1>'
print heading.renderContents()
# <h1>heading</h1>
print str(heading)
# '<h1>heading</h1>'
h = str(heading)
print h
# <h1>heading</h1>

Categories