My objective is to perform a search of Mastodon statuses and return the content (i.e. the text) of any status that matches. The docs suggest I can do this. Can I actually do this?
My python code is
import requests
url = 'https://<server>/api/v2/search'
auth = {'Authorization': 'Bearer <token>'}
params = {'q': '<keyword>', 'type':'statuses'}
response = requests.get(url, data=params, headers=auth)
First issue: I get the following response (no matter what keyword I choose and even when my keyword clearly appears in a recent status):
{'accounts': [], 'statuses': [], 'hashtags': []}
Second issue: If I don't restrict the search to statuses, I get results! But they are not what I expect. There's no content key :( While the example results do contain a content key, which was my goal.
{'accounts': [], 'statuses': [], 'hashtags': [{'name': 'hiring', 'url': 'https://data-folks.masto.host/tags/hiring', 'history': [{'day': '1675814400', 'accounts': '5', 'uses': '5'}, {'day': '1675728000', 'accounts': '10', 'uses': '13'}, {'day': '1675641600', 'accounts': '7', 'uses': '7'}, {'day': '1675555200', 'accounts': '6', 'uses': '6'}, {'day': '1675468800', 'accounts': '3', 'uses': '3'}, {'day': '1675382400', 'accounts': '5', 'uses': '6'}, {'day': '1675296000', 'accounts': '9', 'uses': '9'}], 'following': False}]}
Thanks for any help! I'm a beginner and truly appreciate it.
This question was answered for me by the user trwnh over on Mastodon's github discussions, so I am copying it here:
"Full-text search across all statuses is not supported. If your server
has configured the optional Elasticsearch backend, then you can
perform limited full-text search against your own posts, favourites,
and bookmarks -- basically, only posts relevant to you. To obtain
content based on a keyword, you must use hashtags."
Related
im using an API to get some responses from surveygizmo. It works, but it is changing the question to [question(1)], [question(2)]...
import surveygizmo as sg
client = sg.SurveyGizmo(
api_version='v4',
# example
api_token = "api_token",
api_token_secret = "api_token_secret."
)
survey_id = "survey_id"
responses = client.api.surveyresponse.list(survey_id)
pages = responses['total_pages']
data = []
responses
I got the following answer:
{'result_ok': True,
'total_count': 5,
'page': 1,
'total_pages': 1,
'results_per_page': 50,
'data': [{'id': '1',
'contact_id': '',
'status': 'Complete',
'is_test_data': '0',
'datesubmitted': '2020-01-22 16:07:30',
'SessionID': '1579727226_5e28b97a9ff992.53369554',
'Language': 'Portuguese (Brazil)',
'datestarted': '2020-01-22 16:07:30',
'iLinkID': '9342723',
'sResponseComment': '',
'responseID': '1',
'[question(2)]': 'Sim',
'[question(3)]': 'Assunto',
'[question(4)]': '8',
...
I need to show the question as it was made. How it is possible to do that?
I found the answer. api_version='v4' has some limitations, the question text came with api_version='v5'.
I'm very new to python and please treat me as same. When i tried to convert the XML content into List of Dictionaries I'm getting output but not as expected and tried a lot playing around.
XML Content
<project>
<data>
<row>
<respondent>m0wxo5f6w42h3fot34m7s6xij</respondent>
<timestamp>10-06-16 11:30</timestamp>
<product>1</product>
<replica>1</replica>
<seqnr>1</seqnr>
<session>1</session>
<column>
<question>Q1</question>
<answer>a1</answer>
</column>
<column>
<question>Q2</question>
<answer>a2</answer>
</column>
</row>
<row>
<respondent>w42h3fot34m7s6x</respondent>
<timestamp>10-06-16 11:30</timestamp>
<product>1</product>
<replica>1</replica>
<seqnr>1</seqnr>
<session>1</session>
<column>
<question>Q3</question>
<answer>a3</answer>
</column>
<column>
<question>Q4</question>
<answer>a4</answer>
</column>
<column>
<question>Q5</question>
<answer>a5</answer>
</column>
</row>
</data>
</project>
Code i have used:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file.xml) # import xml from
root = tree.getroot()
data_list = []
for item in root.find('./data'): # find all projects node
data = {} # dictionary to store content of each projects
for child in item:
data[child.tag] = child.text # add item to dictionary
#-----------------for loop with subchild is not working as expcted in my case
for subchild in child:
data[subchild.tag] = subchild.text
data_list.append(data)
print(data_list)
headers = {k for d in data_list for k in d.keys()} # headers for csv
with open(csv_file,'w') as f:
writer = csv.DictWriter(f, fieldnames = headers) # creating a DictWriter object
writer.writeheader() # write headers to csv
writer.writerows(data_list)
Output for the data_list is getting the last info of question to the list of dictionaries.
i guess the issue is at subchild forloop but im not understanding how to append the list with dictionaries.
[{
'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'column': '\n ,
'question': 'Q2',
'answer': 'a2'
},
{
'respondent': 'w42h3fot34m7s6x',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'column': '\n ,
'question': 'Q2',
'answer': 'a2'
}.......
]
I expect the below output, tried a lot but unable to loop over the column tag.
[{
'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q1',
'answer': 'a1'
},
{
'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q2',
'answer': 'a2'
},
{
'respondent': 'w42h3fot34m7s6x',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q3',
'answer': 'a3'
},
{
'respondent': 'w42h3fot34m7s6x',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q4',
'answer': 'a4'
},
{
'respondent': 'w42h3fot34m7s6x',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q5',
'answer': 'a5'
}
]
I have refereed so many stack overflow questions on xml tree but still didn't helped me.
any help/suggestion is appreciated.
I had a problem understanding what this code is supposed to do because it uses abstract variable names like item, child, subchild and this makes it hard to reason about the code. I'm not as clever as that, so I renamed the variables to row, tag, and column to make it easier for me to see what the code is doing. (In my book, even row and column are a bit abstract, but I suppose the opacity of the XML input is hardly your fault.)
You have 2 rows but you want 5 dictionaries, because you have 5 <column> tags and you want each <column>'s data in a separate dictionary. But you want the other tags in the <row> to be repeated along with each <column>'s data.
That means you need to build a dictionary for every <row>, then, for each <column>, add that column's data to the dictionary, then output it before going on to the next column.
This code makes the simplifying assumption that all of your <columns>s have the same structure, with exactly one <question> and exactly one <answer> and nothing else. If this assumption does not hold then a <column> may get reported with stale data it inherited from the previous <column> in the same row. It will also produce no output at all for any <row> that does not have at least one <column>.
The code has to loop through the tags twice, once for the non-<column>s and once for the <column>s. Otherwise it can't be sure it has seen all the non-<column> tags before it starts outputting the <column>s.
There are other (no doubt more elegant) ways to do this, but I kept the code structure as close to your original as I could, other than making the variable names less opaque.
for row in root.find('./data'): # find all projects node
data = {} # dictionary to store content of each projects
for tag in row:
if tag.tag != "column":
data[tag.tag] = tag.text # add row to dictionary
# Now the dictionary data is built for the row level
for tag in row:
if tag.tag == "column":
for column in tag:
data[column.tag] = column.text
# Now we have added the column level data for one column tag
data_list.append(data.copy())
Output is as below. The key order of the dicts isn't preserved because I used pprint.pprint for convenience.
[{'answer': 'a1',
'product': '1',
'question': 'Q1',
'replica': '1',
'respondent': 'm0wxo5f6w42h3fot34m7s6xij',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a2',
'product': '1',
'question': 'Q2',
'replica': '1',
'respondent': 'm0wxo5f6w42h3fot34m7s6xij',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a3',
'product': '1',
'question': 'Q3',
'replica': '1',
'respondent': 'w42h3fot34m7s6x',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a4',
'product': '1',
'question': 'Q4',
'replica': '1',
'respondent': 'w42h3fot34m7s6x',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a5',
'product': '1',
'question': 'Q5',
'replica': '1',
'respondent': 'w42h3fot34m7s6x',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'}]
I'm trying to create a dictionary of specific key values from a list of dictionaries. I believe my code is not flattening out the dictionaries when i put in chunkdata.extend(pythondict[0][1][2], chunkdata will return with the whole 1st 2nd and 3rd dictionaries where i want something like the "name" key pair for all the dictionaries that return in the response.
chunkdata = []
for chunk in chunklist:
url3 = "some URL"
headers = {'accept': 'application/json',
response = requests.request("GET", url3, headers=headers)
time.sleep(5)
print(response.text)
pythondict = json.loads(response.text)
print(pythondict)
chunkdata.extend(pythondict['name']['age']['date']
pythondict output
[{'data': {'name': 'jon', 'age': '30', 'date': '2020-01-05', 'time': '1', 'color': 'blue'}, {'data': {'name': 'phil', 'age': '33', 'date': '2020-01-05', 'time': '1', 'color': 'blue'}, {'data': {'name': 'ted', 'age': '25', 'date': '2020-01-05', 'time':'1', 'color': 'blue'}]
Traceback (most recent call last):
File line 84, in <module>
chunkdata.extend(pythondict['name']['age']['date']
TypeError: list indices must be integers or slices, not str
Use requests.json() for parsing. It is more reliable and accurate.
Note: Response header MUST contain Content-Type: application/json in the header in order for .json() method to work
I figured out that the json format you get is not right here. I was not able to make out the necessity of the 'data:' prior to each element.
It would be better to modify it in the following form:
python_dict=[{'name': 'jon', 'age': '30', 'date': '2020-01-05', 'time': '1', 'color': 'blue'}, {'name': 'phil', 'age': '33', 'date': '2020-01-05', 'time': '1', 'color': 'blue'}, {'name': 'ted', 'age': '25', 'date': '2020-01-05', 'time':'1', 'color': 'blue'}]
Modify the relevant part of the code as follows:
chunkdata=[]
for x in range(len(python_dict)):
temp_list=[python_dict[x]['name'],python_dict[x]['age'],python_dict[x]['date'],python_dict[x]['time'],python_dict[x]['color']]
chunkdata.append(temp_list)
print(chunkdata)
chunkdata will be a list of lists that you can keep appending into. The output for chunkdata is as follows:
[['jon', '30', '2020-01-05', '1', 'blue'], ['phil', '33',
'2020-01-05', '1', 'blue'], ['ted', '25', '2020-01-05', '1', 'blue']]
As background, I am scraping a webpage in Python and using BeautifulSoup.
Some of the information that I need to access is a little box about user profiles that pops up when the mouse hovers over the user's profile picture. The problem, is that this information is not available in the html, instead, I get the following:
""div class="username mo"
span class="expand_inline scrname mbrName_1586A02614A388AEE215B4A3139A2C18" onclick="ta.trackEventOnPage('Reviews', 'show_reviewer_info_window', 'user_name_name_click')">Sapphire-Ed
""
(I have deleted some of the >s so that the html will show up in the question, sorry!)
Can anyone tell me how to do this? Thank you for the help!!
Here is the webpage if that is helpful:
view-source:http://www.tripadvisor.com/Attraction_Review-g143010-d108269-Reviews-Cadillac_Mountain-Acadia_National_Park_Mount_Desert_Island_Maine.html
The information I am trying to access is the review distribution.
Below is the complete working code that outputs a dictionary where the keys are usernames and the values are review distributions. To understand how the code works, here are the key things to take into an account:
the information in the overlay appearing on the mouse over is loaded dynamically with a HTTP GET request with a number of user-specific parameters - the most important are uid and src
the uid and src values can be extracted with a regular expression from the id attribute for every user profile element
the response to this GET request is HTML which you need to parse with BeautifulSoup also
you should maintain the web-scraping session with requests.Session
The code:
import re
from pprint import pprint
import requests
from bs4 import BeautifulSoup
data = {}
# this pattern would help us to extract uid and src needed to make a GET request
pattern = re.compile(r"UID_(\w+)-SRC_(\w+)")
# making a web-scraping session
with requests.Session() as session:
response = requests.get("http://www.tripadvisor.com/Attraction_Review-g143010-d108269-Reviews-Cadillac_Mountain-Acadia_National_Park_Mount_Desert_Island_Maine.html")
soup = BeautifulSoup(response.content, "lxml")
# iterating over usernames on the page
for member in soup.select("div.member_info div.memberOverlayLink"):
# extracting uid and src from the `id` attribute
match = pattern.search(member['id'])
if match:
username = member.find("div", class_="username").text.strip()
uid, src = match.groups()
# making a GET request for the overlay information
response = session.get("http://www.tripadvisor.com/MemberOverlay", params={
"uid": uid,
"src": src,
"c": "",
"fus": "false",
"partner": "false",
"LsoId": ""
})
# getting the grades dictionary
soup_overlay = BeautifulSoup(response.content, "lxml")
data[username] = {grade_type: soup_overlay.find("span", text=grade_type).find_next_sibling("span", class_="numbersText").text.strip(" ()")
for grade_type in ["Excellent", "Very good", "Average", "Poor", "Terrible"]}
pprint(data)
Prints:
{'Anna T': {'Average': '2',
'Excellent': '0',
'Poor': '0',
'Terrible': '0',
'Very good': '2'},
'Arlyss T': {'Average': '0',
'Excellent': '6',
'Poor': '0',
'Terrible': '0',
'Very good': '1'},
'Bf B': {'Average': '1',
'Excellent': '22',
'Poor': '0',
'Terrible': '0',
'Very good': '17'},
'Charmingnl': {'Average': '15',
'Excellent': '109',
'Poor': '4',
'Terrible': '4',
'Very good': '45'},
'Jackie M': {'Average': '2',
'Excellent': '10',
'Poor': '0',
'Terrible': '0',
'Very good': '4'},
'Jonathan K': {'Average': '69',
'Excellent': '90',
'Poor': '6',
'Terrible': '0',
'Very good': '154'},
'Sapphire-Ed': {'Average': '8',
'Excellent': '47',
'Poor': '2',
'Terrible': '0',
'Very good': '49'},
'TundraJayco': {'Average': '14',
'Excellent': '59',
'Poor': '0',
'Terrible': '1',
'Very good': '49'},
'Versrii': {'Average': '2',
'Excellent': '8',
'Poor': '0',
'Terrible': '0',
'Very good': '10'},
'tripavisor83': {'Average': '12',
'Excellent': '9',
'Poor': '1',
'Terrible': '0',
'Very good': '20'}}
I'm using python to fetch issues from Jira with xml-rpc. It works well except it is missing the 'Resolution' field in the returned dictionary. For example 'Fixed', or 'WontFix' etc.
This is how I get the issue from Jira:
import xmlrpclib
s = xmlrpclib.ServerProxy('http://myjira.com/rpc/xmlrpc')
auth = s.jira1.login('user', 'pass')
issue = s.jira1.getIssue(auth, 'PROJ-28')
print issue.keys()
And this is the list of fields that I get back:
['status', 'project', 'attachmentNames', 'votes', 'updated',
'components', 'reporter', 'customFieldValues', 'created',
'fixVersions', 'summary', 'priority', 'assignee', 'key',
'affectsVersions', 'type', 'id', 'description']
The full content is:
{'affectsVersions': [{'archived': 'false',
'id': '11314',
'name': 'v3.09',
'released': 'false',
'sequence': '7'}],
'assignee': 'myuser',
'attachmentNames': '2011-08-17_attach.tar.gz',
'components': [],
'created': '2011-06-14 12:33:54.0',
'customFieldValues': [{'customfieldId': 'customfield_10040', 'values': ''},
{'customfieldId': 'customfield_10010',
'values': 'Normal'}],
'description': "Blah blah...\r\n",
'fixVersions': [],
'id': '28322',
'key': 'PROJ-28',
'priority': '3',
'project': 'PROJ',
'reporter': 'myuser',
'status': '1',
'summary': 'blah blah...',
'type': '1',
'updated': '2011-08-18 15:41:04.0',
'votes': '0'}
When I do:
resolutions = s.jira1.getResolutions(auth )
pprint.pprint(resolutions)
I get:
[{'description': 'A fix for this issue is checked into the tree and tested.',
'id': '1',
'name': 'Fixed'},
{'description': 'The problem described is an issue which will never be fixed.',
'id': '2',
'name': "Won't Fix"},
{'description': 'The problem is a duplicate of an existing issue.',
'id': '3',
'name': 'Duplicate'},
{'description': 'The problem is not completely described.',
'id': '4',
'name': 'Incomplete'},
{'description': 'All attempts at reproducing this issue failed, or not enough information was available to reproduce the issue. Reading the code produces no clues as to why this behavior would occur. If more information appears later, please reopen the issue.',
'id': '5',
'name': 'Cannot Reproduce'},
{'description': 'Code is checked in, and is, er, ready for build.',
'id': '6',
'name': 'Ready For Build'},
{'description': 'Invalid bug', 'id': '7', 'name': 'Invalid'}]
The Jira version is v4.1.1#522 and I using Python 2.7.
Any ideas why I don't get a field called 'resolution'?
Thanks!
The answer is that the getIssue method in JiraXmlRpcService.java calls makeIssueStruct with a RemoteIssue object. The RemoteIssue object contains the Resolution field, but makeIssueStruct copies only values that are set. So if Resolution is not set, it won't appear in the Hashtable there.