Python - Data Splitting and Extraction

Python - Data Splitting and Extraction - python

I am using the Twitch API and i am having issues in understanding how to extract data from it.
I call the API and this is the sort of response i get:
"name":"user1", "game":"game1","name":"user2", "game":"game2"
I know i will need to use some .split()'s but i cannot work out how as each time i try i get a blank output.
The data i need is the user1, game1, user2, game2
This data is repeated several times and i cannot find out how to extract this data from the mass of other data given.
Any links or advice will be grateful, i cannot find any reference to large data extraction like this
EDIT
After being advised it was json data i edited the code to parse it appropriately. But i keep getting the error: AttributeError: 'unicode' object has no attribute 'get'
Here is the code:
import urllib2
import json
url = "https://api.twitch.tv/kraken/channels/'Mychannel'/follows/"
if __name__ == "__main__":
req = urllib2.Request(url)
opener = urllib2.build_opener()
f = opener.open(req)
json = json.load (f)
for item in json:
print item.get('name')
Any suggestions to why this error is occurring?

The response is json data; use the json module to parse it.

assuming you are receiving a string back, such that:
>>> a = '"name":"user1", "game":"game1","name":"user2", "game":"game2"'
>>> a
'"name":"user1", "game":"game1","name":"user2", "game":"game2"'
You can get your first split by doing a split by the ","
>>> mlist = a.split(",")
>>> mlist
['"name":"user1"', ' "game":"game1"', '"name":"user2"', ' "game":"game2"']
Now you can access the data of each element by looping:
>>> for e in mlist:
print("Data:", e.split(":")[1])
('Data:', '"user1"')
('Data:', '"game1"')
('Data:', '"user2"')
('Data:', '"game2"')

Related

How do I import an external text file from a website and use a list from inside it

I'm having trouble trying to make this work
import requests
import random
response = requests.get("https://cdn.discordapp.com/attachments/480168592164257792/557872162661335040/aaaaa.txt")
data = response.text
for line in data:
print(line)
I am trying to pull a txt file from the internet, and be able to use the list inside of the text file.
Right now all it does is assume each letter is a different string(?)

response.text seems to be characters, if you loop over them you get each string. (Read about how Python handles strings).
In this case Python doesn't know what a "line" is. So split the data with newlines and try again:
import requests
import random
response = requests.get("https://cdn.discordapp.com/attachments/480168592164257792/557872162661335040/aaaaa.txt")
data = response.text
for line in data.split("\n"):
print(line)

The attribute response.text is a string, so iterating over it will give you individual chars. You can split the string by spaces (or maybe be newlines) to get what you need (I also added a few print statements to show the steps):
import requests
response = requests.get(
"https://cdn.discordapp.com/attachments/480168592164257792/557872162661335040/aaaaa.txt")
print('response.text type:', type(response.text))
print('response.text len:', len(response.text))
print(response.text)
print()
print('splitting by spaces:')
for i, s in enumerate(response.text.split()):
print(i, s)
print()
print('splitting by newlines:')
for i, line in enumerate(response.text.split('\n')):
print(i, line)
The code gives this output:
response.text type: <class 'str'>
response.text len: 21
a = ["please","work"]
splitting by spaces:
0 a
1 =
2 ["please","work"]
splitting by newlines:
0 a = ["please","work"]
#bruno suggested in a comment to use str.splitlines(); this will work even if the response is bytes, since there also exists the method bytes.splitlines().

Process special JSON with keys as numbers

I want to extract data from file into a dictionary via json.loads. Example:
{725: 'pitcher, ewer',
726: "plane, carpenter's plane, woodworking plane"}
json.loads can't handle the keys as numbers
Some values are "" and others are '.
Any suggestions?
Code
import requests
url = url
r = requests.get(url)
response = r.text.replace('\n','')
response = re.sub(r':(\d+):*', r'"\1"', response)

The file you supplied seems like a valid Python dict, so I suggest an alternative approach, with literal_eval.
from ast import literal_eval
data = literal_eval(r.text)
print(data[726])
Output: plane, carpenter's plane, woodworking plane
If you still like json, then you can try replacing the numbers with strings using regex.
import re
s = re.sub(r"(?m)^(\W*)(\d+)\b", r'\1"\2"', r.text)
data = json.loads(s)

Extracting multiple nested JSON keys at a time

How do I go about extracting more than one JSON key at a time given this script - the script cycles through a list of message ids and extracts the JSON response. I only want to extract certain keys from the response.
import urllib3
import json
import csv
from progressbar import ProgressBar
import time
pbar = ProgressBar()
base_url = 'https://api.pipedrive.com/v1/mailbox/mailMessages/'
fields = {"include_body": "1", "api_token": "token"}
json_arr = []
http = urllib3.PoolManager()
with open('ten.csv', newline='') as csvfile:
for x in pbar(csv.reader(csvfile, delimiter=' ', quotechar='|')):
r = http.request('GET', base_url + "".join(x), fields=fields)
mails = json.loads(r.data.decode('utf-8'))
json_arr.append(mails['data']['from'][0]['id'])
print(json_arr)
This works as intended. But I want to do the following.
json_arr.append(mails(['data']['from'][0]['id'],['data']['to'][0]['id'])
Which results in TypeError: list indices must be integers or slices, not str

Did you mean:
json_arr.append(mails['data']['from'][0]['id'])
json_arr.append(mails['data']['to'][0]['id'])

The answer already posted looks good but I'll share the one-liner equivalent, using extend() instead of append():
json_arr.extend([mails['data']['from'][0]['id'], mails['data']['to'][0]['id']])

Parse a json file and add the strings to a URL

How do I parse a json output get the list from data only and then add the output into say google.com/confidetial and the other strings in the list.
so my json out put i will name it "text"
text = {"success":true,"code":200,"data":["Confidential","L1","Secret","Secret123","foobar","maret1","maret2","posted","rontest"],"errs":[],"debugs":[]}.
What I am looking to do is get the list under data only. so far the script i got is giving me the entire json out put.
json.loads(text)
print text
output = urllib.urlopen("http://google.com" % text)
print output.geturl()
print output.read()

jsonobj = json.loads(text)
print jsonobj['data']
Will print the list in the data section of your JSON.
If you want to open each as a link after google.com, you could try this:
def processlinks(text):
output = urllib.urlopen('http://google.com/' % text)
print output.geturl()
print output.read()
map(processlinks, jsonobj['data'])

info = json.loads(text)
json_text = json.dumps(info["data"])
Using json.dumps converts the python data structure gotten from json.loads back to regular json text.
So, you could then use json_text wherever you were using text before and it should only have the selected key, in your case: "data".

Perhaps something like this where result is your JSON data:
from itertools import product
base_domains = ['http://www.google.com', 'http://www.example.com']
result = {"success":True,"code":200,"data":["Confidential","L1","Secret","Secret123","foobar","maret1","maret2","posted","rontest"],"errs":[],"debugs":[]}
for path in product(base_domains, result['data']):
print '/'.join(path) # do whatever
http://www.google.com/Confidential
http://www.google.com/L1
http://www.google.com/Secret
http://www.google.com/Secret123
http://www.google.com/foobar
http://www.google.com/maret1
http://www.google.com/maret2
http://www.google.com/posted
http://www.google.com/rontest
http://www.example.com/Confidential
http://www.example.com/L1
http://www.example.com/Secret
http://www.example.com/Secret123
http://www.example.com/foobar
http://www.example.com/maret1
http://www.example.com/maret2
http://www.example.com/posted
http://www.example.com/rontest

Manipulate string data

I'm new to python and trying to create a script to modify the output of a JS file to match what is required to send data to an API. The JS file is being read via urllib2.
def getPage():
url = "http://url:port/min_day.js"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
return response.read()
# JS Data
# m[mi++]="19.12.12 09:30:00|1964;2121;3440;293;60"
# m[mi++]="19.12.12 09:25:00|1911;2060;3277;293;59"
# Required format for API
# addbatchstatus.jsp?data=20121219,09:25,3277.0,1911,-1,-1,59.0,293.0;20121219,09:30,3440.0,1964,-1,-1,60.0,293.0
As a breakdown (Required values are bold)
m[mi++]="19.12.12 09:30:00|1964;2121;3440;293;60"
and need to add values of -1,-1 into the string
I've managed to get the date into the correct format and replace characters and line breaks to make the output look as such, but I have a feeling I'm heading down the wrong track if I need to be able to reorder this string values. Although it looks like the order is in reverse in regards to time as well.
20121219,09:30:00,1964,2121,3440,293,60;20121219,09:25:00,1911,2060,3277,293,59
Any help would be greatly appreciated! I'm thinking along the lines of regex might be what I need.

Here's a Regex pattern to strip out the bits you don't want
m\[mi\+\+\]="(?P<day>\d{2})\.(?P<month>\d{2})\.(?P<year>\d{2}) (?P<time>[\d:]{8})\|(?P<v1>\d+);(?P<v2>\d+);(?P<v3>\d+);(?P<v4>\d+);(?P<v5>\d+).+
and replace with
20\P<year>\P<month>\P<day>,\P<time>,\P<v3>,\P<v1>,-1,-1,\P<v5>,\P<v4>
This pattern assumes that the characters before the date are constant. You can replace m\[mi\+\+\]=" with [^\d]+ if you want more general handling of that bit.
So to put this in practice in python:
import re
def getPage():
url = "http://url:port/min_day.js"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
return response.read()
def repl(match):
return '20%s%s%s,%s,%s,%s,-1,-1,%s,%s'%(match.group('year'),
match.group('month'),
match.group('day'),
match.group('time'),
match.group('v3'),
match.group('v1'),
match.group('v5'),
match.group('v4'))
pattern = re.compile(r'm\[mi\+\+\]="(?P<day>\d{2})\.(?P<month>\d{2})\.(?P<year>\d{2}) (?P<time>[\d:]{8})\|(?P<v1>\d+);(?P<v2>\d+);(?P<v3>\d+);(?P<v4>\d+);(?P<v5>\d+).+')
data = [re.sub(pattern, repl, line).split(',') for line in getPage().split('\n')]
# If you want to sort your data
data = sorted(data, key=lambda x:x[0], reverse=True)
# If you want to write your data back to a formatted string
new_string = ';'.join(','.join(x) for x in data)
# If you want to write it back to file
with open('new/file.txt', 'w') as f:
f.write(new_string)
Hope that helps!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Data Splitting and Extraction - python

The response is json data; use the json module to parse it.

Related

How do I import an external text file from a website and use a list from inside it

Process special JSON with keys as numbers

Extracting multiple nested JSON keys at a time

Parse a json file and add the strings to a URL

Manipulate string data

Categories

Resources