I am trying to a reverse split of a URL generated from a text file and am getting the above error when printing that split value. I have tried making a string from the URL and splitting that, but this causes the GUI to freeze completely and not even produce an error message. My code is here:
a = URLS.rsplit('=', 1)
The code I used when attempting to resolve a string from the URL then split that is here:
urlstr = str(URLS)
a = urlstr.rsplit('=', 1)
print(a)
Can anyone tell me why I cant split the URL using the split method (the URLS were defined in a dictionary) and/or why creating a string and then splitting that is not working?
Thanks
The error suggests that URLS is not a string, but rather a dict_values object. I think that's what you get when you call the values method of a dictionary (in Python 3). A values view is an iterable object, so you probably want to loop over it, with something like:
for url in URLS:
a = url.rsplit("=", 1)
# do stuff with a here
Or if you want a list of the various a values, you could use a list comprehension:
a_lst = [url.rsplit("=", 1) for url in URLS]
A dict_values object is a sequence. It does not have an rsplit method, though str objects do.
Really though, instead of using rsplit, you probably should be using urllib.parse to extract information from your URLs.
For example,
>>> import urllib.parse as parse
>>> url = 'http://stackoverflow.com/questions?x=foo&y=bar'
>>> parse.urlsplit(url)
SplitResult(scheme='http', netloc='stackoverflow.com', path='/questions', query='x=foo&y=bar', fragment='')
>>> parse.urlsplit(url).query
'x=foo&y=bar'
>>> parse.parse_qs(parse.urlsplit(url).query)
{'x': ['foo'], 'y': ['bar']}
So, if URLS is a dict, then you can loop through the values and extract the parameter values using
>>> URLS = {'a': 'http://stackoverflow.com/questions?x=foo&y=bar'}
>>> for url in URLS.values():
... print(parse.parse_qs(parse.urlsplit(url).query))
...
{'x': ['foo'], 'y': ['bar']}
Unlike rsplit, parse_qs will allow you to properly unquote percent-encoded query strings, and control the parsing of blank values.
Related
I was using the standard split operation in python to extract ids from urls. It works for
urls of the form https://music.com/146 where I need to extract 146 but fails in these cases
https://music.com/144?i=150
from where I need to extract 150 after i
I use the standard
url.split("/")[-1]
Is there a better way to do it ?
Python provides a few tools to make this process easier.
As #Barmar mentioned, you can use urlsplit to split the URL, which gets you a named tuple:
>>> from urllib import parse as urlparse
>>> x = urlparse.urlsplit('https://music.com/144?i=150')
>>> x
SplitResult(scheme='https', netloc='music.com', path='/144', query='i=150', fragment='')
You can use the parse_qs function to convert the query string into a dictionary:
>>> urlparse.parse_qs(x.query)
{'i': ['150']}
Or in a single line:
>>> urlparse.parse_qs(urlparse.urlsplit('https://music.com/144?i=150').query)['i']
['150']
A particularly useful tool for manipulating URLs in Python is furl, which provides an interface mimicking the convenience of Python's standard pathlib module.
Accessing a parameter in the query string (the part after the ? of the URL) is as simple as indexing the URL's args attribute with the name of the parameter you want:
>>> from furl import furl
>>> url = furl('https://music.com/144?i=150')
>>> url.args['i']
'150'
In my opinion, this is a lot easier than using urllib.
As #Barmar mentioned, you can fix your code to:
url.split("/")[-1].split("?i=")[-1]
Basically you need to split https://music.com/144?i=150 into https://music.com and 144?i=150, get the second element 144?i=150, then split it to 144 and 150, then get the second.
If you need it to be number, you can use int(url.split("/")[-1].split("?i="))[-1]
you can use regexp
import re
url = 'https://music.com/144?i=150'
match = re.search(r'(\d+)\?', url)
if match:
value = match[1] # 144
if you need the 150
match = re.search(r'i=(\d+)', url)
if match:
value = match[1] # 150
When I am reading a cell with hyperlink from CSV file I am getting the following:
=HYPERLINK("http://google.com","google") #for example
Is there a way to extract only the "google" without the =hyperlink and the link?
As per #martineau's comment, you have two versions of HYPERLINK.
>>> s1 = '=HYPERLINK("http://google.com","google")'
Or
>>> s2 = '=HYPERLINK("http://google.com")'
You can split, use a regex, but these methods are tricky (what if you have a comma in the url? an escaped quote in the name?).
There is a module called ast that parses Python expressions. We can use it, because Excel function call syntax is close to Python's one. Here's a version that returns the friendly name if there is one, and the url else:
>>> import ast
>>> ast.parse(s1[1:]).body[0].value.args[-1].s
'google'
And:
>>> ast.parse(s2[1:]).body[0].value.args[-1].s
'http://google.com'
This is how it works: s1[1:] removes the = sign. Then we take the value of the expression:
>>> v = ast.parse(s1[1:]).body[0].value
>>> v
<_ast.Call object at ...>
It is easy to extract the function name:
>>> v.func.id
'HYPERLINK'
And the args:
>>> [arg.s for arg in v.args]
['http://google.com', 'google']
Just take the last arg ( ....args[-1].s) to get the friendly name if it exists, and the url else. You can also checklen(args)` to do something if there is one arg, and something else if there are two args.
I have a data.json file, which looks like this:
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
I am trying to get "Event" from this file using python and miserably failing at this.
with open('data.json', 'r') as json_file:
data = json.load(json_file)
print (data['Event'])
I get the following error:
TypeError: list indices must be integers or slices, not str
And even when I try
print (data[0]['Event'])
then I get this error:
TypeError: string indices must be integers
One more thing:
print(type(data))
gives me "list"
I have searched all over and have not found a solution to this. I would really appreciate your suggestions.
You could use the ast module for this:
import ast
mydata = ["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
data = ast.literal_eval(mydata[0])
data
{'Day': 'Today', 'Event': '1', 'Date': '2019-03-20'}
data['Event']
'1'
Edit
Your original code does load the data into a list structure, but only contains a single string entry inside that list, despite proper json syntax. ast, like json, will parse that string entry into a python data structure, dict.
As it sits, when you try to index that list, it's not the same as calling a key in a dict, hence the slices cannot be str:
alist = [{'a':1, 'b':2, 'c':3}]
alist['a']
TypeError
# need to grab the dict entry from the list
adict = alist[0]
adict['a']
1
You need to convert the elements in data to dict using json module.
Ex:
import json
with open(filename) as infile:
data = json.load(infile)
for d in data:
print(json.loads(d)['Event'])
Or:
data = list(map(json.loads, data))
print(data[0]["Event"])
Output:
1
Your problem is that you are parsing it as a list that consists of a single element that is a string.
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
See how the entire content of the list is surrounded by " on either side and every other " is preceded by a \? The slash generally means to ignore the special meaning the following character might have, but interpret it as purely a string.
If you have control over the file's contents, the easiest solution would be to adjust it. You will want it to be in a format like this:
[{"Day":"Today", "Event": "1", "Date": "2019-03-20"}]
Edit: As others have suggested, you can also parse it in its current state. Granted, cleaning the data is tedious, but oftentimes worth the effort. Though this may not be one of those cases. I'm leaving this answer up anyway because it may help with explaining why OPs initial attempt did not work, and why he received the error messages he got.
website = 'http://www.python.org'
website[18:] = 'com'
The error says:
'str' object does not support item assignment.
Why is this code snippet not legal?
Because strings are immutable. Do it like this:
>>> website = 'http://www.python.org'
>>> website = website[:18] + 'com' # build a new string, reassign variable website
>>> website
'http://www.python.com'
If you prefer not to count how many characters to stop before you reach .org
you can use
website=website[:len(website)-4]+".com"
When I call the JSON file for Vincent van Gogh's List of Works wikipedia page, using this url,
it obviously returns a huge blob of text which I believe is some sort of dictionary of lists.
Now, someone has already shown me Python's import wikipedia feature, so skip that. How can I decode this JSON? I feel like I have tried everything in Python 3's library, and always get an error, like I get if I try this code for example:
data = urllib.request.urlopen(long_json_url)
stuff = json.load(data) #or json.loads(data)
print(stuff)
it returns
TypeError: the JSON object must be str, not 'bytes'
Or if I try this code:
data = urllib.request.urlopen(longurl)
json_string = data.read().decode('utf-8')
json_data = json.loads(json_string)
print(json_data)
It doesn't return an error, but just what looks like nothing
>>>
>>>
But if I highlight that empty space and paste it, it pastes the same blob of text.
{'warnings': {'main': {'*': "Unrecognized parameter: 'Page'"}}, 'query': {'normalized': [{'from': 'list of works by Vincent van Gogh',... etc
If I try a for loop:
for entry in json_data:
print(entry)
It returns
>>>
query
warnings
>>>
And that's it. So it's not returning an error there, but not really much else, just two values? How would you make the JSON data into a workable Python dict or list? Or at the very least, into a more vertical format that I could actually read?
How would you make the JSON data into a workable Python dict or list?
You're already doing that with
json_data = json.loads(json_string)
This however:
for entry in json_data:
print(entry)
will only print the keys of your dictionaries. If you want to print the values, you need to use:
for entry in json_data:
print(json_data[entry])
if you inspect the data, you'll see that there are two keys for the main dictionary. The ones you already got by iterating over the dict:
{u'query': {...}, u'warnings': {...}}