I am looking to split strings at "(", this is working fine if there is only one "(" character in the string. However, if there are more than one such character, it throws a value error too many values to unpack
data = 'The National Bank (US) (Bank)'
I've tried the below code:
name, inst = data.split("(")
Desired output:
name = 'The National Bank (US)'
inst = '(Bank)'
Your split method is splitting the input on both ( characters, giving you the result:
["The National Bank ", "US) ", "Bank)"]
You are then attempting to unpack this list of three values into two variables, name and inst. This is what the error "Too many values to unpack" means.
You can restrict the number of splits to be made using the second parameter to split, but this will give you the wrong result as well.
You actually want to split from the right of the string, on the first space character. You can do that with rsplit:
data = 'The National Bank (US) (Bank)'
name, inst = data.rsplit(' ', 1)
name and inst will now be set as you expect.
this is expected behavior of this function. When you split string with n separators, you get n+1 strings in return
e.g.
l = '1,2,3,4'.split(',')
print(l)
print(type(l), len(l))
You can use the rsplit with the maxsplit parameter like this, although you have to append the leading ( to your inst string:
>>> name, inst = data.rsplit("(", maxsplit=1)
>>> name
'The National Bank (US) '
>>> inst
'Bank)'
You may be able to get a little cleaner results by doing the same thing but passing a blank space as the delimiter:
>>> name, inst = data.rsplit(" ", maxsplit=1)
>>> name
'The National Bank (US)'
>>> inst
'(Bank)'
Related
Need to write a code for a school lab.
Input is First name Middle name Last Name
Output needs to be Last name, First initial. Middle Initial.
It must also work with just first and last name.
Examples:
Input: Jane Ann Doe
Output: Doe, J. A.
Input: Jane Doe
Output: Doe, J.
Code thus far is:
# 2.12 Lab, input First name Middle name last name
# result to print Last name, fist initial. Middle initial period.
# result must account for user not having middle name
name = input()
tokens = name.split()
I do not understand how to write an if statement followed by print statement to get the desired output.
name = input("Enter name: ")
tokens = name.split()
if int(len(tokens)) > 2:
print(tokens[-1] + ",", tokens[0][0]+".", tokens[1][0]+".")
else:
print(tokens[-1] + ",", tokens[0][0]+".")
With what you have so far, tokens will be a list of the words you entered, such as ['Jane', 'Ann', 'Doe'].
What you need to do is to print out the last of those items in full, followed by a comma. Then each of the other items in order but with just the first letter followed by a period.
You can get the last item of a list x with x[-1]. You can get each of the others with a loop like:
for item in x[:-1]:
doSomethingWith(item)
And the first character of the string item can be extracted with item[0].
That should hopefully be enough to get you on your way.
If it's not enough, read on, though it would be far better for you if tou tried to nut it out yourself first.
...
No? Okay then, here we go ...
The following code shows one way you can do this, with hopefully enough comments that you will understand:
import sys
# Get line and turn into list of words.
inputLine = input("Please enter your full name: ")
tokens = inputLine.split()
print(tokens)
# Pre-check to make sure at least two words were entered.
if len(tokens) < 2:
print("ERROR: Need at least two tokens in the name.")
sys.exit(0)
# Print last word followed by comma, no newline (using f-strings).
print(f"{tokens[-1]},", end="")
# Process all but the last word.
for namePart in tokens[:-1]:
# Print first character of word followed by period, no newline.
print(f" {namePart[0]}.", end="")
# Make sure line is terminated by a newline character.
print()
You could no doubt make that more robust against weird edge cases like a first name of "." but it should be okay for an educational assignment.
But it handles even more complex names such as "River Rocket Blue Dallas Oliver" (yes, I'm serious, that's a real name).
# 2.12 Lab, input First name Middle name last name
# result to print Last name, fist initial. Middle initial period.
# result must account for user not having middle name
name = input()
tokens = name.split()
if len(tokens) == 2: # to identify if only two names entered
last_name = tokens[1]
first_init = tokens[0][0]
print(last_name, ',', first_init,'.',sep='')
if len(tokens) == 3: # to identify if three names entered
last_name = tokens[2]
first_init = tokens[0][0]
middle_init = tokens [1][0]
print(last_name, ',',' ',first_init,'.', ' ', middle_init,'.',sep='')
Try this code:
a=input()
name=a.split(" ")
index=len(name)
if index==3:
print(f"{name[-1]},{name[-3][0]}.{name[-2][0]}.")
else:
print(f"{name[-1]},{name[-2][0]}.")
Here is the explanation of the code:
First,using input(),we get the name of the person.
Then,we split the name using .split()with the parameter (written in the parenthesis) as " "
next we will find the no.of elements in the list (.split() returns a list) for the if statement
Then we print the output through the if statement shown above and using indexing ,we extract the first letter.
I'd like to use part of a string ('project') that is returned from an API. The string looks like this:
{'Project Title': 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'}
I'd like to store the 'LS003942_EP... ' part in a new variable called foldername. I'm thought a good way would be to use a regex to find the text after Title. Here's my code:
orders = api.get_all(view='Folder', fields='Project Title', maxRecords=1)
for new in orders:
print ("Found 1 new project")
print (new['fields'])
project = (new['fields'])
s = re.search('Title(.+?)', result)
if s:
foldername = s.group(1)
print(foldername)
This gives me an error -
TypeError: expected string or bytes-like object.
I'm hoping for foldername = 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'
You can use ast.literal_eval to safely evaluate a string containing a Python literal:
import ast
s = "{'Project Title': 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'}"
print(ast.literal_eval(s)['Project Title'])
# LS003942_EP - 5 Random Road, Sunny Place, SA 5000
It seems (to me) that you have a dictionary and not string. Considering this case, you may try:
s = {'Project Title': 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'}
print(s['Project Title'])
If you have time, take a look at dictionaries.
I don't think you need a regex here:
string = "{'Project Title': 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'}"
foldername = string[string.index(":") + 2: len(string)-1]
Essentially, I'm finding the position of the first colon, then adding 2 to get the starting index of your foldername (which would be the apostrophe), and then I use index slicing and slice everything from the index to the second-last character (the last apostrophe).
However, if your string is always going to be in the form of a valid python dict, you could simply do foldername = (eval(string).values)[0]. Here, I'm treating your string as a dict and am getting the first value from it, which is your desired foldername. But, as #AKX notes in the comments, eval() isn't safe as somebody could pass malicious code as a string. Unless you're sure that your input strings won't contain code (which is unlikely), it's best to use ast.literal_eval() as it only evaluates literals.
But, as #MaximilianPeters notes in the comments, your response looks like a valid JSON, so you could easily parse it using json.parse().
You could try this pattern: (?<='Project Title': )[^}]+.
Explanation: it uses positive lookbehind to assure, that match will occure after 'Project Title':. Then it matches until } is encountered: [^}]+.
Demo
I am trying to split the line:
American plaice - 11,000 lbs # 35 cents or trade for SNE stocks
at the word or but I receive ValueError: not enough values to unpack (expected 2, got 1).
Which doesn't make sense, if I split the sentence at or then that will indeed leave 2 sides, not 1.
Here's my code:
if ('-' in line) and ('lbs' in line):
fish, remainder = line.split('-')
if 'trade' in remainder:
weight, price = remainder.split('to ')
weight, price = remainder.split('or')
The 'to' line is what I normally use, and it has worked fine, but this new line appeared without a 'to' but instead an 'or' so I tried writing one line that would tackle either condition but couldn't figure it out so I simply wrote a second and am now running into the error listed above.
Any help is appreciated, thanks.
The most straightforward way is probably to use a regular expression to do the split. Then you can split on either word, whichever appears. The ?: inside the parentheses makes the group non-capturing so that the matched word doesn't appear in the output.
import re
# ...
weight, price = re.split(" (?:or|to) ", remainder, maxsplit=1)
You split on 'to ' before you attempt to split on 'or', which is throwing the error. The return value of remainder.split('to ') is [' 11,000 lbs # 35 cents or trade for SNE stocks'] which cannot be unpacked to two separate values. you can fix this by testing for which word you need to split on first.
if ('-' in line) and ('lbs' in line):
fish, remainder = line.split('-')
if 'trade' in remainder:
if 'to ' in remainder:
weight, price = remainder.split('to ')
elif ' or ' in remainder:
weight, price = remainder.split(' or ') #add spaces so we don't match 'for'
This should solve your problem by checking if your separator is in the string first.
Also note that split(str, 1) makes sure that your list will be split a max of one time (Ex "hello all world".split(" ", 1) == ["hello", "all world"])
if ('-' in line) and ('lbs' in line):
fish, remainder = line.split('-')
if 'trade' in remainder:
weight, price = remainder.split(' to ', 1) if ' to ' in remainder else remainder.split(' or ', 1)
The problem is that the word "for" also contains an "or" therefore you will end up with the following:
a = 'American plaice - 11,000 lbs # 35 cents or trade for SNE stocks'
a.split('or')
gives
['American plaice - 11,000 lbs # 35 cents ', ' trade f', ' SNE stocks']
Stephen Rauch's answer does fix the problem
Once you have done the split(), you have a list, not a string. So you can not do another split(). And if you just copy the line, then you will overwrite you other results. You can instead try and do the processing as a string:
weight, price = remainder.replace('or ', 'to ').split('to ')
Here is an example of what data I will have:
472747372 42 Lawyer John Legend Bishop
I want to be able to take a string like that and using a function convert it into a tuple so that it will be split like so:
"472747372" "42" "Lawyer" "John Legend" "Bishop"
NI Number, Age, Job, surname and other names
What about:
>>> string = "472747372 42 Lawyer John Legend Bishop"
>>> string.split()[:3] + [' '.join(string.split()[3:5])] + [string.split()[-1]]
['472747372', '42', 'Lawyer', 'John Legend', 'Bishop']
Or:
>>> string.split(maxsplit=3)[:-1] + string.split(maxsplit=3)[-1].rsplit(maxsplit=1)
['472747372', '42', 'Lawyer', 'John Legend', 'Bishop']
In python, str has a built-in method called split which will split the string into a list, splitting on whatever character you pass it. It's default is to split on whitespace, so you can simply do:
my_string = '472747372 42 Lawyer Hermin Shoop Tator'
tuple(my_string.split())
EDIT: After OP changed the post.
Assuming there will always be an NI Number, Age, Job, and surname, you would have to do:
elems = my_string.split()
tuple(elems[:3] + [' '.join(elems[3:5])] + elems[5:])
This will allow you to support an arbitrary number of "other" names after the surname
The purpose of this code is to make a program that searches a persons name (on Wikipedia, specifically) and uses keywords to come up with reasons why that person is significant.
I'm having issues with this specific line "if fact_amount < 5 and (terms in sentence.lower()):" because I get this error ("TypeError: coercing to Unicode: need string or buffer, list found")
If you could offer some guidance it would be greatly appreciated, thank you.
import requests
import nltk
import re
#You will need to install requests and nltk
terms = ['pronounced'
'was a significant'
'major/considerable influence'
'one of the (X) most important'
'major figure'
'earliest'
'known as'
'father of'
'best known for'
'was a major']
names = ["Nelson Mandela","Bill Gates","Steve Jobs","Lebron James"]
#List of people that you need to get info from
for name in names:
print name
print '==============='
#Goes to the wikipedia page of the person
r = requests.get('http://en.wikipedia.org/wiki/%s' % (name))
#Parses the raw html into text
raw = nltk.clean_html(r.text)
#Tries to split each sentence.
#sort of buggy though
#For example St. Mary will split after St.
sentences = re.split('[?!.][\s]*',raw)
fact_amount = 0
for sentence in sentences:
#I noticed that important things came after 'he was' and 'she was'
#Seems to work for my sample list
#Also there may be buggy sentences, so I return 5 instead of 3
if fact_amount < 5 and (terms in sentence.lower()):
#remove the reference notation that wikipedia has
#ex [ 33 ]
sentence = re.sub('[ [0-9]+ ]', '', sentence)
#removes newlines
sentence = re.sub('\n', '', sentence)
#removes trailing and leading whitespace
sentence = sentence.strip()
fact_amount += 1
#sentence is formatted. Print it out
print sentence + '.'
print
You should be checking it the other way
sentence.lower() in terms
terms is list and sentence.lower() is a string. You can check if a particular string is there in a list, but you cannot check if a list is there in a string.
you might mean if any(t in sentence_lower for t in terms), to check whether any terms from terms list is in the sentence string.