Python - split() producing ValueError - python

I am trying to split the line:
American plaice - 11,000 lbs # 35 cents or trade for SNE stocks
at the word or but I receive ValueError: not enough values to unpack (expected 2, got 1).
Which doesn't make sense, if I split the sentence at or then that will indeed leave 2 sides, not 1.
Here's my code:
if ('-' in line) and ('lbs' in line):
fish, remainder = line.split('-')
if 'trade' in remainder:
weight, price = remainder.split('to ')
weight, price = remainder.split('or')
The 'to' line is what I normally use, and it has worked fine, but this new line appeared without a 'to' but instead an 'or' so I tried writing one line that would tackle either condition but couldn't figure it out so I simply wrote a second and am now running into the error listed above.
Any help is appreciated, thanks.

The most straightforward way is probably to use a regular expression to do the split. Then you can split on either word, whichever appears. The ?: inside the parentheses makes the group non-capturing so that the matched word doesn't appear in the output.
import re
# ...
weight, price = re.split(" (?:or|to) ", remainder, maxsplit=1)

You split on 'to ' before you attempt to split on 'or', which is throwing the error. The return value of remainder.split('to ') is [' 11,000 lbs # 35 cents or trade for SNE stocks'] which cannot be unpacked to two separate values. you can fix this by testing for which word you need to split on first.
if ('-' in line) and ('lbs' in line):
fish, remainder = line.split('-')
if 'trade' in remainder:
if 'to ' in remainder:
weight, price = remainder.split('to ')
elif ' or ' in remainder:
weight, price = remainder.split(' or ') #add spaces so we don't match 'for'

This should solve your problem by checking if your separator is in the string first.
Also note that split(str, 1) makes sure that your list will be split a max of one time (Ex "hello all world".split(" ", 1) == ["hello", "all world"])
if ('-' in line) and ('lbs' in line):
fish, remainder = line.split('-')
if 'trade' in remainder:
weight, price = remainder.split(' to ', 1) if ' to ' in remainder else remainder.split(' or ', 1)

The problem is that the word "for" also contains an "or" therefore you will end up with the following:
a = 'American plaice - 11,000 lbs # 35 cents or trade for SNE stocks'
a.split('or')
gives
['American plaice - 11,000 lbs # 35 cents ', ' trade f', ' SNE stocks']
Stephen Rauch's answer does fix the problem

Once you have done the split(), you have a list, not a string. So you can not do another split(). And if you just copy the line, then you will overwrite you other results. You can instead try and do the processing as a string:
weight, price = remainder.replace('or ', 'to ').split('to ')

Related

new to python (as in 1 week in) and need help getting pointed in the right direction

Need to write a code for a school lab.
Input is First name Middle name Last Name
Output needs to be Last name, First initial. Middle Initial.
It must also work with just first and last name.
Examples:
Input: Jane Ann Doe
Output: Doe, J. A.
Input: Jane Doe
Output: Doe, J.
Code thus far is:
# 2.12 Lab, input First name Middle name last name
# result to print Last name, fist initial. Middle initial period.
# result must account for user not having middle name
name = input()
tokens = name.split()
I do not understand how to write an if statement followed by print statement to get the desired output.
name = input("Enter name: ")
tokens = name.split()
if int(len(tokens)) > 2:
print(tokens[-1] + ",", tokens[0][0]+".", tokens[1][0]+".")
else:
print(tokens[-1] + ",", tokens[0][0]+".")
With what you have so far, tokens will be a list of the words you entered, such as ['Jane', 'Ann', 'Doe'].
What you need to do is to print out the last of those items in full, followed by a comma. Then each of the other items in order but with just the first letter followed by a period.
You can get the last item of a list x with x[-1]. You can get each of the others with a loop like:
for item in x[:-1]:
doSomethingWith(item)
And the first character of the string item can be extracted with item[0].
That should hopefully be enough to get you on your way.
If it's not enough, read on, though it would be far better for you if tou tried to nut it out yourself first.
...
No? Okay then, here we go ...
The following code shows one way you can do this, with hopefully enough comments that you will understand:
import sys
# Get line and turn into list of words.
inputLine = input("Please enter your full name: ")
tokens = inputLine.split()
print(tokens)
# Pre-check to make sure at least two words were entered.
if len(tokens) < 2:
print("ERROR: Need at least two tokens in the name.")
sys.exit(0)
# Print last word followed by comma, no newline (using f-strings).
print(f"{tokens[-1]},", end="")
# Process all but the last word.
for namePart in tokens[:-1]:
# Print first character of word followed by period, no newline.
print(f" {namePart[0]}.", end="")
# Make sure line is terminated by a newline character.
print()
You could no doubt make that more robust against weird edge cases like a first name of "." but it should be okay for an educational assignment.
But it handles even more complex names such as "River Rocket Blue Dallas Oliver" (yes, I'm serious, that's a real name).
# 2.12 Lab, input First name Middle name last name
# result to print Last name, fist initial. Middle initial period.
# result must account for user not having middle name
name = input()
tokens = name.split()
if len(tokens) == 2: # to identify if only two names entered
last_name = tokens[1]
first_init = tokens[0][0]
print(last_name, ',', first_init,'.',sep='')
if len(tokens) == 3: # to identify if three names entered
last_name = tokens[2]
first_init = tokens[0][0]
middle_init = tokens [1][0]
print(last_name, ',',' ',first_init,'.', ' ', middle_init,'.',sep='')
Try this code:
a=input()
name=a.split(" ")
index=len(name)
if index==3:
print(f"{name[-1]},{name[-3][0]}.{name[-2][0]}.")
else:
print(f"{name[-1]},{name[-2][0]}.")
Here is the explanation of the code:
First,using input(),we get the name of the person.
Then,we split the name using .split()with the parameter (written in the parenthesis) as " "
next we will find the no.of elements in the list (.split() returns a list) for the if statement
Then we print the output through the if statement shown above and using indexing ,we extract the first letter.

How can I extract numbers based on context of the sentence in python?

I tried using regular expressions but it doesn't do it with any context
Examples::
"250 kg Oranges for Sale"
"I want to sell 100kg of Onions at 100 per kg"
You can do something like this.
First you split the text in words and then you try to convert each word to a number.
If the word can be converted to a number, it is a number and if you are sure that a quantity is always followed by the word "kg", once you find the number you can test if the next word is "kg".
Then, depending on the result, you add the value to the respective array.
In this particular case, you have to assure the numbers are written alone (e.g. "100 kg" and not "100kg") otherwise it will not be converted.
string = "250 kg Oranges for Sale. I want to sell 100 kg of Onions at 100 per kg."
# Split the text
words_list = string.split(" ")
print(words_list)
# Find which words are numbers
quantity_array = []
price_array = []
for i in range(len(words_list)):
try:
number = int(words_list[i])
# Is it a price or a quantity?
if words_list[i + 1] == 'kg':
quantity_array.append(number)
else:
price_array.append(number)
except ValueError:
print("\'%s\' is not a number" % words_list[i])
# Get the results
print(quantity_array)
print(price_array)

Too many values to unpack (expected 2) while splitting string

I am looking to split strings at "(", this is working fine if there is only one "(" character in the string. However, if there are more than one such character, it throws a value error too many values to unpack
data = 'The National Bank (US) (Bank)'
I've tried the below code:
name, inst = data.split("(")
Desired output:
name = 'The National Bank (US)'
inst = '(Bank)'
Your split method is splitting the input on both ( characters, giving you the result:
["The National Bank ", "US) ", "Bank)"]
You are then attempting to unpack this list of three values into two variables, name and inst. This is what the error "Too many values to unpack" means.
You can restrict the number of splits to be made using the second parameter to split, but this will give you the wrong result as well.
You actually want to split from the right of the string, on the first space character. You can do that with rsplit:
data = 'The National Bank (US) (Bank)'
name, inst = data.rsplit(' ', 1)
name and inst will now be set as you expect.
this is expected behavior of this function. When you split string with n separators, you get n+1 strings in return
e.g.
l = '1,2,3,4'.split(',')
print(l)
print(type(l), len(l))
You can use the rsplit with the maxsplit parameter like this, although you have to append the leading ( to your inst string:
>>> name, inst = data.rsplit("(", maxsplit=1)
>>> name
'The National Bank (US) '
>>> inst
'Bank)'
You may be able to get a little cleaner results by doing the same thing but passing a blank space as the delimiter:
>>> name, inst = data.rsplit(" ", maxsplit=1)
>>> name
'The National Bank (US)'
>>> inst
'(Bank)'

Python - spilt() over many spaces

I followed this answer's (Python: Split by 1 or more occurrences of a delimiter) directions to a T and it keeps failing so I'm wondering if it's something simple I'm missing or if I need a new method to solve this.
I have the following .eml file:
My goal is to eventually parse out all the fish stocks and their corresponding weight amounts, but for a test I'm just using the following code:
with open(file_path) as f:
for line in f:
if ("Haddock" in line):
#fish, remainder = re.split(" +", line)
fish, remainder = line.split()
print(line.lower().strip())
print("fish:", fish)
print("remainder:", remainder)
and it fails on the line fish, remainder = line.split() with the error
ValueError: too many values to unpack (expected 2)
which tells me that Python is failing because it is trying to split on too many spaces, right? Or am I misunderstanding this? I want to get two values back from this process: the name of the fish (a string containing all the text before the many spaces) and the quantity (integer from the right side of the input line).
Any help would be appreciated.
You may use below regular expression for splitting
fish, remainder = re.split(r'(?<=\w)\s+(?=\d)',line.strip())
it will split and give `['GB Haddock West', '22572']`
I would like the fish to be GB Haddock West and the remainder to be 22572
You could do something line this:
s = line.split()
fish, remainder = " ".join(s[:-1]), s[-1]
Instead of using split() you could utilize rindex() and find the last space and split between there.
at = line.rindex(" ")
fish, remainder = line[:at], line[at+1:]
Both will output:
print(fish) # GB Haddock West
print(remainder) # 22572
Yes ... you can split on multiple spaces. However, unless you can specify the number of spaces, you're going to get additional empty fields in the middle, just as you're getting now. For instance:
in_stuff = [
"GB Haddock West 22572",
"GB Cod West 7207",
"GB Haddock East 3776"
]
for line in in_stuff:
print line.split(" ")
Output:
['GB Haddock West', '', '', ' 22572']
['GB Cod West', '', '', '', '', '7207']
['GB Haddock East', '', '', ' 3776']
However, a simple change will get what you want: pick off the first and last fields from this:
for line in in_stuff:
fields = line.split(" ")
print fields[0], int(fields[-1])
Output:
GB Haddock West 22572
GB Cod West 7207
GB Haddock East 3776
Will that solve your problem?
Building upon #Vallentin's answer, but using the extended unpacking features of Python 3:
In [8]: line = "GB Haddock West 22572"
In [9]: *fish, remainder = line.split()
In [10]: print(" ".join(fish))
GB Haddock West
In [11]: print(int(remainder))
22572

Python Error"TypeError: coercing to Unicode: need string or buffer, list found"

The purpose of this code is to make a program that searches a persons name (on Wikipedia, specifically) and uses keywords to come up with reasons why that person is significant.
I'm having issues with this specific line "if fact_amount < 5 and (terms in sentence.lower()):" because I get this error ("TypeError: coercing to Unicode: need string or buffer, list found")
If you could offer some guidance it would be greatly appreciated, thank you.
import requests
import nltk
import re
#You will need to install requests and nltk
terms = ['pronounced'
'was a significant'
'major/considerable influence'
'one of the (X) most important'
'major figure'
'earliest'
'known as'
'father of'
'best known for'
'was a major']
names = ["Nelson Mandela","Bill Gates","Steve Jobs","Lebron James"]
#List of people that you need to get info from
for name in names:
print name
print '==============='
#Goes to the wikipedia page of the person
r = requests.get('http://en.wikipedia.org/wiki/%s' % (name))
#Parses the raw html into text
raw = nltk.clean_html(r.text)
#Tries to split each sentence.
#sort of buggy though
#For example St. Mary will split after St.
sentences = re.split('[?!.][\s]*',raw)
fact_amount = 0
for sentence in sentences:
#I noticed that important things came after 'he was' and 'she was'
#Seems to work for my sample list
#Also there may be buggy sentences, so I return 5 instead of 3
if fact_amount < 5 and (terms in sentence.lower()):
#remove the reference notation that wikipedia has
#ex [ 33 ]
sentence = re.sub('[ [0-9]+ ]', '', sentence)
#removes newlines
sentence = re.sub('\n', '', sentence)
#removes trailing and leading whitespace
sentence = sentence.strip()
fact_amount += 1
#sentence is formatted. Print it out
print sentence + '.'
print
You should be checking it the other way
sentence.lower() in terms
terms is list and sentence.lower() is a string. You can check if a particular string is there in a list, but you cannot check if a list is there in a string.
you might mean if any(t in sentence_lower for t in terms), to check whether any terms from terms list is in the sentence string.

Categories