Function to convert a string to a tuple - python

Here is an example of what data I will have:
472747372 42 Lawyer John Legend Bishop
I want to be able to take a string like that and using a function convert it into a tuple so that it will be split like so:
"472747372" "42" "Lawyer" "John Legend" "Bishop"
NI Number, Age, Job, surname and other names

What about:
>>> string = "472747372 42 Lawyer John Legend Bishop"
>>> string.split()[:3] + [' '.join(string.split()[3:5])] + [string.split()[-1]]
['472747372', '42', 'Lawyer', 'John Legend', 'Bishop']
Or:
>>> string.split(maxsplit=3)[:-1] + string.split(maxsplit=3)[-1].rsplit(maxsplit=1)
['472747372', '42', 'Lawyer', 'John Legend', 'Bishop']

In python, str has a built-in method called split which will split the string into a list, splitting on whatever character you pass it. It's default is to split on whitespace, so you can simply do:
my_string = '472747372 42 Lawyer Hermin Shoop Tator'
tuple(my_string.split())
EDIT: After OP changed the post.
Assuming there will always be an NI Number, Age, Job, and surname, you would have to do:
elems = my_string.split()
tuple(elems[:3] + [' '.join(elems[3:5])] + elems[5:])
This will allow you to support an arbitrary number of "other" names after the surname

Related

Too many values to unpack (expected 2) while splitting string

I am looking to split strings at "(", this is working fine if there is only one "(" character in the string. However, if there are more than one such character, it throws a value error too many values to unpack
data = 'The National Bank (US) (Bank)'
I've tried the below code:
name, inst = data.split("(")
Desired output:
name = 'The National Bank (US)'
inst = '(Bank)'
Your split method is splitting the input on both ( characters, giving you the result:
["The National Bank ", "US) ", "Bank)"]
You are then attempting to unpack this list of three values into two variables, name and inst. This is what the error "Too many values to unpack" means.
You can restrict the number of splits to be made using the second parameter to split, but this will give you the wrong result as well.
You actually want to split from the right of the string, on the first space character. You can do that with rsplit:
data = 'The National Bank (US) (Bank)'
name, inst = data.rsplit(' ', 1)
name and inst will now be set as you expect.
this is expected behavior of this function. When you split string with n separators, you get n+1 strings in return
e.g.
l = '1,2,3,4'.split(',')
print(l)
print(type(l), len(l))
You can use the rsplit with the maxsplit parameter like this, although you have to append the leading ( to your inst string:
>>> name, inst = data.rsplit("(", maxsplit=1)
>>> name
'The National Bank (US) '
>>> inst
'Bank)'
You may be able to get a little cleaner results by doing the same thing but passing a blank space as the delimiter:
>>> name, inst = data.rsplit(" ", maxsplit=1)
>>> name
'The National Bank (US)'
>>> inst
'(Bank)'

Printing a formatted list [duplicate]

This question already has answers here:
Joining words together with a comma, and "and"
(8 answers)
Closed 3 years ago.
name1=input("Give 1st name: ")
name2=input("Give 2nd name: ")
name3=input("Give 3rd name: ")
names = [name1, name2, name3]
names = list(set(names))
names.sort()
print("names in alphabetical order: {}, {} and {}".format(*nimet))
My code gives error: tuple index out of range when 2 names are same.
When inputs are like Ava, Benjamin and Charlie, I want the output to be like:
names in alphabetical order: Ava, Benjamin and Charlie
but when inputs are like Ava, Benjamin, Ava, I want the output to be like:
names in alphabetical order: Ava and Benjamin
the number of arguments of .format() must match the number of {}-s in the string you are formatting.
You can deal with this before the .format():
name1=input("Give 1st name: ")
name2=input("Give 2nd name: ")
name3=input("Give 3rd name: ")
names = [name1, name2, name3]
names = list(set(names))
names.sort()
first_names = ', '.join(names[:-1])
print("names in alphabetical order: {} and {}".format(first_names, names[-1]))
You can use ', '.join to join the names together with commas in between. But you also want "and" between the last two names instead of a comma. One solution is to join the last two names with "and" first, and then join the rest with commas.
def join_names(names):
names = sorted(set(names))
if len(names) > 1:
last, second_last = names.pop(), names.pop()
names.append(second_last + ' and ' + last)
return ', '.join(names)
Examples:
>>> join_names(['Alice', 'Bob', 'Clive'])
'Alice, Bob and Clive'
>>> join_names(['Clive', 'Bob', 'Clive'])
'Bob and Clive'
>>> join_names(['Alice', 'Alice', 'Alice'])
'Alice'
>>> join_names(['John', 'Paul', 'George', 'Ringo'])
'George, John, Paul and Ringo'
You have exactly three placeholders ({}) in your format string and if the names tuple has less than three elements, this will cause an error, as there are not enough values to format the string.
Instead of pluggin in the whole tuple, try this:
print('names in alphabetical order: ' + ', '.join(names))
Set makes elements unique(removes duplicate). The error Tuple index out of range which you are getting is because in your print line you are expecting 3 elements but your tuple has just two because the duplicate element has been removed. So the error says tuple index out of range. So in such scenarios if you want to use set and don't know the number of elements in a tuple, the best bet would be to use joins.
[Edit with code snippet which would work]:
name1=input("Give 1st name: ")
name2=input("Give 2nd name: ")
name3=input("Give 3rd name: ")
names = [name1, name2, name3]
names = list(set(names))
names.sort()
print("names in alphabetical order are:"+ ",".join(str(e) for e in names))

Python - Easiest way to get whole list object in a row

l = [['John'],['rides'],['bike'],['with'],['Mr.','Brown'],]
Assume that i have a list object like that. How can print my list objects in a row without using for statement and append to a string ?
My desired output:
print("my sentence:" + ? )
print("people:" + ? + ", " + ?)
my sentence: John rides bike with Mr. Brown
people: John, Mr. Brown
MAJOR EDIT:
The reason why i create this question is i analyze text based documentation and extract information with named entity so my problem is when i grouo sequential proper nouns which are "name + surname" or " Blah Blah Organization" in list like that :
[[u'Milletvekili', u'Ay\u015fe'], [u'oraya'], [u'Merve', u'Han\u0131m'], [u'ile'], [u'gitti'], [u'.']]
I have an algorithm which compare list objects with my datasets and describes the entity type. After the if statement decided [u'Milletvekili', u'Ay\u015fe'] is a person name ;
if gettype(arr[i][0].split("'")[0]) == "Özel İsim":
newarr.append("[Person: " + str(arr[i][0]) + "]")
i append that in my new list which will be my in my output like [Person: Mr. Brown]. But i must group Capitalized words then i always have my output like that:
[Person: [u'Milletvekili', u'Ay\u015fe']]
Not quite elegant as the "sum", but alternatively you can use str.join() and map() with functools.partial():
>>> from functools import partial
>>> " ".join(map(partial(str.join, " "), l))
'John rides bike with Mr. Brown'
Borrowing from this answer:
" ".join(sum(l, []))
Out[130]: 'John rides bike with Mr. Brown'
Flatten the list, then join with ' '.
>>> ' '.join(s for sub in l for s in sub)
'John rides bike with Mr. Brown'
Or with itertools.chain:
>>> from itertools import chain
>>> ' '.join(chain(*l))
'John rides bike with Mr. Brown'

Splitting a string of names

I'm trying to write a program that will ask a user to input several names, separated by a semi-colon. The names would be entered as lastname,firstname. The program would then print each name in a firstname lastname format on separate lines. So far, my program is:
def main():
names=input("Please enter your list of names: ")
person=names.split(";")
xname=person.split(",")
This is as far as I got,because there's an error when I try to split on the comma. What am I doing wrong? The output should look like this:
Please enter your list of names: Falcon, Claudio; Ford, Eric; Owen, Megan; Rogers, Josh; St. John, Katherine
You entered:
Claudio Falcon
Eric Ford
Megan Owen
Josh Rogers
Katherine St. John
.split is a string method that returns a list of strings. So it works fine on splitting the original string of names, but you can't call it on the resulting list (list doesn't have a .split method, and that really wouldn't make sense). So you need to call .split on each of the strings in the list. And to be neat, you should clean up any leading or trailing spaces on the names. Like this:
names = "Falcon, Claudio; Ford, Eric; Owen, Megan; Rogers, Josh; St. John, Katherine"
for name in names.split(';'):
last, first = name.split(',')
print(first.strip(), last.strip())
output
Claudio Falcon
Eric Ford
Megan Owen
Josh Rogers
Katherine St. John
.split returns a list, so you are attempting
["Falcon, Claudio", "Ford, Eric" ...].split(',')
Which obviously doesn't work, as split is a string method. Try this:
full_names = []
for name in names.split("; "):
last, first = name.split(', ')
full_names.append(first + " " + last)
To give you
['Claudio Falcon', 'Eric Ford', 'Megan Owen', 'Josh Rogers', 'Katherine St. John']
You are splitting the whole list instead of each string. Change it to this:
def main():
names=input("Please enter your list of names: ")
person=names.split("; ")
xname=[x.split(", ") for x in person]
To print it out, do this:
print("\n".join([" ".join(x[::-1]) for x in xname]))
You can use the following code:
names = raw_input("Please enter your list of names:")
data = names.split(";")
data will return you list so process that list to get first name and last name
f_names=[]
for i in data:
l_name,f_name= i.split(",")
f_names.append(f_name+" "+l_name)
print "you entered \n"+ '\n'.join(p for p in f_names)
So this way you can print desired input

What is efficient way to match words in string?

Example:
names = ['James John', 'Robert David', 'Paul' ... the list has 5K items]
text1 = 'I saw James today'
text2 = 'I saw James John today'
text3 = 'I met Paul'
is_name_in_text(text1,names) # this returns false 'James' in not in list
is_name_in_text(text2,names) # this returns 'James John'
is_name_in_text(text3,names) # this return 'Paul'
is_name_in_text() searches if any of the name list is in text.
The easy way to do is to just check if the name is in the list by using in operator, but the list has 5,000 items, so it is not efficient. I can just split the text into words and check if the words are in the list, but this not going to work if you have more than one word matching. Line number 7 will fail in this case.
Make names into a set and use the in-operator for fast O(1) lookup.
You can use a regex to parse out the possible names in a sentence:
>>> import re
>>> findnames = re.compile(r'([A-Z]\w*(?:\s[A-Z]\w*)?)')
>>> def is_name_in_text(text, names):
for possible_name in set(findnames.findall(text)):
if possible_name in names:
return possible_name
return False
>>> names = set(['James John', 'Robert David', 'Paul'])
>>> is_name_in_text('I saw James today', names)
False
>>> is_name_in_text('I saw James John today', names)
'James John'
>>> is_name_in_text('I met Paul', names)
'Paul'
Build a regular expression with all the alternatives. This way you don't have to worry about somehow pulling the names out of the phrases beforehand.
import re
names_re = re.compile(r'\b' +
r'\b|\b'.join(re.escape(name) for name in names) +
r'\b')
print names_re.search('I saw James today')
You may use Python's set in order to get good performance while using the in operator.
If you have a mechanism of pulling the names out of the phrases and don't need to worry about partial matches (the full name will always be in the string), you can use a set rather than a list.
Your code is exactly the same, with this addition at line 2:
names = set(names)
The in operation will now function much faster.

Categories