Split an element at whitespace in python 2 - python

I have a text dump of letters and numbers, and I want to filter out only valid credit card numbers (for class, I swear). I used
for item in content:
nums.append(re.sub('[^0-9]', ' ', item))
to take out everything that isn't a number, so I have a list of elements that are numbers with white space in the middle. If I don't turn the non-int characters into spaces, the numbers end up concatenated so the lengths are wrong. I want to split each element into a new element at the whitespace.
Here's a screenshot of part of the Sample output, since I can't copy it without python turning every group of multiple spaces into a single space: https://gyazo.com/4db8b8b78be428c6b9ad7e2c552454af
I want to make a new element every time there is one or more spaces. I tried:
for item in nums:
for char in item:
char.split()
and
for item in nums:
item.split()
but that ended up not changing anything.

split doesn't mutate the string but returns a list of strings instead. If you call it without storing the result as in your example it won't do anything good. Just store the result of split to new list:
>>> nums = ['1231 34 42 432', '12 345345 7686', '234234 45646 435']
>>> result = []
>>> for item in nums:
... result.extend(item.split())
...
>>> result
['1231', '34', '42', '432', '12', '345345', '7686', '234234', '45646', '435']
Alternatively you could use list comprehension to do the above on one line:
>>> [x for item in nums for x in item.split()]
['1231', '34', '42', '432', '12', '345345', '7686', '234234', '45646', '435']

Related

Why does adding an index to the end of str.split() method return the value of the index?

I am curious as to why adding [0] to the end of version.split('.') is able to return the value of the first element in the list?
My code:
version = '6.1.9'
version.split('.')[0]
Because version.split('.') returns the list ['6','1','9'] and '6' is the first element.
The x.split(".") splits your string on . character and returns a list which contains the elements. In your case the list will be ['6', '1', '9']. The list is indexed from 0. It means when you write your_list[0], you access to zero element of list which is 6 in your case. I have written a little example for you, perhaps it will be more understandable for you.
Code:
version = "6.1.9"
split_version = version.split(".")
print("List after split: {}".format(split_version))
print("Zero element: {}".format(split_version[0]))
Output:
>>> python3 test.py
List after split: ['6', '1', '9']
Zero element: 6

How to understand the result of list comprehension of nested lists when the order is reversed?

I'm trying to extract numbers that are mixed in sentences. I am doing this by splitting the sentence into elements of a list, and then I will iterate through each character of each element to find the numbers. For example:
String = "is2 Thi1s T4est 3a"
LP = String.split()
for e in LP:
for i in e:
if i in ('123456789'):
result += i
This can give me the result I want, which is ['2', '1', '4', '3']. Now I want to write this in list comprehension. After reading the List comprehension on a nested list?
post I understood that the right code shall be:
[i for e in LP for i in e if i in ('123456789') ]
My original code for the list comprehension approach was wrong, but I'm trying to wrap my heads around the result I get from it.
My original incorrect code, which reversed the order:
[i for i in e for e in LP if i in ('123456789') ]
The result I get from that is:
['3', '3', '3', '3']
Could anyone explain the process that leads to this result please?
Just reverse the same process you found in the other post. Nest the loops in the same order:
for i in e:
for e in LP:
if i in ('123456789'):
print(i)
The code requires both e and LP to be set beforehand, so the outcome you see depends entirely on other code run before your list comprehension.
If we presume that e was set to '3a' (the last element in LP from your code that ran full loopss), then for i in e will run twice, first with i set to '3'. We then get a nested loop, for e in LP, and given your output, LP is 4 elements long. So that iterates 4 times, and each iteration, i == '3' so the if test passes and '3' is added to the output. The next iteration of for i in e: sets i = 'a', the inner loop runs 4 times again, but not the if test fails.
However, we can't know for certain, because we don't know what code was run last in your environment that set e and LP to begin with.
I'm not sure why your original code uses str.split(), then iterates over all the characters of each word. Whitespace would never pass your if filter anyway, so you could just loop directly over the full String value. The if test can be replaced with a str.isdigit() test:
digits = [char for char in String if char.isdigit()]
or a even a regular expression:
digits = re.findall(r'\d', String)
and finally, if this is a reordering puzzle, you'd want to split out your strings into a number (for ordering) and the remainder (for joining); sort the words on the extracted number, and extract the remainder after sorting:
# to sort on numbers, extract the digits and turn to an integer
sortkey = lambda w: int(re.search(r'\d+', w).group())
# 'is2' -> 2, 'Th1s1' -> 1, etc.
# sort the words by sort key
reordered = sorted(String.split(), key=sortkey)
# -> ['Thi1s', 'is2', '3a', 'T4est']
# replace digits in the words and join again
rejoined = ' '.join(re.sub(r'\d+', '', w) for w in reordered)
# -> 'This is a Test'
From the question you asked in a comment ("how would you proceed to reorder the words using the list that we got as index?"):
We can use custom sorting to accomplish this. (Note that regex is not required, but makes it slightly simpler. Use any method to extract the number out of the string.)
import re
test_string = 'is2 Thi1s T4est 3a'
words = test_string.split()
words.sort(key=lambda s: int(re.search(r'\d+', s).group()))
print(words) # ['Thi1s', 'is2', '3a', 'T4est']
To remove the numbers:
words = [re.sub(r'\d', '', w) for w in words]
Final output is:
['This', 'is', 'a', 'Test']

How to input negative numbers into a list

I require some help since whenever I input a negative number the list it interprets it as a separate element so once it gets to sorting it puts all the negative symbols at the beginning. The end goal of the code is to sort 2 merged lists without using the default sort functions. Also if there is a better way to get rid of spaces in a list I would appreciate it, since at the moment I have to convert the list to a string and replace/strip the extra elements that the spaces cause.
list1 = list(input())
list2 = list(input())
mergelist = list1 + list2
print(mergelist)
def bubble_sort(X):
nums = list(X)
for i in range(len(X)):
for j in range(i+1, len(X)):
if X[j] < X[i]:
X[j], X[i] = X[i], X[j]
return X
mergelist = bubble_sort(mergelist)
strmergelist = str(mergelist)
strmergelist = strmergelist.replace("'", '')
strmergelist = strmergelist.replace(",", '')
strmergelist = strmergelist.strip('[]')
strmergelist = strmergelist.strip()
print(strmergelist)
The output for lists with no negatives is:
1 2 3 4 4 5 5
However with negatives it becomes:
- - - - 1 2 3 3 4 4 5
and my first print function to just check the merging of the lists looks like this when I input any negatives (ignore the spaces since I attempt to remove them later):
['1', ' ', '-', '2', ' ', '3', '3', ' ', '-', '4', ' ', '-', '4', ' ', '-', '5']
list() doesn't parse a string to a list of integers, it turns an iterable of items into a list of items.
To read a list from the console, try something like:
def read_list():
"""
read a list of integers from stdin
"""
return list(map(int, input().split()))
list1 = read_list()
list2 = read_list()
input.split() reads one line of user input and will separate it by whitespace - basically to words.
int() can convert a string to an integer.
map(int, ...) returns an iterable which applies int() to each "word" of the user input.
The final call to list() will turn the iterable to a list.
This should handle negative numbers as well.
Additionally, I see that you want to print the resulting list without extra character. I recommend this:
print(' '.join(mergelist))

Python issue with list and join function

How do I append two digit integer into a list using for loop without splitting them. For example I give the computer 10,14,13,15 and I get something like 1,0,1,4,1,3,1,5. I tried to go around this, but I ended up with a new issue, which is Type Error: sequence item 0: expected string, int found
def GetNumbers(List):
q=[]
Numberlist = []
for i in List:
if i.isdigit():
q.append(int(i))
else:
Numberlist.append(''.join(q[:]))
del q[:]
return Numberlist
Ideal way will be to use str.split() function as:
>>> my_num_string = "10,14,13,15"
>>> my_num_string.split(',')
['10', '14', '13', '15']
But, since you mentioned you can not use split(), you may use regex expression to extract numbers from string as:
>>> import re
>>> re.findall('\d+', my_num_string)
['10', '14', '13', '15']
Else, if you do not want to go with any fancy method, you may achieve it with simple for loop as:
num_str, num_list = '', []
# ^ Needed for storing the state of number while iterating over
# the string character by character
for c in my_num_string:
if c.isdigit():
num_str += c
else:
num_list.append(num_str)
num_str = ''
The numbers in num_list will be in the form of str. In order to convert them to int, you may explicitly convert them as:
num_list = [int(i) for i in num_list] # OR, list(map(int, num_list))

How to convert strings to ints in a nested list? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to convert strings into integers in python?
listy = [['1', '2', '3', '4', '5'], "abc"]
for item in listy[0]:
int(item)
print listy
In a nested list, how can I change all those strings to ints? What's above gives me an output of:
[['1', '2', '3', '4', '5'], 'abc']
Why is that?
Thanks in advance!
You need to assign the converted items back to the sub-list (listy[0]):
listy[0][:] = [int(x) for x in listy[0]]
Explanation:
for item in listy[0]:
int(item)
The above iterates over the items in the sub-list and converts them to integers, but it does not assign the result of the expression int(item) to anything. Therefore the result is lost.
[int(x) for x in listy[0]] is a list comprehension (kind of shorthand for your for loop) that iterates over the list, converting each item to an integer and returning a new list. The new list is then assigned back (in place, optional) to the outer list.
This is a very custom solution for your specific question. A more general solution involves recursion to get at the sub-lists, and some way of detecting the candidates for numeric conversion.

Categories