Extract 8-digit numbers from a list of strings

Extract 8-digit numbers from a list of strings - python

I have a list of strings which may contain letters, symbols, digits, etc, as below:
list = ['\n', '', '0', '38059', '', '', '?_', '71229366', '', '1', '38059', '', '', '?_', '87640804', '', '2', '38059', '', '', '?_', '71758011', '', '', ':?', ';__', '71229366287640804271758011287169822']
How do I filter out all other strings, except numbers less than 10000000 and greater than 99999999?
Expected Output:
list = ['71229366', '87640804', '71758011']

You can use a map and filter.
your_list = ['\n', '', '0', '38059', '', '', '?_', '71229366', '', '1', '38059',
'', '', '?_', '87640804', '', '2', '38059', '', '', '?_', '71758011',
'', '', ':?', ';__', '71229366287640804271758011287169822']
new_list = list(map(int, filter(lambda x: x.isdigit() and 10000000 < int(x) < 99999999, your_list)))
print(new_list)
list() optional on python2.
Output:
[71229366, 87640804, 71758011]
If you don't want the conversion to integer, drop the map:
>>> list(filter(lambda x: x.isdigit() and 10000000 < int(x) < 99999999, your_list))
['71229366', '87640804', '71758011']

If you don't mind making a new list, you can try something with just a list comprehension like
filtered_list = [i for i in list if i.isdigit() and 10000000 < int(i) < 99999999]

def valid(v):
try:
value = int(v)
return 10000000 <= value <= 99999999
except:
return False
output = [x for x in list if valid(x)]
Explanation:
Filter all values in the list using the valid function as your criteria.

data = ['\n', '', '0', '38059', '', '', '?_', '71229366', '', '1', '38059',
'', '', '?_', '87640804', '', '2', '38059', '', '', '?_', '71758011',
'', '', ':?', ';__', '71229366287640804271758011287169822']
res = []
for e in data:
try:
number = int(e)
except ValueError:
continue
if 10000000 < number < 99999999:
res.append(str(number))
print(res)
print(res)
Output:
['71229366', '87640804', '71758011']

Let me provide a simple and efficient answer, using regular expressions. There's no need to map (duplicating the original list), or to convert everything to ints; you are basically asking how to keep all 8-digit integers in your list:
>>> filter(re.compile('^\d{8}$').match, data)
['71229366', '87640804', '71758011']
We compile a regular expression which matches exactly 8 digits and then filter the list by providing a partial application of regex.match to the standard filter function.

Related

Get a element from list, from pair of 3

Hello I have a list in which elemnets are in pair of 3 list given below,
labels = ['', '', '5000','', '2', '','', '', '1000','mm-dd-yy', '', '','', '', '15','dd/mm/yy', '', '', '', '3', '','', '', '200','', '2', '','mm-dd-yy', '', '','', '', '','', '', '']
in above list elements are coming in pair of 3 i.e. ('', '', '5000') one pair, ('', '2', '') second pair, ('mm-dd-yy', '', '') third pair and so on.
now i want to check ever 3 pairs in list and get the element which is not blank.
('', '', '5000') gives '5000'
('', '2', '') gives '2'
('mm-dd-yy', '', '') gives 'mm-dd-yy'
and if all three are blank it should return blank i.e.
('', '', '') gives '' like last 2 pair in list
so from the above list my output should be:
required_list = ['5000','2','1000','mm-dd-yy','15','dd/mm/yy','3','200','2','mm-dd-yy','','']

as it is fixed you have to create 3 pairs each time you can do with for loop by specifying step in range(start,end,step)
labels = ['', '', '5000','', '2', '','', '', '1000','mm-dd-yy', '', '','', '', '15','dd/mm/yy', '', '', '', '3', '','', '', '200','', '2', '','mm-dd-yy', '', '','', '', '','', '', '']
res1=[]
for i in range(0,len(labels),3):
res1.append(labels[i]+labels[i+1]+labels[i+2])
print(res1)
#List Comprehension
res2=[labels[i]+labels[i+1]+labels[i+2] for i in range(0,len(labels),3)]
print(res2)
Output:
['5000', '2', '1000', 'mm-dd-yy', '15', 'dd/mm/yy', '3', '200', '2', 'mm-dd-yy', '', '']

I think this should give you the required result. Not ideal performance but gets the job done and should be pretty easy to follow
labels = ['', '', '5000','', '2', '','', '', '1000','mm-dd-yy', '', '','', '', '15','dd/mm/yy', '', '', '', '3', '','', '', '200','', '2', '','mm-dd-yy', '', '','', '', '','', '', '']
def chunks(ls):
chunks = []
start = 0
end = len(ls)
step = 3
for i in range(start, end, step):
chunks.append(ls[i:i+step])
return chunks
output = []
for chunk in chunks(labels):
nonEmptyItems = [s for s in chunk if len(s) > 0]
if len(nonEmptyItems) > 0:
output.append(nonEmptyItems[0])
else:
output.append('')
print(output)

All the previous answers laboriously create a new list of triplets, then iterate on that list of triplets.
There is no need to create this intermediate list.
def gen_nonempty_in_triplets(labels):
return [max(labels[i:i+3], key=len) for i in range(0, len(labels), 3)]
labels = ['', '', '5000','', '2', '','', '', '1000','mm-dd-yy', '', '','', '', '15','dd/mm/yy', '', '', '', '3', '','', '', '200','', '2', '','mm-dd-yy', '', '','', '', '','', '', '']
print(gen_nonempty_in_triplets(labels))
# ['5000', '2', '1000', 'mm-dd-yy', '15', 'dd/mm/yy', '3', '200', '2', 'mm-dd-yy', '', '']
Interestingly, there are many different ways to implement "get the element which is not blank".
I chose to use max(..., key=len) to select the longest string.
Almost every answer you received uses a different method!
Here are a few different methods that were suggested. They are equivalent when at most one element of the triplet is nonempty, but they behave differently if the triplet contains two or more nonempty elements.
# selects the longest string
max(labels[i:i+3], key=len)
# selects the first nonempty string
next((i for i in labels[i:i+3] if i), '')
# concatenates all three strings
labels[i]+labels[i+1]+labels[i+2]

Iterate over a 3 sliced list and then get the first non-null element with next.
labels = ['', '', '5000','', '2', '','', '', '1000','mm-dd-yy', '', '','', '', '15','dd/mm/yy', '', '', '', '3', '','', '', '200','', '2', '','mm-dd-yy', '', '','', '', '','', '', '']
length = len(labels)
list_by_3 = [labels[i:i+3] for i in range(0, length, 3)]
required_list = []
for triplet in list_by_3:
required_list.append(
next(i for i in triplet if i, "")
)
>>> required_list
['5000', '2', '1000', 'mm-dd-yy', '15', 'dd/mm/yy', '3', '200', '2', 'mm-dd-yy', '', '']

Removing Empty elements in a list and converting the remaining into floats

So, basically I have this list I got from the split() method:
mylist=['', '3', '', '', '7.00', '', '', '', '21.00']
and I want to remove the ' ' elements from my list and convert the remaining Strings into floats. Mind that the position or the number of ' ' elements may vary from String to String that I'm reading.

Use this:
mylist=['', '3', '', '', '7.00', '', '', '', '21.00']
clean_list = [float(i) for i in mylist if i !='']
print(clean_list)
[3.0, 7.0, 21.0]

result = [float(x) for x in mylist if x]

Updating a variable to the value of a string only returns the first character of the string

I'm trying to hard code the major ticks for a plot by creating an array which I will then attach to the x-axis of the graph. However, I can't get the array to come out correctly. I created an empty list xticks which I want to update every 5th value the correct value from major_ticks but the updated values are only the first characters of the values in major_ticks
{
length_x = 21
import numpy as np
xticks=np.full(length_x,'',dtype=str)
#print(xticks) returns ['' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '']
major_ticks=np.linspace(-10,10,5,dtype=int)
#print(major_ticks) returns [-10 -5 0 5 10]
i=0
for j in range(len(xticks)):
if j%5==0:
xticks[j]=str(major_ticks[i])
i+=1
print(xticks) #returns ['-' '' '' '' '' '-' '' '' '' '' '0' '' '' '' '' '5' '' '' '' '' '1']
}
please help me understand why this is happening, I've been banging my head against the wall for 3 hours now.

This happens because np.full doesn't generate an array of strings in the first place but an array of chars:
np.full(length_x,'',dtype=str).dtype
dtype('<U1')
Typically I wouldn't recommend to use numpy for string operations. Replacing xticks=np.full(length_x,'',dtype=str) with xticks = [''] * length_x will give you what you want.

I think there's something funky going on with your np.full declaration. Switching to using python lists will make it easier:
major_ticks=np.linspace(-10,10,5,dtype=int)
xticks = []
i=0
for j in range(length_x):
if j%5==0:
tick = str(major_ticks[i])
i += 1
else:
tick = ''
xticks.append(tick)
print(xticks)

In [129]: major_ticks=np.linspace(-10,10,5,dtype=int)
In [130]: major_ticks.shape
Out[130]: (5,)
In [133]: major_ticks
Out[133]: array([-10, -5, 0, 5, 10])
In [134]: major_ticks.astype(str)
Out[134]: array(['-10', '-5', '0', '5', '10'], dtype='<U21')
Making strings from major_ticks. 21 is bigger than needed, but who's counting?
In [135]: xticks=np.full(21,'',dtype='U21')
In [136]: xticks
Out[136]:
array(['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
'', '', '', ''], dtype='<U21')
In [138]: i=0
...: for j in range(len(xticks)):
...: if j%5==0:
...: xticks[j] = str(major_ticks[i])
...: i+=1
...:
...:
In [139]: xticks
Out[139]:
array(['-10', '', '', '', '', '-5', '', '', '', '', '0', '', '', '', '',
'5', '', '', '', '', '10'], dtype='<U21')
But we can fill the string array directly:
In [140]: xticks=np.full(21,'',dtype='U21')
In [141]: xticks[0::5] = major_ticks
In [142]: xticks
Out[142]:
array(['-10', '', '', '', '', '-5', '', '', '', '', '0', '', '', '', '',
'5', '', '', '', '', '10'], dtype='<U21')
The integers are converted to the string dtype as they are added to xticks.

Seems like dtype=str defaults to length limit of 1. Set dtype='U10' or 'S10' instead or something.

How to remove values from a list of list in Python

I have a list as
[['Name', 'Place', 'Batch'], ['11', '', 'BBI', '', '2015'], ['12', '', 'CCU', '', '', '', '', '2016'], ['13', '', '', 'BOM', '', '', '', '', '2017']]
I want to remove all the '' from the list.
The code I tried is :
>>> for line in list:
... if line == '':
... list.remove(line)
...
print(list)
The output that it shows is:
[['Name', 'Place', 'Batch'], ['11', '', 'BBI', '', '2015'], ['12', '', 'CCU', '', '', '', '', '2016'], ['13', '', '', 'BOM', '', '', '', '', '2017']]
Can someone suggest what's wrong with this ?

This is what you want:
a = [['Name', 'Place', 'Batch'], ['11', '', 'BBI', '', '2015'], ['12', '', 'CCU', '', '', '', '', '2016'], ['13', '', '', 'BOM', '', '', '', '', '2017']]
b = [[x for x in y if x != ''] for y in a] # kudos #Moses
print(b) # prints: [['Name', 'Place', 'Batch'], ['11', 'BBI', '2015'], ['12', 'CCU', '2016'], ['13', 'BOM', '2017']]
Your solution does not work because line becomes the entire sublist in every step and not the elements of the sublist.
Now it seems that you are trying to modify the original list in-place (without creating any additional variables). This is a noble cause but looping through a list that you are modifying as you are looping is not advisable and will probably lead to your code raising an error (IndexError or otherwise).
Lastly, do not use the name list since Python is using it internally and you are overwriting its intended usage.

You're running your test on the sublists not on the items they contain. And you'll need a nested for to do what you want. However removing items from a list with list.remove while iterating usually leads to unpredictable results.
You can however, use a list comprehension to filter out the empty strings:
r = [[i for i in x if i != ''] for x in lst]
# ^^ filter condition

Filter function doesn't work in python 2.7

For some reason I can't get the filter function to work.
I'm trying to remove empty strings from a list. After reading Remove empty strings from a list of strings, I'm trying to utilize the filter function.
import csv
import itertools
importfile = raw_input("Enter Filename(without extension): ")
importfile += '.csv'
test=[]
#imports plant names, effector names, plant numbers and leaf numbers from csv file
with open(importfile) as csvfile:
lijst = csv.reader(csvfile, delimiter=';', quotechar='|')
for row in itertools.islice(lijst, 0, 4):
test.append([row])
test1 = list(filter(None, test[3]))
print test1
This however returns:
[['leafs', '3', '', '', '', '', '', '', '', '', '', '']]
What am I doing wrong?

You have a list in a list, so filter(None, ...) is applied on the non-empty list, the empty strings are not affected. You can use, say a nested list comprehension to reach into the inner list and filter out falsy object:
lst = [['leafs', '3', '', '', '', '', '', '', '', '', '', '']]
test1 = [[x for x in i if x] for i in lst]
# [['leafs', '3']]

You filter a list of lists, where the inner item is a non-empty list.
>>> print filter(None, [['leafs', '3', '', '', '', '', '', '', '', '', '', '']])
[['leafs', '3', '', '', '', '', '', '', '', '', '', '']]
If you filter the inner list, the one that contains strings, everything works as expected:
>>> print filter(None, ['leafs', '3', '', '', '', '', '', '', '', '', '', ''])
['leafs', '3']

I was indeed filtering lists of lists, the problem in my code was:
for row in itertools.islice(lijst, 0, 4):
test.append[row]
This should be:
for row in itertools.islice(lijst, 0, 4):
test.append(row)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract 8-digit numbers from a list of strings - python

If you don't mind making a new list, you can try something with just a list comprehension like filtered_list = [i for i in list if i.isdigit() and 10000000 < int(i) < 99999999]

def valid(v): try: value = int(v) return 10000000 <= value <= 99999999 except: return False output = [x for x in list if valid(x)] Explanation: Filter all values in the list using the valid function as your criteria.

Related

Get a element from list, from pair of 3

Removing Empty elements in a list and converting the remaining into floats

Updating a variable to the value of a string only returns the first character of the string

How to remove values from a list of list in Python

Filter function doesn't work in python 2.7

Categories

Resources