String tranformation into Clean list of int - python

I ideally want to turn this 100020630 into [100,020,630]
but so far i can only do this "100.020.630" into ["100","020","630"]
def fulltotriple(x):
X=x.split(".")
return X
print(fulltotriple("192.123.010"))
for some additionnal info my goal is no turn ip adresses into bin adresses using this as a first step =)
edit: i have not found any way of getting the list WITHOUT the " " in the list on stack overflow

Here's one approach using a list comprehension:
s = '100020630'
[s[i:i + 3] for i in range(0, len(s), 3)]
# ['100', '020', '630']

If you want to handle IP addresses, you are doing it totally wrong.
IP address is a 24-binary digit number, not a 9-decimal digit. It is splitted for 4 sub-blocks, like: 192.168.0.1. BUT. In decimal view they all can be 3-digit, or 2-digit, or any else combination. I recommend you to use ipaddress standard module:
import ipaddress
a = '192.168.0.1'
ip = ipaddress.ip_address(a)
ip.packed
will return you the packed binary format:
b'\xc0\xa8\x00\x01'
If you want to convert your IPv4 to binary format, you can use this command:
''.join(bin(i)[2:] for i in ip.packed)
It will return you this string:
'110000001010100001'

You could use the built-in wrap function:
In [3]: s = "100020630"
In [4]: import textwrap
In [6]: textwrap.wrap(s, 3)
Out[6]: ['100', '020', '630']
Wraps the single paragraph in text (a string) so every line is at most width characters long. Returns a list of output lines, without final newlines.
If you want a list of ints:
[int(num) for num in textwrap.wrap(s, 3)]
Outputs:
[100, 020, 630]

You could use wrap which is a inbuilt function in python
from textwrap import wrap
def fulltotriple(x):
x = wrap(x, 3)
return x
print(fulltotriple("100020630"))
Outputs:
['100', '020', '630']

You can use python built-ins for this:
text = '100020630'
# using wrap
from textwrap import wrap
wrap(text, 3)
>>> ['100', '020', '630']
# using map/zip
map(''.join, zip(*[iter(text)]*3))
>>> ['100', '020', '630']

Use regex to find all matches of triplets \d{3}
import re
str = "100020630"
def fulltotriple(x):
pattern = re.compile(r"\d{3}")
return [int(found_match) for found_match in pattern.findall(x)]
print(fulltotriple(str))
Outputting:
[100, 20, 630]

def fulltotriple(data):
result = []
for i in range(0, len(data), 3):
result.append(int(data[i:i + 3]))
return (result)
print(fulltotriple("192123010"))
output:
[192, 123, 10]

Related

How can I check a string for two letters or more?

I am pulling data from a table that changes often using Python - and the method I am using is not ideal. What I would like to have is a method to pull all strings that contain only one letter and leave out anything that is 2 or more.
An example of data I might get:
115
19A6
HYS8
568
In this example, I would like to pull 115, 19A6, and 568.
Currently I am using the isdigit() method to determine if it is a digit and this filters out all numbers with one letter, which works for some purposes, but is less than ideal.
Try this:
string_list = ["115", "19A6", "HYS8", "568"]
output_list = []
for item in string_list: # goes through the string list
letter_counter = 0
for letter in item: # goes through the letters of one string
if not letter.isdigit(): # checks if the letter is a digt
letter_counter += 1
if letter_counter < 2: # if the string has more then 1 letter it wont be in output list
output_list.append(item)
print(output_list)
Output:
['115', '19A6', '568']
Here is a one-liner with a regular expression:
import re
data = ["115", "19A6", "HYS8", "568"]
out = [string for string in data if len(re.sub("\d", "", string))<2]
print(out)
Output:
['115', '19A6', '568']
This is an excellent case for regular expressions (regex), which is available as the built-in re library.
The code below follows the logic:
Define the dataset. Two examples have been added to show that a string containing two alpha-characters is rejected.
Compile a character pattern to be matched. In this case, zero or more digits, followed by zero or one upper case letter, ending with zero of more digits.
Use the filter function to detect matches in the data list and output as a list.
For example:
import re
data = ['115', '19A6', 'HYS8', '568', 'H', 'HI']
rexp = re.compile('^\d*[A-Z]{0,1}\d*$')
result = list(filter(rexp.match, data))
print(result)
Output:
['115', '19A6', '568', 'H']
Another solution, without re using str.maketrans/str.translate:
lst = ["115", "19A6", "HYS8", "568"]
d = str.maketrans(dict.fromkeys(map(str, range(10)), ""))
out = [i for i in lst if len(i.translate(d)) < 2]
print(out)
Prints:
['115', '19A6', '568']
z=False
a = str(a)
for I in range(len(a)):
if a[I].isdigit():
z = True
break
else:
z="no digit"
print(z)```

isolate data from long string [duplicate]

Suppose I had a string
string1 = "498results should get"
Now I need to get only integer values from the string like 498. Here I don't want to use list slicing because the integer values may increase like these examples:
string2 = "49867results should get"
string3 = "497543results should get"
So I want to get only integer values out from the string exactly in the same order. I mean like 498,49867,497543 from string1,string2,string3 respectively.
Can anyone let me know how to do this in a one or two lines?
>>> import re
>>> string1 = "498results should get"
>>> int(re.search(r'\d+', string1).group())
498
If there are multiple integers in the string:
>>> map(int, re.findall(r'\d+', string1))
[498]
An answer taken from ChristopheD here: https://stackoverflow.com/a/2500023/1225603
r = "456results string789"
s = ''.join(x for x in r if x.isdigit())
print int(s)
456789
Here's your one-liner, without using any regular expressions, which can get expensive at times:
>>> ''.join(filter(str.isdigit, "1234GAgade5312djdl0"))
returns:
'123453120'
if you have multiple sets of numbers then this is another option
>>> import re
>>> print(re.findall('\d+', 'xyz123abc456def789'))
['123', '456', '789']
its no good for floating point number strings though.
Iterator version
>>> import re
>>> string1 = "498results should get"
>>> [int(x.group()) for x in re.finditer(r'\d+', string1)]
[498]
>>> import itertools
>>> int(''.join(itertools.takewhile(lambda s: s.isdigit(), string1)))
With python 3.6, these two lines return a list (may be empty)
>>[int(x) for x in re.findall('\d+', your_string)]
Similar to
>>list(map(int, re.findall('\d+', your_string))
this approach uses list comprehension, just pass the string as argument to the function and it will return a list of integers in that string.
def getIntegers(string):
numbers = [int(x) for x in string.split() if x.isnumeric()]
return numbers
Like this
print(getIntegers('this text contains some numbers like 3 5 and 7'))
Output
[3, 5, 7]
def function(string):
final = ''
for i in string:
try:
final += str(int(i))
except ValueError:
return int(final)
print(function("4983results should get"))
Another option is to remove the trailing the letters using rstrip and string.ascii_lowercase (to get the letters):
import string
out = [int(s.replace(' ','').rstrip(string.ascii_lowercase)) for s in strings]
Output:
[498, 49867, 497543]
integerstring=""
string1 = "498results should get"
for i in string1:
if i.isdigit()==True
integerstring=integerstring+i
print(integerstring)

Python split string every n char

I want to split a string every n char and the print must be like that:
MISSISSIPPI => MI*SS*IS*SI*PP*I
I've done a program but I don't know how to change the , with a *. Here is the code:
n=input('chunk size')
s=input('Add word')
import re
r=[s[i:i+n] for i in range(0, len(s), n)]
print (r)
This is the output:
['MI', 'SS', 'IS', 'SI', 'PP', 'I']
but I want it to be like this:
MI*SS*IS*SI*PP*I
You could use str.join() for this:
>>> '*'.join(r)
'MI*SS*IS*SI*PP*I'
What this does is iterate over the strings in r, and join them, inserting '*'.
you could also use re module:
import re
r = '*'.join(re.findall('..|.$', s))
Output:
'MI*SS*IS*SI*PP*I'
Well at the point that you're at, you just have 1 more line to add:
r = '*'.join(r)
So then your program becomes
n=input('chunk size')
s=input('Add word')
import re
r=[s[i:i+n] for i in range(0,len(s),n)]
r = '*'.join(r)
print (r)
Unpack it and then use a custom separator:
>>> print(*r, sep='*')
MI*SS*IS*SI*PI
If you want the brackets in the output, use string formatting instead.
>>> print('[{}]'.format('*'.join(r)))
[MI*SS*IS*SI*PI]
We can use split and join methods of string data structure.
x = 'MI*SS*IS*SI*PP*I'
xlist = x.split('*')
'*'.join(xlist)

In Python, how should one extract the second-last directory name in a path?

I have a string like the following:
/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore
How should I extract the "2.0.24" from this string? I'm not sure how to split the string using the slashes (in order to extract the second last element of the resultant list) and I'm not sure if this would be a good approach. What I have right now is the following:
"/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore".split("/RootCore")[0].split("AnalysisTop/")[1]
You can also do:
import os
x = "/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore"
os.path.split(os.path.split(x)[0])[1]
results in
'2.0.24'
'/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore'.split('/')[-2]
cross platform solution:
import os
'your/path'.split(os.path.sep)[-2]
Just split according to the / symbol then print the second index from the last.
>>> x = "/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore"
>>> y = x.split('/')
>>> y[-2]
'2.0.24'
path = "/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore"
path_dirs = path.split("/")
>>>> path_dirs
>>>> ['', 'cvmfs', 'atlas.cern.ch', 'repo', 'sw', 'ASG', 'AnalysisTop', '2.0.24', 'RootCore']
>>>> print path_dirs[-2]
>>>> '2.0.24'
import re
str1 = "/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore"
t = re.findall("[0-9][.]*",str1)
print ("".join(t))
You can use regex-findall method. t returns a list, so using join().
Output;
>>>
2.0.24
>>>
# print (t)
>>>
['2.', '0.', '2', '4']
>>>

Python idiom: List comprehension with limit of items

I'm basically trying to do this (pseudo code, not valid python):
limit = 10
results = [xml_to_dict(artist) for artist in xml.findall('artist') while limit--]
So how could I code this in a concise and efficient way?
The XML file can contain anything between 0 and 50 artists, and I can't control how many to get at a time, and AFAIK, there's no XPATH expression to say something like "get me up to 10 nodes".
Thanks!
Are you using lxml? You could use XPath to limit the items in the query level, e.g.
>>> from lxml import etree
>>> from io import StringIO
>>> xml = etree.parse(StringIO('<foo><bar>1</bar><bar>2</bar><bar>4</bar><bar>8</bar></foo>'))
>>> [bar.text for bar in xml.xpath('bar[position()<=3]')]
['1', '2', '4']
You could also use itertools.islice to limit any iterable, e.g.
>>> from itertools import islice
>>> [bar.text for bar in islice(xml.iterfind('bar'), 3)]
['1', '2', '4']
>>> [bar.text for bar in islice(xml.iterfind('bar'), 5)]
['1', '2', '4', '8']
Assuming that xml is an ElementTree object, the findall() method returns a list, so just slice that list:
limit = 10
limited_artists = xml.findall('artist')[:limit]
results = [xml_to_dict(artist) for artist in limited_artists]
For everyone else who found this question because they were trying to limit items returned from an infinite generator:
from itertools import takewhile
ltd = takewhile(lambda x: x[0] < MY_LIMIT, enumerate( MY_INFINITE_GENERATOR ))
# ^ This is still an iterator.
# If you want to materialize the items, e.g. in a list, do:
ltd_m = list( ltd )
# If you don't want the enumeration indices, you can strip them as usual:
ltd_no_enum = [ v for i,v in ltd_m ]
EDIT: Actually, islice is a much better option.
limit = 10
limited_artists = [artist in xml.findall('artist')][:limit]
results = [xml_to_dict(artist) for limited_artists]
This avoids the issues of slicing: it doesn't change the order of operations, and doesn't construct a new list, which can matter for large lists if you're filtering the list comprehension.
def first(it, count):
it = iter(it)
for i in xrange(0, count):
yield next(it)
raise StopIteration
print [i for i in first(range(1000), 5)]
It also works properly with generator expressions, where slicing will fall over due to memory use:
exp = (i for i in first(xrange(1000000000), 10000000))
for i in exp:
print i

Categories