I want to split a string every n char and the print must be like that:
MISSISSIPPI => MI*SS*IS*SI*PP*I
I've done a program but I don't know how to change the , with a *. Here is the code:
n=input('chunk size')
s=input('Add word')
import re
r=[s[i:i+n] for i in range(0, len(s), n)]
print (r)
This is the output:
['MI', 'SS', 'IS', 'SI', 'PP', 'I']
but I want it to be like this:
MI*SS*IS*SI*PP*I
You could use str.join() for this:
>>> '*'.join(r)
'MI*SS*IS*SI*PP*I'
What this does is iterate over the strings in r, and join them, inserting '*'.
you could also use re module:
import re
r = '*'.join(re.findall('..|.$', s))
Output:
'MI*SS*IS*SI*PP*I'
Well at the point that you're at, you just have 1 more line to add:
r = '*'.join(r)
So then your program becomes
n=input('chunk size')
s=input('Add word')
import re
r=[s[i:i+n] for i in range(0,len(s),n)]
r = '*'.join(r)
print (r)
Unpack it and then use a custom separator:
>>> print(*r, sep='*')
MI*SS*IS*SI*PI
If you want the brackets in the output, use string formatting instead.
>>> print('[{}]'.format('*'.join(r)))
[MI*SS*IS*SI*PI]
We can use split and join methods of string data structure.
x = 'MI*SS*IS*SI*PP*I'
xlist = x.split('*')
'*'.join(xlist)
Related
I am pulling data from a table that changes often using Python - and the method I am using is not ideal. What I would like to have is a method to pull all strings that contain only one letter and leave out anything that is 2 or more.
An example of data I might get:
115
19A6
HYS8
568
In this example, I would like to pull 115, 19A6, and 568.
Currently I am using the isdigit() method to determine if it is a digit and this filters out all numbers with one letter, which works for some purposes, but is less than ideal.
Try this:
string_list = ["115", "19A6", "HYS8", "568"]
output_list = []
for item in string_list: # goes through the string list
letter_counter = 0
for letter in item: # goes through the letters of one string
if not letter.isdigit(): # checks if the letter is a digt
letter_counter += 1
if letter_counter < 2: # if the string has more then 1 letter it wont be in output list
output_list.append(item)
print(output_list)
Output:
['115', '19A6', '568']
Here is a one-liner with a regular expression:
import re
data = ["115", "19A6", "HYS8", "568"]
out = [string for string in data if len(re.sub("\d", "", string))<2]
print(out)
Output:
['115', '19A6', '568']
This is an excellent case for regular expressions (regex), which is available as the built-in re library.
The code below follows the logic:
Define the dataset. Two examples have been added to show that a string containing two alpha-characters is rejected.
Compile a character pattern to be matched. In this case, zero or more digits, followed by zero or one upper case letter, ending with zero of more digits.
Use the filter function to detect matches in the data list and output as a list.
For example:
import re
data = ['115', '19A6', 'HYS8', '568', 'H', 'HI']
rexp = re.compile('^\d*[A-Z]{0,1}\d*$')
result = list(filter(rexp.match, data))
print(result)
Output:
['115', '19A6', '568', 'H']
Another solution, without re using str.maketrans/str.translate:
lst = ["115", "19A6", "HYS8", "568"]
d = str.maketrans(dict.fromkeys(map(str, range(10)), ""))
out = [i for i in lst if len(i.translate(d)) < 2]
print(out)
Prints:
['115', '19A6', '568']
z=False
a = str(a)
for I in range(len(a)):
if a[I].isdigit():
z = True
break
else:
z="no digit"
print(z)```
Suppose I had a string
string1 = "498results should get"
Now I need to get only integer values from the string like 498. Here I don't want to use list slicing because the integer values may increase like these examples:
string2 = "49867results should get"
string3 = "497543results should get"
So I want to get only integer values out from the string exactly in the same order. I mean like 498,49867,497543 from string1,string2,string3 respectively.
Can anyone let me know how to do this in a one or two lines?
>>> import re
>>> string1 = "498results should get"
>>> int(re.search(r'\d+', string1).group())
498
If there are multiple integers in the string:
>>> map(int, re.findall(r'\d+', string1))
[498]
An answer taken from ChristopheD here: https://stackoverflow.com/a/2500023/1225603
r = "456results string789"
s = ''.join(x for x in r if x.isdigit())
print int(s)
456789
Here's your one-liner, without using any regular expressions, which can get expensive at times:
>>> ''.join(filter(str.isdigit, "1234GAgade5312djdl0"))
returns:
'123453120'
if you have multiple sets of numbers then this is another option
>>> import re
>>> print(re.findall('\d+', 'xyz123abc456def789'))
['123', '456', '789']
its no good for floating point number strings though.
Iterator version
>>> import re
>>> string1 = "498results should get"
>>> [int(x.group()) for x in re.finditer(r'\d+', string1)]
[498]
>>> import itertools
>>> int(''.join(itertools.takewhile(lambda s: s.isdigit(), string1)))
With python 3.6, these two lines return a list (may be empty)
>>[int(x) for x in re.findall('\d+', your_string)]
Similar to
>>list(map(int, re.findall('\d+', your_string))
this approach uses list comprehension, just pass the string as argument to the function and it will return a list of integers in that string.
def getIntegers(string):
numbers = [int(x) for x in string.split() if x.isnumeric()]
return numbers
Like this
print(getIntegers('this text contains some numbers like 3 5 and 7'))
Output
[3, 5, 7]
def function(string):
final = ''
for i in string:
try:
final += str(int(i))
except ValueError:
return int(final)
print(function("4983results should get"))
Another option is to remove the trailing the letters using rstrip and string.ascii_lowercase (to get the letters):
import string
out = [int(s.replace(' ','').rstrip(string.ascii_lowercase)) for s in strings]
Output:
[498, 49867, 497543]
integerstring=""
string1 = "498results should get"
for i in string1:
if i.isdigit()==True
integerstring=integerstring+i
print(integerstring)
I ideally want to turn this 100020630 into [100,020,630]
but so far i can only do this "100.020.630" into ["100","020","630"]
def fulltotriple(x):
X=x.split(".")
return X
print(fulltotriple("192.123.010"))
for some additionnal info my goal is no turn ip adresses into bin adresses using this as a first step =)
edit: i have not found any way of getting the list WITHOUT the " " in the list on stack overflow
Here's one approach using a list comprehension:
s = '100020630'
[s[i:i + 3] for i in range(0, len(s), 3)]
# ['100', '020', '630']
If you want to handle IP addresses, you are doing it totally wrong.
IP address is a 24-binary digit number, not a 9-decimal digit. It is splitted for 4 sub-blocks, like: 192.168.0.1. BUT. In decimal view they all can be 3-digit, or 2-digit, or any else combination. I recommend you to use ipaddress standard module:
import ipaddress
a = '192.168.0.1'
ip = ipaddress.ip_address(a)
ip.packed
will return you the packed binary format:
b'\xc0\xa8\x00\x01'
If you want to convert your IPv4 to binary format, you can use this command:
''.join(bin(i)[2:] for i in ip.packed)
It will return you this string:
'110000001010100001'
You could use the built-in wrap function:
In [3]: s = "100020630"
In [4]: import textwrap
In [6]: textwrap.wrap(s, 3)
Out[6]: ['100', '020', '630']
Wraps the single paragraph in text (a string) so every line is at most width characters long. Returns a list of output lines, without final newlines.
If you want a list of ints:
[int(num) for num in textwrap.wrap(s, 3)]
Outputs:
[100, 020, 630]
You could use wrap which is a inbuilt function in python
from textwrap import wrap
def fulltotriple(x):
x = wrap(x, 3)
return x
print(fulltotriple("100020630"))
Outputs:
['100', '020', '630']
You can use python built-ins for this:
text = '100020630'
# using wrap
from textwrap import wrap
wrap(text, 3)
>>> ['100', '020', '630']
# using map/zip
map(''.join, zip(*[iter(text)]*3))
>>> ['100', '020', '630']
Use regex to find all matches of triplets \d{3}
import re
str = "100020630"
def fulltotriple(x):
pattern = re.compile(r"\d{3}")
return [int(found_match) for found_match in pattern.findall(x)]
print(fulltotriple(str))
Outputting:
[100, 20, 630]
def fulltotriple(data):
result = []
for i in range(0, len(data), 3):
result.append(int(data[i:i + 3]))
return (result)
print(fulltotriple("192123010"))
output:
[192, 123, 10]
I have a string like the following:
/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore
How should I extract the "2.0.24" from this string? I'm not sure how to split the string using the slashes (in order to extract the second last element of the resultant list) and I'm not sure if this would be a good approach. What I have right now is the following:
"/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore".split("/RootCore")[0].split("AnalysisTop/")[1]
You can also do:
import os
x = "/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore"
os.path.split(os.path.split(x)[0])[1]
results in
'2.0.24'
'/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore'.split('/')[-2]
cross platform solution:
import os
'your/path'.split(os.path.sep)[-2]
Just split according to the / symbol then print the second index from the last.
>>> x = "/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore"
>>> y = x.split('/')
>>> y[-2]
'2.0.24'
path = "/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore"
path_dirs = path.split("/")
>>>> path_dirs
>>>> ['', 'cvmfs', 'atlas.cern.ch', 'repo', 'sw', 'ASG', 'AnalysisTop', '2.0.24', 'RootCore']
>>>> print path_dirs[-2]
>>>> '2.0.24'
import re
str1 = "/cvmfs/atlas.cern.ch/repo/sw/ASG/AnalysisTop/2.0.24/RootCore"
t = re.findall("[0-9][.]*",str1)
print ("".join(t))
You can use regex-findall method. t returns a list, so using join().
Output;
>>>
2.0.24
>>>
# print (t)
>>>
['2.', '0.', '2', '4']
>>>
Is there a way to split a string in Python using multiple delimiters instead of one? split seems to take in only one parameter as delimiter.
Also, I cannot import the re module. (This is the main stumbling block really.)
Any suggestions on how I should do it?
Thanks!
In order to split on multiple sequences you could simply replace all of the sequences you need to split on with just one sequence and then split on that one sequence.
So
s = s.replace("z", "s")
s.split("s")
Will split on s and z.
Generic approach for a list of splitters, please, someone can write this with less code?
Initializing vars:
>>> splits = ['.', '-', ':', ',']
>>> s='hola, que: tal. be'
Splitting:
>>> r = [ s ]
>>> for p in splits:
... r = reduce(lambda x,y: x+y, map(lambda z: z.split(p), r ))
Results:
>>> r
['hola', ' que', ' tal', ' be']