I have made a string without spaces. so instead of spaces, I used 0000000. but there will be no alphabet letters. so for example, 000000020000000050000000190000000200000000 should equal "test". Sorry, I am very new to python and am not good. so if someone can help me out, that would be awesome.
You should be able to achieve the desired effect using regular expressions and re.sub()
If you want to extract the literal word "test" from that string as mentioned in the comments, you'll need to account for the fact that if you have 8 0's, it will match the first 7 from left to right, so a number like 20 followed by 7 0's would cause a few issues. We can get around this by matching the string in reverse (right to left) and then reversing the finished string to undo the initial reverse.
Here's the solution I came up with as my revised answer:
import re
my_string = '000000020000000050000000190000000200000000'
# Substitute a space in place of 7 0's
# Reverse the string in the input, and then reverse the output
new_string = re.sub('0{7}', ' ', my_string[::-1])[::-1]
# >>> new_string
# ' 20 5 19 20 '
Then we can strip the leading and trailing whitespace from this answer and split it into an array
my_array = new_string.strip().split()
# >>> my_array
# ['20', '5', '19', '20']
After that, you can process the array in whatever way you see fit to get the word "test" out of it.
My solution to that would probably be the following:
import string
word = ''.join([string.ascii_lowercase[int(x) - 1] for x in my_array])
# >>> word
# 'test'
NOTE: This answer has been completely rewritten (v2).
Related
I have line include some numbers with underscore like this
1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0
I need code to check (DCAAFFC) and if the last 4 numbers not (0000) then the code should be replacing (0000) in place of last 4 numbers (AFFC) like this (DCA0000)
So should be line become like this
1_0_1_A2C_1A_2BE_DCA0000_0_0_0
I need code work on python2 and 3 please !!
P.S the code of (DCAAFFC) is not stander always changing.
code=1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0
I will assume that the format is strictly like this. Then you can get the DCAAFFC by code.split('_')[-4]. Finally, you can replace the last string with 0000 by replace.
Here is the full code
>>> code="1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0"
>>> frag=code.split("_")
['1', '0', '1', 'A2C', '1A', '2BE', 'DCAAFFC', '0', '0', '0']
>>> frag[-4]=frag[-4].replace(frag[-4][-4:],"0000") if frag[-4][-4:] != "0000" else frag[-4]
>>> final_code="_".join(frag)
>>> final_code
'1_0_1_A2C_1A_2BE_DCA0000_0_0_0'
Try regular expressions i.e:
import re
old_string = '1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0'
match = re.search('_([a-zA-Z]{7})_', old_string)
span = match.span()
new_string = old_string[:span[0]+4] + '0000_' + old_string[span[1]:]
print(new_string)
Is this a general string or just some hexadecimal representation of a number? For numbers in Python 3, '_' underscores are used just for adding readability and do not affect the number value in any way.
Say you have one such general string as you've given, and would like to replace ending 4 characters of every possible subgroup bounded within '_' underscores having length more than 4 by '0000', then one simple one-liner following your hexadecimal_string would be:
hexadecimal_string = "1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0"
hexadecimal_string = "_".join([ substring if len(substring)<=4 else substring[:-4]+'0'*4 for substring in hexadecimal_string.split('_')])
Here,
hexadecimal_string.split('_') separates all groups by '_' as separator,
substring if len(substring)<=4 else substring[:-4]+'0'*4 takes care of every such substring group having length more than 4 to have ending 4 characters replaced by '0'*4 or '0000',
such for loop usage is a list comprehension feature of Python.
'_'.join() joins the subgroups back into one main string using '_' as separator in string.
Other answers posted here work specifically well for the given string in the question, I'm sharing this answer to ensure your one-liner requirement in Python 3.
If the length of the string is always the same, and the position of the part that needs to be replaced with zero is always the same, you can just do this,
txt = '1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0'
new = txt[0:20]+'0000'+txt[-6:]
print(new)
The output will be
'1_0_1_A2C_1A_2BE_DCA0000_0_0_0'
It would help if you gave us some other examples of the strings.
I started studying Python yesterday and I wanted to study a little about the string split method.
I wasn't looking for anything specific, I was just trying to learn it. I saw that it's possible to split multiple characters of a string, but what if I want to use the maxsplit parameter in only one of those characters?
I searched a little about it and found nothing, so I'm here to ask how. Here's an example:
Let's suppose I have this string:
normal_string = "1d30 drake dreke"
I want this to be a list like this:
['1', '30', 'drake', 'dreke']
Now let's suppose I use a method to split multiple characters, so I split the character 'd' and the character ' '.
The thing is:
I don't want to take the "d" from "drake" and "dreke" off, only from "1d30", but at the same time I don't want this, I want to split all of the space characters.
I need to put a maxsplit parameter ONLY at the character "d", how can I do it?
Do the following:
normal_string = "1d30 drake dreke"
# first split by d
start, end = normal_string.split("d", maxsplit=1)
# the split by space and concat the results
res = start.split() + end.split()
print(res)
Output
['1', '30', 'drake', 'dreke']
A more general approach, albeit more advanced, is to do:
res = [w for s in normal_string.split("d", maxsplit=1) for w in s.split()]
print(res)
I have been working on a program which will take a hex file, and if the file name starts with "CID", then it should remove the first 104 characters, and after that point there is a few words. I also want to remove everything after the words, but the problem is the part I want to isolate varies in length.
My code is currently like this:
y = 0
import os
files = os.listdir(".")
filenames = []
for names in files:
if names.endswith(".uexp"):
filenames.append(names)
y +=1
print(y)
print(filenames)
for x in range(1,y):
filenamestart = (filenames[x][0:3])
print(filenamestart)
if filenamestart == "CID":
openFile = open(filenames[x],'r')
fileContents = (openFile.read())
ItemName = (fileContents[104:])
print(ItemName)
Input Example file (pulled from HxD):
.........................ýÿÿÿ................E.................!...1AC9816A4D34966936605BB7EFBC0841.....Sun Tan Specialist.................9.................!...9658361F4EFF6B98FF153898E58C9D52.....Outfit.................D.................!...F37BE72345271144C16FECAFE6A46F2A.....Don't get burned............................................................................................................................Áƒ*ž
I have got it working to remove the first 104 characters, but I would also like to remove the characters after 'Sun Tan Specialist', which will differ in length, so I am left with only that part.
I appreciate any help that anyone can give me.
One way to remove non-alphabetic characters in a string is to use regular expressions [1].
>>> import re
>>> re.sub(r'[^a-z]', '', "lol123\t")
'lol'
EDIT
The first argument r'[^a-z]' is the pattern that captures what will removed (here, by replacing it by an empty string ''). The square brackets are used to denote a category (the pattern will match anything in this category), the ^ is a "not" operator and the a-z denotes all the small caps alphabetiv characters. More information here:
https://docs.python.org/3/library/re.html#regular-expression-syntax
So for instance, to keep also capital letters and spaces it would be:
>>> re.sub(r'[^a-zA-Z ]', '', 'Lol !this *is* a3 -test\t12378')
'Lol this is a test'
However from the data you give in your question the exact process you need seems to be a bit more complicated than just "getting rid of non-alphabetical characters".
You can use filter:
import string
print(''.join(filter(lambda character: character in string.ascii_letters + string.digits, '(ABC), DEF!'))) # => ABCDEF
You mentioned in a comment that you got the string down to Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned
Essentially your goal at this point is to remove any uppercase letter that isn't immediately followed by a lowercase letter because Upper Lower indicates the start of a phrase. You can use a for loop to do this.
import re
h = "Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned"
output = ""
for i in range(0, len(h)):
# Keep spaces
if h[i] is " ":
output += h[i]
# Start of a phrase found, so separate with space and store character
elif h[i].isupper() and h[i+1].islower():
output += " " + h[i]
# We want all lowercase characters
elif h[i].islower():
output += h[i]
# [1:] because we appended a space to the start of every word
print output[1:]
# If you dont care about Outfit since it is always there, remove it
print output[1:].replace("Outfit", "")
Output:
Sun Tan Specialist Outfit Dont get burned
Sun Tan Specialist Dont get burned
I am parsing some data where the standard format is something like 10 pizzas. Sometimes, data is input correctly and we might end up with 5pizzas instead of 5 pizzas. In this scenario, I want to parse out the number of pizzas.
The naïve way of doing this would be to check character by character, building up a string until we reach a non-digit and then casting that string as an integer.
num_pizzas = ""
for character in data_input:
if character.isdigit():
num_pizzas += character
else:
break
num_pizzas = int(num_pizzas)
This is pretty clunky, though. Is there an easier way to split a string where it switches from numeric digits to alphabetic characters?
You ask for a way to split a string on digits, but then in your example, what you actually want is just the first numbers, this done easily with itertools.takewhile():
>>> int("".join(itertools.takewhile(str.isdigit, "10pizzas")))
10
This makes a lot of sense - what we are doing is taking the character from the string while they are digits. This has the advantage of stopping processing as soon as we get to the first non-digit character.
If you need the later data too, then what you are looking for is itertools.groupby() mixed in with a simple list comprehension:
>>> ["".join(x) for _, x in itertools.groupby("dfsd98sd8f68as7df56", key=str.isdigit)]
['dfsd', '98', 'sd', '8', 'f', '68', 'as', '7', 'df', '56']
If you then want to make one giant number:
>>> int("".join("".join(x) for is_number, x in itertools.groupby("dfsd98sd8f68as7df56", key=str.isdigit) if is_number is True))
98868756
To split the string at digits you can use re.split with the regular expression \d+:
>>> import re
>>> def my_split(s):
return filter(None, re.split(r'(\d+)', s))
>>> my_split('5pizzas')
['5', 'pizzas']
>>> my_split('foo123bar')
['foo', '123', 'bar']
To find the first number use re.search:
>>> re.search('\d+', '5pizzas').group()
'5'
>>> re.search('\d+', 'foo123bar').group()
'123'
If you know the number must be at the start of the string then you can use re.match instead of re.search. If you want to find all the numbers and discard the rest you can use re.findall.
How about a regex ?
reg = re.compile(r'(?P<numbers>\d*)(?P<rest>.*)')
result = reg.search(str)
if result:
numbers = result.group('numbers')
rest = result.group('rest')
Answer added as possible way to solve How to split a string into a list by digits? which was dupe-linked to this question.
You can do the splitting yourself:
use a temporary list to accumulate characters that are not digits
if you find a digit, add the temporary list (''.join()-ed) to the result list (only if not empty) and do not forget to clear the temporary list
repeat until all characters are processed and if the temp-lists still has content, add it
text = "Ka12Tu12La"
splitted = [] # our result
tmp = [] # our temporary character collector
for c in text:
if not c.isdigit():
tmp.append(c) # not a digit, add it
elif tmp: # c is a digit, if tmp filled, add it
splitted.append(''.join(tmp))
tmp = []
if tmp:
splitted.append(''.join(tmp))
print(splitted)
Output:
['Ka', 'Tu', 'La']
References:
What exactly does the .join() method do?
I would like to do something like:
temp=a.split()
#do some stuff with this new list
b=" ".join(temp)
where a is the original string, and b is after it has been modified. The problem is that when performing such methods, the newlines are removed from the new string. So how can I do this without removing newlines?
I assume in your third line you mean join(temp), not join(a).
To split and yet keep the exact "splitters", you need the re.split function (or split method of RE objects) with a capturing group:
>>> import re
>>> f='tanto va\nla gatta al lardo'
>>> re.split(r'(\s+)', f)
['tanto', ' ', 'va', '\n', 'la', ' ', 'gatta', ' ', 'al', ' ', 'lardo']
The pieces you'd get from just re.split are at index 0, 2, 4, ... while the odd indices have the "separators" -- the exact sequences of whitespace that you'll use to re-join the list at the end (with ''.join) to get the same whitespace the original string had.
You can either work directly on the even-spaced items, or you can first extract them:
>>> x = re.split(r'(\s+)', f)
>>> y = x[::2]
>>> y
['tanto', 'va', 'la', 'gatta', 'al', 'lardo']
then alter y as you will, e.g.:
>>> y[:] = [z+z for z in y]
>>> y
['tantotanto', 'vava', 'lala', 'gattagatta', 'alal', 'lardolardo']
then reinsert and join up:
>>> x[::2] = y
>>> ''.join(x)
'tantotanto vava\nlala gattagatta alal lardolardo'
Note that the \n is exactly in the position equivalent to where it was in the original, as desired.
You need to use regular expressions to rip your string apart. The resulting match object can give you the character ranges of the parts that match various sub-expressions.
Since you might have an arbitrarily large number of sections separated by whitespace, you're going to have to match the string multiple times at different starting points within the string.
If this answer is confusing to you, I can look up the appropriate references and put in some sample code. I don't really have all the libraries memorized, just what they do. :-)
It depends in what you want to split.
For default split use '\n', ' ' as delimitador, you can use
a.split(" ")
if you only want spaces as delimitador.
http://docs.python.org/library/stdtypes.html#str.split
I don't really understand your question. Can you give an example of what you want to do?
Anyway, maybe this can help:
b = '\n'.join(a)
First of all, I assume that when you say
b = " ".join(a)
You actually mean
b = " ".join(temp)
When you call split() without specifying a separator, the function will interpret whitespace of any length as a separator. I believe whitespace includes newlines, so those dissapear when you split the string. Try explicitly passing a separator (such as a simple " " space character) to split(). If you have multiple spaces in a row, using split this way will remove them all and include a series of "" empty strings in the returned list.
To restore the original spacing, just make sure that you call join() from the same string which you used as your separator in split(), and that you don't remove any elements from your intermediary list of strings.