I have a list with spaces within the string. How can I remove these spaces.
['ENTRY', ' 102725023 CDS T01001']
I would like to have the final list as:
['ENTRY', '102725023 CDS T01001']
I tried the strip() function but the function is not working on list. Any help is highly appreciated.Remo
Suppose this is you string
string = "A b c "
And you want it in this way
Abc
What you can do is
string2 = " ".join(string.split())
print(string2)
The easiest way is to build a new list of the values with the spaces removed. For this, you can use list comprehensions and the idiom proposed by #CodeWithYash
old_list = ['ENTRY', ' 102725023 CDS T01001']
new_list = [" ".join(string.split()) for s in old_list]
Note that this works because the default behavior of split is:
split according to any whitespace, and discard empty strings from the result.
If you would want to remove anything but whitespace, you would have to implement you own function, maybe using regular expression.
Note also that in Python strings are immutable: you can not edit each item of the list in place. If you do not want to create a new list (for example, if a reference to the list is kept in other place of the program), you can change every item:
l = ['ENTRY', ' 102725023 CDS T01001']
for i, s in enumerate(l):
old_list[i] = " ".join(s.split())
print(l)
Output:
['ENTRY', '102725023 CDS T01001']
I wrote this function:
s = " abc def xyz "
def proc(s):
l = len(s)
s = s.replace(' ',' ')
while len(s) != l:
l = len(s)
s = s.replace(' ',' ')
if s[0] == ' ':
s = s[1:]
if s[-1] == ' ':
s = s[:-1]
return s
print(proc(s))
the idea is to keep replacing every two spaces with 1 space, then check if the first and last elements are also spaces
I don't know if there exists an easier way with regular expressions or something else.
Related
I am trying to use list comprehension to cycle through the letters in a word and get new combinations after removing one letter at a time.
E.g. say the input string is a word: 'bathe'
I would like to get the output in a list (preferably) with the following
[athe, bthe, bahe, bate]
ie, making just one pass from left to right
---- this is the literal, but I need to accomplish this with list comprehension
word = "bathe"
newlist1 = [word[1::], (word[1:2] + word[-3:]), (word[:2] + word[-2:]), word[:3] + word[-1:] ]
print('sample 1', newlist1)
newlist2 = [(word[1:2] + word[-3:]), (word[1:2] + word[-3:]), (word[:2] + word[-2:]), word[:3] + word[-1:] ]
print('sample 2', newlist2)
I got through the first pass with this code, but am stuck now
x = [(word[:i] + word[-j:]) for i in range(1,4) for j in range(4,1, -1)]
The output I get is obviously not right, but (hopefully) is directionally there (when it comes to using list comprehensions)
['bathe', 'bthe', 'bhe', 'baathe', 'bathe', 'bahe', 'batathe', 'batthe', 'bathe']
You can do it like this:
First, you need some way to remove a certain element from a list:
def without(lst: list, items: list) -> list:
"""
Returns a list without anything in the items list
"""
new_lst = lst.copy()
for e in lst:
if e in items:
items.remove(e)
new_lst.remove(e)
return new_lst
Then, using that function you can create your new word list.
new_word_list = ["".join(without(list(word), list(letter))) for letter in word]
As showed in your wanted output, you don't want the last result of this, so you can just add [:-1] to it.
new_word_list = ["".join(without(list(word), list(letter))) for letter in word][:-1]
Another way you could do it (without the without function):
new_word_list = [word[:index - 1] + word[index:] for index, _ in enumerate(word)][1:]
The [1:] at the end is because you end up with a weird string at the beginning (because of the way it is written). The weird string is bathbathe (when word is bathe)
I have a list which looks like this:
['G1X0.000Y3.000', 'G2X2.000Y3.000I1.000J2.291', 'G1X2.000Y-0.000', 'G2X0.000Y0.000I-1.000J-2.291']
The formatting is such that if the numeric content after X,Y,I or J are positive there is no + sign but if they are negative then the - sign is included. I am trying to loop through this list and to basically add the + sign if there is no - sign at the start of the numeric content. The result should look like this:
['G1X+0.000Y+3.000', 'G2X+2.000Y+3.000I+1.000J+2.291', 'G1X+2.000Y-0.000', 'G2X+0.000Y+0.000I-1.000J-2.291']
I'm trying to use a list comprehension to do so as follows:
#Make sure that X, Y, I and J start with + or -
for count,i in enumerate(fileContents):
if 'G' in i:
indexOfI = i.index("X")
if(i[indexOfI+1]!="-"):
print(i[:indexOfI+1] + "+" + i[indexOfI+1:])
fileContents[count] = i[:indexOfI+1] + "+" + i[indexOfI+1:]
indexOfY = i.index("Y")
if(i[indexOfY+1]!="-"):
fileContents[count] = i[:indexOfY+1] + "+" + i[indexOfY+1:]
if "G2" in i:
indexOfI = i.index("I")
if(i[indexOfI+1]!="-"):
fileContents[count] = i[:indexOfI+1] + "+" + i[indexOfI+1:]
indexOfJ = i.index("J")
if(i[indexOfJ+1]!="-"):
fileContents[count] = i[:indexOfJ+1] + "+" + i[indexOfJ+1:]
the statement print(i[:indexOfI+1] + "+" + i[indexOfI+1:]) gives an output in the console of:
G1X+0.000Y3.000
G2X+2.000Y3.000I1.000J2.291
G1X+2.000Y-0.000
G2X+0.000Y0.000I-1.000J-2.291
Which shows me that this performs what I want it to, however if I print fileContents after this function there are no changes to the list. In other words the following line of code does not replace the list item in each position as I expect it to:
fileContents[count] = i[:indexOfI+1] + "+" + i[indexOfI+1:]
Why does this not work when I can do the following and it does update the list correctly?
#Format each command to a 32 byte string
for i, s in enumerate(fileContents):
fileContents[i] =s.ljust(32,'#')
edit: I originally titled the post "Why doesn't using a list comprehension this way replace each item in the list?". Users have kindly pointed out this has nothing to do with a list comprehension. I apologise, I thought this format x in list was a list comprehension.
if I print fileContents after this function there are no changes to the list.
Actually, there are changes, but at most one + is added (the last one).
This is because you don't apply the same change to i, which means that the next if blocks will copy a part from i back to fileContents[count] that didn't have the previous change.
The quick fix is to make sure you apply the change also to i. Something like:
fileContents[count] = i = i[:indexOfI+1] + "+" + i[indexOfI+1:]
# ^^^^
You can perform this task with list comprehension using re.sub:
import re
fileContents = [re.sub(r"([XYIJ])(?=\d)", r"\1+", line) for line in fileContents]
This will match any X, Y, I or J followed by a digit. In that case, a plus is inserted between those. If you need more strict matching rules, where the line must start with "G", ...etc, then the regular expression will become more complex.
In the loop
for i, s in enumerate(fileContents):
you iterate over the fileContents list, which you want to change in the same loop. It's always dangerous.
Iterate over a copy of this list, which you may simply create by adding [:] to it:
for i, s in enumerate(fileContents[:]):
You can just add + after any of these chars, then replace back +- (f any) with -:
def my_replace(item):
for char in 'XYIJ':
item = item.replace(char, f'{char}+')
return item.replace('+-', '-')
spam = ['G1X0.000Y3.000', 'G2X2.000Y3.000I1.000J2.291',
'G1X2.000Y-0.000', 'G2X0.000Y0.000I-1.000J-2.291']
eggs = [my_replace(item) for item in spam] # now, this is list comprehension
print(eggs)
output
['G1X+0.000Y+3.000', 'G2X+2.000Y+3.000I+1.000J+2.291', 'G1X+2.000Y-0.000', 'G2X+0.000Y+0.000I-1.000J-2.291']
the output if given a string, "abcdefg" of length 7 as example, should print out 7 lines like:
abcdefg
bcdefga
cdefgab
defgabc
efgabcd
fgabcde
gabcedf
but I seem to be missing the boat after many many hours of various loops and print statements
s = str("abcdefg")
print(s)
print()
for i in range(len(s)):
new_s = s[:i+1] + s[-i:] + s[-i]
print(new_s)
I get this:
abcdefg
aabcdefga
abgg
abcfgf
abcdefge
abcdedefgd
abcdefcdefgc
abcdefgbcdefgb
You're overcomplicating this. The proper expression is just
new_s = s[i:] + s[:i]
slicing is inclusive of the start index, and exclusive of the end index. This above expression guarantees to keep the length of the result the same as the input list, just swapping variable parts of it.
Note that the first new_s value is the original string itself. No need to print it at the start of the program.
The result is:
abcdefg
bcdefga
cdefgab
defgabc
efgabcd
fgabcde
gabcdef
slicing in detail: Understanding slice notation
the loop could be like:
s_len = len(s)
for i in range(s_len):
s = s[1:] + s[0]
print(s)
I have a question on creating an r*c matrix with given number of rows and columns.
I wrote this which takes r number of rows and c number of columns but the problem is in the output formatting, i require a exact output format and can't seem to get it even after trying for so long, if anyone could help me.
def matprint(r, c):
max = r*c
l=[]
for i in range(1,max+1):
l.append(i)
subList = [l[n:n+c] for n in range(0, len(l),c)]
for q in subList:
list1 = q
print( ( '{} ' * len(list1) ).format( *list1 ) )
see the difference is that mine prints "\n" after spaces and also on the last line. it is not a logical problem, just need help with the formatting.
Thank You
You should use str.join to join a list of strings.
This code produces a string of items from list1, separeted by ' ', but also adds a white space at the end:
print( ( '{} ' * len(list1) ).format( *list1 ) )
Instead of that, do this:
list_of_strings = [str(x) for x in list1]
print(' '.join(list_of_strings))
Or, more compact:
print(' '.join(str(x) for x in list1))
You have the same problem with the newlines. print adds them after each line. You don't want them after the last line, so you should join the lines as well and then print them without a newline:
lines = [' '.join(str(x) for x in list1) for list1 in subList]
sys.stdout.write('\n'.join(lines))
You can invert the problem:
print a sublist if it is the first one - without newline after it
if it is not the first one, print a newline followed by the next sublist
that way your last line does not have to \n at its end:
def matprint(r, c):
data = list(range(1,r*c+1))
l= [data[i*c:i*c+c] for i in range(r)]
formatter = ('{} ' * c).strip() # create the format string once - strip spaces at end
for i,sublist in enumerate(l):
if i: # 0 is False, all others are True
print("")
print( formatter.format( *sublist ), end="" ) # do not print \n at end
matprint(3, 5)
I optimized the code a bit as well - you should not use things like max,min,list,dict,... as variable names - they hide the build in functions of the same name.
Your list construction can be streamlined by a list comprehension that chunks your numbers list - see How do you split a list into evenly sized chunks? .
You do not need to recompute the length of your sublist - it is c long.
You need the index from enumerate() to decide if the list is "first" - and you need the end="" option of print to avoid autoprinting newlines.
A simpler version without enumerate could be done using list slicing:
def matprint(r, c):
data = list(range(1,r*c+1))
l= [data[i*c:i*c+c] for i in range(r)]
formatter = ('{} ' * c).strip() # create the format string once - strip spaces at end
print(formatter.format(*l[0]), end="") # print the 1st element w/o newline
for sublist in l[1:]:
# print all others including a \n in front
print( "\n"+formatter.format( *sublist ), end="" ) # do not print \n at end
So I need the output of my program to look like:
ababa
ab ba
xxxxxxxxxxxxxxxxxxx
that is it followed by a lot of spaces .
no dot at the end
The largest run of consecutive whitespace characters was 47.
But what I am getting is:
ababa
ab ba
xxxxxxxxxxxxxxxxxxx
that is it followed by a lot of spaces .
no dot at the end
The longest run of consecutive whitespace characters was 47.
When looking further into the code I wrote, I found with the print(c) statement that this happens:
['ababa', '', 'ab ba ', '', ' xxxxxxxxxxxxxxxxxxx', 'that is it followed by a lot of spaces .', ' no dot at the end']
Between some of the lines, theres the , '',, which is probably the cause of why my print statement wont work.
How would I remove them? I've tried using different list functions but I keep getting syntax errors.
This is the code I made:
a = '''ababa
ab ba
xxxxxxxxxxxxxxxxxxx
that is it followed by a lot of spaces .
no dot at the end'''
c = a.splitlines()
print(c)
#d = c.remove(" ") #this part doesnt work
#print(d)
for row in c:
print(' '.join(row.split()))
last_char = ""
current_seq_len = 0
max_seq_len = 0
for d in a:
if d == last_char:
current_seq_len += 1
if current_seq_len > max_seq_len:
max_seq_len = current_seq_len
else:
current_seq_len = 1
last_char = d
#this part just needs to count the whitespace
print("The longest run of consecutive whitespace characters was",str(max_seq_len)+".")
Regex time:
import re
print(re.sub(r"([\n ])\1*", r"\1", a))
#>>> ababa
#>>> ab ba
#>>> xxxxxxxxxxxxxxxxxxx
#>>> that is it followed by a lot of spaces .
#>>> no dot at the end
re.sub(matcher, replacement, target_string)
Matcher is r"([\n ])\1* which means:
([\n ]) → match either "\n" or " " and put it in a group (#1)
\1* → match whatever group #1 matched, 0 or more times
And the replacement is just
\1 → group #1
You can get the longest whitespace sequence with
max(len(match.group()) for match in re.finditer(r"([\n ])\1*", a))
Which uses the same matcher but instead just gets their lengths, and then maxs it.
From what I can tell, your easiest solution would be using list comprehension:
c= [item for item in a.splitlines() if item != '']
If you wish to make it slightly more robust by also removing strings that only contain whitespace such as ' ', then you can alter it as follows:
c= [item for item in a.splitlines() if item.strip() != '']
You can then also join it the list back together as follows:
output = '\n'.join(c)
This can be easily solved with the built-in filter function:
c = filter(None, a.splitlines())
# or, more explicit
c = filter(lambda x: x != "", a.splitlines())
The first variant will create a list with all elements from the list returned by a.splitlines() that do not evaluate to False, like the empty string.
The second variant creates a small anonymous function (using lambda) that checks if a given element is the empty string and returns False if that is the case. This is more explicit than the first variant.
Another option would be to use a list comprehension that achieves the same thing:
c = [string for string in a.splitlines if string]
# or, more explicit
c = [string for string in a.splitlines if string != ""]