i've been reading from the file and i have hard time getting rid of "\t"
i've tried using i.strip().split("\t")[1] and append it to the list. but if theres more tabs in a row it isnt very useful
for example:
if i do what i described i get
z=['\t\t\t\twoman-in-lingerie', 'newspaper-photo', 'reference-to-marie-antoinette', '\tempty-grave', '\t\t\tbased-on-play', '\t\t\tcanadian-humor', '\t\t\tsitcom', 'hypocrisy', 'stripper']
now i dont know how to remove those tabs, ive been trying to get trough the list and change each element on its own bit it was unsuccessful
If you're just trying to remove tabs you can use this list comprehension:
l2 = [item.strip('\t') for item in l1]
That'll get rid of any leading or trailing tabs on each element.
If you don't want any of the tabs you can use filter after reading everything:
for item in my_list:
item = item.filter(lambda x: x != '\t', item)
The best you can do is use the replace function, replacing tabs ('\t') for empty strings (''):
>>> z = ['\t\t\t\twoman-in-lingerie', '\t\t\tsitcom']
>>> map(lambda x: x.replace('\t',''), z)
['woman-in-lingerie', 'sitcom']
This might give you an idea:
>>> import re
>>> re.sub('\t+','\t', 'hello\t\t\t')
'hello\t'
>>>
z = '''\t\t\t\twoman-in-lingerie
newspaper-photo\t\t\t\t reference-to-marie-antoinette
\tempty-grave
\t\t\tbased-on-play
\t\t\tcanadian-humor\t\t\t
\t\t\tsitcom
hypocrisy\t\t\t\t\tstripper'''
import re
def displ(x):
return '\n'.join(map(repr,x.splitlines(True)))
print displ(z)
print '-------------------------------'
zt = re.sub('\t+',' ',z)
print displ(zt)
print '-------------------------------'
zt = re.sub('(^\t+)|(\t+)',
lambda mat: '' if mat.group(1) else ' ',
z,
flags = re.MULTILINE)
print displ(zt)
print '-------------------------------'
zt = re.sub('(^[ \t]+)|([ \t]+)',
lambda mat: '' if mat.group(1) else ' ',
z,
flags = re.MULTILINE)
print displ(zt)
result
'\t\t\t\twoman-in-lingerie\n'
'newspaper-photo\t\t\t\t reference-to-marie-antoinette\n'
'\tempty-grave\n'
'\t\t\tbased-on-play\n'
'\t\t\tcanadian-humor\t\t\t\n'
'\t\t\tsitcom\n'
'hypocrisy\t\t\t\t\tstripper'
-------------------------------
' woman-in-lingerie\n'
'newspaper-photo reference-to-marie-antoinette\n'
' empty-grave\n'
' based-on-play\n'
' canadian-humor \n'
' sitcom\n'
'hypocrisy stripper'
-------------------------------
'woman-in-lingerie\n'
'newspaper-photo reference-to-marie-antoinette\n'
'empty-grave\n'
'based-on-play\n'
'canadian-humor \n'
'sitcom\n'
'hypocrisy stripper'
-------------------------------
'woman-in-lingerie\n'
'newspaper-photo reference-to-marie-antoinette\n'
'empty-grave\n'
'based-on-play\n'
'canadian-humor \n'
'sitcom\n'
'hypocrisy stripper'
I use the function displ() to display in a manner that shows the escaped characters
Related
This question already has answers here:
Efficient way to add spaces between characters in a string
(5 answers)
Closed 9 months ago.
I am using the python module, markovify. I want to make new words instead of making new sentences.
How can I make a function return an output like this?
spacer('Hello, world!') # Should return 'H e l l o , w o r l d !'
I tried the following,
def spacer(text):
for i in text:
text = text.replace(i, i + ' ')
return text
but it returned, 'H e l l o , w o r l d ! ' when I gave, 'Hello, world!'
You can use this one.
def spacer(string):
return ' '.join(string)
print(spacer('Hello,World'))
Or You can change this into.
def spacer(text):
out = ''
for i in text:
out+=i+' '
return out[:-1]
print(spacer("Hello, World"))
(If you want)
You could make the same function into a custom spacer function,
But here you also need to pass how many spaces(Default 1) you want in between.
def spacer(string,space=1):
return (space*' ').join(string)
print(spacer('Hello,World',space=1))
OR FOR CUSTOM SPACES.
def spacer(text,space=1):
out = ''
for i in text:
out+=i+' '*space
return out[:-(space>0) or len(out)]
print(spacer("Hello, World",space=1))
.→ OUTPUT.
H e l l o , W o r l d
The simplest method is probably
' '.join(string)
Since replace works on every instance of a character, you can do
s = set(string)
if ' ' in s:
string = string.replace(' ', ' ')
s.remove(' ')
for c in s:
string = string.replace(c, c + ' ')
if string:
string = string[:-1]
The issue with your original attempt is that you have ox2 and lx3 in your string. Replacing all 'l' with 'l ' leads to l . Similarly for o .
The simplest answer to this question would be to use this:-
"Hello world".replace("", " ")[1:-1]
This code reads as follows:-
Replace every empty substring with a space, and then trim off the trailing spaces.
print(" ".join('Hello, world!'))
Output
H e l l o , w o r l d !
Say I have an incoming string that varies a little:
" 1 |r|=1.2e10 |v|=2.4e10"
" 12 |r|=-2.3e10 |v|=3.5e-04"
"134 |r|= 3.2e10 |v|=4.3e05"
I need to extract the numbers (ie. 1.2e10, 3.5e-04, etc)... so I would like to start at the end of '|r|' and grab all characters up to the ' ' (space) after it. Same for '|v|'
I've been looking for something that would:
Extract a substring form a string starting at an index and ending on a specific character...
But have not found anything remotely close.
Ideas?
NOTE: Added new scenario, which is the one that is causing lots of head-scratching...
To keep it elegant and generic, let's utilize split:
First, we split by ' ' to tokens
Then we find if it has an equal sign and parse the key-value
import re
sabich = "134 |r| = 3.2e10 |v|=4.3e05"
parts = sabich.split(' |')
values = {}
for p in parts:
if '=' in p:
k, v = p.split('=')
values[k.replace('|', '').strip()] = v.strip(' ')
# {'r': '3.2e10', 'v': '4.3e05'}
print(values)
This can be converted to the one-liner:
import re
sabich = "134 |r| = 3.2e10 |v|=4.3e05"
values = {t[0].replace('|', '').strip() : t[1].strip(' ') for t in [tuple(p.split('=')) for p in sabich.split(' |') if '=' in p]}
# {'|r|': '1.2e10', '|v|': '2.4e10'}
print(values)
You can solve it with a regular expression.
import re
strings = [
" 1 |r|=1.2e10 |v|=2.4e10",
" 12 |r|=-2.3e10 |v|=3.5e-04"
]
out = []
pattern = r'(?P<name>\|[\w]+\|)=(?P<value>-?\d+(?:\.\d*)(?:e-?\d*)?)'
for s in strings:
out.append(dict(re.findall(pattern, s)))
print(out)
Output
[{'|r|': '1.2e10', '|v|': '2.4e10'}, {'|r|': '-2.3e10', '|v|': '3.5e-04'}]
And if you want to convert the strings to number
out = []
pattern = r'(?P<name>\|[\w]+\|)=(?P<value>-?\d+(?:\.\d*)(?:e-?\d*)?)'
for s in strings:
# out.append(dict(re.findall(pattern, s)))
out.append({
name: float(value)
for name, value in re.findall(pattern, s)
})
Output
[{'|r|': 12000000000.0, '|v|': 24000000000.0}, {'|r|': -23000000000.0, '|v|': 0.00035}]
I have a string and a list:
src = 'ways to learn are read and execute.'
temp = ['ways to','are','and']
What I wanted is to split the string using the list temp's values and produce:
['learn','read','execute']
at the same time.
I had tried for loop:
for x in temp:
src.split(x)
This is what it produced:
['','to learn are read and execute.']
['ways to learn','read and execute.']
['ways to learn are read','execute.']
What I wanted is to output all the values in list first, then use it split the string.
Did anyone has solutions?
re.split is the conventional solution for splitting on multiple separators:
import re
src = 'ways to learn are read and execute.'
temp = ['ways to','are','and']
pattern = "|".join(re.escape(item) for item in temp)
result = re.split(pattern, src)
print(result)
Result:
['', ' learn ', ' read ', ' execute.']
You can also filter out blank items and strip the spaces+punctuation with a simple list comprehension:
result = [item.strip(" .") for item in result if item]
print(result)
Result:
['learn', 'read', 'execute']
This is a method which is purely pythonic and does not rely on regular expressions. It's more verbose and more complex:
result = []
current = 0
for part in temp:
too_long_result = src.split(part)[1]
if current + 1 < len(temp): result.append(too_long_result.split(temp[current+1])[0].lstrip().rstrip())
else: result.append(too_long_result.lstrip().rstrip())
current += 1
print(result)
You cann remove the .lstrip().rstrip() commands if you don't want to remove the trailing and leading whitespaces in the list entries.
Loop solution. You can add conditions such as strip if you need them.
src = 'ways to learn are read and execute.'
temp = ['ways to','are','and']
copy_src = src
result = []
for x in temp:
left, right = copy_src.split(x)
if left:
result.append(left) #or left.strip()
copy_src = right
result.append(copy_src) #or copy_src.strip()
just keep it simple
src = 'ways to learn are read and execute.'
temp = ['ways','to','are','and']
res=''
for w1 in src.split():
if w1 not in temp:
if w1 not in res.split():
res=res+w1+" "
print(res)
Revenue = [400000000,10000000,10000000000,10000000]
s1 = []
for x in Revenue:
message = (','.join(['{:,.0f}'.format(x)]).split())
s1.append(message)
print(s1)
The output I am getting is something like this [['400,000,000'], ['10,000,000'], ['10,000,000,000'], ['10,000,000']] and I want it should be like this -> [400,000,000, 10,000,000, 10,000,000,000, 10,000,000]
Can someone please help me on this, I am new to python
If your goal is to just add in the commas you will be stuck with the ' ' due to the fact its going to be a str but you can eliminate that nesting by using a simpler list comprehension
Revenue = [400000000,10000000,10000000000,10000000]
l = ['{:,}'.format(i) for i in Revenue]
# ['400,000,000', '10,000,000', '10,000,000,000', '10,000,000']
You could also unpack the list into variables and then print each variable without quotes
v, w, x, y = l
print(v)
# 400,000,000
You can print the unpacked list but that will just be output
print(*l)
# 400,000,000 10,000,000 10,000,000,000 10,000,000
Expanded Loop:
l = []
for i in Revenue:
l.append('{:,}'.format(i))
I'm not sure why you want the output you've shown, because it is hard to read, but here is how to make it:
>>> Revenue = [400000000,10000000,10000000000,10000000]
>>> def revenue_formatted(rev):
... return "[" + ", ".join("{:,d}".format(n) for n in rev) + "]"
...
>>> print(revenue_formatted(Revenue))
[400,000,000, 10,000,000, 10,000,000,000, 10,000,000]
I need a way to copy all of the positions of the spaces of one string to another string that has no spaces.
For example:
string1 = "This is a piece of text"
string2 = "ESTDTDLATPNPZQEPIE"
output = "ESTD TD L ATPNP ZQ EPIE"
Insert characters as appropriate into a placeholder list and concatenate it after using str.join.
it = iter(string2)
output = ''.join(
[next(it) if not c.isspace() else ' ' for c in string1]
)
print(output)
'ESTD TD L ATPNP ZQ EPIE'
This is efficient as it avoids repeated string concatenation.
You need to iterate over the indexes and characters in string1 using enumerate().
On each iteration, if the character is a space, add a space to the output string (note that this is inefficient as you are creating a new object as strings are immutable), otherwise add the character in string2 at that index to the output string.
So that code would look like:
output = ''
si = 0
for i, c in enumerate(string1):
if c == ' ':
si += 1
output += ' '
else:
output += string2[i - si]
However, it would be more efficient to use a very similar method, but with a generator and then str.join. This removes the slow concatenations to the output string:
def chars(s1, s2):
si = 0
for i, c in enumerate(s1):
if c == ' ':
si += 1
yield ' '
else:
yield s2[i - si]
output = ''.join(char(string1, string2))
You can try insert method :
string1 = "This is a piece of text"
string2 = "ESTDTDLATPNPZQEPIE"
string3=list(string2)
for j,i in enumerate(string1):
if i==' ':
string3.insert(j,' ')
print("".join(string3))
outout:
ESTD TD L ATPNP ZQ EPIE