This question already has answers here:
Is there a simple way to remove multiple spaces in a string?
(27 answers)
Closed 5 years ago.
How would I do this in python3?
The quick brown fox jumps over the lazy dog.
to...
The quick brown fox jumps over the lazy dog.
Where the above quote is a string.
You don't need to use regex here. You can achieve what you want like this ways:
a = "The quick brown fox jumps over the lazy dog."
final = " ".join(a.split())
print(final)
Output:
'The quick brown fox jumps over the lazy dog.'
Related
I'm trying to compare the output of a speech-to-text API with a ground truth transcription. What I'd like to do is capitalize the words in the ground truth which the speech-to-text API either missed or misinterpreted.
For Example:
Truth:
The quick brown fox jumps over the lazy dog.
Speech-to-text Output:
the quick brown box jumps over the dog
Desired Result:
The quick brown FOX jumps over the LAZY dog.
My initial instinct was to remove the capitalization and punctuation from the ground truth and use difflib. This gets me an accurate diff, but I'm having trouble mapping the output back to positions in the original text. I would like to keep the ground truth capitalization and punctuation to display the results, even if I'm only interested in word errors.
Is there any way to express difflib output as word-level changes on an original text?
I would also like to suggest a solution using difflib but I'd prefer using RegEx for word detection since it will be more precise and more tolerant to weird characters and other issues.
I've added some weird text to your original strings to show what I mean:
import re
import difflib
truth = 'The quick! brown - fox jumps, over the lazy dog.'
speech = 'the quick... brown box jumps. over the dog'
truth = re.findall(r"[\w']+", truth.lower())
speech = re.findall(r"[\w']+", speech.lower())
for d in difflib.ndiff(truth, speech):
print(d)
Output
the
quick
brown
- fox
+ box
jumps
over
the
- lazy
dog
Another possible output:
diff = difflib.unified_diff(truth, speech)
print(''.join(diff))
Output
---
+++
## -1,9 +1,8 ##
the quick brown-fox+box jumps over the-lazy dog
Why not just split the sentence into words then use difflib on those?
import difflib
truth = 'The quick brown fox jumps over the lazy dog.'.lower().strip(
'.').split()
speech = 'the quick brown box jumps over the dog'.lower().strip('.').split()
for d in difflib.ndiff(truth, speech):
print(d)
So I think I've solved the problem. I realised that difflib's "contextdiff" provides indices of lines that have changes in them. To get the indices for the "ground truth" text, I remove the capitalization / punctuation, split the text into individual words, and then do the following:
altered_word_indices = []
diff = difflib.context_diff(transformed_ground_truth, transformed_hypothesis, n=0)
for line in diff:
if line.startswith('*** ') and line.endswith(' ****\n'):
line = line.replace(' ', '').replace('\n', '').replace('*', '')
if ',' in line:
split_line = line.split(',')
for i in range(0, (int(split_line[1]) - int(split_line[0])) + 1):
altered_word_indices.append((int(split_line[0]) + i) - 1)
else:
altered_word_indices.append(int(line) - 1)
Following this, I print it out with the changed words capitalized:
split_ground_truth = ground_truth.split(' ')
for i in range(0, len(split_ground_truth)):
if i in altered_word_indices:
print(split_ground_truth[i].upper(), end=' ')
else:
print(split_ground_truth[i], end=' ')
This allows me to print out "The quick brown FOX jumps over the LAZY dog." (capitalization / punctuation included) instead of "the quick brown FOX jumps over the LAZY dog".
This is...not a super elegant solution, and it's subject to testing, cleanup, error handling, etc. But it seems like a decent start and is potentially useful for someone else running into the same problem. I'll leave this question open for a few days in case someone comes up with a less gross way of getting the same result.
This question already has answers here:
How do I put a variable’s value inside a string (interpolate it into the string)?
(9 answers)
How to use variables in SQL statement in Python?
(5 answers)
Closed 2 years ago.
Hi I want to concat outputFile variable instead of a filename in the below code in python
can anyone help me out?
outputFile= 'test.csv'
connection.execute("TRUNCATE travel_staging.upsell_test;"
"COPY travel_staging.upsell_test FROM 's3://folder/filename' WITH CREDENTIALS "
"'aws_access_key_id=xxx;aws_secret_access_key=xxxx'"
" FORMAT csv DELIMITER ',' IGNOREHEADER 1 DATEFORMAT 'auto' NULL AS 'null' MAXERROR 500 acceptinvchars;")
Expected output
TRUNCATE travel_staging.upsell_test;
COPY travel_staging.upsell_test FROM 's3://folder/test.csv' WITH CREDENTIALS
'aws_access_key_id=xxx;aws_secret_access_key=xxxx'
FORMAT csv DELIMITER ',' IGNOREHEADER 1 DATEFORMAT 'auto' NULL AS 'null' MAXERROR 500
To replace a variable in a string
Use f strings.
>>> fox = 'quick brown'
>>> dog = 'lazy'
>>> f'the {fox} fox jumps over the {dog} dog'
If you don't need the newlines
Use the \ to continue the statement on a new line.
>>> 'the quick brown fox jumps \
... over the lazy dog'
'the quick brown fox jumps over the lazy dog'
If you do need the new lines
Just have each line on a separate string.
>>> print('\n'.join([
... 'the quick brown fox jumps',
... 'over the lazy dog',
... ]))
the quick brown fox jumps
over the lazy dog
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a list of words in a txt file, each one in a line with its definition next to them. However, the definition sometimes gives a sentence using the word. I want to replace that word repeated in the example with the symbol ~. How could I do this with Python?
Ok, here is my example of replacing every instance of a word in a sentence with another character...
>>> my_string = "the quick brown fox jumped over the lazy dog"
>>> search_word = "the"
>>> replacement_symbol = "~"
>>> my_string.replace(search_word, replacement_symbol)
'~ quick brown fox jumped over ~ lazy dog'
Obviously this doesn't cover loading in the file, reading it line by line and omitting the first instance of the word... Lets extend it a little.
words.txt
fox the quick brown fox jumped over the lazy dog
the the quick brown fox jumped over the lazy dog
jumped the quick brown fox jumped over the lazy dog
And to read this, strip the first word and then replace that word in the rest of the line...
with open('words.txt') as f:
for line in f.readlines():
line = line.strip()
search_term = line.split(' ')[0]
sentence = ' '.join(line.split(' ')[1:])
sentence = sentence.replace(search_term, '~')
line = '%s %s' % (search_term, sentence)
print(line)
and the output...
fox the quick brown ~ jumped over the lazy dog
the ~ quick brown fox jumped over ~ lazy dog
jumped the quick brown fox ~ over the lazy dog
Assuming the word and definition is separated by #:
with open('file.txt','r') as f:
for line in f:
myword,mydefinition=line.split("#")
if myword in mydefinition
mydefinition.replace(myword, "~")
I want Python to remove only some punctuation from a string, let's say I want to remove all the punctuation except '#'
import string
remove = dict.fromkeys(map(ord, '\n ' + string.punctuation))
sample = 'The quick brown fox, like, totally jumped, #man!'
sample.translate(remove)
Here the output is
The quick brown fox like totally jumped man
But what I want is something like this
The quick brown fox like totally jumped #man
Is there a way to selectively remove punctuation from a text leaving out the punctuation that we want in the text intact?
str.punctuation contains all the punctuations. Remove # from it. Then replace with '' whenever you get that punctuation string.
>>> import re
>>> a = string.punctuation.replace('#','')
>>> re.sub(r'[{}]'.format(a),'','The quick brown fox, like, totally jumped, #man!')
'The quick brown fox like totally jumped #man'
Just remove the character you don't want to touch from the replacement string:
import string
remove = dict.fromkeys(map(ord, '\n' + string.punctuation.replace('#','')))
sample = 'The quick brown fox, like, totally jumped, #man!'
sample.translate(remove)
Also note that I changed '\n ' to '\n', as the former will remove spaces from your string.
Result:
The quick brown fox like totally jumped #man
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I'm trying to replace spaces with hyphens one at a time in each possible position in python. For example the man said hi should produce a list of all the possible hyphen positions, including multiple hyphens:
the-man said hi
the man-said hi
the man said-hi
the-man said-hi
the-man-said hi
the man-said-hi
the-man-said-hi
The length of the strings varies in number of spaces, so it can't be a fix for just 3 spaces. I've been experimenting with re.search and re.sub in a while loop, but haven't found a nice way yet.
Use itertools.product() to produce all space-and-dash combinations, then recombine your string with those:
from itertools import product
def dashed_combos(inputstring):
words = inputstring.split()
for combo in product(' -', repeat=len(words) - 1):
yield ''.join(w for pair in zip(words, combo + ('',)) for w in pair)
The last line zips the words together with the dashes and spaces (adding in an empty string at the end to make up the pairs), then flattens that and joins them into a single string.
Demo:
>>> for combo in dashed_combos('the man said hi'):
... print combo
...
the man said hi
the man said-hi
the man-said hi
the man-said-hi
the-man said hi
the-man said-hi
the-man-said hi
the-man-said-hi
You can always skip the first iteration of that loop (with only spaces) with itertools.islice():
from itertools import product, islice
def dashed_combos(inputstring):
words = inputstring.split()
for combo in islice(product(' -', repeat=len(words) - 1), 1, None):
yield ''.join(w for pair in zip(words, combo + ('',)) for w in pair)
All this is extremely memory efficient; you can easily handle inputs with hundreds of words, provided you don't try and store all possible combinations in memory at once.
Slightly longer demo:
>>> for combo in islice(dashed_combos('the quick brown fox jumped over the lazy dog'), 10):
... print combo
...
the quick brown fox jumped over the lazy-dog
the quick brown fox jumped over the-lazy dog
the quick brown fox jumped over the-lazy-dog
the quick brown fox jumped over-the lazy dog
the quick brown fox jumped over-the lazy-dog
the quick brown fox jumped over-the-lazy dog
the quick brown fox jumped over-the-lazy-dog
the quick brown fox jumped-over the lazy dog
the quick brown fox jumped-over the lazy-dog
the quick brown fox jumped-over the-lazy dog
>>> for combo in islice(dashed_combos('the quick brown fox jumped over the lazy dog'), 200, 210):
... print combo
...
the-quick-brown fox jumped-over the lazy-dog
the-quick-brown fox jumped-over the-lazy dog
the-quick-brown fox jumped-over the-lazy-dog
the-quick-brown fox jumped-over-the lazy dog
the-quick-brown fox jumped-over-the lazy-dog
the-quick-brown fox jumped-over-the-lazy dog
the-quick-brown fox jumped-over-the-lazy-dog
the-quick-brown fox-jumped over the lazy dog
the-quick-brown fox-jumped over the lazy-dog
the-quick-brown fox-jumped over the-lazy dog