This question already has answers here:
How do I put a variable’s value inside a string (interpolate it into the string)?
(9 answers)
How to use variables in SQL statement in Python?
(5 answers)
Closed 2 years ago.
Hi I want to concat outputFile variable instead of a filename in the below code in python
can anyone help me out?
outputFile= 'test.csv'
connection.execute("TRUNCATE travel_staging.upsell_test;"
"COPY travel_staging.upsell_test FROM 's3://folder/filename' WITH CREDENTIALS "
"'aws_access_key_id=xxx;aws_secret_access_key=xxxx'"
" FORMAT csv DELIMITER ',' IGNOREHEADER 1 DATEFORMAT 'auto' NULL AS 'null' MAXERROR 500 acceptinvchars;")
Expected output
TRUNCATE travel_staging.upsell_test;
COPY travel_staging.upsell_test FROM 's3://folder/test.csv' WITH CREDENTIALS
'aws_access_key_id=xxx;aws_secret_access_key=xxxx'
FORMAT csv DELIMITER ',' IGNOREHEADER 1 DATEFORMAT 'auto' NULL AS 'null' MAXERROR 500
To replace a variable in a string
Use f strings.
>>> fox = 'quick brown'
>>> dog = 'lazy'
>>> f'the {fox} fox jumps over the {dog} dog'
If you don't need the newlines
Use the \ to continue the statement on a new line.
>>> 'the quick brown fox jumps \
... over the lazy dog'
'the quick brown fox jumps over the lazy dog'
If you do need the new lines
Just have each line on a separate string.
>>> print('\n'.join([
... 'the quick brown fox jumps',
... 'over the lazy dog',
... ]))
the quick brown fox jumps
over the lazy dog
Related
We want to split a string multi line for example
|---------------------------------------------Title1(a)---------------------------------------------
Content goes here, the quick brown fox jumps over the lazy dog
|---------------------------------------------Title1(b)----------------------------------------------
Content goes here, the quick brown fox jumps over the lazy dog
here's our python split using regex code
import re
str1 = "|---------------------------------------------Title1(a)---------------------------------------------" \
"" \
"Content goes here, the quick brown fox jumps over the lazy dog" \
"" \
"|---------------------------------------------Title1(b)----------------------------------------------" \
"" \
"Content goes here, the quick brown fox jumps over the lazy dog" \
"|"
print(str1)
str2 = re.split("\|---------------------------------------------", str1)
print(str2)
We want the output to include only
str2[0]:
Content goes here, the quick brown fox jumps over the lazy dog
str2[1]:
Content goes here, the quick brown fox jumps over the lazy dog
what's the proper regex to use, or is there any other way to split using the format above
Instead of using split, you can match the lines and capture the part that you want in a group.
\|-{2,}[^-]+-{2,}([^-].*?)(?=\|)
Explanation
\| Match |
-{2,} Match 2 or more -
[^-]+ Match 1+ times any char except -
-{2,} Match 2 or more -
( Capture grou 1
[^-].*? match any char except -, then any char as least as possible
) Close group 1
(?=\|) Positive lookahead, assert a | to the right
Regex demo | Python demo
Example
import re
regex = r"\|-{2,}[^-]+-{2,}([^-].*?)(?=\|)"
str1 = "|---------------------------------------------Title1(a)---------------------------------------------" \
"" \
"Content goes here, the quick brown fox jumps over the lazy dog" \
"" \
"|---------------------------------------------Title1(b)----------------------------------------------" \
"" \
"Content goes here, the quick brown fox jumps over the lazy dog" \
"|"
str2 = re.findall(regex, str1);
print(str2[0])
print(str2[1])
Output
Content goes here, the quick brown fox jumps over the lazy dog
Content goes here, the quick brown fox jumps over the lazy dog
If Title should be part of the line, another option is to make the match a bit more precise.
\|-+Title\d+\([a-z]\)-+(.+?)(?=\||$)
Regex demo
I'm trying to compare the output of a speech-to-text API with a ground truth transcription. What I'd like to do is capitalize the words in the ground truth which the speech-to-text API either missed or misinterpreted.
For Example:
Truth:
The quick brown fox jumps over the lazy dog.
Speech-to-text Output:
the quick brown box jumps over the dog
Desired Result:
The quick brown FOX jumps over the LAZY dog.
My initial instinct was to remove the capitalization and punctuation from the ground truth and use difflib. This gets me an accurate diff, but I'm having trouble mapping the output back to positions in the original text. I would like to keep the ground truth capitalization and punctuation to display the results, even if I'm only interested in word errors.
Is there any way to express difflib output as word-level changes on an original text?
I would also like to suggest a solution using difflib but I'd prefer using RegEx for word detection since it will be more precise and more tolerant to weird characters and other issues.
I've added some weird text to your original strings to show what I mean:
import re
import difflib
truth = 'The quick! brown - fox jumps, over the lazy dog.'
speech = 'the quick... brown box jumps. over the dog'
truth = re.findall(r"[\w']+", truth.lower())
speech = re.findall(r"[\w']+", speech.lower())
for d in difflib.ndiff(truth, speech):
print(d)
Output
the
quick
brown
- fox
+ box
jumps
over
the
- lazy
dog
Another possible output:
diff = difflib.unified_diff(truth, speech)
print(''.join(diff))
Output
---
+++
## -1,9 +1,8 ##
the quick brown-fox+box jumps over the-lazy dog
Why not just split the sentence into words then use difflib on those?
import difflib
truth = 'The quick brown fox jumps over the lazy dog.'.lower().strip(
'.').split()
speech = 'the quick brown box jumps over the dog'.lower().strip('.').split()
for d in difflib.ndiff(truth, speech):
print(d)
So I think I've solved the problem. I realised that difflib's "contextdiff" provides indices of lines that have changes in them. To get the indices for the "ground truth" text, I remove the capitalization / punctuation, split the text into individual words, and then do the following:
altered_word_indices = []
diff = difflib.context_diff(transformed_ground_truth, transformed_hypothesis, n=0)
for line in diff:
if line.startswith('*** ') and line.endswith(' ****\n'):
line = line.replace(' ', '').replace('\n', '').replace('*', '')
if ',' in line:
split_line = line.split(',')
for i in range(0, (int(split_line[1]) - int(split_line[0])) + 1):
altered_word_indices.append((int(split_line[0]) + i) - 1)
else:
altered_word_indices.append(int(line) - 1)
Following this, I print it out with the changed words capitalized:
split_ground_truth = ground_truth.split(' ')
for i in range(0, len(split_ground_truth)):
if i in altered_word_indices:
print(split_ground_truth[i].upper(), end=' ')
else:
print(split_ground_truth[i], end=' ')
This allows me to print out "The quick brown FOX jumps over the LAZY dog." (capitalization / punctuation included) instead of "the quick brown FOX jumps over the LAZY dog".
This is...not a super elegant solution, and it's subject to testing, cleanup, error handling, etc. But it seems like a decent start and is potentially useful for someone else running into the same problem. I'll leave this question open for a few days in case someone comes up with a less gross way of getting the same result.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a list of words in a txt file, each one in a line with its definition next to them. However, the definition sometimes gives a sentence using the word. I want to replace that word repeated in the example with the symbol ~. How could I do this with Python?
Ok, here is my example of replacing every instance of a word in a sentence with another character...
>>> my_string = "the quick brown fox jumped over the lazy dog"
>>> search_word = "the"
>>> replacement_symbol = "~"
>>> my_string.replace(search_word, replacement_symbol)
'~ quick brown fox jumped over ~ lazy dog'
Obviously this doesn't cover loading in the file, reading it line by line and omitting the first instance of the word... Lets extend it a little.
words.txt
fox the quick brown fox jumped over the lazy dog
the the quick brown fox jumped over the lazy dog
jumped the quick brown fox jumped over the lazy dog
And to read this, strip the first word and then replace that word in the rest of the line...
with open('words.txt') as f:
for line in f.readlines():
line = line.strip()
search_term = line.split(' ')[0]
sentence = ' '.join(line.split(' ')[1:])
sentence = sentence.replace(search_term, '~')
line = '%s %s' % (search_term, sentence)
print(line)
and the output...
fox the quick brown ~ jumped over the lazy dog
the ~ quick brown fox jumped over ~ lazy dog
jumped the quick brown fox ~ over the lazy dog
Assuming the word and definition is separated by #:
with open('file.txt','r') as f:
for line in f:
myword,mydefinition=line.split("#")
if myword in mydefinition
mydefinition.replace(myword, "~")
This question already has answers here:
Is there a simple way to remove multiple spaces in a string?
(27 answers)
Closed 5 years ago.
How would I do this in python3?
The quick brown fox jumps over the lazy dog.
to...
The quick brown fox jumps over the lazy dog.
Where the above quote is a string.
You don't need to use regex here. You can achieve what you want like this ways:
a = "The quick brown fox jumps over the lazy dog."
final = " ".join(a.split())
print(final)
Output:
'The quick brown fox jumps over the lazy dog.'
I want Python to remove only some punctuation from a string, let's say I want to remove all the punctuation except '#'
import string
remove = dict.fromkeys(map(ord, '\n ' + string.punctuation))
sample = 'The quick brown fox, like, totally jumped, #man!'
sample.translate(remove)
Here the output is
The quick brown fox like totally jumped man
But what I want is something like this
The quick brown fox like totally jumped #man
Is there a way to selectively remove punctuation from a text leaving out the punctuation that we want in the text intact?
str.punctuation contains all the punctuations. Remove # from it. Then replace with '' whenever you get that punctuation string.
>>> import re
>>> a = string.punctuation.replace('#','')
>>> re.sub(r'[{}]'.format(a),'','The quick brown fox, like, totally jumped, #man!')
'The quick brown fox like totally jumped #man'
Just remove the character you don't want to touch from the replacement string:
import string
remove = dict.fromkeys(map(ord, '\n' + string.punctuation.replace('#','')))
sample = 'The quick brown fox, like, totally jumped, #man!'
sample.translate(remove)
Also note that I changed '\n ' to '\n', as the former will remove spaces from your string.
Result:
The quick brown fox like totally jumped #man