I'm working on code to take a user input business name and print out reviews for it. When I run my final loop, I tell python to right justify the reviews by four spaces, yet nothing happens. I've tried multiple solutions and am honestly at a loss.
(Problem area is the very last line)
import json
import textwrap
import sys
f = open('reviews.json')
f1= open('businesses.json')
line1= f1.readline()
business= json.loads(line1)
line = f.readline()
review = json.loads(line)
idlist=[]
reviewlist=[]
bizname= raw_input('Enter a business name => ')
print bizname
for line in f1:
business= json.loads(line)
if bizname == business['name']:
idlist.append(business['business_id'])
if len(idlist)==0:
print 'This business is not found'
sys.exit()
for line in f:
review = json.loads(line)
for item in idlist:
if item == review['business_id']:
reviewlist.append(review['text'])
if len(reviewlist)==0:
print 'No reviews for this business are found'
sys.exit()
for i in range(len(reviewlist)):
w = textwrap.TextWrapper(replace_whitespace=False)
print 'Review',str(i+1)+':'
print w.fill(reviewlist[i] , ).rjust(4,' ')
I suggest you to verify the output print w.fill(reviewlist[i] , ).
the lenght may less than 4. so it looks like not working. e.g. 'abcdef'.rjust(4, ' ')
>>> 'abcdef'.rjust(4, ' ')
'abcdef'
>>> 'abcdef'.rjust(20, ' ')
' abcdef'
https://docs.python.org/2/library/string.html#string.rjust
"Right justify by 4 spaces" doesn't makes sense, so it's unclear what you really want. The first argument to .rjust() is the total width of the field, and if the string is already at least that long nothing at all is done. Some examples:
>>> "abcde".rjust(4, " ") # nothing done: 5 > 4
'abcde'
>>> "abcd".rjust(4, " ") # nothing done: 4 == 4
'abcd'
>>> "abc".rjust(4, " ") # extends to 4 with 1 blank on left
' abc'
>>> "ab".rjust(4, " ") # extends to 4 with 2 blanks on left
' ab'
>>> "a".rjust(4, " ") # extends to 4 with 3 blanks on left
' a'
>>> "".rjust(4, " ") # # extends to 4 with 4 blanks
' '
Assuming that you actually want to indent the text, you can do it with the TextWrapper object:
indent = ' ' * 4
w = textwrap.TextWrapper(replace_whitespace=False, initial_indent=indent, subsequent_indent=indent)
Demo
>>> indent = ' ' * 4
>>> w = textwrap.TextWrapper(width=20, replace_whitespace=False, initial_indent=indent, subsequent_indent=indent)
>>> print(w.fill('A longish paragraph to demonstrate indentation with TextWrapper objects.'))
A longish
paragraph to
demonstrate
indentation with
TextWrapper
objects.
Note that the indent is included in the line width, so you might want to adjust the width accordingly:
>>> w = textwrap.TextWrapper(width=20+len(indent), replace_whitespace=False, initial_indent=indent, subsequent_indent=indent)
>>> print(w.fill('A longish paragraph to demonstrate indentation with TextWrapper objects.'))
A longish paragraph
to demonstrate
indentation with
TextWrapper objects.
Most likely it doesn't work because wrap() returns a single string that is much longer than 4 characters. Example:
'hello'.rjust(3, '*')
output:
'hello'
While, if you do:
'hello'.rjust(10, '*')
Output:
'*****hello'
So, if I understand what you are trying to do, you may need to split the wrapped string and then apply the right justification to each string in the list, while you print it:
wrapped = w.fill(reviewlist[i], )
for line in wrapped.split('\n'):
print line.rjust(4, ' ')
Although I am not sure that justifying on a width of only four characters is really what you need.
There's a couple of problems you're facing here:
.rjust(4,' ') says you want the result to be 4 characters wide, not that you want to indent the line by 4 spaces.
.rjust() just looks at the length of the string, and after you've run it through textwrap it has a bunch of newlines in it that make the length of the string different than the width it prints out to.
You don't want to right justify, really, you want to indent.
The solution given above about indents is correct.
Formatting text through fixed space fonts is very old school, but also very fragile. Perhaps you could think about generating HTML output in a subsequent revision of your application. HTML tables work well for this and are appropriate for tabular data. Alternatively, consider doing a CSV file, and then you can import the result into Excel.
import textwrap
import re
# A sample input string.
inputStr = 'This is a long string which I want to right justify by 70 chars with four spaces on left'
# Wrap by 70 (default) chars. This would result in a multi-line string
w = textwrap.fill(inputStr, )
# Using RegEx read the lines and right justify for 75 chars.
m = re.sub("^(\w+.*)$", lambda g : g.group(0).rjust(75), w, flags = re.MULTILINE)
# Print the result
print(m)
Related
I'm very new to Python and am working on some code to manipulate equations. So far, my code asks the user to input an equation. I want my code to be able to take something like "X + 1 = 2" and turn it into "x+1=2" so that the rest of my code will work, no matter the format of the entered equation.
To convert to lower case and strip out any spaces use lower and replace.
the_input = 'X + 1 = 2'
the_output = the_input.replace(' ', '').lower()
# x+1=2
A simple string .replace(old, new) followed by a .lower() will be sufficient.
"X + 1 = 2".replace(" ", "").lower() # 'x+1=2'
"X + 1 = 2".replace(" ", "").lower() # 'x+1=2'
for a more thorough replacement of all white space characters and not just spaces use python's re module:
import re
re.sub(r'\s+', '', "X + 1 = 2").lower() # 'x+1=2'
I am writing a code that needs to get four individual values, and one of the values has the newline character in addition to an extra apostrophe and bracket like so: 11\n']. I only need the 11 and have been able to strip the '], but I am unable to remove the newline character.
I have tried various different set ups of strip and replace, and both strip and replace are not removing the part.
with open('gil200110raw.txt', 'r') as qcfile:
txt = qcfile.readlines()
line1 = txt[1:2]
line2 = txt[2:3]
line1 = str(line1)
line2 = str(line2)
sptline1 = line1.split(' ')
sptline2 = line2.split(' ')
totalobs = sptline1[39]
qccalc1 = sptline2[2]
qccalc2 = sptline2[9]
qccalc3 = sptline2[16]
qccalc4 = sptline2[22]
qccalc4 = qccalc4.strip("\n']")
qccalc4 = qccalc4.replace("\n", "")
I did not get an error, but the output of print(qccalc4) is 11\n. I expect the output to be 11.
Use rstrip instead!
>>> 'test string\n'.rstrip()
'test string'
You can use regex to match the outputs you're looking for.
From your description, I assume it is all integers, consider the following snippet
import re
p = re.compile('[0-9]+')
sample = '11\n\'] dwqed 12 444'
results = p.findall(sample)
results would now contain the array ['11', '12', '444'].
re is the regex package for python and p is the pattern we would like to find in our text, this pattern [0-9]+ simply means match one or more characters 0 to 9
you can find the documentation here
I'm working on a exercism.io exercise in Python where one of the tests requires that I convert an SGF value with escape characters into one without. I don't know why they leave newline characters intact, however.
input_val = "\\]b\nc\nd\t\te \n\\]"
output_val = "]b\nc\nd e \n]"
I tried some codecs and ats functions to no avail. Any suggestions? Thanks in advance.
The purpose of your exercise is unclear, but the solution is trivial:
input_val.replace("\\", "").replace("\t", " ")
You can use this code:
def no_escapes(text): # get text argument
# get a list of strings split with \ and join them together without it
text = text.split('\\')
text = [t.split('\t') for t in text]
text = [i for t in text for i in t]
return ''.join(text)
It will first turn "\\]b\nc\nd\t\te \n\\]" into ["]b\nc\nd\te \n"]. It'll then turn it into [["]b\nc\nd", "e \n"]]. Next, it'll flatten it out into ["]b\nc\nd", "e \n"] and it'll join them together without anything between the strings, so you'll end up with "]b\nc\nd e \n]"
Example:
>>> print(no_escapes('\\yeet\nlol\\'))
yeet
lol
And if you want it raw:
>>> string = no_escapes('\\yeet\nlol\\')
>>> print(f'{string!r}')
yeet\nlol
After looking at SGF text value rules here which says, 'all whitespaces except line breaks become spaces,' I came up with this solution. It oddly doesn't say '\\' characters should be erased, though. Not sure if there's a cleaner way to do this?
s = '\\]b\nc\nd\t\te \n\\]'
r = re.sub(r'[^\S\n]', ' ', s).replace(r'\\', '')
print(r)
# ']b\nc\nd e \n]'
I’m writing a program that has to replace the string “+” by “!”, and strings “*+” by “!!” in a particular text. As an example, I need to go from:
some_text = ‘here is +some*+ text and also +some more*+ text here’
to
some_text_new = ‘here is !some!! text and also !some more!! text here’
You’ll notice that “+” and “*+” enclose particular words in my text. After I run the program, those words need be enclosed between “!” and “!!” instead.
I wrote the following code but it iterates several times before giving the right output. How can I avoid that iteration?….
def many_cues(value):
if has_cue_marks(value) is True:
add_text(value)
#print value
def has_cue_marks(value):
return '+' in value and'+*' in value
def add_text(value):
n = '+'
m = "+*"
text0 = value
for n in text0:
text1 = text0.replace(n, ‘!', 3)
print text1
for m in text0:
text2 = text0.replace(m, ‘!!’, 3)
print text2
>>> x = 'here is +some*+ text and also +some more*+ text here'
>>> x = x.replace('*+','!!')
>>> x
'here is +some!! text and also +some more!! text here'
>>> x = x.replace('+','!')
>>> x
'here is !some!! text and also !some more!! text here'
The final argument to replace is optional - if you leave it out, it will replace all instances of the word. So, just use replace on the larger substring first so you don't accidentally take out some of the smaller, then use replace on the smaller, and you should be all set.
It can be done using regex groups
import re
def replacer(matchObj):
if matchObj.group(1) == '*+':
return '!!'
elif matchObj.group(2) == '+'
return '!'
text = 'here is +some*+ text and also +some more*+ text here'
replaced = re.sub(r'(\*\+)|(\+)', replacer, text)
Notice that the order of the groups are important since you have common characters in the two patterns you want to replace
Basically, I'm asking the user to input a string of text into the console, but the string is very long and includes many line breaks. How would I take the user's string and delete all line breaks to make it a single line of text. My method for acquiring the string is very simple.
string = raw_input("Please enter string: ")
Is there a different way I should be grabbing the string from the user? I'm running Python 2.7.4 on a Mac.
P.S. Clearly I'm a noob, so even if a solution isn't the most efficient, the one that uses the most simple syntax would be appreciated.
How do you enter line breaks with raw_input? But, once you have a string with some characters in it you want to get rid of, just replace them.
>>> mystr = raw_input('please enter string: ')
please enter string: hello world, how do i enter line breaks?
>>> # pressing enter didn't work...
...
>>> mystr
'hello world, how do i enter line breaks?'
>>> mystr.replace(' ', '')
'helloworld,howdoienterlinebreaks?'
>>>
In the example above, I replaced all spaces. The string '\n' represents newlines. And \r represents carriage returns (if you're on windows, you might be getting these and a second replace will handle them for you!).
basically:
# you probably want to use a space ' ' to replace `\n`
mystring = mystring.replace('\n', ' ').replace('\r', '')
Note also, that it is a bad idea to call your variable string, as this shadows the module string. Another name I'd avoid but would love to use sometimes: file. For the same reason.
You can try using string replace:
string = string.replace('\r', '').replace('\n', '')
You can split the string with no separator arg, which will treat consecutive whitespace as a single separator (including newlines and tabs). Then join using a space:
In : " ".join("\n\nsome text \r\n with multiple whitespace".split())
Out: 'some text with multiple whitespace'
https://docs.python.org/2/library/stdtypes.html#str.split
The canonic answer, in Python, would be :
s = ''.join(s.splitlines())
It splits the string into lines (letting Python doing it according to its own best practices). Then you merge it. Two possibilities here:
replace the newline by a whitespace (' '.join())
or without a whitespace (''.join())
updated based on Xbello comment:
string = my_string.rstrip('\r\n')
read more here
Another option is regex:
>>> import re
>>> re.sub("\n|\r", "", "Foo\n\rbar\n\rbaz\n\r")
'Foobarbaz'
If anybody decides to use replace, you should try r'\n' instead '\n'
mystring = mystring.replace(r'\n', ' ').replace(r'\r', '')
A method taking into consideration
additional white characters at the beginning/end of string
additional white characters at the beginning/end of every line
various end-line characters
it takes such a multi-line string which may be messy e.g.
test_str = '\nhej ho \n aaa\r\n a\n '
and produces nice one-line string
>>> ' '.join([line.strip() for line in test_str.strip().splitlines()])
'hej ho aaa a'
UPDATE:
To fix multiple new-line character producing redundant spaces:
' '.join([line.strip() for line in test_str.strip().splitlines() if line.strip()])
This works for the following too
test_str = '\nhej ho \n aaa\r\n\n\n\n\n a\n '
Regular expressions is the fastest way to do this
s='''some kind of
string with a bunch\r of
extra spaces in it'''
re.sub(r'\s(?=\s)','',re.sub(r'\s',' ',s))
result:
'some kind of string with a bunch of extra spaces in it'
The problem with rstrip() is that it does not work in all cases (as I myself have seen few). Instead you can use
text = text.replace("\n"," ")
This will remove all new line '\n' with a space.
You really don't need to remove ALL the signs: lf cr crlf.
# Pythonic:
r'\n', r'\r', r'\r\n'
Some texts must have breaks, but you probably need to join broken lines to keep particular sentences together.
Therefore it is natural that line breaking happens after priod, semicolon, colon, but not after comma.
My code considers above conditions. Works well with texts copied from pdfs.
Enjoy!:
def unbreak_pdf_text(raw_text):
""" the newline careful sign removal tool
Args:
raw_text (str): string containing unwanted newline signs: \\n or \\r or \\r\\n
e.g. imported from OCR or copied from a pdf document.
Returns:
_type_: _description_
"""
pat = re.compile((r"[, \w]\n|[, \w]\r|[, \w]\r\n"))
breaks = re.finditer(pat, raw_text)
processed_text = raw_text
raw_text = None
for i in breaks:
processed_text = processed_text.replace(i.group(), i.group()[0]+" ")
return processed_text