Remove return key (\n) from read file - python

I have a jumble game that imports a random word from a text file. However, I think it is importing the return key. Once the word is jumbled I print it to screen and the word is split on two different lines.
How can I ignore the return key? If there is not a simple way to do this please let me know, because I will just settle for a tuple until I further my knowledge.
Thanks in advance.

When you're selecting a line from your file, you certainly get something like:
myword\n
... or \r and Mac or even \r\n on Windows... The sequence represents a break line, and you can easily remove it with Python built-in function.
Indeed, to avoid that, you can apply the .strip() function on the string to remove the \n and any undesired spaces:
>>> 'myword\n'.strip()
myword

Related

How to automatically change a particular word while writing to a file in python?

Say a method returns me a long list of lines which I am writing to file. Now on fly is there any way I can change the word "Bread" to "Breakfast", assuming word "Bread" actually exists in several places of my file that is being generated.
Thanks.
I have assigned the sys.stdout to file object, that way all my console print goes to file. So on fly hack would be great.
You could use regular expressions.
import re
word = 'Bread'
rword = 'Breakfast'
line = 'This is a piece of Bread'
line = re.sub(r'\b{0}\b'.format(re.escape(word)), rword, line)
# 'This is a piece of Breakfast'
The advantage of using regular expressions is that it can detect word boundaries (ie. the \b). This prevents it from replacing words that contain your word (ie. Breadth).
You could do this line by line, or replace the word in the whole document at once.
Assuming that it is the list that you want to be changed rather than the file, and assuming that the list is called lines:
lines = [line.replace("Bread", "Breakfast") for line in lines]
You can use the replace string method, like:
text.replace('Bread', 'Breakfast')
Note that this doesn't check if it is a 'word', so it would also change 'Bready' to 'Breakfasty'.
str.replace('bread', 'breakfast) where bread is being replaced by breakfast.

Trying to replace \t with \s with regex in Python, but as a result "Unhashable type:list" error

I am new to programming and have already checked other people's questions to make sure that I am using a good method to replace tabs with spaces, know my regex is correct, and also understand what exactly my error is ("Unhashable type 'list'). But even still, I'm at a loss of what to do. Any help would be great!
I have a large file that I have broken up into lines. Ultimately I will need to access the first 3 elements of each line. Currently when I print a line, without the additional re.sub line of code, I get something like this: ['blah\tblah\tblah'], when I want ['blah blah blah'].
My code to do this is
f = open(text.txt)
raw = f.read()
raw = raw.lower()
lines = raw.splitlines()
lines = re.sub(r'\t', lines, '\s')
print lines[0:2] #just to see the first few examples
f.close()
When I print the first few lines without the regex sub bit, it works fine. And then when I add that line in attempt to change the lines, I get the error. I understand that lists are changeable and thus can't be a hashed... but I'm not trying to work with a hash. I'm just trying to replace \t with \s in a large text file to make the program easier to work with. I don't think there is a problem with how I am changing \t's to \s's, because according to this error, any way I change it will break my code. What do I do?! Any help is super appreciated. :')
You need to change the order of params present inside the re.sub function. And also note that you can't use regex \s as a second param in re.sub function. Syntax of re.sub must be re.sub(regex,replacement,string) .
lines = raw.splitlines()
lines = [re.sub(r'\t', ' ', line) for line in lines]
raw.splitlines() returns a list which was then assigned to a variable called lines. So you need to apply the re.sub function to each item present in the list, since re.sub won't directly be applied on a list.

Python: Regex a dictionary using user input wildcards

I would like to be able to search a dictionary in Python using user input wildcards.
I have found this:
import fnmatch
lst = ['this','is','just','a','test', 'thing']
filtered = fnmatch.filter(lst, 'th*')
This matches this and thing. Now if I try to input a whole file and search through
with open('testfilefolder/wssnt10.txt') as f:
file_contents = f.read().lower()
filtered = fnmatch.filter(file_contents, 'th*')
this doesn't match anything. The difference is that in the file that I am reading from I is a text file (Shakespeare play) so I have spaces and it is not a list. I can match things such as a single letter, so if I just have 't' then I get a bunch of t's. So this tells me that I am matching single letters - I however am wanting to match whole words - but even more, to preserve the wildcard structure.
Since what I would like to happen is that a user enters in text (including what will be a wildcard) that I can substitute it in to the place that 'th*' is. The wild card would do what it should still. That leads to the question, can I just stick in a variable holding the search text in for 'th*'? After some investigation I am wondering if I am somehow supposed to translate the 'th*' for example and have found something such as:
regex = fnmatch.translate('th*')
print(regex)
which outputs th.*\Z(?ms)
Is this the right way to go about doing this? I don't know if it is needed.
What would be the best way in going about "passing in regex formulas" as well as perhaps an idea of what I have wrong in the code as it is not operating on the string of incoming text in the second set of code as it does (correctly) in the first.
If the problem is just that you "have spaces and it is not a list," why not make it into a list?
with open('testfilefolder/wssnt10.txt') as f:
file_contents = f.read().lower().split(' ') # split line on spaces to make a list
filtered = fnmatch.filter(file_contents, 'th*')

python print string on multiple lines

I have a function that can only accept strings. (it creates the image with the string, but the string has little formatting and no word wrapping, so a long string will just bleed right through the edge of the image and keep going into the abyss, when in reality I would have liked it to create a paragraph, instead of a one line infinity).
I need it print with line breaks. Currently the file is being readin using
inputFiles.readlines()
so that this reads the entire file. Storing file.readLines() creates a list. So this list cannot be passed to my function looking for a string.
I used
inputFileContent = ' \n'.join(inputFiles.readLines())
in an attempt to force hard line breaks into the string between each list item. This does not work (edit: elaboration here) which means that the inputFileContent string does not have line breaks even though I put '\n' between the list elements. From my understanding, the readLines() function puts the individual lines into individual elements of a list.
any suggestions? Thank you
Use inputFiles.read() which creates a string. Does that help?
The 'join' should have worked. Your problem may be that the writing of the string ignores newline characters. You could maybe try '\r\n'.join(...)

Suppress linebreak on file.write

When writing to a text file, some of the file.write instances are followed by a linebreak in the output file and others aren't. I don't want linebreaks except where I tell them to occur. Code:
for doc,wc in wordcounts.items():
out.write(doc) #this works fine, no linebreak
for word in wordlist:
if word in wc: out.write("\t%d" % wc[word]) #linebreaks appear
else: out.write("\t0") #after each of these
out.write("\n") #this line had mixed spaces/tabs
What am I missing?
Update
I should have taken a clue from how the code pasted into SO. For some reason there was a mixture of spaces and tabs in the final line, such that in TextMate it visually appeared outside the "for word..." loop—but the interpreter was treating it as part of that loop. Converting spaces to tabs solved the problem.
Thanks for your input.
file.write() does not add any newlines if the string you write does not contain any \ns.
But you force a newline for each word in your word list using out.write("\n"), is that what you want?
for doc,wc in wordcounts.items():
out.write(doc) #this works fine, no linebreak
for word in wordlist:
if word in wc: out.write("\t%d" % wc[word]) #linebreaks appear
else: out.write("\t0") #after each of these
out.write("\n") #<--- NEWLINE ON EACH ITERATION!
Perhaps you indented out.write("\n") too far???
You write a line breaks after every word:
for word in wordlist:
...
out.write("\n")
Are these the line breaks you are seeing, or are there more additional ones?
You might need to perform a strip() on each wc[word]. Printing a single item from wc is would probably be enough to determine if there are already line breaks on those items that area causing this behavior.
Either that or the indentation on your final out.write("\n") is not doing what you intended it to do.
I think your indentation is wrong.
(also I took the liberty to make your if clause redundant and code more readable :)
for doc,wc in wordcounts.items()
out.write(doc)
for word in wordlist:
out.write("\t%d" % wc.get(word,0))
out.write("\n")

Categories