When writing to a text file, some of the file.write instances are followed by a linebreak in the output file and others aren't. I don't want linebreaks except where I tell them to occur. Code:
for doc,wc in wordcounts.items():
out.write(doc) #this works fine, no linebreak
for word in wordlist:
if word in wc: out.write("\t%d" % wc[word]) #linebreaks appear
else: out.write("\t0") #after each of these
out.write("\n") #this line had mixed spaces/tabs
What am I missing?
Update
I should have taken a clue from how the code pasted into SO. For some reason there was a mixture of spaces and tabs in the final line, such that in TextMate it visually appeared outside the "for word..." loop—but the interpreter was treating it as part of that loop. Converting spaces to tabs solved the problem.
Thanks for your input.
file.write() does not add any newlines if the string you write does not contain any \ns.
But you force a newline for each word in your word list using out.write("\n"), is that what you want?
for doc,wc in wordcounts.items():
out.write(doc) #this works fine, no linebreak
for word in wordlist:
if word in wc: out.write("\t%d" % wc[word]) #linebreaks appear
else: out.write("\t0") #after each of these
out.write("\n") #<--- NEWLINE ON EACH ITERATION!
Perhaps you indented out.write("\n") too far???
You write a line breaks after every word:
for word in wordlist:
...
out.write("\n")
Are these the line breaks you are seeing, or are there more additional ones?
You might need to perform a strip() on each wc[word]. Printing a single item from wc is would probably be enough to determine if there are already line breaks on those items that area causing this behavior.
Either that or the indentation on your final out.write("\n") is not doing what you intended it to do.
I think your indentation is wrong.
(also I took the liberty to make your if clause redundant and code more readable :)
for doc,wc in wordcounts.items()
out.write(doc)
for word in wordlist:
out.write("\t%d" % wc.get(word,0))
out.write("\n")
Related
I started playing with Python and programming in general like 3 weeks ago so be gentle ;)
What i try to do is convert text files the way i want them to be, the text files have same pattern but the words i want to replace are unknown. So the program must first find them, set a pattern and then replace them to words i want.
For example:
xxxxx
xxxxx
Line3 - word - xxxx xxxx
xxxxx xxxx
word
word
xxxx word
Legend:
xxxxx = template words, present in every file
word = random word, our target
I am able to localize first apperance of the word because it appears always in the same place of the file, from then it appears randomly.
MY code:
f1 = open('test.txt', 'r')
f2 = open('file2.txt', 'w')
pattern = ''
for line in f1.readlines():
if line.startswith('Seat 1'):
line = line.split(' ', 3)
pattern = line[2]
line = ' '.join(line)
f2.write(line)
elif pattern in line.strip():
f2.write(line.replace(pattern, 'NewWord'))
else:
f2.write(line)
f1.close()
f2.close()
This code doesnt work, whats wrong ?
welcome to the world of Python!
I believe you are on the right track and are very close to the correct solution, however I see a couple of potential issues which may cause your program to not run as expected.
If you are trying to see if a string equals another, I would use == instead of is (see this answer for more info)
When reading a file, lines end with \n which means your variable line might never match your word. To fix this you could use strip, which automatically removes leading and trailing "space" characters (like a space or a new line character)
elif line.strip() == pattern:
This is not really a problem but a recommendation, since you are just starting out. When dealing with files it is highly recommended to use the with statement that Python provides (see question and/or tutorial)
Update:
I saw that you might have the word be part of the line, do instead of using == as recommended in point 1, you could use in, but you need to reverse the order, i.e.
elif pattern in line:
I am new to programming and have already checked other people's questions to make sure that I am using a good method to replace tabs with spaces, know my regex is correct, and also understand what exactly my error is ("Unhashable type 'list'). But even still, I'm at a loss of what to do. Any help would be great!
I have a large file that I have broken up into lines. Ultimately I will need to access the first 3 elements of each line. Currently when I print a line, without the additional re.sub line of code, I get something like this: ['blah\tblah\tblah'], when I want ['blah blah blah'].
My code to do this is
f = open(text.txt)
raw = f.read()
raw = raw.lower()
lines = raw.splitlines()
lines = re.sub(r'\t', lines, '\s')
print lines[0:2] #just to see the first few examples
f.close()
When I print the first few lines without the regex sub bit, it works fine. And then when I add that line in attempt to change the lines, I get the error. I understand that lists are changeable and thus can't be a hashed... but I'm not trying to work with a hash. I'm just trying to replace \t with \s in a large text file to make the program easier to work with. I don't think there is a problem with how I am changing \t's to \s's, because according to this error, any way I change it will break my code. What do I do?! Any help is super appreciated. :')
You need to change the order of params present inside the re.sub function. And also note that you can't use regex \s as a second param in re.sub function. Syntax of re.sub must be re.sub(regex,replacement,string) .
lines = raw.splitlines()
lines = [re.sub(r'\t', ' ', line) for line in lines]
raw.splitlines() returns a list which was then assigned to a variable called lines. So you need to apply the re.sub function to each item present in the list, since re.sub won't directly be applied on a list.
I have a large textfile on my computer (location: /home/Seth/documents/bruteforce/passwords.txt) and I'm trying to find a specific string in the file. The list has one word per line and 215,000 lines/words. Does anyone know of simple Python script I can use to find a specific string?
Here's the code I have so far,
f = open("home/seth/documents/bruteforce/passwords.txt", "r")
for line in f.readlines():
line = str(line.lower())
print str(line)
if str(line) == "abe":
print "success!"
else:
print str(line)
I keep running the script, but it never finds the word in the file (and I know for sure the word is in the file).
Is there something wrong with my code? Is there a simpler method than the one I'm trying to use?
Your help is greatly appreciated.
Ps: I'm using Python 2.7 on a Debian Linux laptop.
I'd rather use the in keyword to look for a string in a line. Here I'm looking for the keyword 'KHANNA' in a csv file and for any such existence the code returns true.
In [121]: with open('data.csv') as f:
print any('KHANNA' in line for line in f)
.....:
True
It's just because you forgot to strip the new line char at the end of each line.
line = line.strip().lower()
would help.
Usually, when you read lines out of a file, they have a newline character at the end. Thus, they will technically not be equal to the same string without the newline character. You can get rid of this character by adding the line line=line.strip() before the test for equality to your target string. By default, the strip() method removes all white space (such as newlines) from the string it is called on.
What do you want to do? Just test whether the word is in the file? Here:
print 'abe' in open("passwords.txt").read().split()
Or:
print 'abe' in map(str.strip, open("passwords.txt"))
Or if it doesn't have to be Python:
egrep '^abe$' passwords.txt
EDIT: Oh, I forgot the lower. Probably because passwords are usually case sensitive. But if it really does make sense in your case:
print 'abe' in open("passwords.txt").read().lower().split()
or
print 'abe' in (line.strip().lower() for line in open("passwords.txt"))
or
print 'abe' in map(str.lower, map(str.strip, open("passwords.txt")))
Your script doesn't find the line because you didn't check for the newline characters:
Your file is made of many "lines". Each "line" ends with a character that you didn't account for - the newline character ('\n'1). This is the character that creates a new line - it is what gets written to the file when you hit enter. This is how the next line is created.
So, when you read the lines out of your file, the string contained in each line actually ends with a newline character. This is why your equality test fails. You should instead, test equality against the line, after it has been stripped of this newline character:
with open("home/seth/documents/bruteforce/passwords.txt") as infile:
for line in infile:
line = line.rstrip('\n')
if line == "abe":
print 'success!'
1 Note that on some machines, the newline character is in fact two characters - the carriage return (CR), and line-feed (LF). This terminology comes from back in the day when typewriters had to jump a line-width of space on the paper that was being written to, and that the carriage that contained the paper had to be returned to its starting position. When seen in a line in the file, this appears as '\r\n'
I have a jumble game that imports a random word from a text file. However, I think it is importing the return key. Once the word is jumbled I print it to screen and the word is split on two different lines.
How can I ignore the return key? If there is not a simple way to do this please let me know, because I will just settle for a tuple until I further my knowledge.
Thanks in advance.
When you're selecting a line from your file, you certainly get something like:
myword\n
... or \r and Mac or even \r\n on Windows... The sequence represents a break line, and you can easily remove it with Python built-in function.
Indeed, to avoid that, you can apply the .strip() function on the string to remove the \n and any undesired spaces:
>>> 'myword\n'.strip()
myword
I want to convert Python multiline string to a single line. If I open the string in a Vim , I can see ^M at the start of each line. How do I process the string to make it all in a single line with tab separation between each line. Example in Vim it looks like:
Serialnumber
^MName Rick
^MAddress 902, A.street, Elsewhere
I would like it to be something like:
Serialnumber \t Name \t Rick \t Address \t 902, A.street,......
where each string is in one line. I tried
somestring.replace(r'\r','\t')
But it doesn't work. Also, once the string is in a single line if I wanted a newline(UNIX newline?) at the end of the string how would I do that?
Deleted my previous answer because I realized it was wrong and I needed to test this solution.
Assuming that you are reading this from the file, you can do the following:
f = open('test.txt', 'r')
lines = f.readlines()
mystr = '\t'.join([line.strip() for line in lines])
As ep0 said, the ^M represents '\r', which the carriage return character in Windows. It is surprising that you would have ^M at the beginning of each line since the windows new-line character is \r\n. Having ^M at the beginning of the line indicates that your file contains \n\r instead.
Regardless, the code above makes use of a list comprehension to loop over each of the lines read from test.txt. For each line in lines, we call str.strip() to remove any whitespace and non-printing characters from the ENDS of each line. Finally, we call '\t'.join() on the resulting list to insert tabs.
You can replace "\r" characters by "\t".
my_string.replace("\r", "\t")
I use splitlines() to detect all types of lines, and then join everything together. This way you don't have to guess to replace \r or \n etc.
"".join(somestring.splitlines())
it is hard coding. But it works.
poem='''
If I can stop one heart from breaking,
I shall not live in vain;
If I can ease one life the aching,
Or cool one pain,
Or help one fainting robin
Unto his nest again,
I shall not live in vain.
'''
lst=list(poem)
str=''
for i in lst:
str+=i
print(str)
lst1=str.split("\n")
str1=""
for i in lst1:
str1+=i+" "
str2=str1[:-2]
print(str2)
This occurs of how VIM interprets CR (carriage return), used by Windows to delimit new lines. You should use just one editor (I personally prefer VIM). Read this: VIM ^M
This trick also can be useful, write "\n" as a raw string. Like :
my_string = my_string.replace(r"\n", "\t")
this should do the work:
def flatten(multiline):
lst = multiline.split('\n')
flat = ''
for line in lst:
flat += line.replace(' ', '')+' '
return flat
This should do the job:
string = """Name Rick
Address 902, A.street, Elsewhere"""
single_line = string.replace("\n", "\t")