so i am very very new to python.
need some basic help.
my logic is to find words in text file.
party A %aapple 1
Party B %bat 2
Party C c 3
i need to find all the words starts from %.
my code is
searchfile = open("text.txt", "r")
for line in searchfile:
for char in line:
if "%" in char:
print char
searchfile.close()
but the output is only the % character. I need the putput to be %apple and %bat
any help?
You are not reading the file properly.
searchfile = open("text.txt", "r")
lines = [line.strip() for line in searchfile.readlines()]
for line in lines:
for word in line.split(" "):
if word.startswith("%"):
print word
searchfile.close()
You should also explore regex to solve this as well.
For the sake of exemplification, I'm following up on Bipul Jain's reccomendation of showing how this can be done with regex:
import re
with open('text.txt', 'r') as f:
file = f.read()
re.findall(r'%\w+', file)
results:
['%apple', '%bat']
Related
I have an text file having the content as follows
1 0.374023 0.854818 0.138672 0.230469
0 0.939941 0.597005 0.118164 0.782552
1 0.826118 0.582643 0.347764 0.803151
1 0.503418 0.822266 0.100586 0.240885
I want to replace "1", at the beginning, with "80"
like following:
80 0.374023 0.854818 0.138672 0.230469
0 0.939941 0.597005 0.118164 0.782552
80 0.826118 0.582643 0.347764 0.803151
80 0.503418 0.822266 0.100586 0.240885
keeping the rest of the content same.
Try doing in two step file opening:
with open("a_file.txt","r") as f:
lines = a.readlines()
lines = ["80"+line[1:] if line[0:2]=="1 " else line for line in lines]
#OR
lines =["80"+line[1:] if line.split(maxsplit=1)[0] == "1" else line for line in lines]
with open("a_file.txt","w") as f:
for line in lines:
f.write(line)
If you do not need to use python, then this can be done easily with sed using the following:
sed -i 's/^1/80/g' input_file.txt
The s/<regex>/<replacement>/g means replace all occurrences of <regex> with <replacement>. The ^1 is a regular expression that means "match any '1' at the beginning of the line".
Alternatively, the following python code will do the same thing:
file = open("input_file.txt")
lines = file.readlines()
outFile = open("input_file.txt", "w")
for line in lines:
split = line.strip().split(" ")
split[0] = 80 if (split[0] == "1") else split[0]
print(*split, file=outFile)
Where you just loop through each line, and replace "1" at the beginning of the line with 80.
A short version using regex and list comprehension:
import re
with open('source.txt', 'r') as infile:
reformatted = [re.sub(r'^1', '80', line) for line in infile.readlines()]
with open('source.txt', 'w') as outfile:
[outfile.write(line) for line in reformatted]
The input file contains lines made up of [white]space delimited tokens. If the first token equals '1' change it to '80'.
This can be achieved as follows:
with open('foo.txt', 'r+') as foo:
lines = foo.readlines()
foo.seek(0)
for line in lines:
if (tokens := line.split())[0] == '1':
tokens[0] = '80'
print(*tokens, file=foo)
foo.truncate()
Note:
Use of truncate isn't really necessary in this case because the file size will not increase but is a failsafe pattern for this kind of read/rewrite process
Apologies for total noob question. I'm trying to learn Python. I've searched this site and have not found a solution for me. Please let me know if one has already been explained.
I'm trying to make a search program for a list of movie titles in a .txt file, and I want it to print every line that contains the words inputted by the user.
For example, if two of the lines in the text file are
22. It Happened One Night (1934)
23. Wonder Woman (2017)
and the user inputs "on" I would like both of these (and any others) to appear, since both contain the "on" at some point.
I have tried using
with open("movies.txt", "r") as f:
searchlines = f.readlines()
for i, line in enumerate(searchlines):
if searchphrase in line:
for l in searchlines[i:i+3]: print(l),
print
but this did not work for me.
Something like this maybe:
with open("movies.txt", "r") as file:
lines = file.read().splitlines()
keyword = "on"
filtered_lines = filter(lambda line: keyword.casefold() in line.casefold(), lines)
for line in filtered_lines:
print(line)
You can refer snippet code bellow
def search(text_search):
with open("movies.txt", "r") as f:
searchlines = f.readlines()
for i, line in enumerate(searchlines):
# if line contains text_search we will print it
if text_search in line:
print(line)
if __name__ == "__main__":
search("One")
def search(text_search):
with open("movies.txt", "r") as f:
searchlines = f.readlines()
for i, line in enumerate(searchlines):
if text_search.casefold() in line.casefold():
print(line)
Example call your function with: search("ON")
Edited indentation
I have a function that loops through a file that Looks like this:
"#" XDI/1.0 XDAC/1.4 Athena/0.9.25
"#" Column.4: pre_edge
Content
That is to say that after the "#" there is a comment. My function aims to read each line and if it starts with a specific word, select what is after the ":"
For example if I had These two lines. I would like to read through them and if the line starts with "#" and contains the word "Column.4" the word "pre_edge" should be stored.
An example of my current approach follows:
with open(file, "r") as f:
for line in f:
if line.startswith ('#'):
word = line.split(" Column.4:")[1]
else:
print("n")
I think my Trouble is specifically after finding a line that starts with "#" how can I parse/search through it? and save its Content if it contains the desidered word.
In case that # comment contain str Column.4: as stated above, you could parse it this way.
with open(filepath) as f:
for line in f:
if line.startswith('#'):
# Here you proceed comment lines
if 'Column.4' in line:
first, remainder = line.split('Column.4: ')
# Remainder contains everything after '# Column.4: '
# So if you want to get first word ->
word = remainder.split()[0]
else:
# Here you can proceed lines that are not comments
pass
Note
Also it is a good practice to use for line in f: statement instead of f.readlines() (as mentioned in other answers), because this way you don't load all lines into memory, but proceed them one by one.
You should start by reading the file into a list and then work through that instead:
file = 'test.txt' #<- call file whatever you want
with open(file, "r") as f:
txt = f.readlines()
for line in txt:
if line.startswith ('"#"'):
word = line.split(" Column.4: ")
try:
print(word[1])
except IndexError:
print(word)
else:
print("n")
Output:
>>> ['"#" XDI/1.0 XDAC/1.4 Athena/0.9.25\n']
>>> pre_edge
Used a try and except catch because the first line also starts with "#" and we can't split that with your current logic.
Also, as a side note, in the question you have the file with lines starting as "#" with the quotation marks so the startswith() function was altered as such.
with open('stuff.txt', 'r+') as f:
data = f.readlines()
for line in data:
words = line.split()
if words and ('#' in words[0]) and ("Column.4:" in words):
print(words[-1])
# pre_edge
I am trying to write a program where I count the most frequently used words from one file but those words should not be available in another file. So basically I am reading data from test.txt file and counting the most frequently used word from that file, but that word should not be found in test2.txt file.
Below are sample data files, test.txt and test2.txt
test.txt:
The Project is for testing. doing some testing to find what's going on. the the the.
test2.txt:
a
about
above
across
after
afterwards
again
against
the
Below is my script, which parses files test.txt and test2.txt. It finds the most frequently used words from test.txt, excluding words found in test2.txt.
I thought I was doing everything right, but when I execute my script, it gives "the" as the most frequent word. But actually, the result should be "testing", as "the" is found in test2.txt but "testing" is not found in test2.txt.
from collections import Counter
import re
dgWords = re.findall(r'\w+', open('test.txt').read().lower())
f = open('test2.txt', 'rb')
sWords = [line.strip() for line in f]
print(len(dgWords));
for sWord in sWords:
print (sWord)
print (dgWords)
while sWord in dgWords: dgWords.remove(sWord)
print(len(dgWords));
mostFrequentWord = Counter(dgWords).most_common(1)
print (mostFrequentWord)
Here's how I'd go about it - using sets
all_words = re.findall(r'\w+', open('test.txt').read().lower())
f = open('test2.txt', 'rb')
stop_words = [line.strip() for line in f]
set_all = set(all_words)
set_stop = set(stop_words)
all_only = set_all - set_stop
print Counter(filter(lambda w:w in all_only, all_words)).most_common(1)
This should be slightly faster as well as you do a counter on only 'all_only' words
I simply changed the following line of your original code
f = open('test2.txt', 'rb')
to
f = open('test2.txt', 'r')
and it worked. Simply read your text as string instead of binaries. Otherwise they won't match in regex. Tested on python 3.4 eclipse PyDev Win7 x64.
OFFTOPIC:
It's more pythonic to open files using with statements. In this case, write
with open('test2.txt', 'r') as f:
and indent file processing statements accordingly. That should keep you away from forgetting to close the filestream.
import re
from collections import Counter
with open('test.txt') as testfile, open('test2.txt') as stopfile:
stopwords = set(line.strip() for line in stopfile)
words = Counter(re.findall(r'\w+', open('test.txt').read().lower()))
for word in stopwords:
if word in words:
words.pop(word)
print("the most frequent word is", words.most_common(1))
So the text file I have is formatted something like this:
a
b
c
I know how to strip() and rstrip() but I want to get rid of the empty lines.
I want to make it shorter like this:
a
b
c
You could remove all blank lines (lines that contain only whitespace) from stdin and/or files given at the command line using fileinput module:
#!/usr/bin/env python
import sys
import fileinput
for line in fileinput.input(inplace=True):
if line.strip(): # preserve non-blank lines
sys.stdout.write(line)
You can use regular expressions :
import re
txt = """a
b
c"""
print re.sub(r'\n+', '\n', txt) # replace one or more consecutive \n by a single one
However, lines with spaces won't be removed. A better solution is :
re.sub(r'(\n[ \t]*)+', '\n', txt)
This way, wou will also remove leading spaces.
Simply remove any line that only equals "\n":
in_filename = 'in_example.txt'
out_filename = 'out_example.txt'
with open(in_filename) as infile, open(out_filename, "w") as outfile:
for line in infile.readlines():
if line != "\n":
outfile.write(line)
If you want to simply update the same file, close and reopen it to overwrite it with the new data:
filename = 'in_example.txt'
filedata = ""
with open(filename, "r") as infile:
for line in infile.readlines():
if line != "\n":
filedata += line
with open(filename, "w") as outfile:
outfile.write(filedata)