Find case sensitive words in text file using python - python

I have a text file need to search it may be uppercase or lowercase letters in file using python

Maybe you should spent more time writing the question, if you expect us to invest time in answering it. Nevertheless, from what I understand you are looking for something like this:
import sys
with open(sys.argv[0], "r") as f:
for row in f:
for chr in row:
if chr.isupper():
print chr, "uppercase"
else:
print chr, "lowercase"

Related

Modifying letters in a file

I'm new to programming so I'm pretty lost. I'm currently learning Python and I need to open a text file and change every letter to the next one in the alphabet (e.g a -> b, b -> c, etc.). How would I go about writing a code like this?
This sounds like a neat problem to work on for a beginner.
Things you may want to look at:
The open() function, which allows you to open files and read/write to them. For example
https://docs.python.org/3/library/functions.html#open
with open('test.out', 'r+') as fi:
all_lines = fi.readlines() # Read all lines from the file
fi.write('this string will be written to the file')
# The file is closed at this point in the code; `with()` is a context manager, look that up
The os.replace() function, which lets you overwrite one file with another. You might try reading the input file, writing to a new output file, then overwriting the input file with the new output file; this will let you do that.
https://docs.python.org/3/library/os.html#os.replace
Replacing a character with the next increment of a character is an interesting twist, as it's not something that a lot of python programmers have to deal with. Here's one way to increment a character:
x = 'c'
print(chr(ord(x) + 1)) # will print 'd'
Without just giving away the answer, this should give you the pieces that you need to get started, feel free to ask more questions.
I think that this will work very well. The code can be shortened I think but Im still not sure how. Not an expert with with open statements.
with open("(your text file path)", "r") as f:
data = f.readline()
new_data = ""
for x in range(len(data)):
i = ord(data[x][0])
i += 1
x = chr(i)
new_data += x
print(new_data)
with open("(your text file path)", "w") as f:
f.write(new_data)
You must change your letters to numbers so that you can increment them by one, and then change them back to letters. This should work.

Splitting text at a determined character using Python

I'm trying to write a program that takes a .txt file with a messy text, read it and every time it comes across a full stop (.) it should create a new line, essentially breaking every paragraph into several. However I'm struggling to find something that will actually look for the specified character within the text.
I was thinking about having the program read the text character by character, then writing them to a different file and having it add a "\n" whenever it ran across a ".", but I'm having troubles implementing it along the lines of:
with open("test.txt", "r+") as f:
while True:
char = f.read(1)
if not char:
break
else:
if char==("."):
f.write(char + "\n")
else:
f.write(char)
break
I'm guessing this particular piece of code is a bloody mess, but I've been struggling with this problem for some time and at this point I'm trying pretty much anything I can think of.
Please Try below:
with open("test.txt", "r+") as f:
data=f.read().replace('.', '.\n')

Python Unicode issues with .txt file

To make a long story short, I am writing a Python script that asks for the user to drop a .docx file and the file converts to .txt. Python looks for keywords within the .txt file and displays them to the shell. I was running into UnicodeDecodeError codec charmap etc..... I overcame that by writing within my for loop "word.decode("charmap"). NOW, Python is not displaying the keywords it does find to the shell. Any advice on how to overcome this? Maybe have Python skip through the characters it cannot decode and continue reading through the rest? Here is my code:
import sys
import os
import codecs
filename = input("Drag and drop resume here: ")
keywords =['NGA', 'DoD', 'Running', 'Programing', 'Enterprise', 'impossible', 'meets']
file_words = []
with open(filename, "rb") as file:
for line in file:
for word in line.split():
word.decode("charmap")
file_words.append(word)
comparison = []
for words in file_words:
if words in keywords:
comparison.append(words)
def remove_duplicates(comparison):
output = []
seen = set()
for words in comparison:
if words not in seen:
output.append(words)
seen.add(words)
return output
comparison = remove_duplicates(comparison)
print ("Keywords found:",comparison)
key_count = 0
word_count = 0
for element in comparison:
word_count += 1
for element in keywords:
key_count += 1
Threshold = word_count / key_count
if Threshold <= 0.7:
print ("The candidate is not qualified for")
else:
print ("The candidate is qualified for")
file.close()
And the output:
Drag and drop resume here: C:\Users\User\Desktop\Resume_Newton Love_151111.txt
Keywords found: []
The candidate is not qualified for
In Python 3, don't open text files in binary mode. The default is the file will decode to Unicode using locale.getpreferredencoding(False) (cp1252 on US Windows):
with open(filename) as file:
for line in file:
for word in line.split():
file_words.append(word)
or specify an encoding:
with open(filename, encoding='utf8') as file:
for line in file:
for word in line.split():
file_words.append(word)
You do need to know the encoding of your file. There are other options to open as well, including errors='ignore' or errors='replace' but you shouldn't get errors if you know the correct encoding.
As others have said, posting a sample of your text file that reproduces the error and the error traceback would help diagnose your specific issue.
In case anyone cares. It's been a long time, but wanted to clear up that I didn't even know the difference between binary and txt files back in these days. I eventually found a doc/docx module for python that made things easier. Sorry for the headache!
Maybe posting the code producing the traceback would be easier to fix.
I'm not sure this is the only problem, maybe this would work better:
with open(filename, "rb") as file:
for line in file:
for word in line.split():
file_words.append(word.decode("charmap"))
Alright I figured it out. Here is my code, but I tried a docx file that seemed to be more complex and when converted to .txt the entire file consisted of special characters. So now I am thinking that I should go to the python-docx module, since it deals with xml files like Word documents. I added "encoding = 'charmap'"
with open(filename, encoding = 'charmap') as file:
for line in file:
for word in line.split():
file_words.append(word)

Python - Counting Words In A Text File

I'm new to Python and am working on a program that will count the instances of words in a simple text file. The program and the text file will be read from the command line, so I have included into my programming syntax for checking command line arguments. The code is below
import sys
count={}
with open(sys.argv[1],'r') as f:
for line in f:
for word in line.split():
if word not in count:
count[word] = 1
else:
count[word] += 1
print(word,count[word])
file.close()
count is a dictionary to store the words and the number of times they occur. I want to be able to print out each word and the number of times it occurs, starting from most occurrences to least occurrences.
I'd like to know if I'm on the right track, and if I'm using sys properly. Thank you!!
What you did looks fine to me, one could also use collections.Counter (assuming you are python 2.7 or newer) to get a bit more information like the number of each word. My solution would look like this, probably some improvement possible.
import sys
from collections import Counter
lines = open(sys.argv[1], 'r').readlines()
c = Counter()
for line in lines:
for work in line.strip().split():
c.update(work)
for ind in c:
print ind, c[ind]
Your final print doesn't have a loop, so it will just print the count for the last word you read, which still remains as the value of word.
Also, with a with context manager, you don't need to close() the file handle.
Finally, as pointed out in a comment, you'll want to remove the final newline from each line before you split.
For a simple program like this, it's probably not worth the trouble, but you might want to look at defaultdict from Collections to avoid the special case for initializing a new key in the dictionary.
I just noticed a typo: you open the file as f but you close it as file. As tripleee said, you shouldn't close files that you open in a with statement. Also, it's bad practice to use the names of builtin functions, like file or list, for your own identifiers. Sometimes it works, but sometimes it causes nasty bugs. And it's confusing for people who read your code; a syntax highlighting editor can help avoid this little problem.
To print the data in your count dict in descending order of count you can do something like this:
items = count.items()
items.sort(key=lambda (k,v): v, reverse=True)
print '\n'.join('%s: %d' % (k, v) for k,v in items)
See the Python Library Reference for more details on the list.sort() method and other handy dict methods.
I just did this by using re library. This was for average words in a text file per line but you have to find out number of words per line.
import re
#this program get the average number of words per line
def main():
try:
#get name of file
filename=input('Enter a filename:')
#open the file
infile=open(filename,'r')
#read file contents
contents=infile.read()
line = len(re.findall(r'\n', contents))
count = len(re.findall(r'\w+', contents))
average = count // line
#display fie contents
print(contents)
print('there is an average of', average, 'words per sentence')
#closse the file
infile.close()
except IOError:
print('An error oocurred when trying to read ')
print('the file',filename )
#call main
main()

Neat way to check first line of a file before reading CSV

I'm searching for a clever way to check the first line of a file before reading it as a csv file. I want to check if there's a # coding: xxx line so that I can decode read data accordingly. But if there is no such line, the first line might already contain a dataset. Seeking seems so brutal to me, I was hoping for a neat way to do it.
import re
import csv
fl = open(filename)
line = fl.readline()
coding = re.match('^#\s*coding\s*(:|=|:=)\s*([\w\d\-_]+)\s*$', line)
fl.seek(0)
reader = csv.reader(fl)
# ...
I don't see anything wrong with your current approach, but here is an alternative that you may find preferable:
import re
import csv
import itertools
line = next(f1)
coding = re.match('^#\s*coding\s*(:|=|:=)\s*([\w\d\-_]+)\s*$', line)
reader = csv.reader(itertools.chain([line], f1))
It isn't clear from your question or the code you posted, but if you do not want to include the first line if your regex matches you could do the following:
reader = csv.reader(f1 if coding else itertools.chain([line], f1))
Would the first line ever look like this?
# coding: xxx, some other "field", and maybe another field
If not, can you just read the first line, look for a comma, if no comma is not found try to interpret a coding, else pass it (and every other line) to csv.reader()?

Categories