How to know a position (.txt) - python

i wonder how to know , a position inside the .txt when I read.
this is my txt
cat dog monkey bird
this my printing
Word: cat Position: line 1 , word 1 (1,1)
any idea?

foo.txt:
asd
asd
asd
ad
I put returns between .......
asd
sad
asd
code:
>>> def position(file,word):
... for i,line in enumerate(file): #for every line; i=linenumber and line=text
... s=line.find(word) #find word
... if s!=-1: #if word found
... return i,s # return line number and position on line
...
>>> position(open("foo.txt"),"put")
(4, 2) # (line,position)

This would work for this given file:
blah bloo cake
donky cat sparrow
nago cheese
The code:
lcount = 1
with open("file", "r") as f:
for line in f:
if word in line:
testline = line.split()
ind = testline.index("sparrow")
print "Word sparrow found at line %d, word %d" % (lcount, ind+1)
break
else:
lcount += 1
Would print:
Word sparrow found at line 2, word 3
You should be able to modify this quite easily to make a function or different output I hope.
Although I'm still really not sure if this is what you're after...
Minor edit:
As a function:
def findword(objf, word):
lcount = 1
found = False
with open(objf, "r") as f:
for line in f:
if word in line: # If word is in line
testline = line.split()
ind = testline.index(word) # This is the index, starting from 0
found = True
break
else:
lcount += 1
if found:
print "Word %s found at line %d, word %d" % (word, lcount, ind+1)
else:
print "Not found"
Use:
>>> findword('file', "sparrow")
Word sparrow found at line 2, word 3
>>> findword('file', "donkey")
Not found
>>>
Shrug Not the best method I'll give it that, but then again it works.

Basic idea
Open the file
Iterate over the lines
For every line read, increment some counter, e.g. line_no += 1;
Split the line by whitespace (you will get a list)
Check if the list contains the word (use in), then use list.index(word) to get the index, store that index in some variable word_no = list.index(word)
print line_no and word_no if the word was found
There are a lot better solutions out there (and more pythonic ones) but this gives you an idea.

Related

Find line number of replaced key value

How do I get the line number of replaced key value?
currently functions are different for it, how do i combine it to have line number at the time of replacing the string.
filedata= is a path of file. In which i need to replace strings.
old_new_dict = {'hi':'bye','old':'new'}
def replace_oc(file):
lines = file.readlines()
line_number = None
for i, line in enumerate(lines):
line_number = i + 1
break
return line_number
def replacee(path, pattern):
for key, value in old_new_dict.items():
if key in filedata:
print("there")
filedata = filedata.replace(key, value)
else:
print("not there")
You could break down the filedata into lines to check for the words to replace before doing the actual replacements. For example:
filedata = """The quick brown fox
jumped over
the lazy dogs
and the cow ran away
from the fox"""
old_new_dict = {"dogs":"cats", "crazy":"sleeping","fox":"cow"}
for key,value in old_new_dict.items():
lines = [i for i,line in enumerate(filedata.split("\n"),1) if key in line]
if lines:
filedata = filedata.replace(key,value)
print(key,"found at lines",*lines)
else:
print(key,"is not there")
output:
# dogs found at lines 3
# crazy is not there
# fox found at lines 1 5
print(filedata)
The quick brown cow
jumped over
the lazy cats
and the cow ran away
from the cow

Counting Paragraph and Most Frequent Words in Python Text File

I am trying to count the number of paragraphs and the most frequent words in a text file (any text file for that matter) but seem to have zero output when I run my code, no errors either. Any tips on where I'm going wrong?
filename = input("enter file name: ")
inf = open(filename, 'r')
#frequent words
wordcount={}
for word in inf.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for key in wordcount.keys():
print ("%s %s " %(key , wordcount[key]))
#Count Paragraph(s)
linecount = 0
for i in inf:
paragraphcount = 0
if '\n' in i:
linecount += 1
if len(i) < 2: paragraphcount *= 0
elif len(i) > 2: paragraphcount = paragraphcount + 1
print('%-4d %4d %s' % (paragraphcount, linecount, i))
inf.close()
filename = raw_input("enter file name: ")
wordcount={}
paragraphcount = 0
linecount = 0
with open(filename, 'r') as ftext:
for line in ftext.readlines():
if line in ('\n', '\r\n'):
if linecount == 0:
paragraphcount = paragraphcount + 1
linecount = linecount + 1
else:
linecount = 0
#frequent words
for word in line.split():
wordcount[word] = wordcount.get(word,0) + 1
print wordcount
print paragraphcount
When you are reading a file, there is a cursor that indicates which byte you are reading at the moment. In your code, you are trying to read the file twice and encountered a strange behavior, which shoud have been a hint that you are doing something wrong. To the solution,
What is the correct way ?
You should read the file once, store every line, then find word count and paragraph count, using the same store. Rather than trying to reading it twice.
What is happening is the current code ?
When you first read the file, your byte cursor is set to the end of the file, when you try to read lines, if returns an empty list because it tries to read the end of the file. You can corrent this by resetting the file pointer(the cursor).
Call inf.seek(0) just before you try to read lines. But instead of this, you should be focusing on implementing a method I mentioned in the first section.

MapReduce to count the frequency of the number consonants in words from a text file

I need a bit of help with Python code to count the frequency of consonants in a word. Consider the following sample input:
"There is no new thing under the sun."
Then the required output would be:
1 : 2
2 : 3
3 : 2
4 : 1
as there are 2 words with 1 consonant, 3 words with 2 consonants, 2 words with 3 consonants and 1 word with 4 consonants.
The following code does a similar job but instead of consonants it counts the frequency of whole words in text file. I know there is only a bit change which loops deeper into the word (I think).
def freqCounter(file1, file2):
freq_dict = {}
dict_static = {2:0, 3:0, 5:0}
# get rid of punctuation
punctuation = re.compile(r'[.?!,"\':;]') # use re.compile() function to convert string into a RegexObject.
try:
with open(file1, "r") as infile, open(file2, "r") as infile2: # open two files at once
text1 = infile.read() # read the file
text2 = infile2.read()
joined = " ".join((text1, text2))
for word in joined.lower().split():
#remove punctuation mark
word = punctuation.sub("", word)
#print word
l = len(word) # assign l tp be the word's length
# if corresponding word's length not found in dict
if l not in freq_dict:
freq_dict[l] = 0 # assign the dict key (the length of word) to value = 0
freq_dict[l] += 1 # otherwise, increase the value by 1
except IOError as e: # exception catch for error while reading the file
print 'Operation failed: %s' % e.strerror
return freq_dict # return the dictionary
Any help will be much appreciated!
I would try a simpler approach:
from collections import Counter
words = 'There is no new thing under the sun.'
words = words.replace('a', '').replace('e', '').replace('i', '').replace('o', '').replace('u', '') # you are welcome to replace this with a smart regex
# Now words have no more vowels i.e. only consonants
word_lengths = map(len, words.split(' '))
c = Counter(word_lengths)
freq_dict = dict(Counter(c))
A simple solution
def freqCounter(_str):
_txt=_str.split()
freq_dict={}
for word in _txt:
c=0
for letter in word:
if letter not in "aeiou.,:;!?[]\"`()'":
c+=1
freq_dict[c]=freq_dict.get(c,0)+ 1
return freq_dict
txt = "There is no new thing under the sun."
table=freqCounter(txt)
for k in table:
print( k, ":", table[k])
How about this?
with open('conts.txt', 'w') as fh:
fh.write('oh my god becky look at her butt it is soooo big')
consonants = "bcdfghjklmnpqrstvwxyz"
def count_cons(_file):
results = {}
with open(_file, 'r') as fh:
for line in fh:
for word in line.split(' '):
conts = sum([1 if letter in consonants else 0 for letter in word])
if conts in results:
results[conts] += 1
else:
results[conts] = 1
return results
print count_cons('conts.txt')
Missed the results
{1: 5, 2: 5, 3: 1, 4: 1}
[Finished in 0.0s]

I have a txt file. How can I take dictionary key values and print the line of text they appear in?

I have a txt file. I have written code that finds the unique words and the number of times each word appears in that file. I now need to figure out how to print the lines that those words apear in as well. How can I go about doing this?
Here is a sample output:
Analyze what file: itsy_bitsy_spider.txt
Concordance for file itsy_bitsy_spider.txt
itsy : Total Count: 2
Line:1: The ITSY Bitsy spider crawled up the water spout
Line:4: and the ITSY Bitsy spider went up the spout again
#this function will get just the unique words without the stop words.
def openFiles(openFile):
for i in openFile:
i = i.strip()
linelist.append(i)
b = i.lower()
thislist = b.split()
for a in thislist:
if a in stopwords:
continue
else:
wordlist.append(a)
#print wordlist
#this dictionary is used to count the number of times each stop
countdict = {}
def countWords(this_list):
for word in this_list:
depunct = word.strip(punctuation)
if depunct in countdict:
countdict[depunct] += 1
else:
countdict[depunct] = 1
from collections import defaultdict
target = 'itsy'
word_summary = defaultdict(list)
with open('itsy.txt', 'r') as f:
lines = f.readlines()
for idx, line in enumerate(lines):
words = [w.strip().lower() for w in line.split()]
for word in words:
word_summary[word].append(idx)
unique_words = len(word_summary.keys())
target_occurence = len(word_summary[target])
line_nums = set(word_summary[target])
print "There are %s unique words." % unique_words
print "There are %s occurences of '%s'" % (target_occurence, target)
print "'%s' is found on lines %s" % (target, ', '.join([str(i+1) for i in line_nums]))
If you parsed the input text file line by line, you could maintain another dictionary that is a word -> List<Line> mapping. ie for each word in a line, you add an entry. Might look something like the following. Bearing in mind I'm not very familiar with python, so there may be syntactic shortcuts I've missed.
eg
countdict = {}
linedict = {}
for line in text_file:
for word in line:
depunct = word.strip(punctuation)
if depunct in countdict:
countdict[depunct] += 1
else:
countdict[depunct] = 1
# add entry for word in the line dict if not there already
if depunct not in linedict:
linedict[depunct] = []
# now add the word -> line entry
linedict[depunct].append(line)
One modification you will probably need to make is to prevent duplicates being added to the linedict if a word appears twice in the line.
The above code assumes that you only want to read the text file once.
openFile = open("test.txt", "r")
words = {}
for line in openFile.readlines():
for word in line.strip().lower().split():
wordDict = words.setdefault(word, { 'count': 0, 'line': set() })
wordDict['count'] += 1
wordDict['line'].add(line)
openFile.close()
print words

python file reading

I have file /tmp/gs.pid with content
client01: 25778
I would like retrieve the second word from it.
ie. 25778.
I have tried below code but it didn't work.
>>> f=open ("/tmp/gs.pid","r")
>>> for line in f:
... word=line.strip().lower()
... print "\n -->" , word
Try this:
>>> f = open("/tmp/gs.pid", "r")
>>> for line in f:
... word = line.strip().split()[1].lower()
... print " -->", word
>>> f.close()
It will print the second word of every line in lowercase. split() will take your line and split it on any whitespace and return a list, then indexing with [1] will take the second element of the list and lower() will convert the result to lowercase. Note that it would make sense to check whether there are at least 2 words on the line, for example:
>>> f = open("/tmp/gs.pid", "r")
>>> for line in f:
... words = line.strip().split()
... if len(words) >= 2:
... print " -->", words[1].lower()
... else:
... print 'Line contains fewer than 2 words.'
>>> f.close()
word="client01: 25778"
pid=word.split(": ")[1] #or word.split()[1] to split from the separator
If all lines are of the form abc: def, you can extract the 2nd part with
second_part = line[line.find(": ")+2:]
If not you need to verify line.find(": ") really returns a nonnegative number first.
with open("/tmp/gs.pid") as f:
for line in f:
p = line.find(": ")
if p != -1:
second_part = line[p+2:].lower()
print "\n -->", second_part
>>> open("/tmp/gs.pid").read().split()[1]
'25778'

Categories