python file reading - python

I have file /tmp/gs.pid with content
client01: 25778
I would like retrieve the second word from it.
ie. 25778.
I have tried below code but it didn't work.
>>> f=open ("/tmp/gs.pid","r")
>>> for line in f:
... word=line.strip().lower()
... print "\n -->" , word

Try this:
>>> f = open("/tmp/gs.pid", "r")
>>> for line in f:
... word = line.strip().split()[1].lower()
... print " -->", word
>>> f.close()
It will print the second word of every line in lowercase. split() will take your line and split it on any whitespace and return a list, then indexing with [1] will take the second element of the list and lower() will convert the result to lowercase. Note that it would make sense to check whether there are at least 2 words on the line, for example:
>>> f = open("/tmp/gs.pid", "r")
>>> for line in f:
... words = line.strip().split()
... if len(words) >= 2:
... print " -->", words[1].lower()
... else:
... print 'Line contains fewer than 2 words.'
>>> f.close()

word="client01: 25778"
pid=word.split(": ")[1] #or word.split()[1] to split from the separator

If all lines are of the form abc: def, you can extract the 2nd part with
second_part = line[line.find(": ")+2:]
If not you need to verify line.find(": ") really returns a nonnegative number first.
with open("/tmp/gs.pid") as f:
for line in f:
p = line.find(": ")
if p != -1:
second_part = line[p+2:].lower()
print "\n -->", second_part

>>> open("/tmp/gs.pid").read().split()[1]
'25778'

Related

Reading whitespaces inside of a list of strings

I'm having a problem trying to count whitespaces in a list in python.
Here's my code
Data = ''
index = 0
num_words = 0
# Open a file for reading.
infile = open('article.txt', 'r')
# Read the contents of the file into a list.
data = infile.readlines()
# Strip the \n from each element.
while index < len(data):
data[index] = data[index].rstrip('\n')
index += 1
for ch in data:
if ch.isspace():
num_words += 1
# Close the file.
infile.close()
# Print the contents of the list.
print(num_words)
The contents of the article.txt is just a list of sentences so the list is just a list of strings such as:
data = ['this is sentence one.', 'this is sentence two.' , 'this is sentence three.' , 'this is sentence four.' , 'this is sentence five.' , 'this is sentence six.' ]
I think I know what the problem is because I did:
print(ch)
Which results in 'false' getting printed 6 times. I'm thinking this is because the for loop is searching to see if the whole string is a space rather than checking for spaces inside of the string.
I know I could just do:
data = infile.read()
But I need each line in a list. Is there anything I can change so the for loop searches for spaces in each string in the list or am I out of luck?
Python has a handy method for that on strings, called str.split. When passed no arguments, it will split on whitespace. If you count the items in the resulting list, you will have the number of words.
Handles multiple spaces:
>>> line = "this is some string."
>>> len(line.split())
4
Handles empty lines:
>>> line = " "
>>> len(line.split())
0
Handles extra space before and after:
>>> line = " space before and after. "
>>> len(line.split())
4
Here is some sample code:
lines = 0
words = 0
with open('yourfile', 'rt') as yourfile:
for line in yourfile:
lines += 1
words += len(line.split())

reading a text file and counting how many times a word is repeated. Using .split function. Now wants it to ignore case sensitive

Getting the desired output so far.
The program prompts user to search for a word.
user enters it and the program reads the file and gives the output.
'ashwin: 2'
Now i want it to ignore case sensitive. For example, "Ashwin" and "ashwin" both shall return 2, as it contains two ashwin`s in the text file.
def word_count():
file = "test.txt"
word = input("Enter word to be searched:")
k = 0
with open(file, 'r') as f:
for line in f:
words = line.split()
for i in words:
if i == word:
k = k + 1
print(word + ": " + str(k))
word_count()
You could use lower() to compare the strings in this part if i.lower() == word.lower():
For example:
def word_count():
file = "test.txt"
word = input("Enter word to be searched:")
k = 0
with open(file, 'r') as f:
for line in f:
words = line.split()
for i in words:
if i.lower() == word.lower():
k = k + 1
print(word + ": " + str(k))
word_count()
You can either use .lower on the line and word to eliminate case.
Or you can use the built-in re module.
len(re.findall(word, text, flags=re.IGNORECASE))
Use the Counter class from collections that returns an dictionary with key value pairs that could be accessed using O(1) time.
from collections import Counter
def word_count():
file = "test.txt"
with open(file, 'r') as f:
words = f.read().replace('\n', '').lower().split()
count = Counter(words)
word = input("Enter word to be searched:")
print(word, ":", count.get(word.lower()))

Special caracters don't display correctly when splitting

When I'm reading a line in a text file, like this one below :
présenté alloué ééé ààà tué
And try to print it in the terminal, it displays correctly. But when I apply a split with a space as separator, it displays this :
['pr\xc3\xa9sent\xc3\xa9', 'allou\xc3\xa9', '\xc3\xa9\xc3\xa9\xc3\xa9', '\xc3\xa0\xc3\xa0\xc3\xa0', 'tu\xc3\xa9\n']
I just use this to read the text file :
f = open("test.txt")
l = f.readline()
f.close()
print l.split(" ")
Can someone help me ?
Printing the list is not the same as printing its elements
s = "présenté alloué ééé ààà tué"
print s.split(" ")
for x in s.split(" "):
print x
Output:
['pr\xc3\xa9sent\xc3\xa9', 'allou\xc3\xa9', '\xc3\xa9\xc3\xa9\xc3\xa9', '\xc3\xa0\xc3\xa0\xc3\xa0', 'tu\xc3\xa9']
présenté
alloué
ééé
ààà
tué
Python 3.* solution:
All you have to do is to specify the encoding you wish to use
f = open("test.txt", encoding='utf-8')
l = f.readline()
f.close()
print(l.split(" "))
And you'll get
['présenté', 'alloué', 'ééé', 'ààà', 'tué']
Python 2.* solution:
import codecs
f = codecs.open("""D:\Source Code\\voc-git\\test.txt""", mode='r', encoding='utf-8')
l = f.read()
f.close()
for word in l.split(" "):
print(word)

How to use string methods on text files?

I have to write a program where I need to find
the number of uppercase letters
the number of lowercase letters
the number of digits
the number of whitespace characters
in a text file and my current code is
def lowercase(line_list):
print("Lower case Letters: ", sum(1 for x in line_list if x.islower))
def uppercase(line_list):
print("Upper case Letters: ", sum(1 for c in line_list if c.isupper())
def numbers(line_list):
print("Numbers: ", sum(1 for b in line_list if b.isdigit())
def whitespace(line_list):
print("Spaces: ", sum(1 for y in line_list if y.isspace())
def main():
in_file = open("text.txt", "r")
line = in_file.readline()
line_list = line.split()
lowercase(line_list)
uppercase(line_list)
numbers(line_list)
whitespace(line_list)
in_file.close()
main()
However whenever I try to run the script it gives me a syntax error. Is there something I am doing wrong?
Right now, you have a syntax error in your lowercase function (you're missing the parens for the function call islower). However, your main function also has some problems. Right now, you are only reading in one line of the file. Also, you're splitting that line (split splits using space by default, so you will lose the spaces you are trying to count). If you're trying to read the whole thing, not just one line. Try this:
def main():
lower_case = 0
upper_case = 0
numbers = 0
whitespace = 0
with open("text.txt", "r") as in_file:
for line in in_file:
lower_case += sum(1 for x in line if x.islower())
upper_case += sum(1 for x in line if x.isupper())
numbers += sum(1 for x in line if x.isdigit())
whitespace += sum(1 for x in line if x.isspace())
print 'Lower case Letters: %s' % lower_case
print 'Upper case Letters: %s' % upper_case
print 'Numbers: %s' % numbers
print 'Spaces: %s' % spaces
main()
Here it is code where syntax errors resolved:
You have missed closing parenthesis in several places.
def lowercase(line_list):
print("Lower case Letters: ", sum(1 for x in line_list if x.islower))
def uppercase(line_list):
print("Upper case Letters: ", sum(1 for c in line_list if c.isupper()))
def numbers(line_list):
print("Numbers: ", sum(1 for b in line_list if b.isdigit()))
def whitespace(line_list):
print("Spaces: ", sum(1 for y in line_list if y.isspace()))
def main():
in_file = open("text.txt", "r")
line = in_file.readline()
line_list = line.split()
lowercase(line_list)
uppercase(line_list)
numbers(line_list)
whitespace(line_list)
in_file.close()
main()
Note: This is only solution for error you faced, there may be any other errors occurring due to the logic issues you have to check for the same.

How to know a position (.txt)

i wonder how to know , a position inside the .txt when I read.
this is my txt
cat dog monkey bird
this my printing
Word: cat Position: line 1 , word 1 (1,1)
any idea?
foo.txt:
asd
asd
asd
ad
I put returns between .......
asd
sad
asd
code:
>>> def position(file,word):
... for i,line in enumerate(file): #for every line; i=linenumber and line=text
... s=line.find(word) #find word
... if s!=-1: #if word found
... return i,s # return line number and position on line
...
>>> position(open("foo.txt"),"put")
(4, 2) # (line,position)
This would work for this given file:
blah bloo cake
donky cat sparrow
nago cheese
The code:
lcount = 1
with open("file", "r") as f:
for line in f:
if word in line:
testline = line.split()
ind = testline.index("sparrow")
print "Word sparrow found at line %d, word %d" % (lcount, ind+1)
break
else:
lcount += 1
Would print:
Word sparrow found at line 2, word 3
You should be able to modify this quite easily to make a function or different output I hope.
Although I'm still really not sure if this is what you're after...
Minor edit:
As a function:
def findword(objf, word):
lcount = 1
found = False
with open(objf, "r") as f:
for line in f:
if word in line: # If word is in line
testline = line.split()
ind = testline.index(word) # This is the index, starting from 0
found = True
break
else:
lcount += 1
if found:
print "Word %s found at line %d, word %d" % (word, lcount, ind+1)
else:
print "Not found"
Use:
>>> findword('file', "sparrow")
Word sparrow found at line 2, word 3
>>> findword('file', "donkey")
Not found
>>>
Shrug Not the best method I'll give it that, but then again it works.
Basic idea
Open the file
Iterate over the lines
For every line read, increment some counter, e.g. line_no += 1;
Split the line by whitespace (you will get a list)
Check if the list contains the word (use in), then use list.index(word) to get the index, store that index in some variable word_no = list.index(word)
print line_no and word_no if the word was found
There are a lot better solutions out there (and more pythonic ones) but this gives you an idea.

Categories