How to find a string in a text file, Python - python

How do I get the number of times a certain 2 characters are used in a text files, (e.g. ('hi'))
And how do I print the sum out as an int?
I tried doing this:
for line in open('test.txt'):
ly = line.split()
for i in ly:
a = i.count('ly')
print(sum(a))
But it failed, thanks in advance!

Your program fails because your variable a is an integer and you cannot apply the sum function to an integer.
Several examples have already been presented. Here is mine:
with open("test.txt") as fp:
a = fp.read().count('ly')
print(a)

you can simply count 'ly' on each line :
sum(line.count('ly') for line in open('test.txt'))

Different approach:
from collections import Counter
text = open('text.txt').read()
word_count = Counter(text.split())
print word_count['hi']

for line in open('test.txt'):
ly = line.split()
alist = [i.count('hi') for i in ly]
print sum(alist)

You can try something like this
for line in open('test.txt'):
ly = line.split()
for i in ly:
if 'word' in i:
a = a + 1
print (a)

Related

How to put a group of integers in a row in a text file into a list?

I have a text file composed mostly of numbers something like this:
3 011236547892X
9 02321489764 Q
4 031246547873B
I would like to extract each of the following (spaces 5 to 14 (counting from zero)) into a list:
1236547892
321489764
1246547873
(Please note: each "number" is 10 "characters" long - the second row has a space at the end.)
and then perform analysis on the contents of each list.
I have umpteen versions, however I think I am closest with:
with open('k_d_m.txt') as f:
for line in f:
range = line.split()
num_lst = [x for x in range(3,10)]
print(num_lst)
However I have: TypeError: 'list' object is not callable
What is the best way forward?
What I want to do with num_lst is, amongst other things, as follows:
num_lst = list(map(int, str(num)))
print(num_lst)
nth = 2
odd_total = sum(num_lst[0::nth])
even_total = sum(num_lst[1::nth])
print(odd_total)
print(even_total)
if odd_total - even_total == 0 or odd_total - even_total == 11:
print("The number is ok")
else:
print("The number is not ok")
Use a simple slice:
with open('k_d_m.txt') as f:
num_lst = [x[5:15] for x in f]
Response to comment:
with open('k_d_m.txt') as f:
for line in f:
num_lst = list(line[5:15])
print(num_lst)
First of all, you shouldn't name your variable range, because that is already taken for the range() function. You can easily get the 5 to 14th chars of a string using string[5:15]. Try this:
num_lst = []
with open('k_d_m.txt') as f:
for line in f:
num_lst.append(line[5:15])
print(num_lst)

Python: How to split a dictionary value in float number

My text file looks like so: comp_339,9.93/10
I want to extract just rating point as a float value. I couldn't solve this with 'if' statement.
Is there any simple solution for this?
d2 = {}
with open('ra.txt', 'r') as r:
for line in r:
s = line.strip().split(',')
d2[s[0]] = "".join(s[1:])
print(d2)
You can do it like this:
with open('ra.txt', 'r') as r:
for line in r:
s = line.strip().split(',')
rating, _ = s[1].split("/", 1)
print(rating)
First you split the line string into "comp399" and "9.93/10"
then you keep the latter and split it again with "/" and keep the first part and convert to float.
line = "comp_339,9.93/10"
s = float(line.split()[1].split('/')[0])
# output: 9.93
You may use a simple regular expression:
import re
line = "comp_339,9.93/10"
before, rating, highscore = re.split(r'[/,]+', line)
print(rating)
Which yields
9.93
You should try this code
ratings={}
with open('ra.txt' , 'r') as f:
for l in f.readlines():
if len(l):
c, r= ratings.split(',')
ratings[c.strip()]=float(r.strip().split('/')[0].strip())

Count frequency of words in a file using python

I am having a file which has a paragraph. I just want to count frequency of each word. I have tried it in the following way. But I am not getting any output. Can anyone please help me.
dic = {}
with open("C:\\Users\\vWX442280\Desktop\\f1.txt" ,'r') as f:
for line in f:
l1 = line.split(" ")
for w in l1:
dic[w] = dic.get(w,0)+1
print ('\n'.join(['%s,%s' % (k, v) for k, v in dic.items()]))
I am getting output like this.
Python,2
is,3
good,1
helps,1
in,2
machine,2
learning,1
learning,1
goos,1
python,1
famous,1
kill,1
the,1
machine,1
it,1
a,1
good,1
day,1
A pure python way without importing any libraries. More code, but I wanted to wtite some bad code today (:
file = open('path/to/file.txt', 'r')
content = ' '.join(line for line in file.read().splitlines())
content = content.split(' ')
freqs = {}
for word in content:
if word not in freqs:
freqs[word] = 1
else:
freqs[word] += 1
file.close()
This uses a python dictionary to store the words and the amount of times they appear.
I know it's better to use with open(blah) as b: but this is just to get the idea across. ¯\_(ツ)_/¯
From your code, I spotted the following issues
for s in l: l is a line of text, the for loop will loop through each character, not word
The f.split('\n') expression will generate an error because f is a file object, and it does not have the .split() method, string does
With that in mind, here is a rewrite of your code to make it works:
dic = {}
with open("f1.txt" ,'r') as f:
for l in f:
for w in l.split():
dic[w] = dic.get(w,0)+1
print ('\n'.join(['%s,%s' % (k, v) for k, v in dic.items()]))
You can use the count method
mystring = "hello hello hello"
mystring.count("hello") # 3

How to extract numbers from a text file and multiply them together?

I have a text file which contains 800 words with a number in front of each. (Each word and its number is in a new line. It means the file has 800 lines) I have to find the numbers and then multiply them together. Because multiplying a lot of floats equals to zero, I have to use logarithm to prevent the underflow, but I don't know how.
this is the formula:
cNB=argmaxlogP(c )+log P(x | c )
this code doesn't print anything.
output = []
with open('c:/python34/probEjtema.txt', encoding="utf-8") as f:
w, h = map(int, f.readline().split())
tmp = []
for i, line in enumerate(f):
if i == h:
break
tmp.append(map(int, line.split()[:w]))
output.append(tmp)
print(output)
the file language is persian.
a snippet of the file:
فعالان 0.0019398642095053346
محترم 0.03200775945683802
اعتباري 0.002909796314258002
مجموع 0.0038797284190106693
حل 0.016488845780795344
مشابه 0.004849660523763337
مشاوران 0.027158098933074686
مواد 0.005819592628516004
معادل 0.002909796314258002
ولي 0.005819592628516004
ميزان 0.026188166828322017
دبير 0.0019398642095053346
دعوت 0.007759456838021339
اميد 0.002909796314258002
You can use regular expressions to find the first number in each line, e.g.
import re
output = []
with open('c:/python34/probEjtema.txt', encoding="utf-8") as f:
for line in f:
match = re.search(r'\d+.?\d*', line)
if match:
output.append(float(match.group()))
print(output)
re.search(r'\d+.?\d*', line) looks for the first number (integer or float with . in each line.
Here is a nice online regex tester: https://regex101.com/ (for debuging / testing).
/Edit: changed regex to \d+.?\d* to catch integers and float numbers.
If I understood you correctly, you could do something along the lines of:
result = 1
with open('c:/python34/probEjtema.txt', encoding="utf-8") as f:
for line in f:
word, number = line.split() # line.split("\t") if numbers are seperated by tab
result = result * float(number)
This will create an output list with all the numbers.And result will give the final multiplication result.
import math
output = []
result=1
eres=0
with open('c:/python34/probEjtema.txt', encoding="utf-8") as f:
for line in (f):
output.append(line.split()[1])
result *= float((line.split()[1]))
eres += math.log10(float((line.split()[1]))) #result in log base 10
print(output)
print(result)
print eres

Reason for two similar codes giving different result and different approaches to this task

The question is
def sum_numbers_in_file(filename):
"""
Return the sum of the numbers in the given file (which only contains
integers separated by whitespace).
>>> sum_numbers_in_file("numbers.txt")
19138
"""
this is my first code:
rtotal = 0
myfile = open(filename,"r")
num = myfile.readline()
num_list = []
while num:
number_line = ""
number_line += (num[:-1])
num_list.append(number_line.split(" "))
num = myfile.readline()
for item in num_list:
for item2 in item:
if item2!='':
rtotal+= int(item2)
return rtotal
this is my second code:
f = open(filename)
m = f.readline()
n = sum([sum([int(x) for x in line.split()]) for line in f])
f.close()
return n
however the first one returns 19138 and the second one 18138
numbers.txt contains the following:
1000
15000
2000
1138
Because m = f.readLine() already reads 1 line from f and then you do the operation with the rest of the lines. If you delete that statement the 2 outputs will be the same. (I think :))
I'd say that m = f.readline() in the second snippet skips the first line (which contains 1000), that's why you get a wrong result.
As requested.. another approach to the question:
import re
def sum(filename):
return sum(int(x.group()) for x in re.finditer(r'\d+',open(filename).read()))
As said by answers, you are skipping first line because f.readline(). But a shorter approach would be:
n=sum((int(line[:-1]) for line in open("numbers.txt") if line[0].isnumeric()))

Categories