Counting a string within a file in python

Counting a string within a file in python - python

I have 10 files with 100 random numbers named randomnumbers(1-10).py. I want to create a program which says "congratulations" when a string of 123 is found and count the number of times 123 shows up as well. I have the "congratulations" part and I have written code for the counting part but I always get zero as a result. What's wrong?
for j in range(0,10):
n = './randomnumbers' + str(j) + '.py'
s='congradulations'
z='123'
def replacemachine(n, z, s):
file = open(n, 'r')
text=file.read()
file.close()
file = open(n, 'w')
file.write(text.replace(z, s))
file.close()
print "complete"
replacemachine(n, z, s)
count = 0
if 'z' in n:
count = count + 1
else:
pass
print count

if 'z' in n is testing to see if the literal string 'z' is in the filename n. Since you only open the file within replacemachine, you can't access the file contents from outside.
Best solution would be to just count the occurrences from within replacemachine:
def replacemachine(n, z, s):
file = open(n, 'r')
text=file.read()
file.close()
if '123' in text:
print 'number of 123:', text.count('123')
file = open(n, 'w')
file.write(text.replace(z, s))
file.close()
print "complete"
Then you don't need that code after replacemachine(n, z, s).

consider:
some_file_as_string = """\
184312345294839485949182
57485348595848512493958123
5948395849258574827384123
8594857241239584958312"""
num_found = some_file_as_string.count('123')
if num_found > 0:
print('num found: {}'.format(num_found))
else:
print('no matches found')
Doing an '123' in some_file_as_string is a little wasteful, because it still needs to look through the entire string. You might as well count anyway and do something when the count returns more than 0.
You also have this
if 'z' in n:
count = count + 1
else:
pass
print count
Which is asking if the string 'z' is present, you should be checking for z the variable instead (without the quote)

Related

My code doesn't work for more then 2 digits numbers and and negative numbers

Basically, this is my task. Extract numbers from a text file and then calculate the sum of them.
I wrote the code successfully and but it doesn't work fine with 2 or more digit numbers and negative numbers. What should i do?
f = open('file6.txt', 'r')
suma = 0
file = f.readlines()
for line in file:
for i in line:
if i.isdigit() == True:
suma += int(i)
print("The sum is ", suma)
file6.txt:
1
10
Output:
The sum is 2

In your case, you are going line by line first through the loop and looking at every digit ( in second loop ) to add.
And /n at the end of elements make the .isDigit() function disabled to find the digits.
So your updated code should be like this :
f = open('file6.txt', 'r')
suma = 0
file = f.readlines()
for line in file:
if line.strip().isdigit():
suma += int(line)
print("The sum is ", suma)
Hope it helps!

Use re.split to split the input into words on anything that is not part of a number. Try to convert the words into numbers, silently skip if this fails.
import re
sum_nums_in_file = 0
with open('file6.txt') as f:
for line in f:
for word in re.split(r'[^-+\dEe.]+', line):
try:
num = float(word)
sum_nums_in_file += num
except:
pass
print(f"The sum is {sum_nums_in_file}")
This works for example on files such as this:
-1 2.0e0
+3.0

Python 3.0 - How do I output which character is counted the most?

So I was able to create a program that counts the amount of vowels (specifially e i o) in a text file I have on my computer. However, I can't figure out for the life of me how to show which one occurs the most. I assumed I would say something like
for ch in 'i':
return numvowel?
I'm just not too sure what the step is.
I basically want it to output in the end saying "The letter, i, occurred the most in the text file"
def vowelCounter():
inFile = open('file.txt', 'r')
contents = inFile.read()
# variable to store total number of vowels
numVowel = 0
# This counts the total number of occurrences of vowels o e i.
for ch in contents:
if ch in 'i':
numVowel = numVowel + 1
if ch in 'e':
numVowel = numVowel + 1
if ch in 'o':
numVowel = numVowel + 1
print('file.txt has', numVowel, 'vowel occurences total')
inFile.close()
vowelCounter()

If you want to show which one occurs the most, you have to keep counts of each individual vowel instead of just 1 total count like what you have done.
Keep 3 separate counters (one for each of the 3 vowels you care about) and then you can get the total by summing them up OR if you want to find out which vowel occurs the most you can simply compare the 3 counters to find out.

Try using regular expressions;
https://docs.python.org/3.5/library/re.html#regular-expression-objects
import re
def vowelCounter():
with open('file.txt', 'r') as inFile:
content = inFile.read()
o_count = len(re.findall('o',content))
e_count = len(re.findall('e',content))
i_count = len(re.findall('i',content))
# Note, if you want this to be case-insensitive,
# then add the addition argument re.I to each findall function
print("O's: {0}, E's:{1}, I's:{2}".format(o_count,e_count,i_count))
vowelCounter()

You can do this:
vowels = {} # dictionary of counters, indexed by vowels
for ch in contents:
if ch in ['i', 'e', 'o']:
# If 'ch' is a new vowel, create a new mapping for it with the value 1
# otherwise increment its counter by 1
vowels[ch] = vowels.get(ch, 0) + 1
print("'{}' occured the most."
.format(*[k for k, v in vowels.items() if v == max(vowels.values())]))

Python claims to have "batteries included", and this is a classical case. The class collections.Counter does pretty much this.
from collections import Counter
with open('file.txt') as file_
counter = Counter(file_.read())
print 'Count of e: %s' % counter['e']
print 'Count of i: %s' % counter['i']
print 'Count of o: %s' % counter['o']

Let vowels = 'eio', then
{ i: contents.count(i) for i in vowels }
For each item in vowels count the number of occurrences in contents and add it as part of the resulting dictionary (note the wrapping curly brackets over the comprehension).

Why is this not correct? (codeeval challenge)PYTHON

This is what I have to do https://www.codeeval.com/open_challenges/140/
I've been on this challenge for three days, please help. It it is 85-90 partially solved. But not 100% solved... why?
This is my code:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
saver=[]
text=""
textList=[]
positionList=[]
num=0
exists=int()
counter=0
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
print text
test_cases.close()

This code works for me:
import sys
def main(name_file):
_file = open(name_file, 'r')
text = ""
while True:
try:
line = _file.next()
disordered_line, numbers_string = line.split(';')
numbers_list = map(int, numbers_string.strip().split(' '))
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
if missing_number == 0:
missing_number = len(disordered_line)
numbers_list.append(missing_number)
disordered_list = disordered_line.split(' ')
string_position = zip(disordered_list, numbers_list)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
except StopIteration:
break
_file.close()
print text.strip()
if __name__ == '__main__':
main(sys.argv[1])
I'll try to explain my code step by step so maybe you can see the difference between your code and mine one:
while True
A loop that breaks when there are no more lines.
try:
I put the code inside a try and catch the StopIteracion exception, because this is raised when there are no more items in a generator.
line = _file.next()
Use a generator, so that way you do not put all the lines in memory from once.
disordered_line, numbers_string = line.split(';')
Get the unordered phrase and the numbers of every string's position.
numbers_list = map(int, numbers_string.strip().split(' '))
Convert every number from string to int
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
Get the missing number from the serial of numbers, so that missing number is the position of the last string in the phrase.
if missing_number == 0:
missing_number = len(unorder_line)
Check if the missing number is equal to 0 if so then the really missing number is equal to the number of the strings that make the phrase.
numbers_list.append(missing_number)
Append the missing number to the list of numbers.
disordered_list = disordered_line.split(' ')
Conver the disordered phrase into a list.
string_position = zip(disordered_list, numbers_list)
Combine every string with its respective position.
ordered = sorted(string_position, key = lambda x: x[1])
Order the combined list by the position of the string.
text += " ".join([x[0] for x in ordered])
Concatenate the ordered phrase, and the reamining code it's easy to understand.
UPDATE
By looking at your code here is my opinion tha might solve your problem.
split already returns a list so you do not have to loop over the splitted content to add that content to another list.
So these six lines:
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
can be converted into three:
splitted_test = test.strip().split(';')
textList = splitted_test[0].split(" ")
positionList = map(int, splitted_test[1].split(" "))
In this line positionList = map(int, splitted_test[0].split(" ")) You already convert numbers into int, so you save these two lines:
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
The next lines:
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
can be converted into the next four:
missing_number = sum(xrange(sorted(positionList)[0],sorted(positionList)[-1]+1)) - sum(positionList)
if missing_number == 0:
missing_number = len(textList)
positionList.append(missing_number)
Basically what these lines do is calculate the missing number in the serie of numbers so the len of the serie is the same as textList.
The next lines:
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
Can be replaced by these ones:
string_position = zip(textList, positionList)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
From this way you can save, lines and memory, also use xrange instead of range.
Maybe the factors that make your code pass partially could be:
Number of lines of the script
Number of time your script takes.
Number of memory your script uses.
What you could do is:
Use Generators. #You save memory
Reduce for's, this way you save lines of code and time.
If you think something could be made it easier, do it.
Do not redo the wheel, if something has been already made it, use it.

how to keep the variable value temporary in memory and compare ... in python

Folks, I'm positive that I broke the logic by wrong indentation but now
I can't fix it.
Could you please help me?
#
# analyzeNano.py - analyze XYZ file for 'sanity'
#
#
import csv
import sys
import os
import getopt
def main():
'''
analyzeNano.py -d input-directory
analyzeNano.py analyzes a list of XYZ files inside input-directory. It counts for the number of consequitive DNA samples with identical ID and if it between 96 and 110 it treats it as 'good', otherwise 'bad'.
input-directory an input directory where XYZ files are located
-d flag for input-directory
At the end it creates 2 files: goodNano.csv and badNano.csv
Note: files that are not in goodNano.csv and badNano.csv have no DNA ID and therefore not listed
'''
try:
opts, args = getopt.getopt(sys.argv[1:],'d:')
except getopt.GetoptError, err:
print str(err)
help(main)
sys.exit(2)
if len(opts) != 1:
help(main)
sys.exit(2)
if not os.path.isdir( sys.argv[2] ):
print "Error, ", sys.argv[2], " is not a valid directory"
help(main)
sys.exit(2)
prefix = 'dna'
goodFiles = []
badFiles = []
fileList = os.listdir(sys.argv[2])
for f in fileList:
absFile = os.path.join(os.path.abspath(sys.argv[2]), f )
with open(absFile, 'rb') as csvfile:
# use csv to separate the fields, making it easier to deal with the
# first value without hard-coding its size
reader = csv.reader(csvfile, delimiter='\t')
match = None
count = 0
for row in reader:
# matching rows
if row[0].lower().startswith(prefix):
if match is None:
# first line with prefix..
match = row[0]
if row[0] == match:
# found a match, so increment
count += 1
if row[0] != match:
# row prefix has changed
if 96 <= count < 110:
# counted enough, so start counting the next
match = row[0] # match on this now
count = 0 # reset the count
goodFiles.append(csvfile.name)
else:
# didn't count enough, so stop working through this file
badFiles.append(csvfile.name)
break
# non-matching rows
else:
if match is None:
# ignore preceding lines in file
continue
else:
# found non-matching line when expecting a match
break
else:
if not 96 <= count < 110:
#there was at least successful run of lines
goodFiles.remove(csvfile.name)
# Create output files
createFile(goodFiles, 'goodNano')
createFile(badFiles, 'badNano')
def createFile(files, fName):
fileName = open( fName + ".csv", "w" )
for f in files:
fileName.write( os.path.basename(f) )
fileName.write("\n")
if __name__ == '__main__':
main()
Could someone just browse and point me where I broke it?

Here's how I would rework your style:
with open("z:/file.txt", "rU") as file: # U flag means Universal Newline Mode,
# if error, try switching back to b
print(file.name)
counter = 0
for line in file: # iterate over a file object itself line by line
if line.lower().startswith('dna'): # look for your desired condition
# process the data
counter += 1

All variables are held in memory. You want to hold onto the most recent match and compare it, counting while it matches:
import csv
prefix = 'DNA'
with open('file.txt','rb') as csvfile:
# use csv to separate the fields, making it easier to deal with the
# first value without hard-coding its size
reader = csv.reader(csvfile, delimiter='\t')
match = None
count = 0
is_good = False
for row in reader:
# matching rows
if row[0].startswith(prefix):
if match is None:
# first line with prefix..
match = row[0]
if row[0] == match:
# found a match, so increment
count += 1
if row[0] != match:
# row prefix has changed
if 96 <= count < 100:
# counted enough, so start counting the next
match = row[0] # match on this now
count = 0 # reset the count
else:
# didn't count enough, so stop working through this file
break
# non-matching rows
else:
if match is None:
# ignore preceding lines in file
continue
else:
# found non-matching line when expecting a match
break
else:
if 96 <= count < 100:
# there was at least successful run of lines
is_good = True
if is_good:
print 'File was good'
else:
print 'File was bad'

From your description, the lines you're interested in match the regular expression:
^DNA[0-9]{10}
That is, I assume that your xyz is actually ten digits.
The strategy here is to match the 13-character string. If there's no match, and we haven't previously matched, we keep going without further ado. Once we match, we
save the string, and increment a counter. As long as we keep matching the regex and the saved string, we keep incrementing. Once we hit a different regex match, or no match at all, the sequence of identical matches is over. If it's valid, we reset the count to
zero and the last match to empty. If it's invalid, we exit.
I hasten to add that the following is untested.
# Input file with DNA lines to match:
infile = "z:/file.txt"
# This is the regex for the lines of interest:
regex = re.compile('^DNA[0-9]{10}')
# This will keep count of the number of matches in sequence:
n_seq = 0
# This is the previous match (if any):
lastmatch = ''
# Subroutine to check given sequence count and bail if bad:
def bail_on_bad_sequence(count, match):
if 96 <= count < 100:
return
sys.stderr.write("Bad count (%d) for '%s'\n" % (count,match))
sys.exit(1)
with open(infile) as file:
for line in file:
# Try to match the line to the regex:
match = re.match(line)
if match:
if match.group(0) == lastmatch:
n_seq += 1
else:
bail_on_bad_sequence(lastmatch, n_seq)
n_seq = 0
lastmatch = match.group(0)
else:
if n_seq != 0:
bail_on_bad_sequence(lastmatch, n_seq)
n_seq = 0
lastmatch = ''

Please ignore my last request to review the code. I reviewed it myself and realized that the problem was with formatting.
It looks that now it works as expected and analyze all files in the directory. Thanks again to Metthew. That help was tremendous. I still have some concern about accuracy of calculation because in a few cases it failed while it should not ... but I'll investigate it.
Overall ... thanks a lot to everyone for tremendous help.

python program to add all values from each line

im writing a python function to do the following, add numbers from each line, so i can then find the average. this is what my file looks like:
-2.7858521
-2.8549764
-2.8881847
2.897689
1.6789098
-0.07865
1.23589
2.532461
0.067825
-3.0373958
Basically ive written a program that does a for loop for each line, incrementing the counter of lines and setting each line to a float value.
counterTot = 0
with open('predictions2.txt', 'r') as infile:
for line in infile:
counterTot += 1
i = float(line.strip())
now is the part i get a lil stuck
totalSum =
mean = totalSum / counterTot
print(mean)
As you can tell im new to python, but i find it very handy for text analysis work, so im getting into it.
Extra function
I was also looking into an extra feature. but should be a seperate function as above.
counterTot = 0
with open('predictions2.txt', 'r') as infile:
for line in infile:
counterTot += 1
i = float(line.strip())
if i > 3:
i = 3
elif i < -3:
i = -3
As you can see from the code, the function decides if a number is bigger than 3, if so, then make it 3. If number is smaller than -3, make it -3. But im trying to output this to a new file, so that it keeps its structure in tact. For both situations i would like to keep the decimal places. I can always round the output numbers myself, i just need the numbers intact.

You can do this without loading the elements into a list by cheekily using fileinput and retrieve the line count from that:
import fileinput
fin = fileinput.input('your_file')
total = sum(float(line) for line in fin)
print total / fin.lineno()

You can use enumerate here:
with open('predictions2.txt') as f:
tot_sum = 0
for i,x in enumerate(f, 1):
val = float(x)
#do something with val
tot_sum += val #add val to tot_sum
print tot_sum/i #print average
#prints -0.32322842

I think you want something like this:
with open('numbers.txt') as f:
numbers = f.readlines()
average = sum([float(n) for n in numbers]) / len(numbers)
print average
Output:
-0.32322842
It reads your numbers from numbers.txt, splits them by newline, casts them to a float, adds them all up and then divides the total by the length of your list.

Do you mean you need to change 5.1234 to 3.1234 and -8.5432 to -3.5432 ?
line = " -5.123456 "
i = float(line.strip())
if i > 3:
n = int(i)
i = i - (n - 3)
elif i < -3:
n = int(i)
i = i - (n + 3)
print(i)
it give you
-3.123456
Edit:
shorter version
line = " -5.123456 "
i = float(line.strip())
if i >= 4:
i -= int(i) - 3
elif i <= -4:
i -= int(i) + 3
print(i)
Edit 2:
If you need to change 5.1234 to 3.0000 ("3" and 4x "0") and -8.7654321 to -3.0000000 ("-3" and 7x "0")
line = " -5.123456 "
line = line.strip()
i = float(line)
if i > 3:
length = len(line.split(".")[1])
i = "3.%s" % ("0" * length) # now it will be string again
elif i < -3:
length = len(line.split(".")[1])
i = "-3.%s" % ("0" * length) # now it will be string again
print(i)

Here is a more verbose version. You could decide to replace invalid lines (if any) by a neutral value instead of ignoring it
numbers = []
with open('myFile.txt', 'r') as myFile:
for line in myFile:
try:
value = float(line)
except ValueError, e:
print line, "is not a valid float" # or numbers.append(defaultValue) if an exception occurred
else:
numbers.append(value)
print sum(numbers) / len(numbers)
For your second request here is the most straightforward solution (more solutions here)
def clamp(value, lowBound, highBound):
return max(min(highBound, value), lowBound)
Applying it to our list:
clampedValues = map(lambda x: clamp(x, -3.0, 3.0), numbers)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Counting a string within a file in python - python

Related

My code doesn't work for more then 2 digits numbers and and negative numbers

Python 3.0 - How do I output which character is counted the most?

Why is this not correct? (codeeval challenge)PYTHON

how to keep the variable value temporary in memory and compare ... in python

python program to add all values from each line

Categories

Resources