Python: How to increment the count when a variable repeats - python

I have a txt file which has following entries:
Rx = 34 // Counter gets incremented = 1, since the Rx was found for the first time
Rx = 2
Rx = 10
Tx = 2
Tx = 1
Rx = 3 // Counter gets incremented = 2, since the Rx was found for the first time after Tx
Rx = 41
Rx = 3
Rx = 19
I want to increment the count only for the 'Rx' that gets repeated for the first time and not for all the Rx in the text file My code is as follows:
import re
f = open("test.txt","r")
count = 0
for lines in f:
m = re.search("Rx = \d{1,2}", lines)
if m:
count +=1
print count
But this is giving me the count of all the Rx's in the txt file. I want the output as 2 and not 7.
Please help me out !

import re
f = open("test.txt","r")
count = 0
for lines in f:
m = re.search("Rx = \d{1,2}", lines)
if m:
count +=1
if count >=2:
break
print(m.group(0))

break the loop since you only needs to find out repeats.
import re
f = open("test.txt","r")
count = 0
for lines in f:
m = re.search("Rx = \d{1,2}", lines)
if m:
count +=1
if count >=2:
break
print count

By saying if m: it's going to continue to increment count as long as m != 0. If you'd like to only get the first 2, you need to introduce some additional logic.

if you want to find the count for the Rxes that are repeated 1x :
import re
rx_count = {}
with open("test.txt","r") as f:
count = 0
for lines in f:
if line.startswith('Rx'): rx_count[lines] = rx_count.get(lines,0)+1
now you have a counter dictionary in rx_count and we filter out all the values greater than 1, then sum those values together , and print out the count
rx_count = {k:v for k,v in rx_count.interitems() if v > 1}
count = sum(rx_count.values())
print count

To do exactly what you want, you're going need to keep track of which strings you've already seen.
You can do this by using a set to keep track of which you have seen until there is a duplicate, and then only counting occurrences of that string.
This example would do that
import re
count = 0
matches = set()
with open("test.txt", "r") as f:
for line in f:
m = re.search(r"Rx = \d{1,2}", line)
if not m:
# Skip the rest if no match
continue
if m.group(0) not in matches:
matches.add(m.group(0))
else:
# First string we saw
first = m.group(0)
count = 2
break
for line in f:
m = re.search(r"Rx = \d{1,2}", line)
## This or whatever check you want to do
if m.group(0) == first:
count += 1
print(count)

Related

python get most common words , greater tha 3 characters

Hi I m quite new of Python
I m trying to figured out how to get the most common words listed in clean.txt file , but only word lenght > 3
`
import re
from collections import Counter
words = re.findall(r'\w+', open('clean.txt', 'r', encoding='utf-8').read().lower())
count = Counter(words).most_common(100)
# define a sort key
def sort_key(count):
return count[1]
def read_data():
f = open('clean.txt', 'r', encoding='utf-8')
s = f.read()
x = s.split()
for i in x:
if len(i) > 5:
print(i)
count.sort(key=sort_key, reverse=True)
print (count)
`
I tried print read_data but I ve got listed all words without showing number of times mentioned

Counting number of occurrence of a string in a text file

I have a text file containing:
Rabbit:Grass
Eagle:Rabbit
Grasshopper:Grass
Rabbit:Grasshopper
Snake:Rabbit
Eagle:Snake
I want to count the number of occurrence of a string, say, the number of times the animals occur in the text file and print the count. Here's my code:
fileName = input("Enter the name of file:")
foodChain = open(fileName)
table = []
for line in foodChain:
contents = line.strip().split(':')
table.append(contents)
def countOccurence(l):
count = 0
for i in l:
#I'm stuck here#
count +=1
return count
I'm unsure about how will python count the occurrence in a text file. The output i wanted is:
Rabbit: 4
Eagle: 2
Grasshopper: 2
Snake: 2
Grass: 2
I just need some help on the counting part and I will be able to manage the rest of it. Regards.
what you need is a dictionary.
dictionary = {}
for line in table:
for animal in line:
if animal in dictionary:
dictionary[animal] += 1
else:
dictionary[animal] = 1
for animal, occurences in dictionary.items():
print(animal, ':', occurences)
The solution using str.split(), re.sub() functions and collections.Counter subclass:
import re, collections
with open(filename, 'r') as fh:
# setting space as a common delimiter
contents = re.sub(r':|\n', ' ', fh.read()).split()
counts = collections.Counter(contents)
# iterating through `animal` counts
for a in counts:
print(a, ':', counts[a])
The output:
Snake : 2
Rabbit : 4
Grass : 2
Eagle : 2
Grasshopper : 2
Use in to judge if an array is an element of another array, in Python, you can use a string as array:
def countOccurence(l):
count = 0
#I'm stuck here#
if l in table:
count +=1
return count
from collections import defaultdict
dd = defaultdict(int)
with open(fpath) as f:
for line in f:
words = line.split(':')
for word in words:
dd[word] += 1
for k,v in dd.items():
print(k+': '+str(v))

Python: Using readine() in "for line in file:" Loop

Lets say I have a text file that looks like:
a
b
start_flag
c
d
e
end_flag
f
g
I wish to iterate over this data line by line, but when I encounter a 'start_flag', I want to iterate until I reach an 'end_flag' and count the number of lines in between:
newline = ''
for line in f:
count = 0
if 'start_flag' in line:
while 'end_flag' not in newline:
count += 1
newline = f.readline()
print(str(count))
What is the expected behavior of this code? Will it iterate like:
a
b
start_flag
c
d
e
end_flag
f
g
Or:
a
b
start_flag
c
d
e
end_flag
c
d
e
end_flag
f
g
There shouldn't be any need to use readline(). Try it like this:
with open(path, 'r') as f:
count = 0
counting = False
for line in f:
if 'start_flag' in line:
counting = True
elif 'end_flag' in line:
counting = False
#do something with your count result
count = 0 #reset it for the next start_flag
if counting is True:
count += 1
This handles it all with the if statements in the correct order, allowing you to just run sequentially through the file in one go. You could obviously add more operations into this, and do things with the results, for example appending them to a list if you expect to run into multiple start and end flags.
Use this:
enter = False
count = 0
for line in f:
if 'start_flag' in line:
enter = True
if 'end_flag' in line:
print count
count = 0
enter = False
if enter is True:
count+=1

Python File IO - building dictionary and finding max value

Problem is to return the name of the event that has the highest number of participants in this text file:
#Beyond the Imposter Syndrome
32 students
4 faculty
10 industries
#Diversifying Computing Panel
15 students
20 faculty
#Movie Night
52 students
So I figured I had to split it into a dictionary with the keys as the event names and the values as the sum of the integers at the beginning of the other lines. I'm having a lot of trouble and I think I'm making it too complicated than it is.
This is what I have so far:
def most_attended(fname):
'''(str: filename, )'''
d = {}
f = open(fname)
lines = f.read().split(' \n')
print lines
indexes = []
count = 0
for i in range(len(lines)):
if lines[i].startswith('#'):
event = lines[i].strip('#').strip()
if event not in d:
d[event] = []
print d
indexes.append(i)
print indexes
if not lines[i].startswith('#') and indexes !=0:
num = lines[i].strip().split()[0]
print num
if num not in d[len(d)-1]:
d[len(d)-1] += [num]
print d
f.close()
import sys
from collections import defaultdict
from operator import itemgetter
def load_data(file_name):
events = defaultdict(int)
current_event = None
for line in open(file_name):
if line.startswith('#'):
current_event = line[1:].strip()
else:
participants_count = int(line.split()[0])
events[current_event] += participants_count
return events
if __name__ == '__main__':
if len(sys.argv) < 2:
print('Usage:\n\t{} <file>\n'.format(sys.argv[0]))
else:
events = load_data(sys.argv[1])
print('{}: {}'.format(*max(events.items(), key=itemgetter(1))))
Here's how I would do it.
with open("test.txt", "r") as f:
docText = f.read()
eventsList = []
#start at one because we don't want what's before the first #
for item in docText.split("#")[1:]:
individualLines = item.split("\n")
#get the sum by finding everything after the name, name is the first line here
sumPeople = 0
#we don't want the title
for line in individualLines[1:]:
if not line == "":
sumPeople += int(line.split(" ")[0]) #add everything before the first space to the sum
#add to the list a tuple with (eventname, numpeopleatevent)
eventsList.append((individualLines[0], sumPeople))
#get the item in the list with the max number of people
print(max(eventsList, key=lambda x: x[1]))
Essentially you first want to split up the document by #, ignoring the first item because that's always going to be empty. Now you have a list of events. Now for each event you have to go through, and for every additional line in that event (except the first) you have to add that lines value to the sum. Then you create a list of tuples like (eventname) (numPeopleAtEvent). Finally you use max() to get the item with the maximum number of people.
This code prints ('Movie Night', 104) obviously you can format it to however you like
Similar answers to the ones above.
result = {} # store the results
current_key = None # placeholder to hold the current_key
for line in lines:
# find what event we are currently stripping data for
# if this line doesnt start with '#', we can assume that its going to be info for the last seen event
if line.startswith("#"):
current_key = line[1:]
result[current_key] = 0
elif current_key:
# pull the number out of the string
number = [int(s) for s in line.split() if s.isdigit()]
# make sure we actually got a number in the line
if len(number) > 0:
result[current_key] = result[current_key] + number[0]
print(max(result, key=lambda x: x[1]))
This will print "Movie Night".
Your problem description says that you want to find the event with highest number of participants. I tried a solution which does not use list or dictionary.
Ps: I am new to Python.
bigEventName = ""
participants = 0
curEventName = ""
curEventParticipants = 0
# Use RegEx to split the file by lines
itr = re.finditer("^([#\w+].*)$", lines, flags = re.MULTILINE)
for m in itr:
if m.group(1).startswith("#"):
# Whenever a new group is encountered, check if the previous sum of
# participants is more than the recent event. If so, save the results.
if curEventParticipants > participants:
participants = curEventParticipants
bigEventName = curEventName
# Reset the current event name and sum as 0
curEventName = m.group(1)[1:]
curEventParticipants = 0
elif re.match("(\d+) .*", m.group(1)):
# If it is line which starts with number, extract the number and sum it
curEventParticipants += int(re.search("(\d+) .*", m.group(1)).group(1))
# This nasty code is needed to take care of the last event
bigEventName = curEventName if curEventParticipants > participants else bigEventName
# Here is the answer
print("Event: ", bigEventName)
You can do it without a dictionary and maybe make it a little simpler if just using lists:
with open('myfile.txt', 'r') as f:
lines = f.readlines()
lines = [l.strip() for l in lines if l[0] != '#'] # remove comment lines and '\n'
highest = 0
event = ""
for l in lines:
l = l.split()
if int(l[0]) > highest:
highest = int(l[0])
event = l[1]
print (event)

Delete and save duplicate in another file

In test.txt:
1 a
2 b
3 c
4 a
5 d
6 c
I want to remove duplicate and save the rest in test2.txt:
2 b
5 d
I tried to start with the codes below.
file1 = open('../test.txt').read().split('\n')
#file2 = open('../test2.txt', "w")
word = set()
for line in file1:
if line:
sline = line.split('\t')
if sline[1] not in word:
print sline[0], sline[1]
word.add(sline[1])
#file2.close()
The results from the codes showed:
1 a
2 b
3 c
5 d
Any suggestion?
You can use collections.Orderedict here:
>>> from collections import OrderedDict
with open('abc') as f:
dic = OrderedDict()
for line in f:
v,k = line.split()
dic.setdefault(k,[]).append(v)
Now dic looks like:
OrderedDict([('a', ['1', '4']), ('b', ['2']), ('c', ['3', '6']), ('d', ['5'])])
Now we only need those keys which contain only 1 items in the list.
for k,v in dic.iteritems():
if len(v) == 1:
print v[0],k
...
2 b
5 d
What you're doing is that you're just making sure every second item (letter) gets printed out only once. Which obviously is not what you're saying you want.
You must split your code into two halfs - reading and gathering statistics about letter counts, and part which prints only those which has count == 1.
Converting your original code (I just made it a little simpler):
file1 = open('../test.txt')
words = {}
for line in file1:
if line:
line_num, letter = line.split('\t')
if letter not in words:
words[letter] = [1, line_num]
else:
words[letter][0] += 1
for letter, (count, line_num) in words.iteritems():
if count == 1:
print line_num, letter
I tried to keep it as similar to your stlye as possible:
file1 = open('../test.txt').read().split('\n')
word = set()
test = []
duplicate = []
sin_duple = []
num_lines = 0;
num_duplicates = 0;
for line in file1:
if line:
sline = line.split(' ')
test.append(" ".join([sline[0], sline[1]]))
if (sline[1] not in word):
word.add(sline[1])
num_lines = num_lines + 1;
else:
sin_duple.append(sline[1])
duplicate.append(" ".join([sline[0], sline[1]]))
num_lines = num_lines + 1;
num_duplicates = num_duplicates + 1;
for i in range (0,num_lines+1):
for item in test:
for j in range(0, num_duplicates):
#print((str(i) + " " + str(sin_duple[j])))
if item == (str(i) + " " + str(sin_duple[j])):
test.remove(item)
file2 = open("../test2.txt", 'w')
for item in test:
file2.write("%s\n" % item)
file2.close()
How about some Pandas
import pandas as pd
a = pd.read_csv("test_remove_dupl.txt",sep=",")
b = a.drop_duplicates(cols="a")

Categories