Count and store in a file - python

counts the occurrences of letter a in the first 200 characters in the file characters.txt
the result should get stored inside a new folder with a txt file
Example:
characters.txt: abcdefghijklmnopqerstuvwxzy
so there is 1 occurrence of g
then "1" should be stored in foulder/file.txt
file = open(filename, "r")
text = file.read()
count = 0
for char in text:
if char == letter:
count += 1
os.mkdir("g")
f = open("res.txt", mode = "w")
f.write(count)
f.close

Your code works, but in the samples provided you dont call it.
I made a local version without your file code.
def letterFrequency(letter):
count = 0
for char in 'abcdefghijklmnopqerstuvwxzy':
if char == letter:
count += 1
return count
print(letterFrequency('g'))
If you only want to search the first 200 character of a file you should use a while loop. Also you will need to account for rows with less than 200 characters.

I modified your given example and added some improvements.
The code below is a minimal working example:
import os
file = open("./Desktop/text.txt", "r")
text = file.read()
count = 0
letter = "g"
if len(text) < 200:
text = text[0:199]
for char in text:
if char == letter:
count += 1
try:
os.mkdir("./Desktop/DIR")
except FileExistsError:
print("Dir already exists")
f = open("./Desktop/DIR/res.txt", "w")
f.write(str(count))

Related

Is there a way to output a link to a file with Python?

I have some code to sort a text and output info on it.
How it works is you copy a text a paste it into a text(.txt) file and save the file where the python file is saved. Then you go into the command prompt and type python3 the_name_of_the_python_file.py the_name_of_the_text_file.txt. When you run it, it outputs "All counted!". After that you have a new .txt file where the python file is saved and it tells you the number of words and unique words in the text file you attached. The new file will also list out what words are the most to least used.
Is there a way to get my code to output "All counted!" and then a link like thing that I can click on to open the new file?
Here is my code:
import sys
text_file = open(sys.argv[1], "r")
word_list = text_file.read().split(",")
word_list = "".join(word_list)
word_list = word_list.split(".")
word_list = "".join(word_list)
word_list = word_list.split(" ")
file_name = []
file_name = sys.argv[1].split(".")
text_file.close()
NumWords = 0
NumUniqueWords = 0
Words = {}
for i in word_list:
if i not in Words.keys():
NumWords += 1
NumUniqueWords += 1
Words[i.lower()] = 1
else:
NumWords += 1
Words[i] += 1
def get_key(val):
for key, value in Words.items():
if value == val:
return key
newfile = open(file_name[0] + "-count.txt", "w")
newfile.write("Total Words - {}\nUnique Words - {}\n\n".format(NumWords, NumUniqueWords))
for i in range(len(Words)):
newfile.write("{} - {}\n".format(get_key(max(Words.values())), max(Words.values())))
del(Words[get_key(max(Words.values()))])
newfile.close()
print("All counted!")
I do have things in my code to eliminate ","'s and "."'s and the same word capitalized or lowercase.

How to find a phrase in a large text file in Python?

I am trying to write an algorithm to find a phrase with words on different lines in a big text file using Python.
The file contents are as follows
fkerghiohgeoihhgergerig ooetbjoptj
enbotobjeob hi how
are you lerjgoegjepogjejgpgrg]
ekrngeigoieghetghehtigehtgiethg
ieogetigheihietipgietigeitgegitie
.......
The algorithm should search for the phrase "hi how are you" and return True in this case.
Since, the file can be huge, all file contents cannot be read at once
You can read the file one character at a time and change line feeds to spaces. Then its just a question of running down the list of wanted characters.
def find_words(text, fileobj):
i = 0
while True:
c = fileobj.read(1)
if not c:
break
if c == "\n": # python combines \r\n
c = " "
if c != text[i]:
i = 0
if c == text[i]:
i += 1
if i == len(text):
return True
return False
If you want to be a little more liberal about whitespace and case sensitivity, you could remove all whitespace and lower case everything before the compare.
import re
import itertools
from string import whitespace
def find_words(text, fileobj):
chars = list(itertools.chain.from_iterable(re.split(r"\s+", text.lower())))
i = 0
while True:
c = fileobj.read(1)
if not c:
break
c = c.lower()
if c in whitespace:
continue
if c != chars[i]:
i = 0
if c == chars[i]:
i += 1
if i == len(chars):
return True
return False
Here is one way to solve the problem:
import re
def find_phrase():
phrase = "hi how are you"
words = dict(zip(phrase.split(), [False]*len(phrase.split())))
with open("data.txt", "r") as f:
for line in f:
for word in words:
if re.search( r"\b" + word + r"\b", line):
words[word] = True
if all(words.values()):
return True
return False
EDIT:
def find_phrase():
phrase = "hi how are you"
with open("data.txt", "r") as f:
for line in f:
if phrase in line:
return True
return False
If it is "pretty large" file, then access the lines sequentially and don't read the whole file into memory:
with open('largeFile', 'r') as inF:
for line in inF:
if 'myString' in line:
# do_something
break
Edit:
Since the words of the string can be on consecutive lines you would want to use a counter to keep a track of words iterated. For example,
counter = 0
words_list = ["hi","hello","how"]
with open('largeFile', 'r') as inF:
for line in inF:
# print( words_list[counter] ,line)
if words_list[counter] in line and len(line.split()) == 1 :
counter +=1
else:
counter = 0
if counter == len(words_list):
print ("here")
break;
Text File
fkerghiohgeoihhgergerig ooetbjoptj enbotobjeob
hi
hello
how
goegjepogjejgpgrg] ekrngeigoieghetghehtigehtgiethg ieoge
It gives the output here since the consecutive words are found

How do I count the number of lines that are full-line comments in python?

I'm trying to create a function that accepts a file as input and prints the number of lines that are full-line comments (i.e. the line begins with #followed by some comments).
For example a file that contains say the following lines should print the result 2:
abc
#some random comment
cde
fgh
#another random comment
So far I tried along the lines of but just not picking up the hash symbol:
infile = open("code.py", "r")
line = infile.readline()
def countHashedLines(filename) :
while line != "" :
hashes = '#'
value = line
print(value) #here you will get all
#if(value == hashes): tried this but just wasn't working
# print("hi")
for line in value:
line = line.split('#', 1)[1]
line = line.rstrip()
print(value)
line = infile.readline()
return()
Thanks in advance,
Jemma
I re-worded a few statements for ease of use (subjective) but this will give you the desired output.
def countHashedLines(lines):
tally = 0
for line in lines:
if line.startswith('#'): tally += 1
return tally
infile = open('code.py', 'r')
all_lines = infile.readlines()
num_hash_nums = countHashedLines(all_lines) # <- 2
infile.close()
...or if you want a compact and clean version of the function...
def countHashedLines(lines):
return len([line for line in lines if line.startswith('#')])
I would pass the file through standard input
import sys
count = 0
for line in sys.stdin: """ Note: you could also open the file and iterate through it"""
if line[0] == '#': """ Every time a line begins with # """
count += 1 """ Increment """
print(count)
Here is another solution that uses regular expressions and will detect comments that have white space in front.
import re
def countFullLineComments(infile) :
count = 0
p = re.compile(r"^\s*#.*$")
for line in infile.readlines():
m = p.match(line)
if m:
count += 1
print(m.group(0))
return count
infile = open("code.py", "r")
print(countFullLineComments(infile))

How to search for string within another string?

I am trying to create a simple word search program.
I have successfully opened an external file that contains the grid of the word search. I also have successfully opened a file that contains the words that are to be searched for. I have stored every line of the grid in a list and every word from the file in a list called words[].
I am attempting to search for the words in each line of the grid. My code currently does not search for the word in each line of the grid.
gridlines_horizontal = []
gridlines_vertical = []
words = []
not_found = []
found_words = {}
def puzzle(fname) :
print ""
for line in f :
gridlines_horizontal.append(line)
for line in gridlines_horizontal :
print line,
for item in zip(*(gridlines_horizontal[::-1])):
gridlines_vertical.append(item)
Here I am trying to get each word in words[] one at a time and see if the word is in any of the lines of the word search grid. If the word is present in any of the lines I am then trying to print the word. The code currently does not do this.
def horizontal_search(word,gridlines_horizontal) :
x = 0
for line in gridlines_horizontal :
if words[0] in line or words[0] in line[::-1]:
found_words.update({words[0]:" "})
print words[0]
else :
not_found.append(words)
x = x + 1
def vertical_search(word,gridlines_vertical):
x = 0
for line in gridlines_vertical:
if words[x] in line or words[x] in line[::-1]:
print words[0]
found_words.update({words[x]:" "})
else:
not_found.append(words[x])
x = x + 1
while True:
try:
fname = input("Enter a filename between double quotation marks: ")
with open(fname) as f:
puzzle(fname)
break
except IOError as e :
print""
print("Problem opening file...")
print ""
while True:
try:
fname2 = input("Enter a filename for your words between double quotation marks: ")
with open(fname2) as f:
for line in f:
words.append(line)
""" line in words:
line = lin """
break
except IOError as e :
print("")
print("Problem opening file...")
There are a couple mistakes in your code:
- You aren't being consistent in using words[x], in your code you would want to replace every words[0] with words[x] BUT
- this isn't necessary because you can use nested 'for' loops.
So for horizontal search:
def horizontal_search(words,gridlines_horizontal):
for word in words:
for line in gridlines_horizontal:
if word in line or word in line[::-1]:
found_words.update({word : " "})
print(word)
break
else:
not_found.append(word)
Did you look at find?
a = 'this is a string'
b = 'string'
if (a.find(b) > -1):
print 'found substring in string'
else:
print 'substring not found in string'
Live demo of above code
EDIT:
I am not sure if its a typo, but you are passing word as parameter instead of words
def horizontal_search(word,gridlines_horizontal) :
x = 0 ^----------------------------------
for line in gridlines_horizontal : |
if words[0] in line or words[0] in line[::-1]: |
^-- see here <------------not matching here -----
Similar issue with def vertical_search(words,gridlines_vertical) :

Cutting character values according to value from file

This is the which i am doing
import csv
output = open('output.txt' , 'wb')
# this functions return the min for num.txt
def get_min(num):
return int(open('%s.txt' % num, 'r+').readlines()[0])
# temporary variables
last_line = ''
input_list = []
#iterate over input.txt in sort the input in a list of tuples
for i, line in enumerate(open('input.txt', 'r+').readlines()):
if i%2 == 0:
last_line = line
else:
input_list.append((last_line, line))
filtered = [(header, data[:get_min(header[-2])] + '\n' ) for (header, data) in input_list]
[output.write(''.join(data)) for data in filtered]
output.close()
In this code input.txt is something like this
>012|013|0|3|M
AFDSFASDFASDFA
>005|5|67|0|6
ACCTCTGACC
>029|032|4|5|S
GGCAGGGAGCAGGCCTGTA
and num.txt is something like this
M 4
P 10
I want that in above input.txt check the amount of value from the num.txt by looking at its last column which is same like in num.txt and cut its character according to that values
I think the error in my code is that it only accept the integer text file , where it should also accept file which contain alphabets
The totally revised version, after a long chat with the OP;
import os
import re
# Fetch all hashes and counts
file_c = open('num.txt')
file_c = file_c.read()
lines = re.findall(r'\w+\.txt \d+', file_c)
numbers = {}
for line in lines:
line_split = line.split('.txt ')
hash_name = line_split[0]
count = line_split[1]
numbers[hash_name] = count
#print(numbers)
# The input file
file_i = open('input.txt')
file_i = file_i.read()
for hash_name, count in numbers.iteritems():
regex = '(' + hash_name.strip() + ')'
result = re.findall(r'>.*\|(' + regex + ')(.*?)>', file_i, re.S)
if len(result) > 0:
data_original = result[0][2]
stripped_data = result[0][2][int(count):]
file_i = file_i.replace(data_original, '\n' + stripped_data)
#print(data_original)
#print(stripped_data)
#print(file_i)
# Write the input file to new input_new.txt
f = open('input_new.txt', 'wt')
f.write(file_i)
You can do it like so;
import re
min_count = 4 # this variable will contain that count integer from where to start removing
str_to_match = 'EOG6CC67M' # this variable will contain the filename you read
input = '' # The file input (input.txt) will go in here
counter = 0
def callback_f(e):
global min_count
global counter
counter += 1
# Check your input
print(str(counter) + ' >>> ' + e.group())
# Only replace the value with nothing (remove it) after a certain count
if counter > min_count:
return '' # replace with nothing
result = re.sub(r''+str_to_match, callback_f, input)
With this tactic you can keep count with a global counter and there's no need to do hard line-loops with complex structures.
Update
More detailed version with file access;
import os
import re
def callback_f(e):
global counter
counter += 1
# Check your input
print(str(counter) + ' >>> ' + e.group())
# Fetch all hash-file names and their content (count)
num_files = os.listdir('./num_files')
numbers = {}
for file in num_files:
if file[0] != '.':
file_c = open('./num_files/' + file)
file_c = file_c.read()
numbers[file.split('.')[0]] = file_c
# Now the CSV files
csv_files = os.listdir('./csv_files')
for file in csv_files:
if file[0] != '.':
for hash_name, min_count in numbers.iteritems():
file_c = open('./csv_files/' + file)
file_c = file_c.read()
counter = 0
result = re.sub(r''+hash_name, callback_f, file_c)
# Write the replaced content back to the file here
Considered directory/file structure;
+ Projects
+ Project_folder
+ csv_files
- input1.csv
- input2.csv
~ etc.
+ num_files
- EOG6CC67M.txt
- EOG62JQZP.txt
~ etc.
- python_file.py
The CSV files contain the big chunks of text you state in your original question.
The Num files contain the hash-files with an Integer in them
What happens in this script;
Collect all Hash files (in a dictionary) and it's inner count number
Loop through all CSV files
Subloop through the collected numbers for each CSV file
Replace/remove (based on what you do in callback_f()) hashes after a certain count
Write the output back (it's the last comment in the script, would contain the file.write() functionality)

Categories