Using the flashText library to count occurrences of characters in a book

Using the flashText library to count occurrences of characters in a book - python

I am using a windows machine running python 3.7.4.
Currently I am trying to use the flashText library to process a .txt file to count the number of occurrences of characters that I selected and running into errors while processing the file.
My code is as follows
from flashtext import KeywordProcessor
#making a dictionary of major charaters
#a few major players for now
keyword_processor = KeywordProcessor(case_sensitive=False)
keyword_dict = {
"Eddard" : ["ned", "eddard"],
"Daenerys" : ["dany", "khaleesi"],
"john" : ["john snow", "bastard"],
"Tyrion" : ['imp', 'halfman' , 'tyrion Lannister' ],
"bran" : ['bran stark']
}
keyword_processor.add_keywords_from_dict(keyword_dict)
text_file = open("gameofthrones.txt", "r" , encoding="utf8")
keywords_found = keyword_processor.extract_keywords(text_file)
print(keywords_found)
text_file.close()
I am getting an error that I don't quite understand:
C:\Users\MLMir\Desktop\python>stackoverflow.py
Traceback (most recent call last):
File "C:\Users\MLMir\Desktop\python\stackoverflow.py", line 24, in <module>
keywords_found = keyword_processor.extract_keywords(text_file)
File "C:\Users\MLMir\Anaconda3\lib\site-packages\flashtext\keyword.py", line 475, in extract_keywords
sentence = sentence.lower()
AttributeError: '_io.TextIOWrapper' object has no attribute 'lower'
I've tried changing this to a list but that just threw a different attribute error.

Need to read the text file in python after opening it.
Try below:
from flashtext import KeywordProcessor
#making a dictionary of major charaters
#a few major players for now
keyword_processor = KeywordProcessor(case_sensitive=False)
keyword_dict = {
"Eddard" : ["ned", "eddard"],
"Daenerys" : ["dany", "khaleesi"],
"john" : ["john snow", "bastard"],
"Tyrion" : ['imp', 'halfman' , 'tyrion Lannister' ],
"bran" : ['bran stark']
}
keyword_processor.add_keywords_from_dict(keyword_dict)
text_file = open("gameofthrones.txt",'r', encoding="utf8")
raw = text_file.read()
keywords_found = keyword_processor.extract_keywords(raw)
print(keywords_found)
text_file.close()

At first, it makes no sense to search an empty file for keywords.
Second, the method extract_keywords expects a string, not a file. lower is not a method of file.

Related

How to delete everything inside an object in a json file but keep the object?

I want to delete everything in the object "name" in the given json file example but keep the the object, in simple words I want to clear the object.
{
"names": [
{
"player": "Player_Name",
"TB:": "12389",
"BW:": "596",
"SW:": "28",
"CQ:": "20"
}
]
}
I used tried this code:
with open('players.json', 'w') as w:
with open('players.json', 'r') as r:
for line in r:
element = json.loads(line.strip())
if 'names' in element:
del element['names']
w.write(json.dumps(element))
but it just clears the whole json file
sorry for my bad english

The problem is that you open the same file twice - for reading and for writing simultaneously. Also a JSON cannot be parsed line by line, only as a whole.
import json
# 1. read
with open('players.json', 'r') as r:
data = json.load(r)
# 2. modify
# (you might want to check if data is a dict)
data['names'] = []
# 3. write
with open('players.json', 'w') as w:
data = json.dump(data, w)

How can i parse an output like dictionary in python?

I have this code.
import json
with open('data5.json', 'r') as myfile:
data = myfile.read()
data_2 = data.replace('[', ' ')
data_3 = data_2.replace(']', ' ')
print(data_3)
My variable data_3 is like that:
{"Manufacturer": "VMware, Inc.", "Model": "VMware7,1", "Name": "DC01"} , {"Index": "1", "IPAddress": "192.168.1.240,fe80::350e:d28d:14a5:5cbb" }
I want to parse this. I want to get the value of Manufacturer or Model or Name. After parsing i will save them into the database. How can i do it?
import json
with open('data5.json', 'r') as myfile:
data = json.load(myfile)
for i in data:
print(data['Name'])
I tried this code for getting Name.
It gives me error
Traceback (most recent call last):
File "C:\Users\DELL\AppData\Roaming\JetBrains\PyCharmCE2021.3\scratches\scratch_38.py", line 5, in <module>
print(data['Name'])
TypeError: list indices must be integers or slices, not str

Your data is json, so use the json library from the stdlib:
import json
with open('data5.json', 'r') as myfile:
data = json.load(myfile)
No need to do any manual parsing.

How to get the value of the word in the file and store it in another file using python

file has the following line:
{"skipfilesyscheck" : 1, "component" : "Content Store", "script" : "tests/functional/cmeta_cache/test_cmetacache_ingest_kill_ddfs.py", "testname" : "CMetaCache_Ingest_kill_ddfs", "params" : " --ddrs=$DDRS --clients=$LOAD_CLIENT --log_level=DEBUG --config_file=/auto/tools/qa/shared/qa-branch/hashlist/cfg/juno/juno_cmeta_config.yaml -s", "numddr" : 1, "timeout" : 7200}
===========================================================================
I want only the value of component ie., Content Store and testname ie., CMetaCache_Ingest_kill_ddfs and store it in another file as below.
CMetaCache_Ingest_kill_ddfs Content Store
===========================================================================
If component word or testname word are not found in the file, then it should save the value of not found word as "NONE"

import json
s = '''{"skipfilesyscheck" : 1, "component" : "Content Store", "script" : "tests/functional/cmeta_cache/test_cmetacache_ingest_kill_ddfs.py", "testname" : "CMetaCache_Ingest_kill_ddfs", "params" : " --ddrs=$DDRS --clients=$LOAD_CLIENT --log_level=DEBUG --config_file=/auto/tools/qa/shared/qa-branch/hashlist/cfg/juno/juno_cmeta_config.yaml -s", "numddr" : 1, "timeout" : 7200}
'''
s = json.loads(s)
print(s.get("component", "NONE"))
print(s.get("testname", "NONE"))
The line is a json. Use json module to load it. Then access it as a dictionary. The second parameter in the .get() is the default value.

I'm assuming the data is stored in a json file? In any case, this code achieves what you're trying to do..
import json
# loads the file into a dictionary
with open('file.json', 'r') as f:
file = json.load(f)
# tries to grab the values corresponding to the keys
# if not found then returns 'NONE'
output1 = file.get('testname', 'NONE')
output2 = file.get('component', 'NONE')
# writes to a new file, joined by spaces
with open('out.txt', 'w') as f:
f.write(' '.join([output1, output2]))
output:
CMetaCache_Ingest_kill_ddfs Content Store
you could use this to loop through files, grab the two (or more) parts you want, and write them to a new file..could put it into a function if you want too

Python get variables from JSON: Unexpected errors

Im trying to create a script that will read a JSON file and use the variables to select particular folders and files and save them somewhere else.
My JSON is as follows:
{
"source_type": "folder",
"tar_type": "gzip",
"tar_max_age": "10",
"source_include": {"/opt/myapp/config", "/opt/myapp/db, /opt/myapp/randomdata"}
"target_type": "tar.gzip",
"target_path": "/home/user/targetA"
}
So far, I have this Python Code:
import time
import os
import tarfile
import json
source_config = '/opt/myapp/config.JSON'
target_dir = '/home/user/targetA'
def main():
with open('source_config', "r").decode('utf-8') as f:
data = json.loads('source_config')
for f in data["source_include", str]:
full_dir = os.path.join(source, f)
tar = tarfile.open(os.path.join(backup_dir, f+ '.tar.gzip'), 'w:gz')
tar.add(full_dir)
tar.close()
for oldfile in os.listdir(backup_dir):
if str(oldfile.time) < 20:
print("int(oldfile.time)")
My traceback is:
Traceback (most recent call last):
File "/Users/user/Documents/myapp/test/point/test/Test2.py", line 16, in <module>
with open('/opt/myapp/config.json', "r").decode('utf-8') as f:
AttributeError: 'file' object has no attribute 'decode'
How do I fix this?

You are trying to call .decode() directly on the file object. You'd normally do that on the read lines instead. For JSON, however, you don't need to do this. The json library handles this for you.
Use json.load() (no s) to load directly from the file object:
with open(source_config) as f:
data = json.load(f)
Next, you need to address the source_include key with:
for entry in data["source_include"]:
base_filename = os.path.basename(entry)
tar = tarfile.open(os.path.join(backup_dir, base_filename + '.tar.gzip'), 'w:gz')
tar.add(full_dir)
tar.close()
Your JSON also needs to be fixed, so that your source_include is an array, rather than a dictionary:
{
"source_type": "folder",
"tar_type": "gzip",
"tar_max_age": "10",
"source_include": ["/opt/myapp/config", "/opt/myapp/db", "/opt/myapp/randomdata"],
"target_type": "tar.gzip",
"target_path": "/home/user/targetA"
}
Next, you loop over filenames with os.listdir(), which are strings (relative filenames with no path). Strings do not have a .time attribute, if you wanted to read out file timestamps you'll have to use os.stat() calls instead:
for filename in os.listdir(backup_dir):
path = os.path.join(backup_dir, filename)
stats = os.stat(path)
if stats.st_mtime < time.time() - 20:
# file was modified less than 20 seconds ago

Finding the "a" string in file within a block for one time using python

I have a file contains 3 blocks
Block1:
a
Block2:
a
Block3:
a
I wanted to search "block1:" a string using python
I have done code for Searching block1 : and a string but it is giving all the a
file = open( "c:\Textfile.txt", "r" ).readlines()
var=raw_input("enter the value")
var1="// Block1:"
for line in file:
if re.search(var1,line,re.IGNORECASE):
print re.search(var,line,re.IGNORECASE)
print "found",line
for line in file:
if re.search(var,line,re.IGNORECASE):
print "value=",line

I am assuming you Textfile is like below:
Block1: test is t
Block2: test is u
Block3: test is V
You can do it like below:
import re
file = open( "Textfile.txt", "r" ).readlines()
var=raw_input("enter the value")
for line in file:
if line.find('Block1:') != -1:
if re.search(var,line,re.IGNORECASE):
print "value=",line

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using the flashText library to count occurrences of characters in a book - python

At first, it makes no sense to search an empty file for keywords. Second, the method extract_keywords expects a string, not a file. lower is not a method of file.

Related

How to delete everything inside an object in a json file but keep the object?

How can i parse an output like dictionary in python?

How to get the value of the word in the file and store it in another file using python

Python get variables from JSON: Unexpected errors

Finding the "a" string in file within a block for one time using python

Categories

Resources