MRJOB open JSON file - Python - python

I am trying to load a json file as part of the mapper function but it returns "No such file in directory" although the file is existent.
I am already opening a file and parsing through its lines. But want to compare some of its values to a second JSON file.
from mrjob.job import MRJob
import json
import nltk
import re
WORD_RE = re.compile(r"\b[\w']+\b")
sentimentfile = open('sentiment_word_list_stemmed.json')
def mapper(self, _, line):
stemmer = nltk.PorterStemmer()
stems = json.loads(sentimentfile)
line = line.strip()
# each line is a json line
data = json.loads(line)
form = data.get('type', None)
if form == 'review':
bs_id = data.get('business_id', None)
text = data['text']
stars = data['stars']
words = WORD_RE.findall(text)
for word in words:
w = stemmer.stem(word)
senti = stems.get[w]
if senti:
yield (bs_id, (senti, 1))

You should not be opening a file in the mapper function at all. You only need to pass the file in as STDIN or as the first argument for the mapper to pick it up. Do it like this:
python mrjob_program.py sentiment_word_list_stemmed.json > output
OR
python mrjob_program.py < sentiment_word_list_stemmed.json > output
Either one will work. It says that there is no such file or directory because these mappers are not able to see the file that you are specifying. The mappers are designed to run on remote machines. Even if you wanted to read from a file in the mapper you would need to copy the file that you are passing to all machines in the cluster which doesn't really make sense for this example. You can actually specify a DEFAULT_INPUT_PROTOCOL so that the mapper know which type of input you are using as well.
Here is a talk on the subject that will help:
http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-mrjob-distributed-computing-for-everyone-4898987/

You are using the json.loads() function, while passing in an open file. Use json.load() instead (note, no s).
stems = json.load(sentimentfile)
You do need to re-open the file every time you call your mapper() function, better just store the filename globally:
sentimentfile = 'sentiment_word_list_stemmed.json'
def mapper(self, _, line):
stemmer = nltk.PorterStemmer()
stems = json.load(open(sentimentfile))
Last but not least, you should use a absolute path to the filename, and not rely on the current working directory being correct.

Related

random/empty characters while re-editing a json file

I apologize for the vague definition of my problem in the title, but I really can't figure out what sort of problem I'm dealing with. So, here it goes.
I have python file:
edit-json.py
import os, json
def add_rooms(data):
if(not os.path.exists('rooms.json')):
with open('rooms.json', 'w'): pass
with open('rooms.json', 'r+') as f:
d = f.read() # take existing data from file
f.truncate(0) # empty the json file
if(d == ''): rooms = [] # check if data is empty i.e the file was just created
else: rooms = json.loads(d)['rooms']
rooms.append({'name': data['roomname'], 'active': 1})
f.write(json.dumps({"rooms": rooms})) # write new data(rooms list) to the json file
add_rooms({'roomname': 'friends'})'
This python script basically creates a file rooms.json(if it doesn't exist), grabs the data(array) from the json file, empties the json file, then finally writes the new data into the file. All this is done in the function add_rooms(), which is then called at the end of the script, pretty simple stuff.
So, here's the problem, I run the file once, nothing weird happens, i.e the file is created and the data inside it is:
{"rooms": [{"name": "friends"}]}
But the weird stuff happens when the run the script again.
What I should see:
{"rooms": [{"name": "friends"}, {"name": "friends"}]}
What I see instead:
I apologize I had to post the image because for some reason I couldn't copy the text I got.
and I can't obviously run the script again(for the third time) because the json parser gives error due to those characters
I obtained this result in an online compiler. In my local windows system, I get extra whitespace instead of those extra symbols.
I can't figure out what causes it. Maybe I'm not doing file handling incorrectly? or is it due to the json module? or am I the only one getting this result?
When you truncate the file, the file pointer is still at the end of the file. Use f.seek(0) to move back to the start of the file:
import os, json
def add_rooms(data):
if(not os.path.exists('rooms.json')):
with open('rooms.json', 'w'): pass
with open('rooms.json', 'r+') as f:
d = f.read() # take existing data from file
f.truncate(0) # empty the json file
f.seek(0) # <<<<<<<<< add this line
if(d == ''): rooms = [] # check if data is empty i.e the file was just created
else: rooms = json.loads(d)['rooms']
rooms.append({'name': data['roomname'], 'active': 1})
f.write(json.dumps({"rooms": rooms})) # write new data(rooms list) to the json file
add_rooms({'roomname': 'friends'})

Filter out Bad Words From a Text File In Python

I am wanting to read the contents of a text file and filter out the bad word from it.
{UPDATED} So I updated my python file after what you told me but right now I am getting a no module error. When I run pip3 install profanity_filter it gives me a big error.
I researched and found Profanity but it is not working
from profanity_filter import ProfanityFilter
pf = ProfanityFilter()
with open(input("Enter the name Of Your File"), "r") as myFile:
j = myFile.read()
filtered = pf.censor(j)
print(filtered)
Here is the error I get when I try to run my file
You need to create an instance of the class, and then call its censor() method.
And the argument has to be a string, not a list of strings. Call read() instead of readlines() to get the entire file as a single string.
from profanity_filter import ProfanityFilter
pf = ProfanityFilter()
with open(input("Enter the name Of Your File"), "r") as myFile:
j = myFile.read()
filtered = pf.censor(j)
print(filtered)

Using ConfigParser to read non-standard config files

I am having a config file of the form
# foo.conf
[section1]
foo=bar
buzz=123
[section2]
line1
line2
line3
that I want to parse using the Python ConfigParser library. Note that section2 does not contain key/value pairs but some raw text instead. I would like to have a possibility to read all (raw) content of section2 to a variable.
Does ConfigParser allow me to read this file or can one of its classes be subclassed in an easy manner to do so?
Using the standard
import ConfigParser
config = ConfigParser.ConfigParser()
config.read('foo.conf')
yields ConfigParser.ParsingError: File contains parsing errors: foo.conf
You could try to use an io adapter to transform the input file in a format suitable for ConfigParser. A way for that would be to tranform plain line that are neither empty line, nor comment line, nor section lines not key=value line in linei=original_line, where i is increased at each line and starts at 1 in each section.
A possible code could be:
class ConfParsAdapter(io.RawIOBase):
#staticmethod
def _confParsAdapter(fd):
num=1
rxsec = re.compile('\[.*\]( *#.*)?$')
rxkv = re.compile('.+?=.*')
rxvoid = re.compile('(#.*)?$')
for line in fd:
if rxsec.match(line.strip()):
num=1
elif rxkv.match(line) or rxvoid.match(line.strip()):
pass
else:
line = 'line{}={}'.format(num, line)
num += 1
yield(line)
def __init__(self, fd):
self.fd = self._confParsAdapter(fd)
def readline(self, hint = -1):
try:
return next(self.fd)
except StopIteration:
return ""
That way, you could use with your current file without changing anything in it:
>>> parser = ConfigParser.RawConfigParser()
>>> parser.readfp(ConfParsAdapter(open('foo.conf'))
>>> parser.sections()
['section1', 'section2']
>>> parser.items('section2')
[('line1', 'line1'), ('line2', 'line2'), ('line3', 'line3')]
>>>
As far as I know,ConfigParser can not do this:
The ConfigParser class implements a basic configuration file parser
language which provides a structure similar to what you would find on
Microsoft Windows INI files.
It seems that your conf file is not a basic configuration file,so maybe two ways you can parse this conf file.
Read the conf file and modify it.
Generate a valid configuration file.

How to copy one file based on a variable

I am building a python script to basically edit lots of files by means of searching and replacing words in the file.
There is an original file named: C:\python 3.5/remedy line 1.ahk
There is a file containing the words I want to replace (search words) in the original document and a text file that has the list of the new words that I would like to be placed into the final document.
The script then runs and works perfect. The final document is then created and named based on a line in the final text file (code begins on line 72). A way so I can tell what the final product is by looking at it. This file is originally named output = open("C:\python 3.5\output.ahk", 'w') and later in the script it is renamed based on line 37 in the script. All that works fine.
So the seemingly simple part left that I can't seem to figure out is how to take this one file and move it to a directory where it belongs. That directory is created based on the same line in that the file gets its name from (code starts on line 82). How do I simply move my file into a directory that has been created by the script, i.e. based on a variable (code starts on line 84 for this) so the name of the file is based on a variable.
import shutil
#below is where your modified file sits, before we move it into it's own directory named dst, based on a variable #mainnewdir
srcdir = r'C:\python 3.5/'+(justfilename)
dst = (mainnewdir)+(justfilename)
shutil.copyfile(src, dst)
Why does it format it with extra \ in the code?
Why does it seem to not give me a error if I use a / vs. a \ slash?
Here is the entire code, like I said only the last part of moving the file does not work:
import os
import linecache
import sys
import string
import re
## information/replacingvalues.txt this is the text of the values you want in your final document
#information = open("C:\python 3.5\replacingvalues.txt", 'r')
information = open("C:\python 3.5/replacingvalues.txt", 'r')
# information = open("C:\Program Files (x86)\Python35- 32\Scripts\Text_Find_and_Replace\information/replacingvalues.txt",
# Text_Find_and_Replace\Result/output.txt This is the dir and the sum or final document
# output = open("C:\python 3.5\output.ahk", 'w')
createblank = open ("C:\python 3.5/output.ahk", 'w')
createblank.close()
output = open("C:\python 3.5\output.ahk", 'w')
# field = open("C:\Program Files (x86)\Python35- 32\Scripts\Text_Find_and_Replace\Field/values.txt"
# Field is the file or words you will be replacing
field = open("C:\python 3.5/values.txt", 'r')
# modified code for autohot key
# Text_Find_and_Replace\Test/remedy line 1.ahk is the original doc you want modified
with open("C:\python 3.5/remedy line 1.ahk", 'r') as myfile:
inline = myfile.read()
## remedy line 1.ahk
informations = []
fields = []
dictionary = {}
i = 0
for line in information:
informations.append(line.splitlines())
for lines in field:
fields.append(lines.split())
i = i + 1;
if (len(fields) != len(informations)):
print("replacing values and values have different numbers")
exit();
else:
for i in range(0, i):
rightvalue = str(informations[i])
rightvalue = rightvalue.strip('[]')
rightvalue = rightvalue[1:-1]
leftvalue = str(fields[i])
leftvalue = leftvalue.strip('[]')
leftvalue = leftvalue.strip("'")
dictionary[leftvalue] = rightvalue
robj = re.compile('|'.join(dictionary.keys()))
result = robj.sub(lambda m: dictionary[m.group(0)], inline)
output.write(result)
information.close;
output.close;
field.close;
output.close()
import os
import linecache
linecache.clearcache()
newfilename= linecache.getline("C:\python 3.5/remedy line 1.txt",37)
filename = ("C:\python 3.5/output.ahk")
os.rename(filename, newfilename.strip())
#os.rename(filename, newfilename.strip()+".ahk")
linecache.clearcache()
############## below will create a new directory based on the the word or words in line 37 of the txt file.
newdirname= linecache.getline("C:\python 3.5/remedy line 1.txt",37)
#newpath = r'C:\pythontest\automadedir'
#below removes the /n ie new line raw assci
justfilename = (newdirname).strip()
#below removes the .txt from the rest of the justfilename..
autocreateddir = (justfilename).strip(".txt")
# below is an example of combining a string and a variable
# below makes the variable up that will be the name of the new directory based on reading line 37 of a text file above
mainnewdir= r'C:\pythontest\automadedir/'+(autocreateddir)
if not os.path.exists(mainnewdir):
os.makedirs(mainnewdir)
linecache.clearcache()
# ####################################################
#below is where your modified file sits, before we move it into it's own directory named dst, based on a variable #mainnewdir
srcdir = r'C:\python 3.5/'+(justfilename)
dst = (mainnewdir)+(justfilename)
shutil.copyfile(src, dst)
backslashes do not have a mind of their own.
When you paste windows paths as-is and they contain \n, r, \b, \x, \v, \U (python 3), (refer to table here for all of them), you're just using escape sequences without noticing it.
When the escape sequence doesn't exist (ex \p) it works. But when it's known the filenames are often invalid. Which explains the apparent randomness of the issue.
To be able to safely paste windows paths without changing/escaping them, just use the raw prefix:
my_file = r"C:\temp\foo.txt"
so the backslashes won't be interpreted. One exception though: if string ends with backslash you still have to double it.

Maya Python - Embed zip file into maya file?

This was a suggestion from another stack thread that I'm finally getting back to. It was part of a discussion about how to embed tools into a maya file.
You can write the whole thing as a python package, zip it, then stuff the binary contents of the zip file into a fileInfo. When you need to code, look for it in the user's $MAYA_APP_DIR; if there's no zip, write the contents of the fileInfo to disk as a zip and then insert the zip into sys.path
Source discussions were:
python copy scripts into script
and Maya Python Create and Use Zipped Package?
So far the programming is going okay, but I think I hit a snag. When I attempt this:
with open("..directory/myZip.zip","rb") as file:
cmds.fileInfo("myZip", file.read())
..and then I..
print cmds.fileInfo("myZip",q=1)
I get
[u'PK\003\004\024']
which is a bad translation of the first line of gibberish when reading the zip file as a text document.
How can I embed my zip file into my maya file as binary?
====================
Update:
Maya doesn't like writing to the file as a direct read of the utf-8 encoded zip file. I found various methods of making it into an acceptable string that could be written, but the decoding back to the file didn't appear to work. I see now that Theodox's suggestion was to write it to binary and put that in the fileInfo node.
How can one encode, store and then decode to write to file later?
If I were to convert to binary using for instance:
' '.join(format(ord(x), 'b') for x in line)
What code would I need to turn that back into the original utf-8 zip information?
you can find related code here:
http://tech-artists.org/forum/showthread.php?4161-Maya-API-Singleton-Nodes&highlight=mayapersist
the relevant bit is
import base64
encoded = base64.b64encode(value)
decoded = base64.b64decode(encoded)
basically it's the same idea, except using the base64 module instead of binascii. Any method of converting an arbitary character stream to an ascii-safe representation will work fine, as long as you use a reversable method: the potential problem you need to watch out for is a character in the data block that looks to maya like a close quote - an open quote int he fileInfo will be messy in an MA file.
This example uses YAML to do arbitrary key-value pairs but that part is irrelevant to storing the binary stuff. I've used this technique for fairly large data (up to 640k if i recall) but I don't know if it has an upper limit in terms of what you can stash in Maya
Found the answer. Great script on stack overflow. I had to encode to 'string_escape', which is something I found while trying to figure out the whole characters situation. But anyways, you open the zip, encode to 'string_escape', write it to fileInfo and then before fetching to write it back to zip, you decode it back.
Convert binary to ASCII and vice versa
import maya.cmds as cmds
import binascii
def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
bits = bin(int(binascii.hexlify(text.encode(encoding, errors)), 16))[2:]
return bits.zfill(8 * ((len(bits) + 7) // 8))
def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
n = int(bits, 2)
return int2bytes(n).decode(encoding, errors)
def int2bytes(i):
hex_string = '%x' % i
n = len(hex_string)
return binascii.unhexlify(hex_string.zfill(n + (n & 1)))
And then you can
with open("..\maya\scripts/test.zip","rb") as thing:
texty = text_to_bits(thing.read().encode('string_escape'))
cmds.fileInfo("binaryZip",texty)
...later
with open("..\maya\scripts/test_2.zip","wb") as thing:
texty = cmds.fileInfo("binaryZip",q=1)
thing.write( text_from_bits( texty ).decode('string_escape') )
and this appears to work.. so far..
Figured I'd post the final product for anyone wanting to undertake this approach. I tried to do some corruption checking, so that a bad zip doesn't get passed between machines. That's what all the check hashing is for.
def writeTimeFull(tl):
import TimeFull
#reload(TimeFull)
with open(TimeFull.__file__.replace(".pyc",".py"),"r") as file:
cmds.scriptNode( tl.scriptConnection[1][0], e=1, bs=file.read() )
cmds.expression("spark_timeliner_activator",
e=1,s='if (Spark_Timeliner.ShowTimeliner == 1)\n'
'{\n'
'\tsetAttr Spark_Timeliner.ShowTimeliner 0;\n'
'\tpython \"Timeliner.InitTimeliner()\";\n'
'}',
o="Spark_Timeliner",ae=1,uc=all)
def checkHash(zipPath,hash1,hash2,hash3):
check = False
hashes = [hash1,hash2,hash3]
for ii, hash in enumerate(hashes):
if hash == hashes[(ii+1)%3]:
hashes[(ii+2)%3] = hashes[ii]
check = True
if check:
if md5(zipPath) == hashes[0]:
return [zipPath,hashes[0],hashes[1],hashes[2]]
else:
cmds.warning("Hash checks and/or zip are corrupted. Attain toolbox_fix.zip, put it in scripts folder and restart.")
return []
#this writes the zip file to the local users space
def saveOutZip(filename):
if os.path.isfile(filename):
if not os.path.isfile(filename.replace('_pkg','_'+__version__)):
os.rename(filename,filename.replace('_pkg','_'+__version__))
with open(filename,"w") as zipFile:
zipInfo = cmds.fileInfo("zipInfo1",q=1)[0]
zipHash_1 = cmds.fileInfo("zipHash1",q=1)[0]
zipHash_2 = cmds.fileInfo("zipHash2",q=1)[0]
zipHash_3 = cmds.fileInfo("zipHash3",q=1)[0]
zipFile.write( base64.b64decode(zipInfo) )
if checkHash(filename,zipHash_1,zipHash_2,zipHash_3):
cmds.fileInfo("zipInfo2",zipInfo)
return filename
with open(filename,"w") as zipFile:
zipInfo = cmds.fileInfo("zipInfo2",q=1)[0]
zipHash_1 = cmds.fileInfo("zipHash1",q=1)[0]
zipHash_2 = cmds.fileInfo("zipHash2",q=1)[0]
zipHash_3 = cmds.fileInfo("zipHash3",q=1)[0]
zipFile.write( base64.b64decode(zipInfo) )
if checkHash(filename,zipHash_1,zipHash_2,zipHash_3):
cmds.fileInfo("zipInfo1",zipInfo)
return filename
return False
#this writes the local zip to this file
def loadInZip(filename):
zipResults = []
for ii in range(0,10):
with open(filename,"r") as theRead:
zipResults.append([base64.b64encode(theRead.read())]+checkHash(filename,md5(filename),md5(filename),md5(filename)))
if ii>0 and zipResults[ii]==zipResults[ii-1]:
cmds.fileInfo("zipInfo1",zipResults[ii][0])
cmds.fileInfo("zipInfo2",zipResults[ii-1][0])
cmds.fileInfo("zipHash1",zipResults[ii][2])
cmds.fileInfo("zipHash2",zipResults[ii][3])
cmds.fileInfo("zipHash3",zipResults[ii][4])
return True
#file check
#http://stackoverflow.com/questions/3431825/generating-a-md5-checksum-of-a-file
def md5(fname):
import hashlib
hash = hashlib.md5()
with open(fname, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash.update(chunk)
return hash.hexdigest()
filename = path+'/toolbox_pkg.zip'
zipPaths = [path+'/toolbox_update.zip',
path+'/toolbox_fix.zip',
path+'/toolbox_'+__version__+'.zip',
filename]
zipPaths_exist = [os.path.isfile(zipPath) for zipPath in zipPaths ]
if any(zipPaths_exist[:2]):
if zipPaths_exist[0]:
cmds.warning('Timeliner update present. Forcing file to update version')
if zipPaths_exist[2]:
os.remove(zipPaths[3])
elif os.path.isfile(zipPaths[3]):
os.rename(zipPaths[3], zipPaths[2])
os.rename(zipPaths[0],zipPaths[3])
if zipPaths_exist[1]:
os.remove(zipPaths[1])
else:
cmds.warning('Timeliner fix present. Replacing file to the fix version.')
if os.path.isfile(zipPaths[3]):
os.remove(zipPaths[3])
os.rename(zipPaths[1],zipPaths[3])
loadInZip(filename)
if not cmds.fileInfo("zipInfo1",q=1) and not cmds.fileInfo("zipInfo2",q=1):
loadInZip(filename)
if not os.path.isfile(filename):
saveOutZip(filename)
sys.path.append(filename)
import Timeliner
Timeliner.InitTimeliner(theVers=__version__)
if not any(zipPaths[:2]):
if __version__ > Timeliner.__version__:
cmds.warning('Saving out newer version of timeliner to local machine. Restart Maya to access latest version.')
saveOutZip(filename)
elif __version__ < Timeliner.__version__:
cmds.warning('Timeliner on machine is newer than file version. Saving machine version over timeliner in file.')
loadInZip(filename)
__version__ = Timeliner.__version__
if __name__ != "__main__":
tl = getTimeliner()
writeTimeFull(tl)

Categories