python module re throwing odd AttributeError - python

I've always used the re module to do things such as re.match and re.sub, the basic stuff, and it's always worked fine for me.
All of a sudden, I'm getting an AttributeError when trying to use basic methods such as match and sub.
Here is some example code I made:
import re
regex = '^[a-z]{3}'
r = re.match(regex, 'asd')
print r
Here's the stacktrace:
Traceback (most recent call last):
File "te.py", line 4, in <module>
r = re.match(regex, 'asd')
AttributeError: module 're' has no attribute 'match'
I've never had problems with the module. I tried in both python 2.x and 3, same error. I'm not very knowledgeable about how imports work, so this is likely a simple mistake by me.
Thanks

Delete your re.py file in the same directory as the te.py file. You commited a typo while naming test files. Your error points that your current file is named te.py, and since t is close to r in the keyboard, this might explain everything.
Just to prove my curiosity I created an empty re.py file in the same directory as te.py, which holds your code. And I got the same error as you did.

My guess is that you're getting something you don't expect for the re module that you're importing.
Maybe try this:
import re
print re.__file__
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.pyc
And see if the result you get is sensible.

Related

Using Python, NLTK, to analyse German text

I am a beginner in Python and currently trying to use NLTK to analyze German text (extract the German noun and it's frequency of German text) by following this tutorial: https://datascience.blog.wzb.eu/2016/07/13/accurate-part-of-speech-tagging-of-german-texts-with-nltk/
There are several issues that I faced during the process and I am not able to solve them.
When I follow the website to execute the code below:
import random
tagged_sents = list(corp.tagged_sents())
random.shuffle(tagged_sents)
split_perc = 0.1
split_size = int(len(tagged_sents) * split_perc)
train_sents, test_sents = tagged_sents[split_size:], tagged_sents[:split_size]
and it comes out with this
Traceback (most recent call last):
File "test2.py", line 7, in <module>
tagged_sents = list(corp.tagged_sents())
File "C:\Users\User\anaconda3\lib\site-packages\nltk\corpus\reader\conll.py", line 130, in tagged_sents
return LazyMap(get_tagged_words, self._grids(fileids))
File "C:\Users\User\anaconda3\lib\site-packages\nltk\corpus\reader\conll.py", line 215, in _grids
return concat(
File "C:\Users\User\anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 433, in concat
raise ValueError("concat() expects at least one object!")
ValueError: concat() expects at least one object!
Then I try to fix by following this solution https://teamtreehouse.com/community/randomshuffle-crashes-when-passed-a-range-somenums-randomshufflerange5250
and alter the
tagged_sents = list(corp.tagged_sents())
to
tagged_sents = list(range(5,250))
And the ValueError didn't come out, I don't know what (5,250) means, although I have read the explanation.
Then I continue to execute the follow step
from ClassifierBasedGermanTagger.ClassifierBasedGermanTagger import ClassifierBasedGermanTagger
tagger = ClassifierBasedGermanTagger(train=train_sents)
And it shows
Traceback (most recent call last):
File "test1.py", line 90, in <module>
from ClassifierBasedGermanTagger.ClassifierBasedGermanTagger import ClassifierBasedGermanTagger
ModuleNotFoundError: No module named 'ClassifierBasedGermanTagger'
I have already downloaded the ClassifierBasedGermanTagger.py and init.py and put them in the folder which link to the VS CODE, don't know if it is correct as the passage said:
'Using his Python class ClassifierBasedGermanTagger (which you can download from the github page) we can create a tagger and train it with the data from the TIGER corpus:'
Please help me to fix these problems, thanks!
First of all, welcome to StackOverflow! Before posting a question, please make sure that you have done your own research and most of the time it solves the problem.
Secondly, range(start, end) is a very basic function in Python to get list of numbers based on the input and I don't think using it like the way you did is going to solve the problem. I would suggest you to use print to see what kind of data is being populated in corp and start debugging from there. Maybe corp is just empty and that's why you don't get any tagged_sents.
For the the import part, it is not clear to me where did you put the ClassifierBasedGermanTagger.py but wherever it is, it is not visible to your code. You can try to put your code (test2.py) and ClassifierBasedGermanTagger.py in the same directory. Read the link below for more details on how to properly import module in Python.
https://docs.python.org/3/reference/import.html

python console to call function

I know this is an easy fix, but could someone tell me how to call a python file in the python Console who have this symbol: -.
Here is my mistake:
>>> import main #no error here
>>> import a1-devoir1
File "<input>", line 1
import a1-devoir1
Syntax Error: invalid syntax
You must name your files so that they only contains letters, underscores, or numbers (but not as the first character). All libraries and modules must follow this.
So rename your .py file to a1_devoir and then try import a1_devoir

Value Error: Substring not found from the script running path

I have the following code which is throwing an error: ValueError: Substring Not Found.
import os, sys
myCwd = os.path.abspath(__file__)
svtestcases = os.path.normpath('Tests/SVTestCases')
tcPath = myCwd[:myCwd.index(svtestcases) + len(svtestcases)]
sys.path.insert(0, tcPath)
The error is raised from the fourth line of the myCwd.index(svtestcases) part.
The path of the python script is : "C:\Netra_Step_2015\Tests\SVTestcases\TC-Regression"
What might be the issue? Also why there is a ':' before myCwd.index? Can anyone explain please?
Looks like your "myCwd" and "svtestcases" doesn't have anything in common and when you try to find the index of substring "svtestcases" , it is not matched at all with your myCwd.
For e.g :
>>> a = '/Test/test1/test2/test3'
>>> a.index('/Test')
0
>>> a.index('test2')
12
>>>
>>> a.index('abc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
>>>
Though the comment already mentions about it. i just added a little code to make it more easy to understand. It also tells you about the colon part.
Read more about slicing and you will find it.

Python module returning errors in bash but not from IDLE

I'm a newbie programmer posting here for the first time. Any suggestions or advice would be greatly appreciated! I am working on a project that compares the contents of, say test.csv to ref.csv (both single columns containing strings of 3-4 words) and assigns a score to each string from test.csv based its similarity to the most similar string in ref.csv. I am using the fuzzywuzzy string matching module to assign the similarity score.
The following code snippet takes the two input files, converts them into arrays, and prints out the arrays:
import csv
# Load text matching module
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
# Import reference and test lists and assign to variables
ref_doc = csv.reader(open('ref.csv', 'rb'), delimiter=",", quotechar='|')
test_doc = csv.reader(open('test.csv', 'rb'), delimiter=",", quotechar='|')
# Define the arrays into which to load these lists
ref = []
test = []
# Assign the reference and test docs to arrays
for row in ref_doc:
ref.append(row)
for row in test_doc:
test.append(row)
# Print the arrays to make sure this all worked properly
# before we procede to run matching operations on the arrays
print ref, "\n\n\n", test
The problem is that this script works as expected when I run it in IDLE, but returns the following error when I call it from bash:
['one', 'two']
Traceback (most recent call last):
File "csvimport_orig.py", line 4, in <module>
from fuzzywuzzy import fuzz
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/fuzzywuzzy/fuzz.py", line 32, in <module>
import utils
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/fuzzywuzzy/utils.py", line 6, in <module>
table_from=string.punctuation+string.ascii_uppercase
AttributeError: 'module' object has no attribute 'punctuation'
Is there something I need to configure in bash for this to work properly? Or is there something fundamentally wrong that IDLE is not catching? For simplicity's sake, I don't call the fuzzywuzzy module in this snippet, but it works as expected in IDLE.
Eventually, I'd like to use pylevenshtein but am trying to see if my use for this script has value before I put the extra time in making that work.
Thanks in advance.
Almost certainly you have a module called string.py which is being loaded instead of the stdlib's string.py module.
You can confirm this by adding the lines
import string
print string
to your csvimport_orig.py program. This should show something like
<module 'string' from '/usr/lib/python2.7/string.pyc'>
[I'm on linux at the moment, so the location is different and you should see the usual /Library/Frameworks/etc. equivalent.] What it will probably show instead is
<module 'string' from 'string.py'>
or, instead of string.py, wherever your conflicting library is. Rename your string.py module.

Creating a Python function that opens a textfile, reads it, tokenizes it, and finally runs from the command line or as a module

I have been trying to learn Python for a while now. By chance, I happened across chapter 6 of the official tutorial through a Google search link pointing
here.
When I learned, from that page, that functions were the heart of modules, and that modules could be called from the command line, I was all ears. Here's my first attempt at doing both, openbook.py
import nltk, re, pprint
from __future__ import division
def openbook(book):
file = open(book)
raw = file.read()
tokens = nltk.wordpunct_tokenize(raw)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
vocab = sorted(set(words))
return vocab
if __name__ == "__main__":
import sys
openbook(file(sys.argv[1]))
What I want is for this function to be importable as the module openbook, as well as for openbook.py to take a file from the command line and do all of those things to it.
When I run openbook.py from the command line, this happens:
gemeni#a:~/Projects-FinnegansWake$ python openbook.py vicocyclometer
Traceback (most recent call last):
File "openbook.py", line 23, in <module>
openbook(file(sys.argv[1]))
File "openbook.py", line 5, in openbook
file = open(book)
When I try using it as a module, this happens:
>>> import openbook
>>> openbook('vicocyclometer')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'module' object is not callable
So, what can I do to fix this, and hopefully continue down the long winding path to enlightenment?
Error executing openbook.py
For the first error, you are opening the file twice:
openbook(file(sys.argv[1]))
ph0 = open(book)
Calling both file() and open() is redundant. They both do the same thing. Pick one or the other: preferably open().
open(...)
open(name[, mode[, buffering]]) → file object
Open a file using the file() type, returns a file object. This is the
preferred way to open a file.
Error importing openbook module
For the second error, you need to add the module name:
>>> import openbook
>>> openbook.openbook('vicocyclometer')
Or import the openbook() function into the global namespace:
>>> from openbook import openbook
>>> openbook('vicocyclometer')
Here are some things you need to fix:
nltk.word_tokenize will fail every time:
The function takes sentences as arguments. Make sure that you use nltk.sent_tokenize on the whole text first, so that things work correctly.
Files not being dealt with:
Only open the file once.
You're not closing the file once it's done. I recommend using Python's with statement to extract the text, as it closes things automatically: with open(book) as raw: nltk.sent_tokenize(raw) ...
Import the openbook function from the module, not just the module: from openbook import openbook.
Lastly, you could consider:
Adding things to the set with a generator expression, which will probably reduce the memory load: set(w.lower() for w in text)
Using nltk.FreqDist to generate a vocab & frequency distribution for you.
Try
from openbook import *
instead of
import openbook
OR:
import openbook
and then call it with
openbook.openbook("vicocyclometer")
In your interactive session, you're getting that error because you need to from openbook import openbook. I can't tell what happened with the command line because the line with the error got snipped. It's probably that you tried to open a file object. Try just passing the string into the openbook function directly.

Categories