(This is my items.py)
import scrapy
class FreelanceItem(scrapy.Item):
url = scrapy.Field()
url = scrapy.Field()
When I started another python and imported Package
import scrapy
from scrapy.item import Item , Field
from freelance.items import FreelanceItem
I get this :
ModuleNotFoundError: No module named 'freelance'
How should I do ?
thanks.
Youre accessing it the wrong way..
Lets say you are in a directory called PythonTest, where you also have your main.py file.
Steps:
Create a folder named "freelance" in this PythonTest Directory
add an empty file in this directory (freelance dir) named : "_ init _.py" (this tells python it is a package)
add your items.py file aswell in this directory
Now go to your 'main.py' and add the line:
from freelance.items import FreeLanceItem
Also make sure to have correct indenting in your code.(see below)
import scrapy
class FreeLanceItem(scrapy.Item):
url = scrapy.Field()
url = scrapy.Field()
running the code should not produce an error anymore.
Let me know if this helped!
Related
[Im working on OSX]
I'll post the relevant portions of my program down below:
Spider:
# -*- coding: utf-8 -*-
import scrapy
#import pandas as pd
from ..items import Homedepotv2Item
from scrapy.http import Request
class HomedepotspiderSpider(scrapy.Spider):
name = 'homeDepotSpider'
allowed_domains = ['homedepot.com']
pathName = '/Users/user/Desktop/homeDepotv2Helpers/homeDepotInfo.csv'
export = pd.read_csv(pathName, skiprows = [0], header = None)
#pathName: Find the correct path for the file
#skiprows: The first row is occupied for the title, we dont need that
omsList = export.values.T[1].tolist() #Transpose the matrix + get second path
start_urls = ['https://www.homedepot.com/p/{omsID}'].format(omsID = omsID)
for omsID in omsList]
def parse(self, response):
#call home depot function
for item in self.parseHomeDepot(response):
yield item
pass
Settings:
BOT_NAME = 'homeDepotv2'
SPIDER_MODULES = ['homeDepotv2.spiders']
NEWSPIDER_MODULE = 'homeDepotv2.spiders'
When I try running my spider by using the command: scrapy crawl homeDepotSpider
I get this error ModuleNotFoundError: No module named 'homeDepotv2'
Initially I thought I was having a directory error so instead of using cd to find my directory I copied in the pathname for the directory of the spider which was
/Users/userName/homeDepotv2_Spider/build/lib/homeDepotv2
However that still returned the same error.
Not too sure what's wrong here, so any help would be appreciated!
and here is the fire hierarchy:
Check this video,
Path append | how to fix "Module not found" with Scrapy items.py
I had the same problem, the solution is to use:
from sys import path
path.append(/Users/userName/homeDepotv2_Spider)
You may need to check/modify the path, as Scrapy makes 2 directories with same name.
I have the following folder structure.
check_site
- test_site
-- views.py
- app2
- app3
- modules
-- url.py
-- usability.py
module ulr.py contains one class inside - Url.py
class URL:
...
module usability.py contains one class that inherit URL class
from url import URL
class Usability(URL):
...
And then I have a view.py where I neen to import class Usability
from modules.url import URL
from modules.usability import Usability
And here is a problem. It gives me an error
from url import URL
ModuleNotFoundError: No module named 'url'
I've tried to change the import in usability.py to
from modules.url import URL but in this case it gives the error in the usability.py
Unable to import modules.url
I've also tried
from .url import URL and from check_site.modules.url import URL But these also don't work
If someone knows how to fix it, please help
Well, the problem lies here because by default Python searches for the file in the current directory but the file u want to import is not in the same directory as your program.
You should try sys.path
# some_file.py
import sys
# insert at 1, 0 is the script path (or '' in REPL)
sys.path.insert(1, '/path/to/application/app/folder')
import file
This should work in most cases.
Note: I have already check question with same error as mine but mine is different i want to ask if "clean_string, clean_number, clean_text, clean_float, clean_int"
agency_id = scrapy.Field(serializer=clean_string)
are some in built function in python or i have to import to make it work
I am new in python just doing some programming stuffs.
Below i my code snippet
from .utils import clean_string, clean_number, clean_text, clean_float, clean_int
from six.moves.urllib.parse import urljoin
class MyItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
agency_id = scrapy.Field(serializer=clean_string)
when i run above code it give me error
**ImportError: No module named utils**
can you help with it have i to install clean_string or something
As per our discussion, please install python -m pip install pyes
do it as below:
from pyes import utils
# use it like below
class MyItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
agency_id = scrapy.Field(serializer=utils.clean_string)
(or)
from pyes.utils import clean_string
# use it like below
class MyItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
agency_id = scrapy.Field(serializer=clean_string)
you cannot use from .utils import clean_string because it looks for utils in current working directory. Instead either use from django import utils (or) use from pyes.utils import clean_string
i'm trying to make a script that runs many spiders but i'm getting ImportError: No module named project_name.settings
my script looks like this:
import os
os.system("scrapy crawl spider1")
os.system("scrapy crawl spider2")
....
os.system("scrapy crawl spiderN")
My settings.py
# -*- coding: utf-8 -*-
# Scrapy settings for project_name
#
# For simplicity, this file contains only the most important settings by
# default. All the other settings are documented here:
#
# http://doc.scrapy.org/en/latest/topics/settings.html
#
BOT_NAME = 'project_name'
ITEM_PIPELINES = {
'project_name.pipelines.project_namePipelineToJSON': 300,
'project_name.pipelines.project_namePipelineToDB': 800
}
SPIDER_MODULES = ['project_name.spiders']
NEWSPIDER_MODULE = 'project_name.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'project_name (+http://www.yourdomain.com)'
And my spiders look like any normal spider, quite simple ones actually...
import scrapy
from scrapy.crawler import CrawlerProcess
from Projectname.items import ProjectnameItem
class ProjectnameSpiderClass(scrapy.Spider):
name = "Projectname"
allowed_domains = ["Projectname.com"]
start_urls = ["...urls..."]
def parse(self, response):
item = ProjectnameItem()
I gave them generic names but you get the idea, is there a way to solve this error?
Edit 2018:
You need to run the spider from the project folder, meaning that the os.system("scrapy crawl spider1") has to be run from the folder with the spider1.
Or you can do as I did in the past, putting all the code in a single file (old answer, not recommended by me anymore, but still useful and decent solution)
Well, in case someone comes up to this question I finally used a heavily modified version of this https://gist.github.com/alecxe/fc1527d6d9492b59c610 provided by alexce in another question. Hope this helps.
My app name is search_keywords. As on creating this app, I wiil have one file called models.py
in which I have written this piece if code :
from django.db import models
class Keywords(models.Model):
file_name = models.CharField(primary_key=True, max_length=100)
frequency_count = models.IntegerField()
then add this app to INSTALLED_APPS and run python manage.py syncdb. On running this command , I will get a table automatically created in django. Then run python manage.py sql search_keywords. It will show the table as desired.
Then the next step is to run python manage.py shell.Instead of running this I want to insert the values in my table created through the python code that I have created.The code is:
#!/usr/bin/python
#here skey.py is another file created by me and has the imported functions
#in this code
from skey import find_root_tags, count, sorting_list
str1 = raw_input("enter the word to be searched\n")
list = []
fo = open("xml.txt","r")
for i in range(count.__len__()):
file = fo.readline()
file = file.rstrip('\n')
find_root_tags(file,str1,i)
list.append((file,count[i]))
sorting_list(list)
fo.close()
I want to insert this list elements in the the table created by the django and that too after the function sorting_list has been called. The list contains the file name and it's count. e.g. list=[('books.xml','3'),('news.xml,'2')].
How can I do that?
Please Help.
//////////////////////////////////////////////////////////////////////////////
Hey I have written the code:
#!/usr/bin/python
#to tell django which settings module to use
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
from search.models import Keywords
from skey import find_root_tags, count, sorting_list
str1 = raw_input("enter the word to be searched\n")
list = []
fo = open("xml.txt","r")
for i in range(count.__len__()):
file = fo.readline()
file = file.rstrip('\n')
find_root_tags(file,str1,i)
list.append((file,count[i]))
sorting_list(list)
for name, count in list:
s = Keywords(file_name=name,frequency_count=count)
s.save()
fo.close()
Here django_project = mysite #my project's name
and app = search #my app's name
on running this code it gives me error :
Traceback (most recent call last):
File "call.py", line 7, in
from search.models import Keywords
ImportError: No module named search.models
and on including :
import sys
sys.path.insert(0, path_to_django_project)
this in above code it gives error:
Traceback (most recent call last):
File "call.py", line 4, in
sys.path.insert(0,path_to_mysite)
NameError: name 'path_to_mysite' is not defined
Why?I have my project on the desktop and the above python code file as well.
Please Help!!
//////////////////////////////////////////
now it's giving me this error , please help.see it at :
error in accessing table created in django in the python code
It shouldn't be a probjem just importing your models and create object instances to persist to the database:
# setup sys.path first if needed
import sys
sys.path.insert(0, path_to_django_project)
# tell django which settings module to use
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'django_project.settings'
# import application models
import test_app.models as m
# create and save db entries...
o = m.TestObject()
o.var = 'spam'
o.par = 1
o.save()
For each element in your list, create an instance of Keywords_Search. After that, call .save() on each of them. For example:
for name, count in mylist:
s = Keywords_Search(file_name=name, frequency_count=count)
s.save()
First, there are few problems with your code (for example, you are calling methods that are built-ins, not to mention count.__len__()). Instead of trying to figure out what you are doing, here are the two recommended way to do this in django:
Django provides the functionality to provide initial data for models in SQL, XML, YAML or JSON. This allows for automatic data insertion everytime you run syncdb. You don't have to run a command again.
Create a custom management command and run it with manage.py
I'm assuming you have a file of keywords, and you want to search how many of the entered keyword is in the file.
Here is how you would create a custom management command that does this:
from collections import Counter
from django.core.management.base import BaseCommand, CommandError
from search.models import Keywords
class Command(BaseCommand):
args = 'search_string'
help = 'Enter the search string'
def handle(self, *args, **options):
file_contents = []
with open('keyword_master.txt') as f:
file_contents = [l for l in f.readlines() if l.rstrip('\n')]
if not file_contents:
raise CommandError("Cannot read keyword_master.txt")
c = Counter(file_contents)
for search_string in args:
freq = c[search_string]
Keywords.object.create(keyword=search_string,frequency_count=freq)
self.stdout.write('Added %s freq for "%s"' % (freq,search_string))
Create a folder called commands inside any app. Inside this folder, create a blank file called __init__.py. In the same folder, save this file as "keyword_importer.py". So now you have:
/search
..__init__.py
../commands
.....__init__.py
.....keyword_importer.py
.....keyword_master.txt
Now run it with python manage.py keyword_importer mykeyword