Is there a way to restrict certain words from appearing in a title in google books api.
For example, I want to receive data about fantasy books however I keep getting books such as "Guide to Literary Agents 2017" in my search. I was wondering if I could restrict the some words such as "Literary" in my search (or would there be a better way to resolve this problem).
Also this is my api link:
https://www.googleapis.com/books/v1/volumes?q=subject: fantasy+ young adult &printType=books&langRestrict= en&maxResults=40&key=APIKey'
Yes, in the Books APIs Developers Guide, I found this.
To exclude entries that match a given term, use the form q=-term.
So, in your example you could try something like
https://www.googleapis.com/books/v1/volumes?q=-Literary&subject:fantasy+young%20adult&printType=books&langRestrict=en&maxResults=40
I didn't see the title Guide to Literary Agents 2017 in the results, so I tried excluding a few other keywords and it does seem to exclude those titles.
Related
I trying to build a corpus of documents related to earthquakes. I want to download all news articles related to that event. My problem is that using google search(stackoverflow.com/questions/…) gives bias with respect what is revelant now. Instead I want all articles irrespective of time or relevance.
The problem is that Google is trying to guess what is the most relevant search result for a user entering your query, and you are interested in all of them.
You would be better served by a newspaper article database than by Google in this case. If you are currently enrolled in a university, ask your library for this kind of resource. If you have access to such a database, you will be able to search for every article containing a given keyword, and some search forms will even let you filter by publisher, by date, by geographical location, etc...
Eureka.cc is an example of such a database.
Some newspapers' websites will give you access to their article archive. New York Times is one of those.
Here is a result searching in their article database for "earthquake".
More info about newspaper article databases
There are a lot of classified ads appearing in NON-HTML format(paper ,text ,written ,etc) which tend to sell house,automobile,rent,lease,flat,etc. A classified ads say for example, a flat rent ad has some of the features included like: SIZE,AREA,LOCALITY,PRICE,CONTACT INFO. .etc
My question is how to extract the street address(address mentioned in article /LOCALITY) in which the ad resides or has mentioned in former article ?
Is there any solution to this problem using NLTK & python ??
Imagine that the source of article is in normal text file(.txt) .
If the source is in .txt format regular expressions probably would be the best solution.
I don't think it easy (or even possible) to write a regex for all arbitrary kinds of ads but the more examples you'll have the better your search will work.
We have scanned thousands of old documents and entered key data into a database. One of the fields is author name.
We need to search for documents by a given author but the exact name might have been entered incorrectly as on many documents the data is handwritten.
I thought of searching for only the first few letters of the surname and then presenting a list for the user to select from. I don't know at this stage how many distinct authors there are, I suspect it will be in the hundreds rather than hundreds of thousands. There will be hundreds of thousands of documents.
Is there a better way? Would an SQL database handle it better?
The software is python, and there will be a list of documents each with an author.
I think you can use mongodb where you can set list field with all possible names of author. For example you have handwriten name "black" and you cant recognize what letter in name for example "c" or "e" and you can set origin name as "black" and add to list of possible names "blaek"
You could use Sunburnt which is a Python-Solr library which accesses Solr which is built on top of Lucene.
An excerpt of what Solr is:
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
It will give you all you want for searching documents, including partial hits and potential matches on whatever your search criteria is.
We are developing an e-commerce portal that enables users to list their items (name, description, tags) on the site.
However, we realized that users are not understanding item tags very well, some of them write arbitrary words some others leave it blank, so we decided to deal with it, i thought about using an Entity Extractor to generate tags, first, i tried to pass this listing to Calais:
I'm a Filipino Male looking for Office Assistant job,with knowledge in MS Word,Excel,Power Point & Internet Browsing,i'm a quick learner with clear & polite communicative skills,immense flexibility in terms of work assignments and work hours,and performing my duties with full dedication,integrity and honesty.
and i got these tags: Religion Belief, Positive psychology, Integrity, Evaluation, Behavior, Psychology, Skill.
Then i tried Stanford NER and got: Excel, Power, Point, &, Internet, Browsing
after that, i stopped trying these solutions as i thought they will not fit, and started thinking about having an e-commerce-related thesaurus that may contain product/brand names and trade related terms so i can use it with filtering user-generated posts and finding the proper tags but i couldn't find one.
so 1st question: did i miss something?
2nd question: is there better scinarios for this (i.e generating the tags)?
I have little working knowledge of python. I know that there is something called a Twitter search API, but I'm not really sure what I'm doing. I know what I need to do:
I need point data for a class. I thought I would just pull up a map of the world in a GIS application, select cities that have x population or larger, then export those selections to a new table. That table would have a key and city name.
next i randomly select 100 of those cities. Then I perform a search of a certain term (in this case, Gaddafi) for each of those 100 cities. All I need to know is how many posts there were on a certain day (or over a few days depending on amount of tweets there were).
I just have a feeling there is something that already exsists that does this, and I'm having a hard time finding it. I've dowloaded and installed python-twitter but have no idea how to get this search done. Anyone know where I can find or how I can make this tool? Any suggestions would really help. Thanks!
A tweet itself comes with a geo tag. But it is a new feature and majority tweets do not have it. So it is not possible to search for all tweets containing "Gaddafi" from a city given the city name.
What you could do is the reverse, you search for "Gaddafi" first (regardless of geo location), using search api. Then, for each tweet, find the location of the poster (either thru the RESTful api or use some sort of web scraping).
so basically you can classify the tweets collected according to the location of the poster.
I think only tweepy have access to both twitter search API as well as RESTful API.