I am trying to save information about video feeds in Youtube Data API. (eg. video title for each entry in the feed and save it to a file).
def SaveFeed(feed,filename):
with open(filename,"w") as f:
counter = 0;
for e in feed.entry:
counter += 1;
f.write("Counter: "+ str(counter) + '\n');
file.write('Video title: %s\n' % e.media.title.text)
yt_service = gdata.youtube.service.YouTubeService()
yt_service.ssl = True
feed = yt_service.GetMostRecentVideoFeed();
feed2 = yt_service.GetMostViewedVideoFeed();
feed3 = yt_service.GetMostRespondedVideoFeed();
feed4 = yt_service.GetMostDiscussedVideoFeed();
However, what I get is the same list of videos in each feed (most recent, most viewed, most responded). Starting with "Video title: Charlie bit my finger - again !".
Problem was not solved, but it was eventually explained.
The methods I used had been deprecated.
[https://developers.google.com/youtube/2.0/developers_guide_protocol_video_feeds#Standard_feeds] is explaining why there is always the same output.
All standard video feeds except the most_popular feed have been deprecated. In response to requests for other feeds, the API will return the most_popular feed with a default time parameter value of today.
I'm trying to download the complete title/abstract data from PMC/Pubmed. This is an age-old question but none of the answers at stackoverflow seems to answer it.
A general approach is to use Entrez package, but then again, you need to specify search terms. Also there is a limit on the query request you can send over time.
from Bio import Entrez
Entrez.email = "A.N.Other#example.com"
handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463)
record = Entrez.read(handle)
idlist = record["IdList"]
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text")
records = Medline.parse(handle)
for record in records:
print("title:", record.get("TI", "?"))
print("authors:", record.get("AU", "?"))
print("source:", record.get("SO", "?"))
Is there anyway I can download the entire article+abstract data from PMC, using Python or directly from any other sources?
One way you can attack this problem is using esearch method with a term that allows to search articles from the beginning of pubmed, and start to bring the articles in a iterative way changing the retstart parameter.
batch_size = 20
start = 0
while start<1000:
handle = Entrez.esearch(db="pubmed",term = "2015/3/1:2022/4/30[Publication Date]",retmode="xml",retstart = start, retmax = batch_size)
summaries = Entrez.read(handle)
start = start + batch_size
I'm fairly new to AWS and for the past week, been following all the helpful documentation on the site.
I am currently stuck on bring unable to pull the External Image Id data from a Reko collection after a 'search face by image', I just need to be able to put that data into a variable or to print it, does anybody know how I could do that?
Basically, this is my code:
import boto3
if name == "main":
bucket = 'bucketname'
collectionId = 'collectionname'
fileName = 'test.jpg'
threshold = 90
maxFaces = 2
admin = 'test'
targetFile = "%sTarget.jpg" % admin
imageTarget = open(targetFile, 'rb')
client = boto3.client('rekognition')
response = client.search_faces_by_image(CollectionId=collectionId,
Image={'Bytes': imageTarget.read()},
faceMatches = response['FaceMatches']
print ('Matching faces')
for match in faceMatches:
print ('FaceId:' + match['Face']['FaceId'])
print ('Similarity: ' + "{:.2f}".format(match['Similarity']) + "%")
at the end of it, I receive:
Matching faces
Similarity: 96.12%
Process finished with exit code 0
What I need is the External Image Id instead of the FaceId.
I am trying to do reverse geocoding and extract pincodes for lot-long. The .csv file has around 1 million records..
Below is my problem
1. Google API failing to give address for large records, and taking huge amount of time. I will later move it to Batch-Process though.
2. I tried to split the file into chunks and ran few files manually one by one (1000 records in each file after splitting), then i surprisingly get 100% result.
3. Later, I ran in loop one by one, again, Google API fails to give the result
Note: Right now we are looking for free API's only
**Below is my code**
def reverse_geocode(latlng):
result = {}
url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={}'
request = url.format(latlng)
key= '&key=' + api_key
request = request + key
data = requests.get(request).json()
if len(data['results']) > 0:
result = data['results'][0]
return result
def parse_postal_code(geocode_data):
if (not geocode_data is None) and ('formatted_address' in geocode_data):
for component in geocode_data['address_components']:
if 'postal_code' in component['types']:
return component['short_name']
return None
dfinal = pd.DataFrame(columns=colnames)
dmiss = pd.DataFrame(columns=colnames)
for fl in files:
df = pd.read_csv(fl)
print ('Processing file : ' + fl[36:])
df['geocode_data'] = ''
df['Pincode'] = ''
df['Pincode'] = df['geocode_data'].map(parse_postal_code)
if (len(df[df['Pincode'].isnull()]) > 0):
print("Missing Picodes : " + str(len(df[df['Pincode'].isnull()])) + " / " + str(len(df)))
Can anybody help me out, what is the problem in my code? or if any additional info required please let me know....
You've run into Google API usage limits.
The following code annotates a video with labels (or "tags") that are selected based on the image content.
import argparse
from google.cloud import videointelligence
def analyze_labels(path):
""" Detects labels given a GCS path. """
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.enums.Feature.LABEL_DETECTION]
operation = video_client.annotate_video(path, features=features)
print('\nProcessing video for label annotations:')
result = operation.result(timeout=90)
print('\nFinished processing.')
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
if __name__ == '__main__':
parser = argparse.ArgumentParser(
parser.add_argument('path', help='GCS file path for label detection.')
args = parser.parse_args()
The code prints the result to the terminal.
I want to store the Video label description, Label category description and confidence values to a Database(maybe MongoDB) and later retrieve it in a Noje.js application.
How can I do that or Is there any other better way?
There's nothing stopping you from doing that; as long as both your python program and your Node.js program agree on how data is being stored, you should have no trouble storing and retrieving simple strings. MongoDB, being a key/value store, would work well if you will know the keys you need to retrieve beforehand; something like SQLite might be better if you want to retrieve data based on some criteria without knowing an exact key. Both Node.js and Python have MongoDB/SQLite clients that function perfectly well.
Apologies if this isn't totally clear - I'm a Python copy-the-code-and-try-to-make-it-work developer.
I'm using the Google NLP API in Python 2.7.
When I use analyze_entities(), I can get and print the name, entity type and salience.
Mentions is supposed to contain the noun type: PROPER or COMMON, per this page:
I can't get mention type from the returned dictionary.
Here's my hideous code:
def entities_text(text, client):
"""Detects entities in the text."""
language_client = client
# Instantiates a plain text document.
document = language_client.document_from_text(text)
# Detects entities in the document. You can also analyze HTML with:
# document.doc_type == language.Document.HTML
entities = document.analyze_entities()
return entities
articles = os.listdir('articles')
for f in articles:
language_client = language.Client()
fname = "articles/" + f
thisfile = open(fname,'r')
content = thisfile.read()
entities = entities_text(content, language_client)
for e in entities:
name = e.name.strip()
type = e.entity_type.strip()
if e.name.strip()[0].isupper() and len(e.name.strip()) > 2:
print name, type, e.salience, e.mentions
That returns this:
RELATED OTHER 0.0019081507 [u'RELATED']
Zoe 3 PERSON 0.0016676666 [u'Zoe 3']
Where the value in [] is the mentions.
If I try to get mentions.type, I get an attribute not found error.
I'd appreciate any input.
1) Do not call the "AnalyzeEntities" function, but call the "AnnotateText" one instead.
2) Check for "Proper". Examine its value, it should be "PROPER" and not "PROPER_UNKNOWN" nor "NOT_PROPER".