Using Pygithub to search code and identify author

Using Pygithub to search code and identify author - python

Does anyone have a recommendation how how I can perform a string search in Github code, identify files matching the string in code and then identify the owner of the code file?
This function allows me to search code, but I can't figure out how to identify the author/owner.
def search_github(keyword):
#rate_limit = g.get_rate_limit()
#rate = rate_limit.search
#if rate.remaining == 0:
# print(f'You have 0/{rate.limit} API calls remaining.
Reset time: {rate.reset}')
# return
#else:
# print(f'You have {rate.remaining}/{rate.limit} API
calls remaining')
query = f'"{keyword}"'
result = g.search_code(query, order='desc')
max_size = 100
print(f'Found {result.totalCount} file(s)')
if result.totalCount > max_size:
result = result[:max_size]
for file in result:
print(f'{file.download_url}')

Related

Pattern3 code not working for me in Python3 (but I know the code works for others), does it matter that I'm using a Mac?

I continue to get the error, "'<=' not supported between instances of 'NoneType' and 'int'", but none of the fellow students are receiving this error. This code was provided to us, so I know I didn't do something wrong. My only guess is that it needs something different when it is run on a mac vs windows?
%%time
from pattern.web import Twitter, hashtags
engine = Twitter(language="en")
indexid = set()
tweets =[]
prev = None
for i in range(5):
print(i)
for tweet in engine.search("stock market bull or bear", start=prev, count= 20, cached=False):
print(f" hashtag = {hashtags(tweet.text)} text = {tweet.text}, author = {tweet.author}, date = {tweet.date} ")
if len(tweet.text) > 0 and tweet.id not in indexid:
tweets.append(tweet.text)
indexid.add(tweet.id)
prev = tweet.id
print(f"Found {len(tweets)} tweets!" )
print("")

The default to engine.search must be an integer or an id. The method is defined as follows:
def search(self, query, type=SEARCH, start=1, count=10, sort=RELEVANCY, size=None, cached=False, **kwargs):
""" Returns a list of results from Twitter for the given query.
- type : SEARCH,
- start: Result.id or int,
- count: maximum 100.
There is a limit of 150+ queries per 15 minutes.
"""
So in this case, all you need to do is initialize prev to start at 1, instead of None (which is what is causing your error).
prev = 1
for i in range(5):
...
Which resulted in:
Found 25 tweets!

unable to append unwatched EPs to list

The idea of the code is to add to existent playlist unwatched EPs by index order, ep 1 Show X, ep 1 Show Z, regardless of air date:
from plexapi.server import PlexServer
baseurl = 'http://0.0.0.0:0000/'
token = '0000000000000'
plex = PlexServer(baseurl, token)
episode = 0
first_ep_name = []
for x in plex.library.section('Anime').search(unwatched=True):
try:
for y in plex.library.section('Anime').get(x.title).episodes()[episode]:
if plex.library.section('Anime').get(x.title).episodes()[episode].isWatched:
episode +=1
first_ep_name.append(y)
else:
episode = 0
first_ep_name.append(y)
except:
continue
plex.playlist('Anime Playlist').addItems(first_ep_name)
But when I run it, it will always add watched EPs but if I debug the code in Thoni IDE it seems that is doing its purpose so I am not sure whats wrong with that code.
Any ideas?
Im thinking that the error might be here:
plex.playlist('Anime Playlist').addItems(first_ep_name)
but according to the documentation addItems should be a list but my list "first_ep_name " its already appending unwatched episodes in the correct order, in theory addItems should recognize the specific episode and not only the series name but I am not sure anymore.

is somebody out there is having the same issue with plexapi I was able to find a way to get this project working properly:
from plexapi.server import PlexServer
baseurl = 'insert plex url here'
token = 'plex token here'
plex = PlexServer(baseurl, token)
anime_plex = []
scrapped_playlist = []
for x in plex.library.section('Anime').search(unwatched=True):
anime_plex.append(x)
while len(anime_plex) >0:
episode_list = []
for y in plex.library.section('Anime').get(anime_plex[0].title).episodes():
episode_list.append(y)
ep_checker = True
while ep_checker:
if episode_list[0].isWatched:
episode_list.pop(0)
else:
scrapped_playlist.append(episode_list[0])
episode_list.clear()
ep_checker = False
anime_plex.pop(0)
# plex.playlist('Anime Playlist').addItems(scrapped_playlist)
plex.playlist('Anime Playlist').delete()
plex.createPlaylist('Anime Playlist', section='Anime', items= scrapped_playlist)
Basically, what I am doing with that code I am looping through each anime series I have and if EP # X is watched then pop from the list until it finds a boolean FALSE then that will append into an empty list that later I will use for creating/adding to playlist.
The last lines of the code can be commented on for whatever purpose, creating the playlist anime or adding items.

Find Value Using Selenium using a Variable that Contains String

I am trying to open up several URL's (because they contain data I want to append to a list). I have a logic saying "if amount in icl_dollar_amount_l" then run the rest of the code. However, I want the script to only run the rest of the code on that specific amount in the variable "amount".
Example:
selenium opens up X amount of links and sees ['144,827.95', '5,199,024.87', '130,710.67'] in icl_dollar_amount_l but i want it to skip '144,827.95', '5,199,024.87' and only get the information for '130,710.67' which is in the 'amount' variable already.
Actual results:
Its getting webscaping information for amount '144,827.95' only and not even going to '5,199,024.87', '130,710.67'. I only want it getting webscaping information for '130,710.67' because my amount variable has this as the only amount.
print(icl_dollar_amount_l)
['144,827.95', '5,199,024.87', '130,710.67']
print(amount)
'130,710.67'
file2.py
def scrapeBOAWebsite(url,fcg_subject_l, gp_subject_l):
from ICL_Awk_Checker import rps_amount_l2
icl_dollar_amount_l = []
amount_ack_missing_l = []
file_total_l = []
body_l = []
for link in url:
print(link)
browser = webdriver.Chrome(options=options,
executable_path=r'\\TEST\user$\TEST\Documents\driver\chromedriver.exe')
# if 'P2 Cust ID 908554 File' in fcg_subject:
browser.get(link)
username = browser.find_element_by_name("dialog:username").get_attribute('value')
submit = browser.find_element_by_xpath("//*[#id='dialog:continueButton']").click()
body = browser.find_element_by_xpath("//*[contains(text(), 'Total:')]").text
body_l.append(body)
icl_dollar_amount = re.findall('(?:[\£\$\€]{1}[,\d]+.?\d*)', body)[0].split('$', 1)[1]
icl_dollar_amount_l.append(icl_dollar_amount)
if not missing_amount:
logging.info("List is empty")
print("List is empty")
count = 0
for amount in missing_amount:
if amount in icl_dollar_amount_l:
body = body_l[count]
get_file_total = re.findall('(?:[\£\$\€]{1}[,\d]+.?\d*)', body)[0].split('$', 1)[1]
file_total_l.append(get_file_total)
return icl_dollar_amount_l, file_date_l, company_id_l, client_id_l, customer_name_l, file_name_l, file_total_l, \
item_count_l, file_status_l, amount_ack_missing_l

I don't know if I understand problem but this
if amount in icl_dollar_amount_l:
doesn't give information on which position is '130,710.67' in icl_dollar_amount_l and you need also
count = icl_dollar_amount_l.index(amount)
for amount in missing_amount:
if amount in icl_dollar_amount_l:
count = icl_dollar_amount_l.index(amount)
body = body_l[count]
But it will works if you expect only one amount on list icl_dollar_amount_l. For more elements you would have to use rather for-loop and check every element separatelly
for amount in missing_amount:
for count, item in enumerate(icl_dollar_amount_l)
if amount == item :
body = body_l[count]
But frankly I don't know why you don't check it in first loop for link in url: when you have direct access to icl_dollar_amount and body

Different result on browser search vs Bio.entrez search

I am getting different result when I use Bio Entrez to search. For example when I search on browser using query "covid side effect" I get 344 result where as I get only 92 when I use Bio Entrez. This is the code I was using.
from Bio import Entrez
Entrez.email = "Your.Name.Here#example.org"
handle = Entrez.esearch(db="pubmed", retmax=40, term="covid side effect", idtype="acc")
record = Entrez.read(handle)
handle.close()
print(record['Count'])
I was hoping if someone could help me with this discrepancy.

For some reason everyone seemed to have same issue whether it's R api or Python API. I have found a work around to get the same result. It is slow but it gets job done. If your result is less than 10k you could probably use Selenium to get the pubmedid. Else, we can scrape the data using code below. I hope this will help someone in future.
import requests
# # Custom Date Range
# req = requests.get("https://pubmed.ncbi.nlm.nih.gov/?term=covid&filter=dates.2009/01/01-2020/03/01&format=pmid&sort=pubdate&size=200&page={}".format(i))
# # Custom Year Range
# req = requests.get("https://pubmed.ncbi.nlm.nih.gov/?term=covid&filter=years.2010-2019&format=pmid&sort=pubdate&size=200&page={}".format(i))
# #Relative Date
# req = requests.get("https://pubmed.ncbi.nlm.nih.gov/?term=covid&filter=datesearch.y_1&format=pmid&sort=pubdate&size=200&page={}".format(i))
# # filter language
# # &filter=lang.english
# # filter human
# #&filter=hum_ani.humans
# Systematic Review
#&filter=pubt.systematicreview
# Case Reports
# &filter=pubt.casereports
# Age
# &filter=age.newborn
search = "covid lungs"
# search_list = "+".join(search.split(' '))
def id_retriever(search_string):
string = "+".join(search_string.split(' '))
result = []
old_result = len(result)
for page in range(1,10000000):
req = requests.get("https://pubmed.ncbi.nlm.nih.gov/?term={string}&format=pmid&sort=pubdate&size=200&page={page}".format(page=page,string=string))
for j in req.iter_lines():
decoded = j.decode("utf-8").strip(" ")
length = len(decoded)
if "log_displayeduids" in decoded and length > 46:
data = (str(j).split('"')[-2].split(","))
result = result + data
data = []
new_result = len(result)
if new_result != old_result:
old_result = new_result
else:
break
return result
ids=id_retriever(search)
len(ids)

Reverse-Geo Coding Fails when run in Loop

I am trying to do reverse geocoding and extract pincodes for lot-long. The .csv file has around 1 million records..
Below is my problem
1. Google API failing to give address for large records, and taking huge amount of time. I will later move it to Batch-Process though.
2. I tried to split the file into chunks and ran few files manually one by one (1000 records in each file after splitting), then i surprisingly get 100% result.
3. Later, I ran in loop one by one, again, Google API fails to give the result
Note: Right now we are looking for free API's only
**Below is my code**
def reverse_geocode(latlng):
result = {}
url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={}'
request = url.format(latlng)
key= '&key=' + api_key
request = request + key
data = requests.get(request).json()
if len(data['results']) > 0:
result = data['results'][0]
return result
def parse_postal_code(geocode_data):
if (not geocode_data is None) and ('formatted_address' in geocode_data):
for component in geocode_data['address_components']:
if 'postal_code' in component['types']:
return component['short_name']
return None
dfinal = pd.DataFrame(columns=colnames)
dmiss = pd.DataFrame(columns=colnames)
for fl in files:
df = pd.read_csv(fl)
print ('Processing file : ' + fl[36:])
df['geocode_data'] = ''
df['Pincode'] = ''
df['geocode_data']=df['latlng'].map(reverse_geocode)
df['Pincode'] = df['geocode_data'].map(parse_postal_code)
if (len(df[df['Pincode'].isnull()]) > 0):
d0=df[df['Pincode'].isnull()]
print("Missing Picodes : " + str(len(df[df['Pincode'].isnull()])) + " / " + str(len(df)))
dmiss.append(d0)
d0=df[~df['Pincode'].isnull()]
dfinal.append(d0)
else:
dfinal.append(df)
Can anybody help me out, what is the problem in my code? or if any additional info required please let me know....

You've run into Google API usage limits.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Pygithub to search code and identify author - python

Related

Pattern3 code not working for me in Python3 (but I know the code works for others), does it matter that I'm using a Mac?

unable to append unwatched EPs to list

Find Value Using Selenium using a Variable that Contains String

Different result on browser search vs Bio.entrez search

Reverse-Geo Coding Fails when run in Loop

Categories

Resources