Django urls.py and optional parameters in def view - python

I've this in urls.py. I'm using paginator. If I get some UsuarioName it responses me the UsuarioName profile and page = 1. But I could see following pages. I've followed Django paginator doc.
url(r'^(?P<UsuarioName>.+)/$', "actividades.views.ShowUserPage"),
url(r'^(?P<UsuarioName>.+)/?(?P<page>.+)/?$', "actividades.views.ShowUserPage"),
views.py, If I write def.. (... UsuarioName, page), is wrong, because there's no parameter for the first url entry.
def ShowUserPage(request, UsuarioName, page):
UsuarioModel = UserProfile.objects.get(user__username=UsuarioName)
UserPage = '<div class="userpage">'
UserPage += '<strong>' + UsuarioModel.titulo + '</strong><br>'
UserPage += UsuarioModel.user.get_username() + "<br>"
UserPage += UsuarioModel.descripcion + "<br>"
UserPage += '</div>'
UserPage += '<strong>Actividades de usuario</strong>'
UserActList = UserActivities.objects.filter(user=UsuarioModel).values('actividad','fecha_alta')
paginator = Paginator(UserActList, 2)
page = 1
try:
ActPage = paginator.page(page)
except PageNotAnInteger:
ActPage = paginator.page(1)
except EmptyPage:
ActPage = paginator.page(1)
#print ActPage.object_list[0]['actividad']
for ActividadActual in ActPage:
UserAct = Actividad.objects.get(id_evento=ActividadActual['actividad'])
UserPage += '<div class="activity">'
UserPage += '<strong>Actividad: ' + UserAct.titulo + '</strong><br>'
UserPage += 'Fecha actividad: ' + UserAct.fecha.strftime("%d-%m-%y") + '<br>'
UserPage += 'Fecha alta: ' + ActividadActual['fecha_alta'].strftime("%d-%m-%y") + '<br>'
UserPage += '</div>'
return HttpResponse(UserPage)
How can I solve it? I know that I can split url path, but I don't like it too much..

Maybe you should do this :
url(r'^(?P<UsuarioName>.+)/(?P<page>\d+)/?$', "actividades.views.ShowUserPage"),
url(r'^(?P<UsuarioName>.+)/$', "actividades.views.ShowUserPage"),
I changed the order of the urls, so that /me/3 wouldn't be matched first by (?P.+). I've also set a number type to the page argument, because it will always be a number, so there is no reason to catch anything else than than.
And then, you should try :
def ShowUserPage(request, UsuarioName, page=1):
It should allow the query without a page to call this function without a page argument (defaulting it to 1).
I didn't try any of this, so if it doesn't help, I'll take a deeper look into it.

I think I got it.
Default page = 1. Always,
next time, there will be a link for getting following pages. I can get it as a POST argument?¿?¿?

Related

Scraping Error: Index Error: List index out of range While(writing on csv) in python

How To Fix This Error:
Traceback (most recent call last):
File "scrap.py", line 37, in <module>
code()
File "scrap.py", line 34, in code
s.write(str(g_name[i].text) + ',' + str(phone[i].text) + ',' + str(website[i
].text) + ',' + str(reviews[i].text) + '\n')
IndexError: list index out of range
I'm Trying To Fix Again And Again But Every time i can't
What is the meaning of this error and why i'm getting this error?
Here is my code:
driver = webdriver.Chrome()
for url in urls:
if str(url) == '0':
driver.get('https://www.google.com/search?tbm=lcl&ei=kALeXauoIMWasAfc27TAAQ&q=software+house+in+johar+town+lahore&oq=software+house+in+johar+town+lahore&gs_l=psy-ab.3...0.0.0.96329.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.tvP3qqno_1Q')
else:
driver.get('https://www.google.com/search?tbm=lcl&sxsrf=ACYBGNTndl0R6IJRm1LcZ_bQJ14a-C3ocQ%3A1574830560313&ei=4AHeXc7kErH5sAfYr4PQCg&q=software+house+in+johar+town+lahore&oq=software+house+in+johar+town+lahore&gs_l=psy-ab.3...0.0.0.4519.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.S1G_WpFjvhI#rlfi=hd:;si:;mv:[[31.475505499999997,74.30897639999999],[31.4553548,74.2472458]];start:'+ str(url))
if (driver.find_elements_by_css_selector('.dbg0pd div')):
g_name = driver.find_elements_by_css_selector('.dbg0pd div')
else:
g_name = 'NONE'
if (driver.find_elements_by_css_selector('.lqhpac div:nth-child(3) span')):
phone = driver.find_elements_by_css_selector('.lqhpac div:nth-child(3) span')
else:
phone = 'NONE'
if (driver.find_elements_by_css_selector('.L48Cpd .wLAgVc')):
website = driver.find_elements_by_css_selector('.L48Cpd .wLAgVc')
else:
website = 'NONE'
if (driver.find_elements_by_css_selector('.BTtC6e')):
reviews = driver.find_elements_by_css_selector('.BTtC6e')
else:
reviews = 'NONE'
items = len(g_name)
with open('johartown.csv','a',encoding="utf-8") as s:
for i in range(items):
s.write(str(g_name[i].text) + ',' + str(phone[i].text) + ',' + str(website[i].get_attribute('href')) + ',' + str(reviews[i].text) + '\n')
You define range in items = len(g_name) by the length of g_name. The length of g_name is greater than one or some of phone, website, or reviews thus you getting the error.
You must make sure that
the length of all of these objects is the same
add additional checks to only access the object if the required index is available
define items by the length of the shortest of your data objects.
On the other hand, the actual problem you are facing here is that the selectors you are using are unable to deal with missing elements on the website.
I would suggest you rewrite your logic so that you would be parsing content holders (elements that contain all of your necessary fields) rather than the fields themselves and then define additional rules within that logic to handle the missing CSS selectors.
In layman terms, do not look for names, phones, websites, and reviews but
instead look for "users" and then define a parser that would go through all of the "users" and extract the data that you need.

How to write correctly a program which extract all links from a web page?

This is the part of Udacity course WEB SEARCH ENGINE.The goal of this quiz is to write a program which extract all links from the web page.On the output program must return only LINKS.But in my case program returns all links and "NONE" twice.I know that the error in the second part of program after "WHILE" and after "ELSE".But i dont know what i must write there.
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None,0
else:
start_quote = page.find('"', start_link)
endquo = page.find('"',start_quote + 1)
url = page[(start_quote + 1) : endquo]
return url,endquo
page = 'i know what you doing summer <a href="Udasity".i know what you doing summer <a href="Georgia" i know what you doing summer '
def ALLlink(page):
url = 1
while url != None:
url,endquo = get_next_target(page)
if url:
print url
page = page[endquo:]
else:
print ALLlink(page)
First, you can remove your else statement in your ALLlink() function since it's not doing anything.
Also, when comparing to None, you should use is not instead of !=:
while url != None: # bad
while url is not None # good
That said, I think your error is in your last line:
print ALLlink(page)
You basically have two print statements. The first is inside your function and the second is on the last line of your script. Really, you don't need the last print statement there because you're already printing in your ALLlink() function. So if you change the line to just ALLlink(page), I think it'll work.
If you do want to print there, you could modify your function to store the URLs in an array, and then print that array. Something like this:
def ALLlink(page):
urls = []
url = 1
while url is not None:
url, endquo = get_next_target(page)
if url:
urls.append(url)
page = page[endquo:]
return urls
print ALLlink(page)

Paginate the CSV (Python)

How can I paginate through the CSV version of an API call using Python?
I understand the metadata in the JSON call includes the total number of records, but without similar info in the CSV call I won't know where to stop my loop if I try to increment the page parameter.
Below is my code:
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.csv'
payload = {
'api_key': '4KC***UNKk',
'fields': 'school.name,2012.repayment.2_yr_default_rate',
'_page' : '0'
}
r = requests.get(url, params=payload)
df = pd.read_csv(r.url)
This loads a dataframe with the first 20 results, but I'd like to load a dataframe with all the results.
Utilize the &_per_page option parameter to edit the number of choices per call; Setting it to &_per_page=200 returns a CSV with 100 lines, so lets assume 100 is the maximum.
Now that we know the maximum per call, and we have the total calls, its possible to run a for loop to get what we need, like so:
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.csv'
apikey = '&api_key=xxx'
fields = '&_fields=school.name,2012.repayment.2_yr_default_rate'
pageA = '&_page='
pageTotal = '&_per_page='
pageNumbersMaximum = 10
rowSum = 200
for page in range(pageNumbersMaximum):
fullURL = url + pageA + str(page) + pageTotal + str(rowSum) + fields + apikey
print(fullURL)
print("Page Number: " + str(page) + ", Total Rows: " + str(rowSum))
rowSum += 200
That will loop through the results until it gets to 7000 total.

else if condition with 3 checks

I am getting 10 result of google search.
My scenario is:
if any result(link) out of 10 belongs to wikipedia, consider that result
Else consider Google instant result (result which appear on top before links) if exist
Else consider description of all 10 link
Here is my code:
for contentIndex in self.search_response['links']:
domain = self.search_response['links'][contentIndex]['domain']
if "wikipedia.org" in domain:
google_query = ''
google_query = self.search_response['links'][contentIndex]['content']
print "wiki link"
break
elif google_instant:
google_query = ''
google_query = google_instant
print "\n \n Instant result : " + google_instant
break
else:
google_query += self.search_response['links'][contentIndex]['content']
But this condition gets crashed. Like if first link is not wiki link and instant result is present then it will not connsider wiki link, but instant result.
You're breaking out of the loop on the google_instant condition. If this condition is met before you find a wikipedia link, then it will always use the google_instant link. What you actually need to do here is keep iterating through the results, then at the end check if there is a wikipedia or google instant link.
search_results = ''
wikipedia_result = None
google_instant_result = None
for contentIndex in self.search_response['links']:
domain = self.search_response['links'][contentIndex]['domain']
if "wikipedia.org" in domain:
wikipedia_rsult = self.search_response['links'][contentIndex]['content']
print "wiki link"
elif google_instant:
google_instant_result = google_instant
print "\n \n Instant result : " + google_instant
else:
search_results += self.search_response['links'][contentIndex]['content']
google_query = wikipedia_result or google_instant or search_results

Creating new table while iterating through a queryset in django

This is a newbie question, but despite reading https://docs.djangoproject.com/en/dev/ref/models/instances/#saving-objects , I'm not quite sure how to do this. I have an existing table where I would like to iterate through all its records, and save certain info to a second table. I have the following model:
class myEmails(models.Model):
text = models.CharField(max_length=1200)
In my view I have:
def getMyMessages(request):
from django_mailbox.models import Message
from get_new_emails.models import myEmails
import re
qs = Message.objects.all()
count = 0
output = ""
for i in qs:
count += 1
output = output + str(count) + " TEXT: " + i.text + '<br>' + '<br>'
return HttpResponse(output)
How can I modify my view to save "i.text" to the text field of the 'myEmails' table
You can create new objects and save them to the database afterwards using save():
for i in qs:
obj = myEmails(text=i.text)
obj.save()

Categories