I made a scraper for yellow pages in python. There is a table with working hours of the businesses listed. I scrape that into a list and save it in a csv using scrapy. These different items are seperated by a comma by default like
Mon,Closed,Tue - Fri ,9 00 am - 6:00 pm,Sat ,9 00 am - 1:00 pm,Sun,Closed
I want to use a pipe (|) instead of commas. So the final list be like this: Mon,Closed| Tue - Fri ,9 00 am - 6:00 pm|Sat ,9 00 am - 1:00 pm|Sun,Closed
Any help on how should i implement this would be appreciated.Following is my parse method:
def parse_item(self, response):
item = YellowItem()
item['keyword'] = category_i
item['title'] = response.xpath('//h1/text()').extract_first()
item['phone'] = response.xpath('//p[#class="phone"]/text()').extract_first()
addr = response.xpath('//h2[#class="address"]/text()').extract_first()
item['street_address'] = addr
email = response.xpath('//a[#class="email-business"]/#href').extract_first()
try:
item['email'] = email.replace("mailto:", '')
except AttributeError:
pass
item['website'] = response.xpath('//a[#class="primary-btn website-link"]/#href').extract_first()
item['Description'] = response.xpath('//dd[#class="general-info"]/text()').extract_first()
hours = response.xpath(
'//div[#class="open-details"]/descendant-or-self::*/text()[not(ancestor::*['
'#class="hour-category"])]').extract()
t_f_h = []
for hour in hours:
data = re.findall(r'(\d{1,2}:\d{2})\s(AM|PM|am|pm)', hour)
if data:
time = data[0][0] + " " + data[0][1]
time_t = data[1][0] + " " + data[1][1]
d = time
t = pd.to_datetime(d).strftime('%H:%M')
start = t
d_t = time_t
time_d = pd.to_datetime(d_t).strftime('%H:%M')
end = time_d
fin_t = hour.replace(time, start)
m_f_t = fin_t.replace(time_t, end)
t_f_h.append(m_f_t)
if not data:
t_f_h.append(hour)
item['t_hour_format'] = t_f_h
try:
clean_l = []
for hour in hours:
clean_st = hour.replace(":", " ", 1)
clean_l.append(clean_st)
item['Hours'] = clean_l
except AttributeError:
pass
item['Other_info'] = response.xpath(
'//dd[#class="other-information"]/descendant-or-self::*/text()').extract()
category_ha = response.xpath('//dd[#class="categories"]/descendant-or-self::*/text()').extract()
item['Categories'] = " ".join(category_ha)
item['Years_in_business'] = response.xpath('//div[#class="number"]/text()').extract_first()
year = item['Years_in_business']
if year:
opened = 2020 - int(year) # change the year here
item['year_opened'] = 'Year Opened: ' + str(opened)
neighborhood = response.xpath('//dd[#class="neighborhoods"]/descendant-or-self::*/text()').extract()
item['neighborhoods'] = ' '.join(neighborhood)
item['other_links'] = response.xpath('//dd[#class="weblinks"]/descendant-or-self::*/text()').extract()
item['BBB_Grade'] = response.xpath('//span[#class="bbb-no-link"]/text()').extract_first()
item['link_to_the_listing'] = response.url
adress = str(addr)
data = usaddress.tag(adress)
if "PlaceName" in data[0].keys():
item["City"] = data[0]["PlaceName"]
if "StateName" in data[0].keys():
item["State"] = data[0]["StateName"]
if "ZipCode" in data[0].keys():
item["Zip"] = data[0]["ZipCode"]
return item
You could write your own exporter based on CsvItemExporter.
from scrapy.exporters import CsvItemExporter
class MyExporter(CsvItemExporter):
def __init__(self, *args, **kwargs):
kwargs['delimiter'] = '|'
super(MyExporter, self).__init__(*args, **kwargs)
You can then set your new ItemExporter in your project's settings.py.
FEED_EXPORTERS = {
'csv': 'my_project.file_containing_exporter.MyExporter'
}
Related
class CovidRecord:
def __init__(self, pruid, prname, prnameFR, date, update, numcomf, numprob, numdeaths, numtotal):
self.pruid = pruid
self.prname = prname
self.prnameFR = prnameFR
self.date = date
self.update = update
self.numconf = numcomf
self.numprob = numprob
self.numdeaths = numdeaths
self.numtotal = numtotal
def __str__(self):
return self.pruid + self.prname + self.prnameFR + self.date + self.update + self.numconf + self.numprob + self.numdeaths + self.numtotal`
def write_to_dataframe(self):
dataframe = pd.read_csv(r'C:\Users\jakev\Downloads\covid19-download.csv')
dataframetop = dataframe.head(6)
print('Please insert information to add')
pruid = input('Province ID: ')
prname = input('Province name: ')
prnameFR = input('Province name in French: ')
date = input('Date: ')
update = input('When was it updated: ')
numcomf = input('Confirmed cases: ')
numprob = input('Probable cases: ')
numdeaths = input('Number of deaths ')
numtotal = input('Total number')
newrecord = CovidRecord(pruid, prname, prnameFR, date, update, numcomf, numprob, numdeaths, numtotal)
dfobj = pd.DataFrame(newrecord, columns = ['prvid', 'prname', 'prnameFR', 'date', 'update', 'numcomf', 'numprob'
, 'numdeaths', 'numtotal', 'newrecord'], index=['1'])
newdataframe = dataframetop.append(dfobj, ignore_index = True)
print(newdataframe)
CovidRecord is a record object I've created to load additional records into a dataframe. When I input the responses to the inputs in the terminal, it prints out a giant String with each of my responses under every single column instead of separating them into their individual rightful columns. How can I make it so that when I enter the "prvname" it puts my answer only under province name and so on? Below is an example of the output I get.
pruid prname prnameFR date
0 35.0 Ontario Ontario 2020-01-31
1 59.0 British Columbia Colombie-Britannique 2020-01-31
6 NaN 23OntarioOntario2021-02-182021-02-182334278 23OntarioOntario2021-02-182021-02-182334278 23OntarioOntario2021-02-182021-02-182334278
the bottom record being the one I attempted to insert. It taking all my answers to each input, putting them into 1 String and then putting that one giant string into each column.
What am I missing?
Thanks.
str returning one giant string
You are putting them together yourself with following line:
return self.pruid + self.prname + self.prnameFR + self.date + self.update + self.numconf + self.numprob + self.numdeaths + self.numtotal
I have the following view function used to scrape data:
def results(request):
if request.method == 'POST':
form = RoomForm(request.POST)
if form.is_valid():
form_city = form.cleaned_data['city'].title()
form_country = form.cleaned_data['country'].title()
form_arrival_date = form.cleaned_data['arrival_date']
form_departure_date = form.cleaned_data['departure_date']
form_pages_to_scrape = form.cleaned_data['pages_to_scrape']
#launch scraper
scraper = AIRBNB_scraper(city=form_city, country=form_country, arrival_date=str(form_arrival_date), departure_date=str(form_departure_date))
scraped_dataframe = scraper.scrape_multiple_pages(last_page_selector_number=form_pages_to_scrape)
scraped_dataframe_sorted = scraped_dataframe.sort_values('prices')
print(scraped_dataframe_sorted)
#convert scraped dataframe into lists
prices = scraped_dataframe_sorted['prices'].tolist()
listings_links = scraped_dataframe_sorted['listings_links'].tolist()
listings_names = scraped_dataframe_sorted['listings_names'].tolist()
photo_links = scraped_dataframe_sorted['photo_links'].tolist()
dictionary = zip(prices, listings_links, listings_names, photo_links)
context = {'dictionary': dictionary}
return render(request, 'javascript/results.html', context)
On form submit, a post request is sent to this function using AJAX:
var frm = $('#login-form');
frm.submit(function () {
$.ajax({
type: "POST",
url: "/results",
data: frm.serialize(),
success: function (data) {
$("#table").html(data);
$('#go_back').remove();
},
error: function(data) {
$("#table").html("Something went wrong!");
}
});
return false;
});
After that the scraped data is displayed as HTML table on the same page the form is on.
The problem is the number of scraped items doubles every time the form submit is done. So for example if the number of scraped items on first button click is sixteen, the output will be 16, but on the second run it will be 32, then 64, and so on.
It is like the app remembers previous form submits, but I don't see any reason why. I tried clearin - at the end of this function - the pandas dataframe used to store the scraped data and also the dictionary passed as context, but to no avail.
The form is:
class RoomForm(forms.Form):
city = forms.CharField(max_length=100)
country = forms.CharField(max_length=100)
arrival_date = forms.DateField(widget=forms.DateInput(attrs=
{
'class':'datepicker'
}), required=False)
departure_date = forms.DateField(widget=forms.DateInput(attrs=
{
'class':'datepicker'
}), required=False)
pages_to_scrape = forms.IntegerField(label='Pages to scrape (max. 17)', min_value=0, max_value=17, widget=forms.NumberInput(attrs={'style':'width: 188px'}))
AIRBNB_scraper is:
import requests, bs4
import re
import pandas as pd
price_pattern = re.compile(r'\d*\s*?,?\s*?\d*\szł')
photo_link_pattern = re.compile(r'https.*\)')
prices = []
listings_links = []
photo_links = []
listings_names = []
class AIRBNB_scraper():
def __init__(self, city, country, accomodation_type='homes', arrival_date='2018-03-25', departure_date='2018-04-10'):
self.city = city
self.country = country
self.arrival_date = arrival_date
self.departure_date = departure_date
self.accomodation_type = accomodation_type
def make_soup(self, page_number):
url = 'https://www.airbnb.pl/s/'+ self.city +'--'+ self.country +'/'+ self.accomodation_type +'?query='+ self.city +'%2C%20'+ self.country +'&refinement_paths%5B%5D=%2F'+ self.accomodation_type +'&checkin=' + self.arrival_date + '&checkout=' + self.departure_date + '§ion_offset=' + str(page_number)
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text, "html.parser")
return soup
def get_listings(self, page_number):
soup = self.make_soup(page_number)
listings = soup.select('._f21qs6')
number_of_listings = len(listings)
print('\n' + "Number of listings found: " + str(number_of_listings))
while number_of_listings != 18:
print('\n' + str(number_of_listings) + ' is not correct number of listings, it should be 18. Trying again now.')
soup = self.make_soup(page_number)
listings = soup.find_all('div', class_='_f21qs6')
number_of_listings = len(listings)
print('\n' + "All fine! The number of listings is: " + str(number_of_listings) + '. Starting scraping now')
return listings
def scrape_listings_per_page(self, page_number):
listings_to_scrape = self.get_listings(page_number)
for listing in listings_to_scrape:
#get price
price_container = listing.find_all('span', class_='_hylizj6')
price_search = re.search(price_pattern, str(price_container))
price = price_search.group()
#get listing_link
listing_link = 'https://www.airbnb.pl' + listing.find('a', class_='_15ns6vh')['href']
#get photo_link
photo_link_node = listing.find('div', class_="_1df8dftk")['style']
photo_link_search = re.search(photo_link_pattern, str(photo_link_node))
#~ if photo_link_search:
#~ print('Is regex match')
#~ else:
#~ print('No regex match')
photo_link_before_strip = photo_link_search.group()
photo_link = photo_link_before_strip[:-1] #remove ") at the end of link
#get listing_name
listing_name = listing.find('div', class_='_1rths372').text
#append lists
prices.append(price)
listings_links.append(listing_link)
photo_links.append(photo_link)
listings_names.append(listing_name)
def scrape_multiple_pages(self, last_page_selector_number):
last_page_selector_number += 1
for x in range(0, last_page_selector_number):#18
self.scrape_listings_per_page(x)
print('\n' + "INDEX OF PAGE BEING SCRAPED: " + str(x))
scraped_data = pd.DataFrame({'prices': prices,
'listings_links': listings_links,
'photo_links': photo_links,
'listings_names': listings_names})
return scraped_data
You have module-level variables: prices, listings_links, etc. You append to these inside your AIRBNB_scraper instance but they are not part of that instance, and will persist between calls. You should make them instance attributes - define them as self.prices etc in the __init__ method.
I have a Django application where I am trying to make a call to Fedex's API to send out a shipping label for people wanting to send in a product for cash. When I try to make the call though it says there is a data validation issue with the Expiration field in the XML I am filling out. I swear this has worked in the past with me formatting the date as "YYYY-MM-DD", but now it is not. I read that with Fedex, you need to format the date as ISO, but that is also not passing the data validation. I am using a python package created to help with tapping Fedex's API.
Django view function for sending API Call
def Fedex(request, quote):
label_link = ''
expiration_date = datetime.datetime.now() + datetime.timedelta(days=10)
# formatted_date = "%s-%s-%s" % (expiration_date.year, expiration_date.month, expiration_date.day)
formatted_date = expiration_date.replace(microsecond=0).isoformat()
if quote.device_type != 'laptop':
box_length = 9
box_width = 12
box_height = 3
else:
box_length = 12
box_width = 14
box_height = 3
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
## Page 411 of FedEx Dev Guide - 20.14 Email Labels
CONFIG_OBJ = FedexConfig(key=settings.FEDEX_KEY, password=settings.FEDEX_PASSWORD, account_number=settings.FEDEX_ACCOUNT,
meter_number=settings.FEDEX_METER, use_test_server=settings.USE_FEDEX_TEST)
fxreq = FedexCreatePendingShipRequestEmail(CONFIG_OBJ, customer_transaction_id='xxxxxx id:01')
fxreq.RequestedShipment.ServiceType = 'FEDEX_GROUND'
fxreq.RequestedShipment.PackagingType = 'YOUR_PACKAGING'
fxreq.RequestedShipment.DropoffType = 'REGULAR_PICKUP'
fxreq.RequestedShipment.ShipTimestamp = datetime.datetime.now()
# Special fields for the email label
fxreq.RequestedShipment.SpecialServicesRequested.SpecialServiceTypes = ('RETURN_SHIPMENT', 'PENDING_SHIPMENT')
fxreq.RequestedShipment.SpecialServicesRequested.PendingShipmentDetail.Type = 'EMAIL'
fxreq.RequestedShipment.SpecialServicesRequested.PendingShipmentDetail.ExpirationDate = formatted_date
email_address = fxreq.create_wsdl_object_of_type('EMailRecipient')
email_address.EmailAddress = quote.email
email_address.Role = 'SHIPMENT_COMPLETOR'
# RETURN SHIPMENT DETAIL
fxreq.RequestedShipment.SpecialServicesRequested.ReturnShipmentDetail.ReturnType = ('PENDING')
fxreq.RequestedShipment.SpecialServicesRequested.ReturnShipmentDetail.ReturnEMailDetail = fxreq.create_wsdl_object_of_type(
'ReturnEMailDetail')
fxreq.RequestedShipment.SpecialServicesRequested.ReturnShipmentDetail.ReturnEMailDetail.MerchantPhoneNumber = 'x-xxx-xxx-xxxx'
fxreq.RequestedShipment.SpecialServicesRequested.PendingShipmentDetail.EmailLabelDetail.Recipients = [email_address]
fxreq.RequestedShipment.SpecialServicesRequested.PendingShipmentDetail.EmailLabelDetail.Message = "Xxxxxx Xxxxxx"
fxreq.RequestedShipment.LabelSpecification = {'LabelFormatType': 'COMMON2D', 'ImageType': 'PDF'}
fxreq.RequestedShipment.Shipper.Contact.PersonName = quote.first_name + ' ' + quote.last_name
fxreq.RequestedShipment.Shipper.Contact.CompanyName = ""
fxreq.RequestedShipment.Shipper.Contact.PhoneNumber = quote.phone
fxreq.RequestedShipment.Shipper.Address.StreetLines.append(quote.address)
fxreq.RequestedShipment.Shipper.Address.City = quote.city
fxreq.RequestedShipment.Shipper.Address.StateOrProvinceCode = quote.state
fxreq.RequestedShipment.Shipper.Address.PostalCode = quote.zip
fxreq.RequestedShipment.Shipper.Address.CountryCode = settings.FEDEX_COUNTRY_CODE
fxreq.RequestedShipment.Recipient.Contact.PhoneNumber = settings.FEDEX_PHONE_NUMBER
fxreq.RequestedShipment.Recipient.Address.StreetLines = settings.FEDEX_STREET_LINES
fxreq.RequestedShipment.Recipient.Address.City = settings.FEDEX_CITY
fxreq.RequestedShipment.Recipient.Address.StateOrProvinceCode = settings.FEDEX_STATE_OR_PROVINCE_CODE
fxreq.RequestedShipment.Recipient.Address.PostalCode = settings.FEDEX_POSTAL_CODE
fxreq.RequestedShipment.Recipient.Address.CountryCode = settings.FEDEX_COUNTRY_CODE
fxreq.RequestedShipment.Recipient.AccountNumber = settings.FEDEX_ACCOUNT
fxreq.RequestedShipment.Recipient.Contact.PersonName = ''
fxreq.RequestedShipment.Recipient.Contact.CompanyName = 'Xxxxxx Xxxxxx'
fxreq.RequestedShipment.Recipient.Contact.EMailAddress = 'xxxxxx#xxxxxxxxx'
# Details of Person Who is Paying for the Shipping
fxreq.RequestedShipment.ShippingChargesPayment.PaymentType = 'SENDER'
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.AccountNumber = settings.FEDEX_ACCOUNT
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.Contact.PersonName = 'Xxxxx Xxxxx'
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.Contact.CompanyName = 'Xxxxx Xxxxxx'
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.Contact.PhoneNumber = 'x-xxx-xxx-xxxx'
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.Contact.EMailAddress = 'xxxxxxx#xxxxxxxxx'
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.Address.StreetLines = 'Xxxxx N. xXxxxxx'
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.Address.City = 'Xxxxxxx'
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.Address.StateOrProvinceCode = 'XX'
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.Address.PostalCode = 'xxxxx'
fxreq.RequestedShipment.ShippingChargesPayment.Payor.ResponsibleParty.Address.CountryCode = 'US'
# Package Info
package1 = fxreq.create_wsdl_object_of_type('RequestedPackageLineItem')
package1.SequenceNumber = '1'
package1.Weight.Value = 1
package1.Weight.Units = "LB"
package1.Dimensions.Length = box_length
package1.Dimensions.Width = box_width
package1.Dimensions.Height = box_height
package1.Dimensions.Units = "IN"
package1.ItemDescription = 'Phone'
fxreq.RequestedShipment.RequestedPackageLineItems.append(package1)
fxreq.RequestedShipment.PackageCount = '1'
try:
fxreq.send_request()
label_link = str(fxreq.response.CompletedShipmentDetail.AccessDetail.AccessorDetails[0].EmailLabelUrl)
except Exception as exc:
print('Fedex Error')
print('===========')
print(exc)
print('==========')
return label_link
Error Log
Error:cvc-datatype-valid.1.2.1: \\'2017-11-3\\' is not a valid value for \\'date\\'.\\ncvc-type.3.1.3: The value \\'2017-11-3\\' of element \\'ns0:ExpirationDate\\' is not valid."\\n }\\n }' (Error code: -1)
Im trying to use the eventful api to get information about only music events (concerts) between two dates. For example I want to get the below information about each concert from 20171012 to 20171013:
- city
- performer
- country
- latitude
- longitude
- genre
- title
- image
- StarTime
Im using a python example available online and change it to get the data above. But for now its not working Im just able to get this information:
{'latitude': '40.4',
'longitude': '-3.68333',
'start_time': '2017-10-12 20:00:00',
'city_name': 'Madrid', 'title': 'Kim Waters & Maysa Smooth en Hot Jazz Festival'}
But the performer, genre country and image url its not working. Do you know how to get that information? When I change the python example below to get this information it returns always a empty array.
python example working: (However, without getting the performer, genre, country and image url, if I add theese elements to the event_features I get an empty array)
import requests
import datetime
def get_event(user_key, event_location , start_date, end_date, event_features, fname):
data_lst = [] # output
start_year = int(start_date[0:4])
start_month = int(start_date[4:6])
start_day = int(start_date[6:])
end_year = int(end_date[0:4])
end_month = int(end_date[4:6])
end_day = int(end_date[6:])
start_date = datetime.date(start_year, start_month, start_day)
end_date = datetime.date(end_year, end_month, end_day)
step = datetime.timedelta(days=1)
while start_date <= end_date:
date = str(start_date.year)
if start_date.month < 10:
date += '0' + str(start_date.month)
else:
date += str(start_date.month)
if start_date.day < 10:
date += '0' + str(start_date.day)
else:
date += str(start_date.day)
date += "00"
date += "-" + date
url = "http://api.eventful.com/json/events/search?"
url += "&app_key=" + user_key
url += "&location=" + event_location
url += "&date=" + date
url += "&page_size=250"
url += "&sort_order=popularity"
url += "&sort_direction=descending"
url += "&q=music"
url+= "&c=music"
data = requests.get(url).json()
try:
for i in range(len(data["events"]["event"])):
data_dict = {}
for feature in event_features:
data_dict[feature] = data["events"]["event"][i][feature]
data_lst.append(data_dict)
except:
pass
print(data_lst)
start_date += step
def main():
user_key = ""
event_location = "Madrid"
start_date = "20171012"
end_date = "20171013"
event_location = event_location.replace("-", " ")
start_date = start_date
end_date = end_date
event_features = ["latitude", "longitude", "start_time"]
event_features += ["city_name", "title"]
event_fname = "events.csv"
get_event(user_key, event_location, start_date, end_date, event_features, event_fname)
if __name__ == '__main__':
main()
You should debug your problem and not to ignore all exceptions.
Replace lines try: ... except: pass by:
data = requests.get(url).json()
if "event" in data.get("event", {}):
for row in data["events"]["event"]:
# print(row) # you can look here what are the available data, while debugging
data_dict = {feature: row[feature] for feature in features}
data_lst.append(data_dict)
else:
pass # a problem - you can do something here
You will see a KeyError with a name of the missing feature that is not present in "row". You should fix missing features and read documentation about API of that service. Country feature is probably "country_name" similarly to "city_name". Maybe you should set the "include" parameter to specify more sections of details in search than defaults only.
An universal try: ... except: pass should never used, because "Errors should never pass silently." (The Zen of Python)
Read Handling Exceptions:
... The last except clause may omit the exception name(s), to serve as a wildcard. Use this with extreme caution, since it is easy to mask a real programming error in this way! ...
A more important command where unexpected exceptions are possible is requests.get(url).json(), e.g. TimeoutException. Anyway you should not continue the "while" loop if there is a problem.
If you look at the data returned by eventful.com, a few things are clear:
For country, the field to be used is country_name. This was missing from your "event_features" list
There can be multiple performers for each event. To get all the performers, you need to add "performers" to your "event_features" list
There is no field named Genre and hence you cannot find Genre
The "image" field is always None. This means there is no image available.
Here is modified code. Hopefully it works much better and it will help you move forward.
import datetime
import requests
data_lst = [] # output
event_features = ["latitude", "longitude", "start_time", "city_name",
"country_name", "title", "image", "performers"]
def get_event(user_key, event_location, start_date, end_date):
start_year = int(start_date[0:4])
start_month = int(start_date[4:6])
start_day = int(start_date[6:])
end_year = int(end_date[0:4])
end_month = int(end_date[4:6])
end_day = int(end_date[6:])
start_date = datetime.date(start_year, start_month, start_day)
end_date = datetime.date(end_year, end_month, end_day)
step = datetime.timedelta(days=1)
while start_date <= end_date:
date = str(start_date.year)
if start_date.month < 10:
date += '0' + str(start_date.month)
else:
date += str(start_date.month)
if start_date.day < 10:
date += '0' + str(start_date.day)
else:
date += str(start_date.day)
date += "00"
date += "-" + date
url = "http://api.eventful.com/json/events/search?"
url += "&app_key=" + user_key
url += "&location=" + event_location
url += "&date=" + date
url += "&page_size=250"
url += "&sort_order=popularity"
url += "&sort_direction=descending"
url += "&q=music"
url += "&c=music"
data = requests.get(url).json()
print "==== Data Returned by eventful.com ====\n", data
try:
for i in range(len(data["events"]["event"])):
data_dict = {}
for feature in event_features:
data_dict[feature] = data["events"]["event"][i][feature]
data_lst.append(data_dict)
except IndexError:
pass
print "===================================="
print data_lst
start_date += step
def main():
user_key = "Enter Your Key Here"
event_location = "Madrid"
start_date = "20171012"
end_date = "20171013"
event_location = event_location.replace("-", " ")
start_date = start_date
end_date = end_date
#event_fname = "events.csv"
get_event(user_key, event_location, start_date, end_date)
if __name__ == '__main__':
main()
I was able to successfully pull data from the Eventful API for the performer, image, and country fields. However, I don't think the Eventful Search API supports genre - I don't see it in their documentation.
To get country, I added "country_name", "country_abbr" to your event_features array. That adds these values to the resulting JSON:
'country_abbr': u'ESP',
'country_name': u'Spain'
Performer also can be retrieved by adding "performers" to event_features. That will add this to the JSON output:
'performers': {
u'performer': {
u'name': u'Kim Waters',
u'creator': u'evdb',
u'url': u'http://concerts.eventful.com/Kim-Waters?utm_source=apis&utm_medium=apim&utm_campaign=apic',
u'linker': u'evdb',
u'short_bio': u'Easy Listening / Electronic / Jazz', u'id': u'P0-001-000333271-4'
}
}
To retrieve images, add image to the event_features array. Note that not all events have images, however. You will either see 'image': None or
'image': {
u'medium': {
u'url': u'http://d1marr3m5x4iac.cloudfront.net/store/skin/no_image/categories/128x128/other.jpg',
u'width': u'128',
u'height': u'128'
},
u'thumb': {
u'url': u'http://d1marr3m5x4iac.cloudfront.net/store/skin/no_image/categories/48x48/other.jpg',
u'width': u'48',
u'height': u'48'
}
}
Good luck! :)
After solving a naive datetime problem I am facing a new problem on a view to generate graphs. Now I get mktime argument out of range.
I have no idea how to solve it. I didn't write the code, I am using it from a colleague of mine and I can't seem o understand why it fails. I think it has to do with a function that runs overtime and the error pops out.
#login_required(login_url='/accounts/login/')
def loggedin(request):
data = []
data2 = []
data3 = []
dicdata2 = {}
dicdata3 = {}
datainterior = []
today = timezone.localtime(timezone.now()+timedelta(hours=1)).date()
tomorrow = today + timedelta(1)
semana= today - timedelta(7)
today = today - timedelta(1)
semana_start = datetime.combine(today, time())
semana_start = timezone.make_aware(semana_start, timezone.utc)
today_start = datetime.combine(today, time())
today_start = timezone.make_aware(today_start, timezone.utc)
today_end = datetime.combine(tomorrow, time())
today_end = timezone.make_aware(today_end, timezone.utc)
for modulo in Repository.objects.values("des_especialidade").distinct():
dic = {}
mod = str(modulo['des_especialidade'])
dic["label"] = str(mod)
dic["value"] = Repository.objects.filter(des_especialidade__iexact=mod).count()
data.append(dic)
for modulo in Repository.objects.values("modulo").distinct():
dic = {}
mod = str(modulo['modulo'])
dic["label"] = str(mod)
dic["value"] = Repository.objects.filter(modulo__iexact=mod, dt_diag__gte=semana_start).count()
datainterior.append(dic)
# print mod, Repository.objects.filter(modulo__iexact=mod).count()
# data[mod] = Repository.objects.filter(modulo__iexact=mod).count()
dicdata2['values'] = datainterior
dicdata2['key'] = "Cumulative Return"
dicdata3['values'] = data
dicdata3['color'] = "#d67777"
dicdata3['key'] = "Diagnosticos Identificados"
data3.append(dicdata3)
data2.append(dicdata2)
#-------sunburst
databurst = []
dictburst = {}
dictburst['name'] = "CHP"
childrenmodulo = []
for modulo in Repository.objects.values("modulo").distinct():
childrenmodulodic = {}
mod = str(modulo['modulo'])
childrenmodulodic['name'] = mod
childrenesp = []
for especialidade in Repository.objects.filter(modulo__iexact=mod).values("des_especialidade").distinct():
childrenespdic = {}
esp = str(especialidade['des_especialidade'])
childrenespdic['name'] = esp
childrencode = []
for code in Repository.objects.filter(modulo__iexact=mod,des_especialidade__iexact=esp).values("cod_diagnosis").distinct():
childrencodedic = {}
codee= str(code['cod_diagnosis'])
childrencodedic['name'] = 'ICD9 - '+codee
childrencodedic['size'] = Repository.objects.filter(modulo__iexact=mod,des_especialidade__iexact=esp,cod_diagnosis__iexact=codee).count()
childrencode.append(childrencodedic)
childrenespdic['children'] = childrencode
#childrenespdic['size'] = Repository.objects.filter(des_especialidade__iexact=esp).count()
childrenesp.append(childrenespdic)
childrenmodulodic['children'] = childrenesp
childrenmodulo.append(childrenmodulodic)
dictburst['children'] = childrenmodulo
databurst.append(dictburst)
# print databurst
# --------stacked area chart
datastack = []
for modulo in Repository.objects.values("modulo").distinct():
datastackdic = {}
mod = str(modulo['modulo'])
datastackdic['key'] = mod
monthsarray = []
year = timezone.localtime(timezone.now()+timedelta(hours=1)).year
month = timezone.localtime(timezone.now()+timedelta(hours=1)).month
last = timezone.localtime(timezone.now()+timedelta(hours=1)) - relativedelta(years=1)
lastyear = int(last.year)
lastmonth = int(last.month)
#i = 1
while lastmonth <= int(month) or lastyear<int(year):
date = str(lastmonth) + '/' + str(lastyear)
if (lastmonth < 12):
datef = str(lastmonth + 1) + '/' + str(lastyear)
else:
lastmonth = 01
lastyear = int(lastyear)+1
datef = str(lastmonth)+'/'+ str(lastyear)
lastmonth = 0
datainicial = datetime.strptime(date, '%m/%Y')
datainicial = timezone.make_aware(datainicial, timezone.utc)
datafinal = datetime.strptime(datef, '%m/%Y')
datafinal = timezone.make_aware(datafinal, timezone.utc)
#print "lastmonth",lastmonth,"lastyear", lastyear
#print "datainicial:",datainicial,"datafinal: ",datafinal
filtro = Repository.objects.filter(modulo__iexact=mod)
count = filtro.filter(dt_diag__gte=datainicial, dt_diag__lt=datafinal).count()
conv = datetime.strptime(date, '%m/%Y')
ms = datetime_to_ms_str(conv)
monthsarray.append([ms, count])
#i += 1
lastmonth += 1
datastackdic['values'] = monthsarray
datastack.append(datastackdic)
#print datastack
if request.user.last_login is not None:
#print(request.user.last_login)
contador_novas = Repository.objects.filter(dt_diag__lte=today_end, dt_diag__gte=today_start).count()
return render_to_response('loggedin.html',
{'user': request.user.username, 'contador': contador_novas, 'data': data, 'data2': data2,
'data3': data3,
'databurst': databurst, 'datastack':datastack})
def datetime_to_ms_str(dt):
return str(1000 * mktime(dt.timetuple()))
I think the problem is with this condition.
while lastmonth <= int(month) or lastyear<int(year):
During December, month=12, so lastmonth <= int(month) will always be True. So the loop whill always return True, even once lastyear is more that the current year.
You want to loop if the loop is in the previous year, or if the loop is in the current year and the month is not in the future. Therefore, I think you want to change it to the following:
while lastyear < year or (lastyear == year and lastmonth <= month):
To be sure that the code is working and to understand it, you need to add lots of print statements to the loops, see how lastmonth and lastyear change, and check that the loop exits when you expect it to. You also need to test it for other values of year and month so that it doesn't break next month. Ideally you want to extract this bit of the code into a separate function. It would be easier to understand the loop if it only returned a list of (month, year) integers, instead of doing lots of date formatting at the same time. Then it would be easier to add unit tests.