I am constructing a twitter database and would like to record some demographic data based on profile pictures. I decided to do this using Microsoft's Face API, but I am new to Python and coding in general, is there a way to:
Extract and save the age and gender data into the same or separate data frame.
Construct a list of only unique twitter handles, based on my data, so that I don't waste my daily call limit on Azure.
Append the data so that I have the age and gender of each person, stored next to their unique twitter handle.
The twitter data is stored in the format:
'timestamp', 'id', 'text', 'user', 'replies', 'retweets', 'likes'
I am also open to less other suggestions.
P.S.
Is the Python demo for Face broken, because it won't run with the default code?
Your question is wide, you should keep your questions focused on particular problem. Let me answer to part one:
You should use the SDK:
https://pypi.org/project/azure-cognitiveservices-vision-face/
And you have some samples here:
https://github.com/Azure-Samples/cognitive-services-python-sdk-samples/blob/master/samples/vision/face_samples.py
After that, storing and building a DB is a different question that I suggest humbly you ask separately once you have specific questions after your got the data you like.
Related
I am making a chatbot in RASA which helps High school graduates find university according to their desired location. I have all my data stored in a CSV file. So is there any way we can extract some specific data from that CSV.
Example: If a user asks to show universities available in a certain location, how to extract the specific data from CSV which is the name of the university according to the location given by the user.
Seems like you will need to train the model with location entity. Create a story that will link an intent with location entity to a custom action.
Sample story could like something like this:
story 1
* ask_university{"location":New York}
- action_get_universities
In the custom action action_get_universities, you will then need to handle the CSV query based on the location entity that the model detected. Pandas should work just fine.
Have fun exploring !
I'm trying to analyze (for business intelligence purpose) some google analytics data in python.
All I get after many tutorials are "aggregated" data... like the number of views in a day the thing I need instead is something capable of tracking the behavior of a single user.. like what page of the web site he visited, his bounce rate if he used the e-commerce and so on.
I saw many CSV already prepared for such analysis but I'm starting from scratch with my web site.
You can use the User-ID feature, when you send Analytics an ID and related data from multiple sessions, your reports tell a more unified, holistic story about a user’s relationship with your business:
https://support.google.com/analytics/answer/3123662?hl=en
Otherwise, you can examine individual-user behavior at the session level in User Explorer report. The User Explorer report lets you isolate and examine individual rather than aggregate user behavior. Individual user behavior is associated with either Client ID or User ID.
https://support.google.com/analytics/answer/6339208?hl=en
I have recently started developing an application to analyse my all-time exercises in the Polar platform.
I'm using their Accesslink API to get new sessions and I have exported my old sessions through another service they offer.
The exported sessions come with fully detailed information (instant GPS location, speed, heart rate), but the JSON data provided by the API is just a summary. I am looking for a way to get the initial position (GPS location) of my session to, later, find the city's name from another source. I think that the only way to do this is by getting the GPS info of my sessions.
Although the sessions have a has-route field, I cannot find in their documentation a way to request this route. They have provided a working example, but it does not provide a way to get these data.
Does anyway know if this is possible and, if so, could you please give me some directions?
Thanks in advance.
Turns out that the GPS information is provided through GPX files, which are provided by the API mentioned on the question. There is a method implemented to do this on their github (link also on the question) which already performs this task. I have added the call to this method and saved its output in this project.
I am facing a couple of issues in figuring out what-is-what, in spite of the humungous documentation I am unable to figure out these issues
1.Which report type should be used to get the campaign level totals. I am trying to get the data in the format of headers
-campaign_id|campaign_name|CLicks|Impressions|Cost|Conversions.
2.I have tried to use "CAMPAIGN_PERFORMANCE_REPORT" but I get broken up information at a keyword level, but I am trying to pull the data at a campaign level.
3.I also need to push the data to a database. In the API documentation, i get samples which will either print the results on my screen or it will create a file on my machine. is there a way where I can get the data in JSON to push it to the database.
4.I have 7 accounts on my MCC account as of now, the number will increase in the coming days. I don't want to manually hard code the client customer ids into my code as there will be new accounts which will be created. is there a way where I can get the list of client customer ids which are on my MCC ac
I am trying to get this data using python as my code base and adwords api V201710.
To retrieve campaign performance data you need to run a campaign_performance_report. Follow this link to view all available columns for Campaign performance report.
The campaign performance report does not include stats aggregated at a keyword level. Are you using AWQL to pull your report?
Can you paste your code here, I find it odd you are getting keyword level data.
Run this python example code to get campaign data (you should definitely not be getting keyword level data with this example code).
Firstly Google AdWords API only returns report data in the following file formats CSVFOREXCEL, CSV, TSV, XML, GZIPPED_CSV, GZIPPED_XML. Unfortunately JSON is not supported for your use case. I would recommend GZIPPED_CSV and set the following properties to false:
skipReportHeader
skipColumnHeader
skipReportSummary
This will simply skip all headers, report titles & totals from the report making is very simple to upsert data into a table.
It is not possible to enter a MCC ID and expect the API to fetch a report for all client accounts. Each API report request contains the client ID, so therefore you are required to create an array of all client IDs and then iterate through each id. If you are using the client library (recommended) then you can simply set the clientID within the session i.e. session.setClientCustomerId("xxx");
To automate this use the ManagedCustomerService to automatically retrieve all clientIDs then iterate through this therefore you would not need to hard code each ClientID. Google have created a handy python file which returns the account hierarchy including child account ID (click here).
Lastly I based on your question I assume you attempting to run an ETL process. Google have an opensource AdWords extractor which I highly recommend.
I'm trying to store user data for a website in Python I'm making. Which is more efficient:
-Storing all the user data in one huge table
-Storing all the user data in several tables, one per user, in one database.
-Storing each user's data in a XML or JSON file, one file per user. Each file has a unique name based on the user id.
Also, which is safer? I'm biased towards storing user data in JSON files because that is something I already know how to do.
Any advice? I'd post some code I already have, but this is more theoretical than code-based.
I don't think efficiency should be part of your calculus.
I don't like either of your proposed designs.
One table? That's not normalized. I don't know what data you're talking about, but you should know about normalization.
Multiple copies? That's not scalable. Every time you add a user you add a table? Sounds like the perfect way to ensure that your user population will be small.
Is all the data JSON? Document based? Maybe you should consider a NoSQL document based solution like MongoDB.