How to extract tweets posted only from local people?

How to extract tweets posted only from local people? - python

I am doing a sentiment analysis project about local people's attitudes toward the transportation service in Hong Kong. I used the Twitter API to collect the tweets. However, since my research target is the local people in Hong Kong, tweets posted from, for instance, travelers should be removed. Could anyone give me some hints about how to extract tweets posted from local people given a large volume of Twitter data? My idea now is to construct a dictionary which contains traveling-related words and use these words to filter the tweets. But it may seem not to work
Any hints and insights are welcomed! Thank you!

There are three main ways you can do this.
Language. If the user is Tweeting in Cantonese - or another local language - there is less chance they are a traveller compared to, say, Russian.
User location. If a user has a location present in their profile, you can see if it is within Hong Kong.
User timezone. If the user's timezone is the same as HK's timezone, they may be a local.
All of this is very fuzzy.

Related

Getting all articles related to a query without a bias

I trying to build a corpus of documents related to earthquakes. I want to download all news articles related to that event. My problem is that using google search(stackoverflow.com/questions/…) gives bias with respect what is revelant now. Instead I want all articles irrespective of time or relevance.

The problem is that Google is trying to guess what is the most relevant search result for a user entering your query, and you are interested in all of them.
You would be better served by a newspaper article database than by Google in this case. If you are currently enrolled in a university, ask your library for this kind of resource. If you have access to such a database, you will be able to search for every article containing a given keyword, and some search forms will even let you filter by publisher, by date, by geographical location, etc...
Eureka.cc is an example of such a database.
Some newspapers' websites will give you access to their article archive. New York Times is one of those.
Here is a result searching in their article database for "earthquake".
More info about newspaper article databases

An Email Report Application for Outlook

The sales folk in my start-up send and receive a bunch of mail on a daily basis from vendors. dealers and customers.
But they tend to lose track of these mails quite often...as to whether they have responded/followed-up or not. And they waste a lot of time on figuring this out.
Expecting them to use a Mail Tool like MailChimp is even more painful and a ticketing tool is not a good fit for the jobs.
Hence, I am trying to build an app that can create a report of the total Email IDs interacted with in a particular date range. The only goal is to create a report, in a csv file or to dump the data into Google spreadsheets.
The report for the period entered by the user would look as below:
Email ID - All emails lying in "sent items" AND "inbox" for particular date range
Name - If present
Status
The "Status" would be:
Received not responded by sales person
Sent but not responded by recipient
Responded by Sales Person
...and so on
I am vary of running the script directly on the mail server and am not sure if Outlook Exchange would allow something like this.
I would prefer if it could be an application that runs directly on the sales person's machine.
A few use Macs and the others Windows. I would be focussing on the Macs first.
The mail tool used is Outlook for Mac-2011 and the machines are either Lions or Snow Leopards.
Mail is on the Outlook Exchange
I must confess to be not much of a coder, but i blunder/Google my was through it.
I had some time on my hand with the holiday season coming up, hence thought of taking this project up.
I am moderately comfortable with Python.
But for this project, from what i have read, appears to be the job for AppleScripting.
Before starting my blunderings, i wanted to seek the advice of the SO community on the same:
Is AppleScripting the best bet here? If yes, could you share the best resource to read up the same. I have the copy of "AppleScript The Comprehensive Guide to Scripting and Automation on Mac OS X". But it is almsot 6 years old.
Could it be done somehow just using Python? - I wanted to dump the respective reports onto Google Spreadsheets, hence would be easier to get Python involved here.
Are there any similar applications that are already out there?
Or am i completely off track
Sorry for the ramble. But really Looking forward to some assistance on this

I liked:
AppleScript 1-2-3
and
AppleScript: The Definitive Guide
There also some good tutorials here: MacScripter
That being said, you should consider the cost/benefit of learning AppleScript to accomplish one task at your company. You may be better off simply hiring someone to write the script for you and focus on growing your business instead.

Is there any API for Oneworld Alliance?

Does anyone know if there is an API for Oneworld Alliance ? The idea is to program in Python a Traveling Salesman, who visits all airports in the system based on actual available flight connections.

I don't think Oneworld Alliance themselves, or other airlines or alliances, have their own APIs. Not sure whether asking this is ontopic to SO.
Try the search engines and booking sites: Travelocity, Expedia, Hotwire, cheaptickets...
For example by Google here's Expedia Affiliate Network.
Kayak apparently used to have a beta API but it was pulled due to misuse.
Not sure how easy it is to scrape Oneworld's site or timetables, I wouldn't start there.
Remember the airlines have a negative incentive to allow their data to be scraped, whereas the search engines have a positive incentive (within reasonable limits). So start with the latter.
When you say "based on actual available flight connections", I presume you just check whether airline X has a route connecting city A to city B, not at actual seat inventory on specific dates and times, which seems needless. Do you need durations and frequencies?
Btw, there are 900 hits on SO on "Traveling Salesman", you might be able to reuse someone else's data.

Performing multiple searches of a term in Twitter

I have little working knowledge of python. I know that there is something called a Twitter search API, but I'm not really sure what I'm doing. I know what I need to do:
I need point data for a class. I thought I would just pull up a map of the world in a GIS application, select cities that have x population or larger, then export those selections to a new table. That table would have a key and city name.
next i randomly select 100 of those cities. Then I perform a search of a certain term (in this case, Gaddafi) for each of those 100 cities. All I need to know is how many posts there were on a certain day (or over a few days depending on amount of tweets there were).
I just have a feeling there is something that already exsists that does this, and I'm having a hard time finding it. I've dowloaded and installed python-twitter but have no idea how to get this search done. Anyone know where I can find or how I can make this tool? Any suggestions would really help. Thanks!

A tweet itself comes with a geo tag. But it is a new feature and majority tweets do not have it. So it is not possible to search for all tweets containing "Gaddafi" from a city given the city name.
What you could do is the reverse, you search for "Gaddafi" first (regardless of geo location), using search api. Then, for each tweet, find the location of the poster (either thru the RESTful api or use some sort of web scraping).
so basically you can classify the tweets collected according to the location of the poster.
I think only tweepy have access to both twitter search API as well as RESTful API.

list comprehension multiplying itself, and it isn't checking according to conditionals

I am trying to fix my condition that says if found any forbidden keyword in string or string_2 then skip it, but if not found any keyword from forbidden, but it found any word from skills then save it, but however it is multiplying the results 10 times in the else part.
string = "opportunity: this opportunity would suit a budding hacker who is seeking a first step into a commercial role or a tester with 1-3 years of experience. this is a great opportunity to utilise your experience in penetration testing, vulnerability assessments and delivering outcomes while also expanding your knowledge and skillset. benefits: perform red team engagements excellent training & development budget attendance at local and international conferences responsibilities include: working with a diverse range of customers identify and solve security problems perform penetration testing and vulnerability assessments maintain and improve penetration testing and methodologies delivery of technical reports and documentation ideally you will have: ideally current security clearance or minimum australian citizenship certifications such as oscp, sans, crest highly regarded fluent with linux command line and windows powershell experience performing assessments on client networks ability to clearly communicate vulnerability details and risks for a confidential discussion about this opportunity or to discuss other opportunities within it security & risk please contact specialist infosec recruiter john smith on 0123 456 789 or email johnsmith#example.com. australian citizens only – ideally already with a security clearance. want to know more about me? connect with me on linkedin"
string_2 = "your new company this melbourne based consultancy boasts a unique depth and breadth of capabilities across cyber security, application security, data & analytics, cloud and digital transformations. they continue to deliver rich insight, innovative strategies and solutions that help their clients reach their potential. about the opportunity this is an outstanding opportunity to utilise your experience in penetration testing and vulnerability assessments. you will use your skills to prepare high quality reports detailing security issues, making recommendations and identifying solutions. the types of testing can include vulnerability assessment, penetration testing and application security assessment. what you’ll need to succeed passion, drive and enthusiasm! demonstrated experience performing internal and external penetration testing, web application penetration testing and mobile application penetration testing industry certifications such as sans, oscp, crest crt/cct or osce strong knowledge of common vulnerabilities such as owasp top 10 and sans top 25 scripting experience - javascript, objective c and python a very strong technical background and a passion for security the ability to think outside the box what you'll get in return our client is looking for an individual that is seeking longevity in their next role and in return offers the chance to join an equal opportunity employer that is passionate about diversity. also on offer is ongoing personal and professional development, providing you with the right tools and support to thrive. what you need to do now if you’re interested in this role, click ‘apply now’ or for more information and a confidential discussion on this role or any others within it security contact john smith at johnsmith#example.com"
forbidden = ['clearance','TS/SCI','4+ years','5+ years','6+ years','7+ years','8+ years','9+ years','10+ years','11+ years','12+ years']
skills = ['owasp']
for s_prefix in forbidden:
if s_prefix in string:
print(s_prefix)
else:
print("save it")
skill_match = [s_prefix for s_prefix in forbidden if s_prefix in string]
print(skill_match)
if len(skill_match) > 0 :
pass
I am getting the output of multiples times save it while once it found clearance it should be marked as flagged, and if it doesn't found any red-flagged keyword, and any keyword from skills then save
clearance
save it
save it
save it
save it
save it
save it
save it
save it
save it
save it
['clearance']
[Finished in 0.0s]
sample:
string = "snip active cleared snip..." # skip or remove because contains cleared
string2 = "snip owasp..... php , devops" # save it because contains owasp

If you only want one line to be printed in your for loop, you probably need to change its logic, since currently it always prints something on every iteration.
One approach might be to print and break out of the loop if you spot one of the strings you're searching for, and to attach the else clause to the for loop instead of to the if. An else on a loop gets run only if the loop ended normally, not if it was escaped early by a break:
for s_prefix in forbidden:
if s_prefix in string:
print(s_prefix)
break
else:
print("save it")
If you don't need to print the matching prefix string, you could also play around with any or all.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.