I am pulling information from a web site (in this case ip/location etc) using python 3
import urllib.request
data = urllib.request.urlopen('http://www.maxmind.com/app/locate_my_ip')
for search in data:
if b'align="center">' in search:
print(next(data).decode().rstrip())
data.close()
How can I remove blank lines / put information into tuples / save as variables etc. I want to be able to start using the data gathered.
If you're doing html scaping / parsing etc, use a library like BeautifulSoup.
It sure beats manually handling scraping.
As mentioned by #jordanm, the best option is to use the GeoIP Python API for this.
But to answer your question - your code should probably look more like this:
import urllib.request, pprint
data = urllib.request.urlopen('http://www.maxmind.com/app/locate_my_ip')
fields = []
for line in data:
if b'class=output' in line:
fields.append(next(data).decode('iso-8859-1').strip())
data.close()
Note that I have changed the test string, and blank lines have been included. This is to ensure that the fields can be easily identified by index.
To access the field values, you can do:
address = fields[0]
isp = fields[8]
domain = fields[-1]
If you want to remove specific fields:
del fields[3], fields[4], fields[6]
Related
As the title mentions, my issue is that I don't understand quite how to extract the data I need for my table (The columns for the table I need are Date, Time, Courtroom, File Number, Defendant Name, Attorney, Bond, Charge, etc.)
I think regex is what I need but my class did not go over this, so I am confused on how to parse in order to extract and output the correct data into an organized table...
I am supposed to turn my text file from this
https://pastebin.com/ZM8EPu0p
and export it into a more readable format like this- example output is below
Here is what I have so far.
def readFile(court):
csv_rows = []
# read and split txt file into pages & chunks of data by pagragraph
with open(court, "r") as file:
data_chunks = file.read().split("\n\n")
for chunk in data_chunks:
chunk = chunk.strip # .strip removes useless spaces
if str(data_chunks[:4]).isnumeric(): # if first 4 characters are digits
entry = None # initialize an empty dictionary
elif (
str(data_chunks).isspace() and entry
): # if we're on an empty line and the entry dict is not empty
csv_rows.DictWriter(dialect="excel") # turn csv_rows into needed output
entry = {}
else:
# parse here?
print(data_chunks)
return csv_rows
readFile("/Users/mia/Desktop/School/programming/court.txt")
It is quite a lot of work to achieve that, but it is possible. If you split it in a couple of sub-tasks.
First, your input looks like a text file so you could parse it line by line. -- using https://www.w3schools.com/python/ref_file_readlines.asp
Then, I noticed that your data can be split in pages. You would need to prepare a lot of regular expressions, but you can start with one for identifying where each page starts. -- you may want to read this as your expression might get quite complicated: https://www.w3schools.com/python/python_regex.asp
The goal of this step is to collect all lines from a page in some container (might be a list, dict, whatever you find it suitable).
And afterwards, write some code that parses the information page by page. But for simplicity I suggest to start with something easy, like the columns for "no, file number and defendant".
And when you got some data in a reliable manner, you can address the export part, using pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
I am trying to create an Instagram bot using python.
So my problem is that I have created a text file that will contain all the usernames of the people my bot follows and the text appears as follows.
These are the lines of code that I have used to append the file.
followers_list contains the list of all the users.
with open("file.txt", 'a') as file:
file.write(str(followers_list))
This is how the usernames are entered into the file.
["user1"]["user2"]["user3"]
Now I want to make a function that unfollows all the users present in the list. So I am going to need the username from these lists and I have been trying to find information on how to do that but I have not found anything useful. So I need suggestions on how to do that.
First of all I would suggest you to change: file.write(str(followers_list)) to file.write(",".join(str(followers_list))). Once that is done, you can simply read the file via with open("file.txt", 'r') as f: lines=f.read() And then make the for loop that you need: for username in lines.split(",").
This is fast code, maybe needs some debug, if you can edit the question and add some examples we will be able to help you more. Only with an example of the variable follower_list should be enough, feel free to add fake data.
Note: Also instead of commas, using a json format would be nice too.
Not entirely clear what is the problem here. Whether is your write function or your read function you're trying to fix. Assuming the problem is the write function something like this should get the results that you want I guess.
with open("file.txt", 'a') as file:
for follower in followers_list:
# assuming follower is a string therefore doesn't need to be converted
file.write(follower)
Otherwise if you need to pick the username from each list just use indexing when you are reading your follower_lists
e.g.
for follower_list in follower_lists:
follower = follower_list[0]
Right now you don't have a completely valid data structure in python, so modules like json and ast are going to be tricky. If you are regex-inclined, you could try the following:
import re
userstr = '["user1"]["user2"]["user3"]'
# capture everything except " in the group
re.findall('\[\"([^\"]+)\"\]', userstr)
['user1', 'user2', 'user3']
Where this will also work if there is a newline between user entries:
userstr = '''["user1"]["user2"]
["user3"]
'''
re.findall('\[\"([^\"]+)\"\]', userstr)
['user1', 'user2', 'user3']
Otherwise, I'd agree with #MarkMeyer and try to get these users in some sort of json file format or something that is a bit more compatible with built-in python data structures. One suggestion to make life easy would be just to format users.txt like so:
user1
user2
user3
...
Then you can just do:
with open('users.txt') as fh:
# this will create a list of users, and strip()
# removes leading/trailing whitespace
users = [user.strip() for user in fh]
And adding users is as simple as
with open('users.txt', 'a') as fh:
fh.write('userN')
I'm new to Python and having some trouble with an API scraping I'm attempting. What I want to do is pull a list of book titles using this code:
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
for title in doc["sourceResource"]["title"]:
print (title)
Which works to pull the titles, but most (not all) titles are outputting as one character per line. I've tried adding .splitlines() but this doesn't fix the problem. Any advice would be appreciated!
The problem is that you have two types of title in the response, some are plain strings "Germain the wizard" and some others are arrays of string ['Joe Strong, the boy wizard : or, The mysteries of magic exposed /']. It seems like in this particular case, all lists have length one, but I guess that will not always be the case. To illustrate what you might need to do I added a join here instead of just taking title[0].
import requests
import json
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
title = doc["sourceResource"]["title"]
if isinstance(title, list):
print(" ".join(title))
else:
print(title)
In my opinion that should never happen, an API should return predictable types, otherwise it looks messy on the users' side.
I'm trying to get url from object data, but it isn't right. This program has stopped on line 4. Code is under.
My code:
import requests
gifs = str(requests.get("https://api.giphy.com/v1/gifs/random?
api_key=APIKEY"))
dump = json.dumps(gifs)
json.loads(dump['data']['url'])
Your description is not clear enough. You expect to read a json and select a field that brings you something?
I recommend you check this section of requests quickstart guide this i suspect you want to read the data to json and extract from some fields.
Maybe something like this might help:
r = requests.get('http://whatever.com')
url = r.json()['url']
I'm trying to get IP location and other stuff from ipinfodb.com, but I'm stuck.
I want to split all of the values into new strings that I can format how I want later. What I wrote so far is:
resp = urllib2.urlopen('http://api.ipinfodb.com/v3/ip-city/?key=mykey&ip=someip').read()
out = resp.replace(";", " ")
print out
Before I replaced the string into new one the output was:
OK;;someip;somecountry;somecountrycode;somecity;somecity;-;42.1975;23.3342;+05:00
So I made it show only
OK someip somecountry somecountrycode somecity somecity - 42.1975;23.3342 +05:00
But the problem is that this is pretty stupid, because I want to use them not in one string, but in more, because what I do now is print out and it outputs this, I want to change it like print country, print city and it outputs the country,city etc. I tried checking in their site, there's some class for that but it's for different api version so I can't use it (v2, mine is v3). Does anyone have an idea how to do that?
PS. Sorry if the answer is obvious or I'm mistaken, I'm new with Python :s
You need to split the resp text by ;:
out = resp.split(';')
Now out is a list of values instead, use indexes to access various items:
print 'Country: {}'.format(out[3])
Alternatively, add format=json to your query string and receive a JSON response from that API:
import json
resp = urllib2.urlopen('http://api.ipinfodb.com/v3/ip-city/?format=json&key=mykey&ip=someip')
data = json.load(resp)
print data['countryName']