I was trying to split combination of string, unicode in python. The split has to be made on the ResultSet object retrieved from web-site. Using the code below, I am able to get the details, actually it is user details:
from bs4 import BeautifulSoup
import urllib2
import re
url = "http://www.mouthshut.com/vinay_beriwal"
profile_user = urllib2.urlopen(url)
profile_soup = BeautifulSoup(profile_user.read())
usr_dtls = profile_soup.find("div",id=re.compile("_divAboutMe")).find_all('p')
for dt in usr_dtls:
usr_dtls = " ".join(dt.text.split())
print(usr_dtls)
The output is as below:
i love yellow..
Name: Vinay Beriwal
Age: 39 years
Hometown: New Delhi, India
Country: India
Member since: Feb 11, 2016
What I need is to create distinct 5 variables as Name, Age, Hometown, Country, Member since and store the corresponding value after ':' for same.
Thanks
You can use a dictionary to store name-value pairs.For example -
my_dict = {"Name":"Vinay","Age":21}
In my_dict, Name and Age are the keys of the dictionary, you can access values like this -
print (my_dict["Name"]) #This will print Vinay
Also, it's nice and better to use complete words for variable names.
results = profile_soup.find("div",id=re.compile("_divAboutMe")).find_all('p')
user_data={} #dictionary initialization
for result in results:
result = " ".join(result.text.split())
try:
var,value = result.strip().split(':')
user_data[var.strip()]=value.strip()
except:
pass
#If you print the user_data now
print (user_data)
'''
This is what it'll print
{'Age': ' 39 years', 'Country': ' India', 'Hometown': 'New Delhi, India', 'Name': 'Vinay Beriwal', 'Member since': 'Feb 11, 2016'}
'''
You can use a dictionary to store your data:
my_dict = {}
for dt in usr_dtls:
item = " ".join(dt.text.split())
try:
if ':' in item:
k, v = item.split(':')
my_dict[k.strip()] = v.strip()
except:
pass
Note: You should not use usr_dtls inside your for loop, because that's would override your original usr_dtls
Related
I've extracted the data from API response and created a dictionary function:
def data_from_api(a):
dictionary = dict(
data = a['number']
,created_by = a['opened_by']
,assigned_to = a['assigned']
,closed_by = a['closed']
)
return dictionary
and then to df (around 1k records):
raw_data = []
for k in data['resultsData']:
records = data_from_api(k)
raw_data.append(records)
I would like to create a function allows to extract the nested fields {display_value} in the columns in the dataframe. I need only the names like John Snow, etc. Please see below:
How to create a function extracts the display values for those fields? I've tried to create something like:
df = pd.DataFrame.from_records(raw_data)
def get_nested_fields(nested):
if isinstance(nested, dict):
return nested['display_value']
else:
return ''
df['created_by'] = df['opened_by'].apply(get_nested_fields)
df['assigned_to'] = df['assigned'].apply(get_nested_fields)
df['closed_by'] = df['closed'].apply(get_nested_fields)
but I'm getting an error:
KeyError: 'created_by'
Could you please help me?
You can use .str and get() like below. If the key isn't there, it'll write None.
df = pd.DataFrame({'data':[1234, 5678, 5656], 'created_by':[{'display_value':'John Snow', 'link':'a.com'}, {'display_value':'John Dow'}, {'my_value':'Jane Doe'}]})
df['author'] = df['created_by'].str.get('display_value')
output
data created_by author
0 1234 {'display_value': 'John Snow', 'link': 'a.com'} John Snow
1 5678 {'display_value': 'John Dow'} John Dow
2 5656 {'my_value': 'Jane Doe'} None
This question already has answers here:
Parse key value pairs in a text file
(7 answers)
Closed 2 years ago.
I have an input file I am using for a python script.
Example of a file is here:
Name: Joe
Surname: Doe
Country: DE
Gender:
Anybody would suggest how to parse the file and make sure that all required info is supplied?
I am trying to avoid if/else statements and trying to implement in more efficient way!
Here is what I do but I am sure there is a better way.
for line in file_content:
if re.match(r'Name\d+:\s+(\w+)', line, re.IGNORECASE):
file_validation['name'] = True
elif re.match(r'Surname:\s+(\w+)', line, re.IGNORECASE):
file_validation['surname'] = True
...
Any suggestions?
ZDZ
Something like this:
>>> re.match(r'^(.+)\s*:\s*(.*)$', 'Surname: Doe').groups()
('Surname', 'Doe')
Firstly, you should parse using regex and construct a dict from the file. The regex we'll be using is-
^(\w+):\s+(\w+)$
This will only select combinations of key and values. So it will not match Gender: since it is empty.
Check out the demo
Now we just have to construct a corresponding dictionary
# File contents
content = '''Name: Joe
Surname: Doe
Country: DE
Gender:
'''
data = {k:v for k, v in re.findall(r'(\w+):\s+(\w+)', content, re.M)}
Now if you look at data, it should look like-
>>> data
{'Name': 'Joe', 'Surname': 'Doe', 'Country': 'DE'}
Now all you have to do, is verify all the required fields exist in data.keys()
Initialize the required fields
required_fields = {'Name', 'Surname', 'Country', 'Gender'}
Check if required_fields is a subset of data.keys() - if you want to allow extra keys in input, or, use == if you want only valid keys to exist in data.keys().
>>> set.issubset(required_fields, set(data.keys()))
False
>>> data.keys() == required_fields
False
Let's try the same thing with valid data-
# File contents
content = '''Name: Joe
Surname: Doe
Country: DE
Gender: Male'''
required_fields = {'Name', 'Surname', 'Country', 'Gender'}
data = {k:v for k, v in re.findall(r'(\w+):\s+(\w+)', content, re.M)}
print(data.keys() == required_fields) # True
print(set.issubset(required_fields, set(data.keys()))) # True
Output-
True
True
I would like to suggest using csvreader because of its simplicity:
import csv
fields_to_validate = ["name", "surname", "country", "gender"]
with open('data.csv') as csvfile:
csv_reader = csv.reader(csvfile, delimiter=':')
for row in csv_reader:
field_key = row[0].lower()
field_value = row[1].strip()
print("\n{} {}".format(field_key, field_value))
if field_key in fields_to_validate and field_value:
print("{} validated correctly!".format(field_key))
else:
print("{} NOT validated correctly.".format(field_key))
Output
name Joe
name validated correctly!
surname Doe
surname validated correctly!
country DE
country validated correctly!
gender
gender NOT validated correctly.
I have an input file that I am trying to build a data base from.
Each line looks like this:
Amy Shchumer, Trainwreck, I Feel Pretty, Snatched, Inside Amy Shchumer
Bill Hader,Inside Out, Trainwreck, Tropic Thunder
And so on.
The first string is an actor\actress, and then movies they played in.
The data isn't sorted and they are some trailing whitespaces.
I would like to create a dictionary that would look like this:
{'Trainwreck': {'Amy Shchumer', 'Bill Hader'}}
The key would be the movie, the values should be the actors in it, unified in a set data type.
def create_db():
my_dict = {}
raw_data = open('database.txt','r+')
for line in raw_data:
lst1 = line.split(",") //to split by the commas
len_row = len(lst1)
lst2 = list(lst1)
for j in range(1,len_row):
my_dict[lst2[j]] = set([lst2[0]])
print(my_dict)
It doesn't work... it doesn't solve the issue that when a key already exists then the actor should be unified in a set with the prev actor
Instead I end up with:
'Trainwreck': {'Amy Shchumer'}, 'Inside Out': {'Bill Hader'}
def create_db():
db = {}
with open("database.txt") as data:
for line in data.readlines():
person, *movies = line.split(",")
for m in movies:
m = m.strip()
db[m] = db.get(m, []) + [person]
return db
Output:
{'Trainwreck': ['Amy Shchumer', 'Bill Hader'],
'I Feel Pretty': ['Amy Shchumer'],
'Snatched': ['Amy Shchumer'],
'Inside Amy Shchumer': ['Amy Shchumer'],
'Inside Out': ['Bill Hader'],
'Tropic Thunder': ['Bill Hader']}
This will loop through the data and assign the first value of each line to person and the rest to movies (see here for an example of how * unpacks tuples). Then for all the movies, it uses .get to check if it’s in the database yet, returning the list if it is and an empty list if it isn’t. Then it adds the new actor to the list.
Another way to do this would be to use a defaultdict:
from collections import defaultdict
def create_db():
db = defaultdict(lambda: [])
with open("database.txt") as data:
for line in data.readlines():
person, *movies = line.split(",")
for m in movies:
db[m.strip()].append(person)
return db
which automatically assigns [] if the key does not exist.
I was trying to grab a list of prices. So far my code for such a thing is:
def steamlibrarypull(steamID, key):
#Pulls out a CSV of Steam appids.
steaminfo = {
'key': key,
'steamid': steamID,
'format':'JSON',
'include_appinfo':'1'
}
r = requests.get('http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/', params=steaminfo)
d = json.loads(r.content)
I = d['response']['games']
B = {}
for games in I:
B[games['name'].encode('utf8')] = games['appid']
with open('games.csv', 'w') as f:
for key, value in B.items():
f.write("%s,%s\r\n" % (key, value))
return B
But I'd like to be able to do a request.get that'll take this dictionary and ouput out a list of prices. https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI Seems to require the need of a CSV list but is that really necessary?
this is a non formal steam api meaning steam modifies as they see fit. currently it does not support multiple appids as noted here.
to use it to get the price of a game you would go
http://store.steampowered.com/api/appdetails/?appids=237110&cc=us&filters=price_overview
working from the code you have above you will need to know how to iterate through the dictionary and update the store price once you get it back.
def steamlibrarypull(steamID, key):
#Pulls out a CSV of Steam appids.
steaminfo = {
'key': key,
'steamid': steamID,
'format':'JSON',
'include_appinfo':'1'
}
r = requests.get('http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/', params=steaminfo)
d = json.loads(r.content)
response = d['response']['games']
games = {}
for game in response:
getprice = requests.get('http://store.steampowered.com/api/appdetails/?appids=%d&filters=price_overview&cc=us' % game['appid'])
if getprice.status_code == 200:
rjson = json.loads(getprice.text)
# use the appid to fetch the value and convert to decimal
# appid is numeric, cast to string to lookup the price
try:
price = rjson[str(game['appid'])]['data']['price_overview']['initial'] * .01
except:
price = 0
games[game['name']] = {'price': price, 'appid': game['appid']}
this will return the following dictionary:
{u'Half-Life 2: Episode Two': {'price': 7.99, 'appid': 420}
it would be easier to navigate via appid instead of name but as per your request and original structure this is how it should be done. this then gives you the name, appid and price that you can work with further or write to a file.
note that this does not include a sleep timer, if your list of games is long you should sleep your api calls for 2 seconds before making another one or the api will block you and will not return data which will cause an error in python when you parse the price.
So what I'm trying to do is make a dictionary of people and their information but I want to use their names as the main key and have each part of their information to also have a key. I have not been able to figure out how to go about changing the values of their individual information.
I'm not even sure if I'm going about this the right way here is the code.
name = raw_input("name")
age = raw_input("age")
address = raw_input("address")
ramrod = {}
ramrod[name] = {'age': age}, {'address' : address}
print ramrod
#prints out something like this: {'Todd': ({'age': '67'}, {'address': '55555 FooBar rd'})}
What you are looking for is a simple nested dictionary:
>>> data = {"Bob": {"Age": 20, "Hobby": "Surfing"}}
>>> data["Bob"]["Age"]
20
A dictionary is not a pair - you can store more than one item in a dictionary. So you want one dictionary containing a mapping from name to information, where information is a dictionary containing mappings from the name of the information you want to store about the person to that information.
Note that if you have behaviour associated with the data, or you end up with a lot of large dictionaries, a class might be more suitable:
class Person:
def __init__(self, name, age, hobby):
self.name = name
self.age = age
self.hobby = hobby
>>> data = {"Bob": Person("Bob", 20, "Surfing")}
>>> data["Bob"].age
20
You were close
ramrod[name] = {'age': age, 'address' : address}