Removing unwanted characters/words

Removing unwanted characters/words - python

enter image description hereI'm struggling to remove some characters from the extracted data. I've managed to remove the '£' from the price and that's it.
Outcome:
What I am getting
Tried:
data = json.loads(r.text)
products = data['upcoming']
product_list = []
for product in products:
price = product['price']
date = product['launchDate']
productsforsale = {
'Retail_price': price,
'Launch_date': date,
}
product_list.append(productsforsale)
df = pd.DataFrame(product_list).replace('£',"")
df.to_csv('PATH.csv')
print('saved to file')
Expected outcome:
110.00 2023-01-15 08:00

You can get the amount from the price dictionary by price['amount']. The time can be converted to your desired timeformat with the datetime module:
from datetime import datetime
datetime_date = datetime.strptime(date, "%Y-%m-%dT%H:%M:%S.%fZ")
new_date = datetime_date.strftime("%Y-%m-%d %H:%M")
I can´t test it with your original .json snipped though.

You can format the time as so:
strftime
date = product['launchDate'].strftime("%Y-%m-%d %H:%M")
You're currently not correctly getting the price, you are extracting the whole [price] element, but you only want the amount within the price.
You can format the price as so:
price = product['price']['amount']
The full code
from datetime import datetime
data = json.loads(r.text)
products = data['upcoming']
df = pd.DataFrame()
for product in products:
price = product['price']['amount']
date = datetime.strptime(product['launchDate'], "%Y-%m-%dT%H:%M:%S.%fZ")
date = date.strftime("%Y-%m-%d %H:%M")
df = df.append({"Price": price, "Date": date}, ignore_index=True)
df.to_csv('PATH.csv')
print('saved to file')
This should save a csv with 2 columns, Price and Date, with all the unnecessary info removed

Related

Yfiance print only the price of a stock

I want to print only the price for a stock form yfinance, this is what I get/have now :
ticker = "aapl"
start = datetime.now().strftime('%Y-%m-%d')
end = datetime.now().strftime('%Y-%m-%d')
data = pdr.get_data_yahoo(ticker, start, end)
data['EMA10'] = data['Close'].ewm(span=10, adjust=False).mean()
print(data['EMA10'])
and this is the response :
Date
2022-03-04 163.169998
Name: EMA10, dtype: float64
I only want to print 163....

You obtain a pd.Series. To select the top-most value within that series just do data['EMA10'][0].
The entire code is given below:
from datetime import datetime
import pandas_datareader as pdr
ticker = "AAPL"
start = datetime.now().strftime('%Y-%m-%d')
end = datetime.now().strftime('%Y-%m-%d')
data = pdr.get_data_yahoo(ticker, start, end)
data['EMA10'] = data['Close'].ewm(span=10, adjust=False).mean()
print(data['EMA10'][0])
Output:
163.1699981689453

Pandas how to search one df for a certain date and return that data

I have two data frames and I am trying to search each row by date in the user.csv file and find the corresponding date in the Raven.csv file and then return the Price from the df1 and the date and amount from df2.
This is working but my Price is returning a value like this [[0.11465]], is there a way to remove these brackets or a better way to do this?
import pandas as pd
df1 = pd.read_csv('Raven.csv',)
df2 = pd.read_csv('User.csv')
df1 = df1.reset_index(drop=False)
df1.columns = ['index', 'Date', 'Price']
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'], format="%Y-%m-%d %H:%M:%S").dt.date
df1['Date'] = pd.to_datetime(df1['Date'], format="%Y-%m-%d").dt.date
Looper = 0
Date = []
Price = []
amount = []
total_value = []
for x in df2['Timestamp']:
search = df2['Timestamp'].values[Looper]
Date.append(search)
price =(df1.loc[df1['Date'] == search,['index']] )
value = df1['Price'].values[price]
Price.append(value)
payout = df2['Amount'].values[Looper]
amount.append(payout)
payout_value = value * payout
total_value.append(payout_value)
Looper = Looper + 1
dict = {'Date': Date, 'Price': Price, 'Payout': amount, "Total Value": total_value}
df = pd.DataFrame(dict)
df.to_csv('out.csv')

You can do indexing to get the value:
value = [[0.11465]][0][0]
print(value)
You get:
0.11465
I hope this is what you need.

How to store the result of function to datafram with related column

Return Data from function as dictionary and store it in data frame.
While run it using for loop getting error.
import pyowm
from pyowm.utils import config
from pyowm.utils import timestamps
owm = pyowm.OWM(" your free api key from OpenWeatherMap")
mgr = owm.weather_manager()
data =[]
# Create function to get weather details
def get_weather(city):
observation = mgr.weather_at_place(city)
l = observation.weather
Wind_Speed = l.wind()['speed']
Temp = l.temperature('celsius')['temp']
Max_temp = l.temperature('celsius')['temp_max']
Min_temp = l.temperature('celsius')['temp_min']
#Heat_index = l.heat_index
Humidity = l.humidity
Pressure = l.pressure['press']
weather = {"City": city, "Wind_Speed" : Wind_Speed, "Temp":
Temp,"Max_temp":Max_temp, "Min_temp":Min_temp,
"Humidity":Humidity, "Pressure":Pressure}
return weather
for city in df_location['City']:
get_weather(city)
df = df.append(data, True)
Want to store that weather details in same df with relative city.
Current df_location is like:

Filling missing dates in python beautiful soup and pandas

I have this website from where I scraped data as CSV file. I was able to scrape the date and the price. however the date is in week format and I need to convert it into date format like daily prices for 5 working days. (mon-sat).I used python and pandas and beautiful soup for this. WHAT I GET AND WHAT I WANT FROM THIS SITE
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup
from pandas import DataFrame
import csv
import pandas as pd
from urllib.request import urlopen
try:
html = urlopen("https://www.eia.gov/dnav/ng/hist/rngwhhdD.htm")
except HTTPError as e:
print(e)
except URLError:
print("Server down or incorrect domain")
else:
res = BeautifulSoup(html.read(),"html5lib")
price = res.findAll(class_=["tbody", "td", "B3"])
price_list = []
for tag in price:
price_tag=tag.getText()
price_list.append(price_tag)
print(price_tag)
date = res.findAll(class_=["tbody", "td", "B6"])
date_list = []
for tag in date:
date_tag=tag.getText()
date_list.append(date_tag)
print(date_tag)
d1 = pd.DataFrame({'Date': date_list})
d2 = pd.DataFrame({'Price': price_list})
df = pd.concat([d1,d2], axis=1)
print(df)
df.to_csv("Gas Price.csv", index=False, header=True)

Your actual Code create a list for each row and an list for each cell, this don't fits together.
Following script search the table (it is the only one that has the attribute summary) and loops over each row (tr). Than it gets from the Week column (td class B6) the first part before the " to " and convert it to an datetime.
For each cell (td class B3) it get the price (or empty string), set the date and increments the date.
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup
from pandas import DataFrame
import csv
import pandas as pd
from urllib.request import urlopen
import datetime
try:
html = urlopen("https://www.eia.gov/dnav/ng/hist/rngwhhdD.htm")
except HTTPError as e:
print(e)
except URLError:
print("Server down or incorrect domain")
else:
res = BeautifulSoup(html.read(),"html5lib")
table = None
for t in res.findAll("table"):
table = t if "summary" in t.attrs else table
if table == None: exit()
# stop_date = datetime.datetime(year = 2018, month = 7, day = 12)
# today = datetime.datetime.now()
# abort = False
price_list = []
date_list = []
rows = table.findAll("tr")[1:]
for row in rows:
date = None
cells = row.findAll("td")
if cells[0].get("class") == None: continue # placeholder..
if "B6" in cells[0].get("class"):
d = cells[0].getText().split(" to ")[0].strip().replace(" ", "")
date = datetime.datetime.strptime(d,"%Y%b-%d")
for cell in cells:
if "B3" in cell.get("class"): # and abort == False:
price = cell.getText().strip()
if price == "" or price == "NA": price = ""
else: price = float(price)
price_list.append(price)
date_list.append(date)
date = date + datetime.timedelta(days=1)
#if date > today: abort = True
#if abort == True: break
d1 = pd.DataFrame({'Date': date_list})
d2 = pd.DataFrame({'Price': price_list})
df = pd.concat([d1,d2], axis=1)
print(df)
df.to_csv(r"Gas Price.csv", index=False, header=True)

I wasn't entirely clear what you wanted for Date but I extracted both and called them Start and End Date.
In:
df = pd.DataFrame({'Date': ['1997 Jan- 6 to Jan-10', '1997 Jan-13 to Jan-17'], 'Price': [3.80, 5.00] })
df['Temp_Year'] = df.Date.str.extract(r'((?:19|20)\d\d)')
df['Temp_Date'] = df.Date.str.replace(r'((?:19|20)\d\d)','')
df[['Start Date', 'End Date']] = df.Temp_Date.str.split('to', expand=True)
df['Start Date'] = pd.to_datetime(df['Temp_Year'] + ' ' + df['Start Date'].str.replace(" ",""))
df['End Date'] = pd.to_datetime(df['Temp_Year'] + ' ' + df['End Date'].str.replace(" ",""))
df.drop(['Temp_Year', 'Temp_Date'], axis=1)
Out:
| | Date | Price | Start Date | End Date |
|---|-----------------------|-------|------------|------------|
| 0 | 1997 Jan- 6 to Jan-10 | 3.8 | 1997-01-06 | 1997-01-10 |
| 1 | 1997 Jan-13 to Jan-17 | 5.0 | 1997-01-13 | 1997-01-17 |

Using Python to edit the timestamps in a list? Convert POSIX to readable format using a function

SECOND EDIT:
Finished snippet for adjusting timezones and converting format. See correct answer below for details leading to this solution.
tzvar = int(input("Enter the number of hours you'd like to add to the timestamp:"))
tzvarsecs = (tzvar*3600)
print (tzvarsecs)
def timestamp_to_str(timestamp):
return datetime.fromtimestamp(timestamp).strftime('%H:%M:%S %m/%d/%Y')
timestamps = soup('span', {'class': '_timestamp js-short-timestamp '})
dtinfo = [timestamp["data-time"] for timestamp in timestamps]
times = map(int, dtinfo)
adjtimes = [x+tzvarsecs for x in times]
adjtimesfloat = [float(i) for i in adjtimes]
dtinfofloat = [float(i) for i in dtinfo]
finishedtimes = [x for x in map(timestamp_to_str, adjtimesfloat)]
originaltimes = [x for x in map(timestamp_to_str, dtinfofloat)]
END SECOND EDIT
EDIT:
This code allows me to scrape the POSIX time from the HTML file and then add a number of hours entered by the user to the original value. Negative numbers will also work to subtract hours. The user will be working in whole hours as the changes are specifically to adjust for timezones.
tzvar = int(input("Enter the number of hours you'd like to add to the timestamp:"))
tzvarsecs = (tzvar*3600)
print (tzvarsecs)
timestamps = soup('span', {'class': '_timestamp js-short-timestamp '})
dtinfo = [timestamp["data-time"] for timestamp in timestamps]
times = map(int, dtinfo)
adjtimes = [x+tzvarsecs for x in times]
All that is left is a reverse of a function like the one suggested below. How do I convert each POSIX time in the list to a readable format using a function?
END EDIT
The code below creates a csv file containing data scraped from a saved Twitter HTML file.
Twitter converts all the timestamps to the user's local time in the browser. I would like to have an input option for the user to adjust the timestamps by a certain number of hours so that the data for the tweet reflects the tweeter's local time.
I'm currently scraping an element called 'title' that is a part of each permalink. I could just as easily scrape the POSIX time from each tweet instead.
title="2:29 PM - 28 Sep 2015"
vs
data-time="1443475777" data-time-ms="1443475777000"
How would I edit the following piece so it added a variable entered by the user to each timestamp? I don't need help with requesting input, I just need to know how to apply it to the list of timestamps after the input is passed to python.
timestamps = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
datetime = [timestamp["title"] for timestamp in timestamps]
Other questions related to this code/project.
Fix encoding error with loop in BeautifulSoup4?
Focusing in on specific results while scraping Twitter with Python and Beautiful Soup 4?
Using Python to Scrape Nested Divs and Spans in Twitter?
Full code.
from bs4 import BeautifulSoup
import requests
import sys
import csv
import re
from datetime import datetime
from pytz import timezone
url = input("Enter the name of the file to be scraped:")
with open(url, encoding="utf-8") as infile:
soup = BeautifulSoup(infile, "html.parser")
#url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
#headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
#r = requests.get(url, headers=headers)
#data = r.text.encode('utf-8')
#soup = BeautifulSoup(data, "html.parser")
names = soup('strong', {'class': 'fullname js-action-profile-name show-popup-with-id'})
usernames = [name.contents for name in names]
handles = soup('span', {'class': 'username js-action-profile-name'})
userhandles = [handle.contents[1].contents[0] for handle in handles]
athandles = [('#')+abhandle for abhandle in userhandles]
links = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
urls = [link["href"] for link in links]
fullurls = [permalink for permalink in urls]
timestamps = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
datetime = [timestamp["title"] for timestamp in timestamps]
messagetexts = soup('p', {'class': 'TweetTextSize js-tweet-text tweet-text'})
messages = [messagetext for messagetext in messagetexts]
retweets = soup('button', {'class': 'ProfileTweet-actionButtonUndo js-actionButton js-actionRetweet'})
retweetcounts = [retweet.contents[3].contents[1].contents[1].string for retweet in retweets]
favorites = soup('button', {'class': 'ProfileTweet-actionButtonUndo u-linkClean js-actionButton js-actionFavorite'})
favcounts = [favorite.contents[3].contents[1].contents[1].string for favorite in favorites]
images = soup('div', {'class': 'content'})
imagelinks = [src.contents[5].img if len(src.contents) > 5 else "No image" for src in images]
#print (usernames, "\n", "\n", athandles, "\n", "\n", fullurls, "\n", "\n", datetime, "\n", "\n",retweetcounts, "\n", "\n", favcounts, "\n", "\n", messages, "\n", "\n", imagelinks)
rows = zip(usernames,athandles,fullurls,datetime,retweetcounts,favcounts,messages,imagelinks)
rownew = list(rows)
#print (rownew)
newfile = input("Enter a filename for the table:") + ".csv"
with open(newfile, 'w', encoding='utf-8') as f:
writer = csv.writer(f, delimiter=",")
writer.writerow(['Usernames', 'Handles', 'Urls', 'Timestamp', 'Retweets', 'Favorites', 'Message', 'Image Link'])
for row in rownew:
writer.writerow(row)

Using your code as example, the var datetime store a list of string dates. So let's dissect the process in 3 steps, just for comprehension.
Example
>>> datetime = [timestamp["title"] for timestamp in timestamps]
>>> print(datetime)
['2:13 AM - 29 Sep 2015', '2:29 PM - 28 Sep 2015', '8:04 AM - 28 Sep 2015']
First step: convert it to a Python datetime object.
>>> datetime_obj = datetime.strptime('2:13 AM - 29 Sep 2015', '%H:%M %p - %d %b %Y')
>>> print(datetime_obj)
datetime.datetime(2015, 9, 29, 2, 13)
Second step: convert datetime object to a Python structured time object
>>> to_time = struct_date.timetuple()
>>> print(to_time)
time.struct_time(tm_year=2015, tm_mon=9, tm_mday=29, tm_hour=2, tm_min=13, tm_sec=0, tm_wday=1, tm_yday=272, tm_isdst=-1)
Third step: convert sturctured time object to time using time.mktime
>>> timestamp = time.mktime(to_time)
>>> print(timestamp)
1443503580.0
All together now.
import time
from datetime import datetime
...
def str_to_ts(str_date):
return time.mktime(datetime.strptime(str_date, '%H:%M %p - %d %b %Y').timetuple())
datetimes = [timestamp["title"] for timestamp in timestamps]
times = [i for i in map(str_to_ts, datetimes)]
PS: datetime is a bad choice for variable name. Specially in this context. :-)
Update
To apply a function to each value of list:
def add_time(timestamp, hours=0, minutes=0, seconds=0):
return timestamp + seconds + (minutes * 60) + (hours * 60 * 60)
datetimes = [timestamp["title"] for timestamp in timestamps]
times = [add_time(i, 5, 0, 0) for i in datetimes]
Update 2
To convert a timestamp to string formatted date:
def timestamp_to_str(timestamp):
return datetime.fromtimestamp(timestamp).strftime('%H:%M:%S %m/%d/%Y')
Example:
>>> from time import time
>>> from datetime import datetime
>>> timestamp_to_str(time())
'17:01:47 08/29/2016'

This is what I was thinking but not sure if this is what you're after:
>>> timestamps = ["1:00 PM - 28 Sep 2015", "2:00 PM - 28 Sep 2016", "3:00 PM - 29 Sep 2015"]
>>> datetime = dict(enumerate(timestamps))
>>> datetime
{0: '1:00 PM - 28 Sep 2015',
1: '2:00 PM - 28 Sep 2016',
2: '3:00 PM - 29 Sep 2015'}

It seems you are looking for datetime.timedelta (documentation here). You can convert your inputs into datetime.datetime objects in various ways, for example,
timestamp = datetime.datetime.fromtimestamp(1443475777)
Then you can perform arithmetic on them with timedelta objects. A timedelta just represents a change in time. You can construct one with an hours argument like so:
delta = datetime.timedelta(hours=1)
And then timestamp + delta will give you another datetime one hour in the future. Subtraction will work as well, as will other arbitrary time intervals.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing unwanted characters/words - python

Related

Yfiance print only the price of a stock

Pandas how to search one df for a certain date and return that data

How to store the result of function to datafram with related column

Filling missing dates in python beautiful soup and pandas

Using Python to edit the timestamps in a list? Convert POSIX to readable format using a function

Categories

Resources