I am very new in coding, just started this summer. I am trying to scrape reviews from App Store for 9 live-shopping-apps: https://www.apple.com/us/search/live-shopping?src=globalnav
I have created an xlsx-file with information about the apps and downloaded that as csv to the code, hoping that appstorescraper will identify apps through their id but it does not seem to work. Here is the code, originally retrieved from https://python.plainenglish.io/scraping-app-store-reviews-with-python-90e4117ccdfb:
import pandas as pd
# for scraping app info from App Store
from itunes_app_scraper.scraper import AppStoreScraper
# for scraping app reviews from App Store
from app_store_scraper import AppStore
# for pretty printing data structures
from pprint import pprint
# for keeping track of timing
import datetime as dt
from tzlocal import get_localzone
# for building in wait times
import random
import time
## Read in file containing app names and IDs
app_df = pd.read_csv('Data/app_.....ids.csv')
app_df.head()
app_name iOS_app_name iOS_app_id url
4 Flip - Beauty and Shopping flip-beauty-shopping 1470077137 https://apps.apple.com/us/app/flip-beauty-shop...
7 Spin Live spin-live 1519146498 https://apps.apple.com/us/app/spin-live/id1519...
1 Popshop - Live Shopping popshop-live-shopping 1009480270 https://apps.apple.com/us/app/popshop-live-sho...
5 Lalabox - Live Stream Shopping lalabox-live-stream-shopping 1496718575 https://apps.apple.com/us/app/lalabox-live-str...
6 Supergreat Beauty supergreat-beauty 1360338670 https://apps.apple.com/us/app/supergreat-beaut...
8 HERO® hero-live-shopping 1178589357 https://apps.apple.com/us/app/hero-live-shoppi...
2 Whatnot: Buy, Sell, Gov Live whatnot-buy-sell-go-live 1488269261 https://apps.apple.com/us/app/whatnot-buy-sell...
3 NTWRK - Live Video Shopping ntwrk-live-video-shopping 1425910407 https://apps.apple.com/us/app/ntwrk-live-video...
0 LIT Live - Live Shopping lit-live-live-shopping 1507315272 https://apps.apple.com/us/app/lit-live-live-sh...
## Get list of app names and app IDs
app_names = list(app_df['iOS_app_name'])
app_ids = list(app_df['iOS_app_id'])```
## Set up App Store Scraper
scraper = AppStoreScraper()
app_store_list = list(scraper.get_multiple_app_details(app_ids))
## Pretty print the data for the first app
pprint(app_store_list[0])
https://itunes.apple.com/lookup?id=1507315272&country=nl&entity=software
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/opt/anaconda3/lib/python3.8/site-packages/itunes_app_scraper/scraper.py in get_app_details(self, app_id, country, lang, flatten)
179 result = json.loads(result)
--> 180 except json.JSONDecodeError:
181 raise AppStoreException("Could not parse app store response")
IndexError: list index out of range
During handling of the above exception, another exception occurred:
AppStoreException Traceback (most recent call last)
<ipython-input-73-624146f96e92> in <module>
1 ## Set up App Store Scraper
2 scraper = AppStoreScraper()
----> 3 app_store_list = list(scraper.get_multiple_app_details(app_ids))
4
5 app = result["results"][0]
/opt/anaconda3/lib/python3.8/site-packages/itunes_app_scraper/scraper.py in get_multiple_app_details(self, app_ids, country, lang)
205 :param str lang: Dummy argument for compatibility. Unused.
206
--> 207 :return generator: A list (via a generator) of app details
208 """
209 for app_id in app_ids:
/opt/anaconda3/lib/python3.8/site-packages/itunes_app_scraper/scraper.py in get_app_details(self, app_id, country, lang, flatten)
180 except json.JSONDecodeError:
181 raise AppStoreException("Could not parse app store response")
--> 182
183 try:
184 app = result["results"][0]
AppStoreException: No app found with ID 1507315272```
This is where I am stuck. It seems to be a simple problem but my experience is very limited. The url that App Store scraper use is not the same I used to retrieve app-ids from. Could this be the concern? Please help me to solve it. Thank you in advance!
Related
I am brand new to coding bots and coding in general. I copied a simple bot tutorial for beginners. The follwing part is for getting historical data of crypto stocks:
def gethourlydata(symbol):
frame = pd.DataFrame(Client.get_historical_klines(symbol,
'1hr',
'now UTC',
'25 hours ago UTC'))
frame = frame.iloc[:,:5]
frame.columns = ['Time','Open','High','Low','Close']
frame[['Open','High','Low','Close']] = frame[['Open','High','Low','Close']].astype(float)
frame.Time = pd.to_datetime(frame.Time, unit='ms')
return frame
First I had to put in a start_str because it was supposedly missing. I did so,executed the function for 'BTCUSDT', and got this:
AttributeError Traceback (most recent call last)
/tmp/ipykernel_1473/2916929938.py in <module>
----> 1 df = gethourlydata('BTCUSDT')
/tmp/ipykernel_1473/2893431243.py in gethourlydata(symbol)
3 '1hr',
4 'now UTC',
----> 5 '25 hours ago UTC'))
6 frame = frame.iloc[:,:5]
7 frame.columns = ['Time','Open','High','Low','Close']
~/.local/lib/python3.7/site-packages/binance/client.py in get_historical_klines(self, symbol, interval, start_str, end_str, limit, klines_type)
930
931 """
--> 932 return self._historical_klines(symbol, interval, start_str, end_str=end_str, limit=limit, klines_type=klines_type)
933
934 def _historical_klines(self, symbol, interval, start_str, end_str=None, limit=500,
AttributeError: 'str' object has no attribute '_historical_klines'
I have tried many different methods, e.g. defining 'self','klines_type',etc. in detail, and still some error appears. All I'm to do is prove to myself that I can at least run a bot for on my jupyter notebook.
Could someone please help or at least give so tips?
Thank you!
You firstly have to initialize client
try this -
from binance.client import Client
my_client = Client("","") # for this operation you dont need to use keys
my_client.get_historical_klines((symbol,'1hr','now UTC','25 hours ago UTC'))
I am also facing the same error. Can anyone please help?
I have changed the directory to the path where the file is located and was able to read the data. But when i use the read sql query statement in pandas i get the error.Below is the ipynb code and the error i am getting.
Code:
import warnings
warnings.filterwarnings("ignore")
import sqlite3
import pandas as pd
import os
#cwd= os.getcwd()
%cd /content/drive/My Drive/Colab Notebooks/
!pwd`enter code here`
second cell:
from google.colab import drive
drive.mount('/content/drive')
# After executing the cell above, Drive
# files will be present in "/content/drive/My Drive".
!ls "/content/drive/My Drive/Colab Notebooks/Reviews.csv"
Third cell:
con = sqlite3.connect('/content/drive/My Drive/Colab Notebooks/Reviews.csv')
# filtering only positive and negative reviews i.e. not taking into consideration those reviews with Score=3
# SELECT * FROM Reviews WHERE Score != 3 LIMIT 500000, will give top 500000 data points
# you can change the number to any other number based on your computing power
# filtered_data = pd.read_sql_query(""" SELECT * FROM Reviews WHERE Score != 3 LIMIT 500000""", con)
cur = con.cursor()
df_bonus = pd.read_csv('/content/drive/My Drive/Colab Notebooks/Reviews.csv')
print(df_bonus)
filtered_data = pd.read_sql_query(""" SELECT * FROM Reviews WHERE Score != 3 LIMIT 3000""", con)
Error Message:
Id ... Text
0 1 ... I have bought several of the Vitality canned d...
1 2 ... Product arrived labeled as Jumbo Salted Peanut...
2 3 ... This is a confection that has been around a fe...
3 4 ... If you are looking for the secret ingredient i...
4 5 ... Great taffy at a great price. There was a wid...
... ... ... ...
568449 568450 ... Great for sesame chicken..this is a good if no...
568450 568451 ... I'm disappointed with the flavor. The chocolat...
568451 568452 ... These stars are small, so you can give 10-15 o...
568452 568453 ... These are the BEST treats for training and rew...
568453 568454 ... I am very satisfied ,product is as advertised,...
[568454 rows x 10 columns]
---------------------------------------------------------------------------
DatabaseError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
1585 try:
-> 1586 cur.execute(*args, **kwargs)
1587 return cur
DatabaseError: file is not a database
The above exception was the direct cause of the following exception:
DatabaseError Traceback (most recent call last)
3 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
1596
1597 ex = DatabaseError(f"Execution failed on sql '{args[0]}': {exc}")
-> 1598 raise ex from exc
1599
1600 #staticmethod
DatabaseError: Execution failed on sql ' SELECT * FROM Reviews WHERE Score != 3 LIMIT 3000': file is not a database
Reviews.csv is either a csv file ( con = sqlite3.connect('/content/drive/My Drive/Colab Notebooks/Reviews.csv')
or a sqlite database (df_bonus = pd.read_csv('/content/drive/My Drive/Colab Notebooks/Reviews.csv'). It can't be both. Conventional wisdom says the file is a csv. Therefore, it cannot be connected to sqlite3, else you will get error DatabaseError: file is not a database (or similar). The csv data would need to be loaded into a database before any queries can examine it.
Why do i get an attribute error when i run this code in jupyter ? I am trying to figure out how to use Neurokit.
Ive tried to look through the modules one by one, but i seem to find the error.
import neurokit as nk
import pandas as pd
import numpy as np
import sklearn
df = pd.read_csv("https://raw.githubusercontent.com/neuropsychology/NeuroKit.py/master/examples/Bio/bio_100Hz.csv")
# Process the signals
bio = nk.bio_process(ecg=df["ECG"], rsp=df["RSP"], eda=df["EDA"], add=df["Photosensor"], sampling_rate=1000 )
Output Message:
AttributeError Traceback (most recent call last)
<ipython-input-2-ad0abf8de45e> in <module>
11
12 # Process the signals
---> 13 bio = nk.bio_process(ecg=df["ECG"], rsp=df["RSP"], eda=df["EDA"], add=df["Photosensor"], sampling_rate=1000 )
14 # Plot the processed dataframe, normalizing all variables for viewing purpose
15 nk.z_score(bio["df"]).plot()
~\Anaconda3\lib\site-packages\neurokit\bio\bio_meta.py in bio_process(ecg, rsp, eda, emg, add, sampling_rate, age, sex, position, ecg_filter_type, ecg_filter_band, ecg_filter_frequency, ecg_segmenter, ecg_quality_model, ecg_hrv_features, eda_alpha, eda_gamma, scr_method, scr_treshold, emg_names, emg_envelope_freqs, emg_envelope_lfreq, emg_activation_treshold, emg_activation_n_above, emg_activation_n_below)
123 # ECG & RSP
124 if ecg is not None:
--> 125 ecg = ecg_process(ecg=ecg, rsp=rsp, sampling_rate=sampling_rate, filter_type=ecg_filter_type, filter_band=ecg_filter_band, filter_frequency=ecg_filter_frequency, segmenter=ecg_segmenter, quality_model=ecg_quality_model, hrv_features=ecg_hrv_features, age=age, sex=sex, position=position)
126 processed_bio["ECG"] = ecg["ECG"]
127 if rsp is not None:
~\Anaconda3\lib\site-packages\neurokit\bio\bio_ecg.py in ecg_process(ecg, rsp, sampling_rate, filter_type, filter_band, filter_frequency, segmenter, quality_model, hrv_features, age, sex, position)
117 # ===============
118 if quality_model is not None:
--> 119 quality = ecg_signal_quality(cardiac_cycles=processed_ecg["ECG"]["Cardiac_Cycles"], sampling_rate=sampling_rate, rpeaks=processed_ecg["ECG"]["R_Peaks"], quality_model=quality_model)
120 processed_ecg["ECG"].update(quality)
121 processed_ecg["df"] = pd.concat([processed_ecg["df"], quality["ECG_Signal_Quality"]], axis=1)
~\Anaconda3\lib\site-packages\neurokit\bio\bio_ecg.py in ecg_signal_quality(cardiac_cycles, sampling_rate, rpeaks, quality_model)
355
356 if quality_model == "default":
--> 357 model = sklearn.externals.joblib.load(Path.materials() + 'heartbeat_classification.model')
358 else:
359 model = sklearn.externals.joblib.load(quality_model)
AttributeError: module 'sklearn' has no attribute 'externals'
You could downgrade you scikit-learn version if you don't need the most recent fixes using
pip install scikit-learn==0.20.1
There is an issue to fix this problem in future version:
https://github.com/neuropsychology/NeuroKit.py/issues/101
I'm executing the exact same code as you and run into the same problem.
I followed the link indicated by Louis MAYAUD and there they suggest to just add
from sklearn.externals import joblib
That solves everything and you don't need to downgrade scikit-learn version
Happy code! :)
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import pandas_datareader.data as web
style.use('ggplot')
start = dt.datetime(2000,1,1)
end = dt.datetime(2016,12,31)
df = web.DataReader('INPX', 'yahoo', start, end)
ImmediateDeprecationError Traceback (most recent call last)
<ipython-input-11-d0b9e16fb581> in <module>()
----> 1 df = web.DataReader('INPX', 'yahoo', start, end)
/anaconda3/lib/python3.6/site-packages/pandas_datareader/data.py in DataReader(name, data_source, start, end, retry_count, pause, session, access_key)
289 """
290 if data_source == "yahoo":
--> 291 raise ImmediateDeprecationError(DEP_ERROR_MSG.format('Yahoo Daily'))
292 return YahooDailyReader(symbols=name, start=start, end=end,
293 adjust_price=False, chunksize=25,
ImmediateDeprecationError:
Yahoo Daily has been immediately deprecated due to large breaks in the API without the
introduction of a stable replacement. Pull Requests to re-enable these data
connectors are welcome.
See https://github.com/pydata/pandas-datareader/issues
I tried the link but I couldn't find the reason why there is an immediate depreciation error. I also tried changing 'yahoo' to 'google' ie:df = web.DataReader('INPX', 'google', start, end) but there is still an error:
/anaconda3/lib/python3.6/site-packages/pandas_datareader/google/daily.py:40: UnstableAPIWarning:
The Google Finance API has not been stable since late 2017. Requests seem
to fail at random. Failure is especially common when bulk downloading.
warnings.warn(UNSTABLE_WARNING, UnstableAPIWarning)
RemoteDataError Traceback (most recent call last)
<ipython-input-12-5d16a3e9b68a> in <module>()
----> 1 df = web.DataReader('INPX', 'google', start, end)
/anaconda3/lib/python3.6/site-packages/pandas_datareader/data.py in DataReader(name, data_source, start, end, retry_count, pause, session, access_key)
313 chunksize=25,
314 retry_count=retry_count, pause=pause,
--> 315 session=session).read()
316
317 elif data_source == "iex":
/anaconda3/lib/python3.6/site-packages/pandas_datareader/base.py in read(self)
204 if isinstance(self.symbols, (compat.string_types, int)):
205 df = self._read_one_data(self.url,
--> 206 params=self._get_params(self.symbols))
207 # Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])
208 elif isinstance(self.symbols, DataFrame):
/anaconda3/lib/python3.6/site-packages/pandas_datareader/base.py in _read_one_data(self, url, params)
82 """ read one data from specified URL """
83 if self._format == 'string':
---> 84 out = self._read_url_as_StringIO(url, params=params)
85 elif self._format == 'json':
86 out = self._get_response(url, params=params).json()
/anaconda3/lib/python3.6/site-packages/pandas_datareader/base.py in _read_url_as_StringIO(self, url, params)
93 Open url (and retry)
94 """
---> 95 response = self._get_response(url, params=params)
96 text = self._sanitize_response(response)
97 out = StringIO()
/anaconda3/lib/python3.6/site-packages/pandas_datareader/base.py in _get_response(self, url, params, headers)
153 msg += '\nResponse Text:\n{0}'.format(last_response_text)
154
--> 155 raise RemoteDataError(msg)
156
157 def _get_crumb(self, *args):
RemoteDataError: Unable to read URL: https://finance.google.com/finance/historical?q=INPX&startdate=Jan+01%2C+2000&enddate=Dec+31%2C+2016&output=csv
Response Text:
b'Sorry... body { font-family: verdana, arial, sans-serif; background-color: #fff; color: #000; }GoogleSorry...We\'re sorry...... but your computer or network may be sending automated queries. To protect our users, we can\'t process your request right now.See Google Help for more information.Google Home'.
Thankyou so much for helping!
A small change as discussed here worked for me. Just use
import pandas_datareader.data as web
sp500 = web.get_data_yahoo('SPY', start=start, end=end)
The error is self-explanatory; the Yahoo API has changed, so the old Pandas code to read from Yahoo's API no longer works. Have you read this discussion about the API change and its impact on Pandas? Essentially, Pandas can't read the new Yahoo API, and it will take a long time to write new code, so the temporary solution is to raise an ImmediateDeprecationError every time someone tries to use Pandas for the Yahoo API.
it is obvious that the api get_data_yahoo, goes wrong.
Here is my solution:
First, install fix_yahoo_finance:
pip install fix_yahoo_finance --upgrade --no-cache-dir
next, before you use the api, insert the code:
import fix_yahoo_finance as yf
yf.pdr_override()
Best wishes!
Last night I typed up the following
from pandas.io.data import Options
import csv
symList = []
optData = {}
with open('C:/optionstrade/symbols.txt') as symfile:
symreader = csv.reader(symfile, delimiter=',')
for row in symreader:
symList = row
for symbol in symList:
temp = Options(symbol,'yahoo')
try:
optData[symbol] = temp.get_all_data()
except:
pass
It worked alright. I only got data from 200 something of the 400 something symbols I have in the file, but it pulled the options data for those 200 something just fine.
This morning, I go to run the code again (markets have been open for nearly an hour) and I get nothing:
In [6]: len(optData)
Out[6]: 0
So I run a bit of a test:
test = Options('AIG','yahoo')
spam = test.get_all_data()
import pickle
with open('C:/optionstrade/test.txt','w') as testfile:
pickle.dump(test,testfile)
I get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-902aa7c31f7e> in <module>()
1 test = Options('AIG','yahoo')
----> 2 spam = test.get_all_data()
C:\Anaconda\lib\site-packages\pandas\io\data.pyc in get_all_data(self, call, put)
1109
1110 for month in months:
-> 1111 m2 = month.month
1112 y2 = month.year
1113
AttributeError: 'str' object has no attribute 'month'
And this content of the pickled file:
ccopy_reg
_reconstructor
p0
(cpandas.io.data
Options
p1
c__builtin__
object
p2
Ntp3
Rp4
(dp5
S'symbol'
p6
S'AIG'
p7
sb.
Nothing has changed overnight on my end... last thing I did was save and shut down. First thing I did after waking up was run it again.