I have a script that uses shareplum to get items from a very large and growing SharePoint (SP) list. Because of the size, I encountered the dreaded 5000 item limit set in SP. To get around that, I tried to page the data based on the 'ID' with a Where clause on the query.
# this is wrapped in a while.
# the idx is updated to the latest max if the results aren't empty.
df = pd.DataFrame(columns=cols)
idx = 0
query = {'Where': [('Gt', 'ID', str(idx))], 'OrderBy': ['ID']}
data = sp_list.GetListItems(view, query=query, row_limit=4750)
df = df.append(pd.DataFrame(data[0:]))
That seemed to work but, after I added the Where, it started returning rows not visible on the SP web list. For example, the minimum ID on the web is, say, 500 while shareplum returns rows starting at 1. It also seems to be pulling in rows that are filtered out on the web. For example, it includes column values not included on the web. If the Where is removed, it brings back the exact list viewed on the web.
What is it that I'm getting wrong here? I'm brand new to shareplum; I looked at the docs but they don't go into much detail and all the examples are rather trivial.
Why does a Where clause cause more data to be returned?
After further investigation, it seems shareplum will ignore any filters applied to the list to create the view when a query is provided to GetListItems. This is easily verified by removing the query param.
As a workaround, I'm now paging 'All Items' with a row_limit and query as below. This at least lets me get all the data and do any further filtering/grouping in python.
df = pd.DataFrame(columns=cols)
idx = 0
more = True
while more:
query = {'Where': [('Gt', 'ID', str(idx))]}
# Page 'All Items' based on 'ID' > idx
data = sp_list.GetListItems('All Items', query=query, row_limit=4500)
data_df = pd.DataFrame(data[0:])
if not data_df.empty:
df = df.append(data_df)
ids = pd.to_numeric(data_df['ID'])
idx = ids.max()
else:
more = False
As to why shareplum behaves this way is still an open question.
I'm trying to loop through a list of ID's and submit each option with a value = id.
After submitting I am grabbing the resulting text I need from the last row of a table.
The basic functionality works, however, when I add more than one 'ID' to the list it only returns the result for last item in the list.
Here is my code:
#Go To Email Logs
driver.get("https://website.com/manager/email_logs.php")
#variables
SaleIds = ['47832', '47842', '49859', '50898']
dropdown = Select(driver.find_element_by_id('emailspecialid'))
options = dropdown.options
for option in options:
value = option.get_attribute('value')
for id in SaleIds:
if id == value:
option.click()
driver.find_element_by_tag_name('input').submit()
result = driver.find_element_by_xpath('/html/body/table[1]/tbody/tr[last()]/td[4]').text
driver.implicitly_wait(100)
print(result)
I have a query that reaches into a MySQL database and grabs row data that match the column "cab" which is a variable that is passed on from a previous html page. That variable is cabwrite.
SQL's response is working just fine, it queries and matches the column 'cab' with all data point in the rows that match id cab.
Once that happens I then remove the data I don't need line identifier and cab.
The output from that is result_set.
However when I print the data to verify its what I expect I'm met with the same data for every row I have.
Example data:
Query has 4 matching rows that is finds
This is currently what I'm getting:
> data =
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
Code:
cursor = connection1.cursor(MySQLdb.cursors.DictCursor)
cursor.execute("SELECT * FROM devices WHERE cab=%s " , [cabwrite])
result_set = cursor.fetchall()
data = []
for row in result_set:
localint = "('%s','%s','%s')" % ( row["localint"], row["devicename"], row["hostname"])
l = str(localint)
data.append(l)
print (data)
This is what I want it too look like:
data = [(g11,none,tech11),(g2,none,tech13),(g3,none,tech15),(g4,none,tech31)]
["('Gi3/0/13','None','TECH2_HELP')", "('Gi3/0/7','None','TECH2_1507')", "('Gi1/0/11','None','TECH2_1189')", "('Gi3/0/35','None','TECH2_4081')", "('Gi3/0/41','None','TECH2_5625')", "('Gi3/0/25','None','TECH2_4598')", "('Gi3/0/43','None','TECH2_1966')", "('Gi3/0/23','None','TECH2_2573')", "('Gi3/0/19','None','TECH2_1800')", "('Gi3/0/39','None','TECH2_1529')"]
Thanks Tripleee did what you recommended and found my issue... legacy FOR clause in my code upstream was causing the issue.
Homework is a python notebook project in Watson. Homework provides below codes for function get_basketball_stats(link="..."). However it return erroneous result: dictionary's value and key are dis-matched, i.e. Key "PPG" is given "GP"'s values.
I tried the same codes in google Colab. The result is correct. Google colab python version is 3.6.7. I suspect that the outdated python version in Watson (3.5.5) causes the erroneous dictionary, and hence I ask the question here: how to upgrade Watson's python version?
def get_basketball_stats(link='https://en.wikipedia.org/wiki/Michael_Jordan'):
# read the webpage
response = requests.get(link)
# create a BeautifulSoup object to parse the HTML
soup = bs4.BeautifulSoup(response.text, 'html.parser')
# the player stats are defined with the attribute CSS class set to 'wikitable sortable';
#therefore we create a tag object "table"
table=soup.find(class_='wikitable sortable')
#the headers of the table are the first table row (tr) we create a tag object that has the first row
headers=table.tr
#the table column names are displayed as an abbreviation; therefore we find all the abbr tags and returs an Iterator
titles=headers.find_all("abbr")
#we create a dictionary and pass the table headers as the keys
data = {title['title']:[] for title in titles}
#we will store each column as a list in a dictionary, the header of the column will be the dictionary key
#we iterate over each table row by fining each table tag tr and assign it to the objed
for row in table.find_all('tr')[1:]:
#we iterate over each cell in the table, as each cell corresponds to a different column we all obtain the correspondin key corresponding the column n
for key,a in zip(data.keys(),row.find_all("td")[2:]):
# we append each elment and strip any extra HTML contnet
data[key].append(''.join(c for c in a.text if (c.isdigit() or c == ".")))
# we remove extra rows by finding the smallest list
Min=min([len(x) for x in data.values()])
#we convert the elements in the key to floats
for key in data.keys():
data[key]=list(map(lambda x: float(x), data[key][:Min]))
return data
I expect the keys to match their corresponding values in Watson like Google Colad does.
I am trying to either merge or concatenate tables that I am generating through a loop in python.
Here's what I have:
for i in [some_list]:
# replacing with the ith term to request that particular value
url = "https://some_url/%s" % str(i)
# accessing a table correspounding to my request
request = pd.read_html(url)[0]
#request1 is a table with the same columns as request
request1 = request1.merge(request,how = 'outer')
request1
Essentially I want to add on to my original request1 table which has the same columns as request table, However, I am getting an error:
" You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat"
You may want to use concat
dflist=[]
for i in some_list:
# replacing with the ith term to request that particular value
url = "https://some_url/%s" % str(i)
# accessing a table correspounding to my request
request = pd.read_html(url)[0]
dflist.append(pd.DataFrame(request))#Adding dataframe constructor here
request1=pd.concat(dflist)