How can I split DataFrame (pandas) on pages with django paginator?

How can I split DataFrame (pandas) on pages with django paginator? - python

It is easy for Series. I just pass it to paginator. But, when I use DataFrame, it is call "The truth value of a Series is ambiguous". Maybe, there are problem with count method, but I don't know how I can change it. In my project DataFrame must be split on pages by rows.
def listing(request):
contact_list = pd.DataFrame(np.arange(12).reshape(4,3))
paginator = Paginator(contact_list, 1) # Show 1 row per page
page = request.GET.get('page')
try:
contacts = paginator.page(page)
except PageNotAnInteger:
# If page is not an integer, deliver first page.
contacts = paginator.page(1)
except EmptyPage:
# If page is out of range (e.g. 9999), deliver last page of results.
contacts = paginator.page(paginator.num_pages)
return render(request, 'list.html', {'contacts': contacts})

You have to segregate your columns and then use pagination on each column and then append them together, since dataframe iterates on columns. Basically by separating each column, you give the chance to iterator to go through the rows:
contact_list = pd.DataFrame(np.arange(12).reshape(4,3))
paginator1 = Paginator(contact_list['col1'], 1)
paginator2 = Paginator(contact_list['col2'], 1)

The problem may be caused by the fact that by DataFrame.__iter__ iterates by column rather than row. You could call df.iterrows() or df.values if you want to get an iterator of your dataframe rows.

I have tested the following code with my own dataframes and it works. Just convert your dataframe to a list of dicts and the paginator should just work fine.
Sorry for the late response I just came up with this question
records = df.to_dict(orient='records')
paginator = Paginator(records, num_of_items)
page = request.GET.get('page')
records = paginator.get_page(page)
return render(request, 'list.html', {'contacts': records})

Related

Why does a Where clause cause more data to be returned?

I have a script that uses shareplum to get items from a very large and growing SharePoint (SP) list. Because of the size, I encountered the dreaded 5000 item limit set in SP. To get around that, I tried to page the data based on the 'ID' with a Where clause on the query.
# this is wrapped in a while.
# the idx is updated to the latest max if the results aren't empty.
df = pd.DataFrame(columns=cols)
idx = 0
query = {'Where': [('Gt', 'ID', str(idx))], 'OrderBy': ['ID']}
data = sp_list.GetListItems(view, query=query, row_limit=4750)
df = df.append(pd.DataFrame(data[0:]))
That seemed to work but, after I added the Where, it started returning rows not visible on the SP web list. For example, the minimum ID on the web is, say, 500 while shareplum returns rows starting at 1. It also seems to be pulling in rows that are filtered out on the web. For example, it includes column values not included on the web. If the Where is removed, it brings back the exact list viewed on the web.
What is it that I'm getting wrong here? I'm brand new to shareplum; I looked at the docs but they don't go into much detail and all the examples are rather trivial.
Why does a Where clause cause more data to be returned?

After further investigation, it seems shareplum will ignore any filters applied to the list to create the view when a query is provided to GetListItems. This is easily verified by removing the query param.
As a workaround, I'm now paging 'All Items' with a row_limit and query as below. This at least lets me get all the data and do any further filtering/grouping in python.
df = pd.DataFrame(columns=cols)
idx = 0
more = True
while more:
query = {'Where': [('Gt', 'ID', str(idx))]}
# Page 'All Items' based on 'ID' > idx
data = sp_list.GetListItems('All Items', query=query, row_limit=4500)
data_df = pd.DataFrame(data[0:])
if not data_df.empty:
df = df.append(data_df)
ids = pd.to_numeric(data_df['ID'])
idx = ids.max()
else:
more = False
As to why shareplum behaves this way is still an open question.

Loop in web Table

I am new using python and I am trying to get some values from a table in a webpage, I need to get the values in yellow from the web page:
I have this code, it is getting all the values in the "Instruments" column but I don't know how to get the specific values:
body = soup.find_all("tr")
for Rows in body:
RowValue = Rows.find_all('th')
if len(RowValue) > 0:
CellValue = RowValue[0]
ThisWeekValues.append(CellValue.text)
any suggestion?

ids = driver.find_elements_by_xpath('//*[#id]')
if 'Your element id` in ids:
Do something
One of the ways could be this, since only id is different.

Looping through list and comparing to options in dropdown

I'm trying to loop through a list of ID's and submit each option with a value = id.
After submitting I am grabbing the resulting text I need from the last row of a table.
The basic functionality works, however, when I add more than one 'ID' to the list it only returns the result for last item in the list.
Here is my code:
#Go To Email Logs
driver.get("https://website.com/manager/email_logs.php")
#variables
SaleIds = ['47832', '47842', '49859', '50898']
dropdown = Select(driver.find_element_by_id('emailspecialid'))
options = dropdown.options
for option in options:
value = option.get_attribute('value')
for id in SaleIds:
if id == value:
option.click()
driver.find_element_by_tag_name('input').submit()
result = driver.find_element_by_xpath('/html/body/table[1]/tbody/tr[last()]/td[4]').text
driver.implicitly_wait(100)
print(result)

Django: queryset "loosing" a value

I have 2 models, one with a list of clients and the other with a list of sales.
My intention is to add sales rank value to the clients queryset.
all_clients = Contactos.objects.values("id", "Vendedor", "codigo", 'Nombre', "NombrePcia", "Localidad", "FechaUltVenta")
sales = Ventas.objects.all()
Once loaded I aggregate all the sales per client summing the subtotal values of their sales and then order the result by their total sales.
sales_client = sales.values('cliente').annotate(
fact_total=Sum('subtotal'))
client_rank = sales_client .order_by('-fact_total')
Then I set the rank of those clients and store the value in a the "Rank" values in the same client_rank queryset.
a = 0
for rank in client_rank:
a = a + 1
rank['Rank'] = a
Everything fine up to now. When I print the results in the template I get the expected values in the "client_rank" queryset: "client name" + "total sales per client" + "Rank".
{'cliente': '684 DROGUERIA SUR', 'fact_total': Decimal('846470'), 'Rank': 1}
{'cliente': '699 KINE ESTETIC', 'fact_total': Decimal('418160'), 'Rank': 2}
etc....
The problem starts here
First we should take into account that not all the clients in the "all_clients" queryset have actual sales in the "sales" queryset. So I must find which ones do have sales, assign them the "Rank" value and a assign a standard value for the ones who don´t.
for subject in all_clients:
subject_code = str(client["codigo"])
try:
selected_subject = ranking_clientes.get(cliente__icontains=subject_code)
subject ['rank'] = selected_subject['Rank']
except:
subject ['rank'] = "Some value"
The Try always fails because "selected_subject" doesn´t seems to hace the "Rank" value. If I print the "selected_subject" I get the following:
{'cliente': '904 BAHIA BLANCA BASKET', 'fact_total': Decimal('33890')}
Any clues on why I´, lossing the "Rank" value? The original "client_rank" queryset still has that value included.
Thanks!

I presume that ranking_clientes is the same as client_rank.
The problem is that .get will always do a new query against the database. This means that any modifications you made to the dictionaries returned in the original query will not have been applied to the result of the get call.
You would need to iterate through your query to find the one you need:
selected_subject = next(client for client in ranking_clientes if subject_code in client.cliente)
Note, this is pretty inefficient if you have a lot of clients. I would rethink your model structure. Alternatively, you could look into using a database function to return the rank directly as part of the original query.

Django+python: Use Order by to Django RawQuery objects?

I have almost 100 million product names present in DB.I am displaying 100 products in UI each time & after scrolling showing next 100 & so on. For this I have used Django RawQuery as my database(mysql) doesn't support distinct functionality.
Here 'fetch' is callback function used in otherfile:
def fetch(query_string, *query_args):
conn = connections['databaseName']
with conn.cursor() as cursor:
cursor.execute(query_string, query_args)
record = dictfetchall(cursor)
return record
Here is the main call in views.py
So sample raw query code snippet:
record= fetch("select productname from abc")
Here if I am going to apply sorting criteria to the records
record= fetch("select productname from abc orderby name ASC")
Same doing for descending as well. As a result it takes so much time to display the sorted products.
What I want is like I will query 1 time & will store in a python object then will start applying ascending or descending.
So that for the first time when it loads, it will take some time but then applying sorting criteria it won't go to database to apply sorting for each time sort is hitted.
Overally want to say increase performance in case of sorting the records.

I think what you are looking for is pagination. This is an essential technique when you want to display data in batches(pages).
from django.core.paginator import Paginator, EmptyPage, PageNotAnInteger
def listing(request):
query_string = 'your query'
query_args = []
conn = connections['databaseName']
with conn.cursor() as cursor:
cursor.execute(query_string, *query_args)
all_records = dictfetchall(cursor)
paginator = Paginator(all_records, 100) # Show 100 records per page
page = request.GET.get('page')
try:
records = paginator.page(page)
except PageNotAnInteger:
# If page is not an integer, deliver first page.
records = paginator.page(1)
except EmptyPage:
# If page is out of range (e.g. 9999), deliver last page of results.
records = paginator.page(paginator.num_pages)
return records
Whenever you make a request you should include the page you want to display (e.g. 1,2,3...) in the url parameters.
Example GET http://localhost/products/?page=1
In terms of logic, your javascript should display the first page and have a counter that hold the next page you should request, after the user scrolls make an AJAX request to get the second page and increase the page counter, etc...
EDIT: As for the sorting matter you can also use javascript to sort your data

Below is my try & I got the desired answer.
I fetch the data from database & is stored in the form of array of dictionary in a list.
Assuming the data stored format in a list:
l = [{'productName'='soap',id=1},{'productName'='Laptop',id=2}]
Below code snippet is the solution to sort dependening upon key:
from operator import itemgetter
for Ascending
res= sorted(l, key=itemgetter('name'))
For Descending
res= sorted(l, key=itemgetter('name'),reverse=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I split DataFrame (pandas) on pages with django paginator? - python

The problem may be caused by the fact that by DataFrame.iter iterates by column rather than row. You could call df.iterrows() or df.values if you want to get an iterator of your dataframe rows.

Related

Why does a Where clause cause more data to be returned?

Loop in web Table

Looping through list and comparing to options in dropdown

Django: queryset "loosing" a value

Django+python: Use Order by to Django RawQuery objects?

Categories

Resources