Display a huge file on GUI [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am very new to programming world. I have started learning Python. I am trying to develop a program that gets user input, runs some commands in the background and then display the output on web interface. Although it is little more complex, but for simplicity, you can consider this as:
user enters a filename
python code runs "cat filename"
output is displayed on the screen.
I have Python + Django + Apache setup. I am using HTML to to display the output. It is working fine as long as the output being returned is reasonably small. But sometimes I have huge output (~700 MB) and obviously I am having difficulty displaying it on web interface.
Code Snippet:
op = subprocess.check_output(cmd,universal_newlines=True)
return render (request, 'display.html', {'op': op})
What is the best approach to display such large content on web? Some points I wanted to put on the table:
Store the output in a file rather than a string. Read the file in chunks.In this case, can I have my code write to the file and read the file for display simultaneously? I don't think it is a good idea but wanted to check.
Are there any python packages intended to resolve such problem? Any django feature?
When I have huge output, it takes a while for the program to get the output and then it takes a while to be displayed on the screen.

You can't display 700 MB of alphanumerical data at once on any regular screen and even if you could, a user would drown in it. So you should display portions and enable scrolling or selection. As you indicate, loading all this data costs quite some time. But since you don't view it in one piece, you also don't have to load it in one piece. Rather than refreshing your whole page you should use Ajax (as e.g. implemented in jQuery) to have the browser load your data into an appropriate HTML/DOM element in portions as indicated by the user who e.g. shifts a scrollbar. Given the fact that 700 MB even in that case is just too much, a better way would be to select data for display using some type of hierarchical (tree) control or another non-sequential selection mechanism. The only task of Django in this case, apart from serving the page itself, is serving the requested data portion, e.g. as JSON data. As mentioned the page holding both the displayed data portion and the selection controls is served only once.

Take a look at some of the jQuery Infinite Scrolling options here.

Related

Merging excel edits from several users into a master excel workbook [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have some excel files with several workbooks in each. All of the columns are defined and I now need several users to go in and add data. Each user will make edits and send back the doc with their info added. Then someone will copy and paste all the changes into the master workbooks. Very time consuming process and I was wondering if I can make this quicker with python. My idea is this load the master and the edited spreadsheet into a dataframe and loop through each cell in both do the following.
If a cell in the master is blank but has data in the edited file, then copy the info in that cell to the master.
If the cell in the master has data but it is not the same as the edited file then overwrite the cell in the master with that info.
If the cell in the master has the exact same info as the edited file then do nothing to the cell.
If the cell in the master and the edited file are both blank do nothing to the cell.
Is this a possible solution or is my logic way off and there is an easier to do this? These excel docs are huge so the copy paste method takes a very long time and I would like to speed up this process for my team. So basically I want to add each users edits into the master without overwriting any other users edits in the process. Thanks in advance for your help with this.
This is certainly possible within Python but if you are looking for the continued ease of use for your team, building it as a macro in VBA may be better. User could go into the submitted document, click a button on the ribbon, and it would work through the process with a relatively nice UI. While Python may be nominally faster, you could be asking your team to interact with a terminal interface (using Py2exe or pyinstaller) or you may need to install Python on their workstations (to use xlwings, for example).
I'm not sure of the exact layout of your data, but here's an example of how the VBA process could work:
User opens edited file and clicks macro button on ribbon
Macro asks for user to select Master Document (assuming Master Document is not always the same)
Based on Sequence/ID column (usually column A), insert a helper column to the right of the data that validates whether their entered data is the same or different than the Master Document (could use CONCAT() to create one cell of data for comparison if there is multiple columns they're editing)
For items that are different (would include both where they're blank or changed), copy those respective ranges over
There are many ways to break down the logic into executable code, so that is just an example of one possibility. Python is a powerful language but for automating Excel data entry or administrative tasks like this request, VBA is often the easiest way forward.

How to loop over hundreds of images in Qulatrics - help needed to implement code

I would like to use Qualtrics to get ratings for over 700 images (i.e., participants will have to indicate how negative or positive they find them). The question will be exactly the same for each image, yet there seems to be no straightforward way to just create the question once and then loop over all the images I want participants to rate. Obviously, I don't really want to write the same question 700 times.
I found a relevant answer here on Stack Overflow that seems to suggest a good solution - here it is: Randomization in Qualtrics using Photos or Graphics and Loop and Merge . My question, however, is not a mere duplicate of this, as I have troubles running the web scraping code at the bottom (I am a very inexperienced coder with limited Python knowledge), thus have some follow-up questions.
I tried running the above mentioned code a number of different ways (I have BeautifulSoup and Selenium):
1) create a .py file (e.g., getURL.py) with all the code suggested, and then just run it from the PowerShell (the only way I know how to run python code) with
python getURL.py
This opens up a Chrome browser (data:,), but the file it creates in the end is empty. I'm guessing by the time I navigate to the library the code has already run and reached the end of the for loop.
2) I tried running it line by line in the Python interpreter in the PowerShell, like this: I would go through the first block, that would bring up the Browser, then I'd navigate to the photo library within the browser (am I supposed to do anything other than just brining up the site in the browser?). Once that's done I'd move on to the next block of code with the for loop - I'd paste it into the interpreter, either as a block or line by line - but python just doesn't seem to execute that.
Could anyone tell me how to run that code?
ALTERNATIVELY: does anyone know an easier way a) to get Qualtrics to loop over images, or b) to get a list of 700 URLs quickly?
Thank you very much for the help.
You didn't say where the images are stored. If they are stored in Qualtrics this is fairly easy:
Get a list of photo urls or image ids (If photos are stored in your Qualtrics library, go Account Setting/Qualtrics IDs. Then click on the library where the photos are stored. Copy the image ids and paste them in a spreadsheet.)
Edit your spreadsheet as needed.
Copy and paste your urls or image ids from the spreadsheet into the loop & merge setup. This can be done all at one time.
Create your question in the loop & merge block. Include an html <img> tag in the appropriate place. You'll pipe your url or image id into the appropriate place in the src attribute. For example, if you are using image ids with a name in field 1 and the image id in field 2, the html might look like:
<img src="https://survey.qualtrics.com/ControlPanel/Graphic.php?IM=${lm://Field/2}" alt="${lm://Field/1}" border="0">

Python or Powershell - Import Folder Name & Text file Content in Folder into Excel [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have been looking at some python modules and powershell capabilities to try and import some data a database recently kicked out in the form of folders and text files.
File Structure:
Top Level Folder > FOLDER (Device hostname) > Text File (also contains the hostname of device) (with data I need in a single cell in Excel)
The end result I am trying to accomplish is have the first cell be the FOLDER (device name) and the second column contain the text of the text file within that folder.
I found some python modules but they all focus on pulling directly from a text doc...I want to have the script or powershell function iterate through each folder and pull both the folder name and text out.
This is definitely do-able in Powershell. If I understand your question correctly, you're going to want to use Get Child-Itemand Get Content then -recurse if necessary. As far as an export you're going to want to use Out-File which can be a hassle when exporting directly to xlsx. If you had some code to work with I could help better but until then this should get you started in the right direction. I would read up on the Getcommands because Powershell is very simple to write but powerful.
Well, Since you're asking in a general sense -- you can do this project simply in either choice of script languages. If it were me -- and I had to do this job once -- I'd probably just bang out a PoSH script to do it. If this script had to be run repeatedly, and potentially end up with more complex functions I'd probably then switch to Python (but that's just based on my personal preference as PoSH is pretty powerful in a Windows env). Really this seems like a 15 line recursive function in either choice.
You can look up "recursive function in powershell" and the same for python and get plenty of code examples, as File Tree Walks (FTW :) ) are one of the most solved problems on sites like this. You'd just replace whatever the other person is doing in their walk with a read of the file and write. You probably also want to output in CSV format because it's easier and imports into excell just fine.

How to cache data to be used in multiple ways at a single URL

Let's say I have a page I'd like to render which will present some (expensive to compute) data in a few ways. For example, I want to hit my database and get some large-size pile of data. Then I want to group that data and otherwise manipulate it in Python (for example, using Pandas). Say the result of this manipulation is some Pandas DataFrame that I'll call prepped_data. And say everything up to this point takes 3 seconds. (I.e. it takes a while...)
Then I want to summarize that data at a single URL (/summary): I'd like to show a bar graph, a pie chart and also an HTML table. Each of these elements depends on a subset of prepped_data.
One way I could handle this is to make 3 separate views hooked up to 3 separate URL's. I could make pie_chart_view which would make a dynamically generated pie chart available at /piechart.svg. I could make bar_graph_view which would make a dynamically generated bar graph available at /bargraph.svg. And I could make summary_view which would finish by rendering a template. That template would make use of context variables generated by summary_view itself to make my HTML table. And it would also include the graphs by linking to their URL's from within the template. In this structure, all 3 view functions would need to independently calculate prepped_data. That seems less-than-ideal.
As an alternative. I could turn on some kind of caching. Maybe I could make a view called raw_data_view which would make the data itself available at /raw_data.json. I could set this to cache itself (using whatever Django caching backend) for a short amount of time (30 seconds?). Then each of the other views could hit this URL to get their data and that way I could avoid doing the expensive calculations 3 times. This seems a bit dicey as well, though, because there's some real judgement involved in setting the cache time.
One other route could involve creating both graphs within summary_view and embedding the graphics directly within the rendered HTML (which is possible with .svg). But I'm not a huge fan of that since you wind up with bulky HTML files and graphics that are hard for users to take with them. More generally, I don't want to commit to doing all my graphics in that format.
Is there a generally accepted architecture for handling this sort of thing?
Edit: How I'm making the graphs:
One comment asked how I'm making the graphs. Broadly speaking, I'm doing it in matplotlib. So once I have a Figure I like generated by the code, I can save it to an svg easily.
I think your idea to store the prepped data in a file is a good one. I might name the file something like this:
/tmp/prepped-data-{{session_id}}.json
You could then just have a function in each view called get_prepped_data(session_id) that either computes it or reads it from the file. You could also delete old files when that function is called.
Another option would be to store the data directly in the user's session so it is cleaned up when their session goes away. The feasibility of this approach depends a bit on how much data needs to be stored.

Takes an "eternity" to run my Python script [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I have a Python script which loads binary data from any targeted file and stores in inside
itself, in a list. The problem is that the bigger the stored file is, the longer it takes to open it the next time. Let's say that I want to load a 700 MB movie and store it in my script file. Then imagine that I open it next day with the 700 MB data stored in that script. It takes an eternity to open it!
Here is a simplified layout of how the script file looks.
Line 1: "The 700 MB movie is stored here inside a list."
Everything below : "All functions that the end-user uses."
Before the interpreter reaches the functions that the user is waiting for to be called,
it has to interpret a 700 MB data that is on line 1 first! This is a problem
because who wants to wait for an hour just to open a script?
So, would it help if I changed the layout of the file like this?
First lines: "All functions that the end-user uses." Below : "The 700
MB movie is stored here inside a list."
Would that help? Or would the interpreter have to plow through all the 700 MBs before the functions were called anyways?
Python compiler works in a way that makes what you are looking to do very very hard to say the least.
First, every-time you change the script (by adding the file for example), it will trigger a new cycle of compilation before the execution (turning a .py file in a .pyc one).
Second, every time you import the module, you will have that large block of data loaded into memory (whether it is on import or when you first access the data).
This is not just slow, it's also unsafe and error prone.
I'm guessing that what you intend to do is, distribute one single file with the data in it.
You might be able to do that using this little trick:
Making an executable python package (a zip file basically).
Building the zip file is very easy using the zipfile module.

Categories