Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Is it okay to use python pandas to manipulate tabular data within a flask/django web application?
My web app receives blocks of data which we visualise in chart form on the web. We then provide the user some data manipulation operations like, sorting the the data, deleting a given column. We have our own custom code to perform these data operations which it would be much easier to do it using pandas however I'm not sure if that is a good idea or not?
It’s a good question, pandas could be use in development environment if dataset isn’t too big , if dataset is really big I think you could use spark dataframes or rdd, if data increase in function of time you can think on streaming data with pyspark.
Actually yes, but don't forget to move your computation into a separate process if it takes too long.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I just need some advice about what database should I use, and how should I store my data.
Namely I need to store big chunk of data per user, I was thinking about storing everything in JSON data, but I thought that I could ask you first.
So I am using Django, and for now MySql, I need to store like 1000-2000 table rows per user, with columns like First Name, Last Name, Contact info, and also relate it somehow to the user that created that list. Also I need this to be able to efficiently get data from database.
Is there any way of storing this big data per user?
Thank you!
I know pandas is a library that works very well for storing data. So maybe look into that and see what file formats are well documented with it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Usually, for me, loading data from the SQL database on a Server first, then manipulatting later with pandas on my computer.
However, many other's aere preprocessing some data in SQL first (like case etc.) then the rest with pandas.
So i wonder which is better and why? thx!
This question is quit general. For a more specific answer, we would need to know more about your setup.
I make some assumptions to answer your question: I assume your database is running on a server and your python code is executed on your local machine.
In this case, you have to consider at least two things:
transmitted data over the network
data processing
If you make a general SQL request, large amounts of data are transmitted over the network. Next, your machine has to process the data. Your local machine might be less powerful than the server.
On the other hand, if you submit a specific SQL request, the powerful server can process the data and only return the data you are actually interested.
SQL queries can get long and hard to understand since you have to pass it as one statement. In python, you have the possibility to process the data over multiple lines of code.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am working on a project which involves working with a large amount of data. Essentially, there exists a large repository on some website of excel files that can be downloaded. The site has several different lists of filters and I have several different parameters I am filtering and then collecting data from. Overall, this process requires me to download upwards of 1,000+ excel files and copy and paste them together.
Does Python have the functionality to automate this process? Essentially what I am doing is setting Filter 1 = A, Filter 2 = B, Filter 3 = C, download file, and then repeat with different parameters and copy and paste files together. If Python is suitable for this, can anyone point me in the direction of a good tutorial or starting point? If not, what language would be more suitable for this for someone with little background?
Thanks!
Personally I would prefer to use python for this. I would look in particular at the Pandas library that is a powerful data analysis library that has a dataframe object that can be used like a headless Spreadsheet. I use it for a small number of spreadsheets and it's been very quick. Perhaps take a look at this person's website for more guidance. https://pythonprogramming.net/data-analysis-python-pandas-tutorial-introduction/
I'm not 100% if your question was only about spreadsheets and my first paragraph was really about working on the files once you have downloaded them, but if you're interested in actually fetching the files or 'scraping' the data you can look at the Requests library for the http side of things - this might be what you could use if there is Restful way of doing things. Or, look at scrapy https://scrapy.org for web scraping.
Sorry if I misunderstood in parts.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm diving into Django to create a webapp.
The thing is, I'm not sure if my app is too simple for what Django offers.
My app will download the latest CPI figures and convert your (monetary) dataset into inflation-adjusted figures, going way back in decades. The user pastes their data in via a textbox. It certainly won't need SQL.
I may want to expand the project with more features in future.
Is it advisable to go with a more lightweight framework for something as simple as I've described?
Every framework has its pros and cons. There are many different frameworks. Personally I prefer Flask but it is all personal preference. Here are some articles that help describe the differences:
https://www.airpair.com/python/posts/django-flask-pyramid
https://www.reddit.com/r/Python/comments/1yr8v5/django_vs_flask/
https://www.hakkalabs.co/articles/django-and-flask
A webapp like the one you describe sounds like most of the work can happen on the client side, without sending the data back to server. From what it sounds like, you simply need to make a few calculations and present the data in a new way.
For this I don't recommend Django, which is ideal for serving pages and managing relational DB content, but not really useful for client side work.
I'd recommend AngularJS
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I began learning myself data analysis and machine learning recently and ran quickly into my first issue.
I have the data from a REST API stored in JSON. My dataset is a folder with near 350.000 text files containing the JSON returned by the Riot API match endpoint (I store League Of Legends games), summing up 11GB of text files uncompressed. File names are the IDs of the matches.
Obviously I cant load all that data into memory (8GB) to analyze it or handle it with Scikit.Learn. And even if I could, parsing is extremely slow (Getting number of soloQ games, average win ratio of champions...). I've been told to store that data in a SQLite database, but I'm not really decided what to do. SQLite should be OK as future analysis will not need all the features, so I could do SELECT easily.
What are the best approach or what should I know before? Is any essential knowledge of data analysis I'm missing?