Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there any "test-data" generation framework out there, specially for Python?
To make it clear, instead of writing scripts from scratch that fill my database with random users and other entities I want to know if there are any tools/frameworks out there to make it easier,
To make it even more clear, I am not looking for test frameworks, I want to generate test data to "put some load" my application.
http://code.google.com/p/fake-data-generator/
Looks like what you want. I tend to just use ranges with appropriate upper and lower limits and liberal use of lists.
If you need your test data to match the distribution of your population then you'll need to do more work though.
If you don't want to write any code, try Mockaroo. It's a free web app that allows you to generate random test data tables in lots of different formats such as XML, JSON, Excel, CSV. You are allowed to generate up to 1000 rows for free.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 months ago.
Improve this question
I work in insurance as an actuary, and one thing we deal with is models, it usually involve bunch of input tables, then a calculation or projection process to produce certain output in interests.
It is usually more complex then usual excel reports you see in business world.
Now given the fast calc speed & effienciency in oher programming platform (C#, python, C++, Julia...etc)
I really want to use other platform to either
replicate the certain computational intensive process which usually takes 2-3 hrs as it go back and forth between bunch of excel sheets and doing an iterative process and find the solution
call or control the excel process
I understand python can control for part 2 using openpyxl , but for 1. which are the easiest to replicate with? here i meant easy to convert.
thanks many~
i am playing around python but did not expect can replicate the complex iteration process easily
This question will almost certainly get closed as StackOverflow does not allow asking for recommended software.
There's an active actuary group in Julia, their website is here:
https://juliaactuary.org/
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed last year.
Improve this question
I'm trying to convert the whole extension of a PDF into a CSV or an xlsx with python and I've hit a wall.
I know that there is an API called PDFTables that works perfectly but the number of documents that I would like to convert (over 400) and the fact that its use involves an economic investment that I can't afford makes its use unfeasible. There is another library that I've tried, tabula, however as far as I know it only works with the tables of the PDF.
With this problem in mind, are there any other options available?
Thank you in advance.
If you don't need it to be programmatic, have you seen https://www.adobe.com/la/acrobat/online/pdf-to-excel.html?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have a few terabytes of data that I want to store and be able to lookup at a fast speed. Of course I cannot simply use a python dictionary as the size of that is limited by the size of my RAM.
I tried using dbm python: https://docs.python.org/3/library/dbm.html, but it's too slow for my application.
Take a look at work done by Neueda at https://github.com/blu-corner/heliumdb
There a few important features that this interface provides:
1. It uses the native dictionary interface, so you really don't need to use a new API and your code doesn't have to change
2. It is fast. Much faster than using a database underneath
3. You can have your dictionary in a separate server and have multiple Python programs share the same dictionary. Of course you still need to take care of concurrency control at the application level but the dictionary is always consistent across all the programs. For string types the performance comes close to native in memory dictionary. Hope that helps. If you want I can send you some sample code
You can use a disk-based key-value store. Something like LevelDB might work with Python wrapper library whilst maintaining performance
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a api made based on thrift TCompactProtocol.
Is there a quick way to convert it into TBinaryProtocolTransport?
Is there a tool for conversion?
FYI. My api is Line Api bases api Python.
There is no tool needed. Since yyou did not elaborate on your actual use case too much, I can only give a generic answer.
You control both RPC server & client + we do NOT talk about stored data
In that case you need only to replace the transports on both ends and you're pretty much done.
All other cases
You will need two pieces
a piece of code that deserializes old data stored with "compact"
a piece of code that deserializes these data using "binary"
Both cases are not really hard to implement technically.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm wondering if some of the automatic filtering and subsetting tools that are built into Microsoft Excel have equivalents in any package in Python or R.
For instance, let's say I want to build a tool to filter job candidates by various characteristics (for a non-technical user of the tool). In Excel, I can hit the filtering button and then automatically start subsetting a spreadsheet using multiple-choice lists, numeric ranges, free text search, etc:
I know Shiny apps in R allow you to build interactive dashboards, but (a) they don't automatically discern the type of every column of your data set, and (b) I've found the Shiny reactive triggers to be a little glitchy when repeatedly subsetting the same data.
Again, this tool is intended to be used by a non-technical user, who is trying to narrow a large data set down to a set of matching entries using filtering.
EDIT: I've just been told about the DataTables library -- specifically DT: An R interface to the DataTables library, but it also can be used in Python. I'm still curious if there are better packages out there, but this one seems like the most likely candidate.
Rstudio has this ability. If you view a data set (click it in the Environment window), there's an option to "Filter".