How to store large datasets in python dictionary? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have a few terabytes of data that I want to store and be able to lookup at a fast speed. Of course I cannot simply use a python dictionary as the size of that is limited by the size of my RAM.
I tried using dbm python: https://docs.python.org/3/library/dbm.html, but it's too slow for my application.

Take a look at work done by Neueda at https://github.com/blu-corner/heliumdb
There a few important features that this interface provides:
1. It uses the native dictionary interface, so you really don't need to use a new API and your code doesn't have to change
2. It is fast. Much faster than using a database underneath
3. You can have your dictionary in a separate server and have multiple Python programs share the same dictionary. Of course you still need to take care of concurrency control at the application level but the dictionary is always consistent across all the programs. For string types the performance comes close to native in memory dictionary. Hope that helps. If you want I can send you some sample code

You can use a disk-based key-value store. Something like LevelDB might work with Python wrapper library whilst maintaining performance

Related

convert excel model (input-calc-output) to another platform for gd [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 months ago.
Improve this question
I work in insurance as an actuary, and one thing we deal with is models, it usually involve bunch of input tables, then a calculation or projection process to produce certain output in interests.
It is usually more complex then usual excel reports you see in business world.
Now given the fast calc speed & effienciency in oher programming platform (C#, python, C++, Julia...etc)
I really want to use other platform to either
replicate the certain computational intensive process which usually takes 2-3 hrs as it go back and forth between bunch of excel sheets and doing an iterative process and find the solution
call or control the excel process
I understand python can control for part 2 using openpyxl , but for 1. which are the easiest to replicate with? here i meant easy to convert.
thanks many~
i am playing around python but did not expect can replicate the complex iteration process easily
This question will almost certainly get closed as StackOverflow does not allow asking for recommended software.
There's an active actuary group in Julia, their website is here:
https://juliaactuary.org/

A cryptographic tool for specific fields in data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I was thinking of building a piece of software that would be able to encrypt specific fields in a data file. So I started to consider writing some code in Python using cryptographic libraries. However I wonder: is it really safe? Should I rather use existing cryptographic tools?
If so, do you know a good cryptographic tool I could rely on? The only tools I find only encrypt entire files or disks. Thank you!
This greatly depends (obviously), on how you write it.
There are libraries like cryptography which already provide this solution though and are considered very safe.
https://github.com/fugue/credstash, for instance, which is widely used, uses cryptography.
https://github.com/lyft/confidant uses it also.
I implemented a locally usable secret store using cryptography (which you could use to encrypt any type of data) - https://github.com/nir0s/ghost which you could use as a reference implementation or simply use it (hope I'm not breaking any rules here)

Excel-like table filtering interface in R or Python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm wondering if some of the automatic filtering and subsetting tools that are built into Microsoft Excel have equivalents in any package in Python or R.
For instance, let's say I want to build a tool to filter job candidates by various characteristics (for a non-technical user of the tool). In Excel, I can hit the filtering button and then automatically start subsetting a spreadsheet using multiple-choice lists, numeric ranges, free text search, etc:
I know Shiny apps in R allow you to build interactive dashboards, but (a) they don't automatically discern the type of every column of your data set, and (b) I've found the Shiny reactive triggers to be a little glitchy when repeatedly subsetting the same data.
Again, this tool is intended to be used by a non-technical user, who is trying to narrow a large data set down to a set of matching entries using filtering.
EDIT: I've just been told about the DataTables library -- specifically DT: An R interface to the DataTables library, but it also can be used in Python. I'm still curious if there are better packages out there, but this one seems like the most likely candidate.
Rstudio has this ability. If you view a data set (click it in the Environment window), there's an option to "Filter".

Audio Manipulation [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm looking for a Python library that will allow me to record, manipulate, and merge audio files. Most of the ones I've seen don't support Windows and/or are outdated. Does anyone have any suggestions for libraries or how these functions could be implemented with the standard python library?
Recording and Manipulating are generally different problems. For both of these, I stick with .wav file formats since (at least in their simpler forms) they are basically just the raw data with a minimal header and are easy to work with.
Recording: I use pyaudio, which provides bindings to the portaudio library.
Manipulation: For simple things I use audioop which is included in the basics Python installation, and for more complex things I go straight to scipy (which can read in many .wav files with scipy.io.wavfile.read) and then manipulate the data like any other time-series data. scipy is powerful and fast, but doesn't offer many audio specific tools nor does it present things in an audio specific terminology.
There are other things out there, though less well established, such as Snack, Audiere, and AudioLazy, are tools I've heard of bet never used, and I don't know which are still available, or their level of development, etc.

Test data generation framework in python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there any "test-data" generation framework out there, specially for Python?
To make it clear, instead of writing scripts from scratch that fill my database with random users and other entities I want to know if there are any tools/frameworks out there to make it easier,
To make it even more clear, I am not looking for test frameworks, I want to generate test data to "put some load" my application.
http://code.google.com/p/fake-data-generator/
Looks like what you want. I tend to just use ranges with appropriate upper and lower limits and liberal use of lists.
If you need your test data to match the distribution of your population then you'll need to do more work though.
If you don't want to write any code, try Mockaroo. It's a free web app that allows you to generate random test data tables in lots of different formats such as XML, JSON, Excel, CSV. You are allowed to generate up to 1000 rows for free.

Categories