write table cell real-time python - python

I would like to loop trough a database, find the appropriate values and insert them in the appropriate cell in a separate file. It maybe a csv, or any other human-readable format.
In pseudo-code:
for item in huge_db:
for list_of_objects_to_match:
if itemmatch():
if there_arent_three_matches_yet_in_list():
matches++
result=performoperationonitem()
write_in_file(result, row=object_to_match_id, col=matches)
if matches is 3:
remove_this_object_from_object_to_match_list()
can you think of any way other than going every time through all the outputfile line by line?
I don't even know what to search for...
even better, there are better ways to find three matching objects in a db and have the results in real-time? (the operation will take a while, but I'd like to see the results popping out RT)

Assuming itemmatch() is a reasonably simple function, this will do what I think you want better than your pseudocode:
for match_obj in list_of_objects_to_match:
db_objects = query_db_for_matches(match_obj)
if len(db_objects) >= 3:
result=performoperationonitem()
write_in_file(result, row=match_obj.id, col=matches)
else:
write_blank_line(row=match_obj.id) # if you want
Then the trick becomes writing the query_db_for_matches() function. Without detail, I'll assume you're looking for objects that match in one particular field, call it type. In pymongo such a query would look like:
def query_db_for_matches(match_obj):
return pymongo_collection.find({"type":match_obj.type})
To get this to run efficiently, make sure your database has an index on the field(s) you're querying on by first calling:
pymongo_collection.ensure_index({"type":1})
The first time you call ensure_index it could take a long time for a huge collection. But each time after that it will be fast -- fast enough that you could even put it into query_db_for_matches before your find and it would be fine.

Related

Merge Two TinyDB Databases

On Python, I'm trying to merge multiple JSON files obtained from TinyDB.
I was not able to find a way to directly merge two tinydb JSON files that have keys autogenerated in the sequence that not restart with the opening of the next file.
In code words, i want to merge large amount of data like this:
hello1={"1":"bye",2:"good"....,"20000":"goodbye"}
hello2={"1":"dog",2:"cat"....,"15000":"monkey"}
As:
Hello3= {"1":"bye",2:"good"....,"20000":"goodbye","20001":"dog",20002:"cat"....,"35000":"monkey"}
Because of the problem to find the correct way to do it with TinyDB, I opened and transformed them simply in classic syntax json file, loading each file and then doing:
Data = Data['_default']
The problem that I have, is that at the moment the code works, but it has serious memory problems. After a few seconds, the created merged Db contains like 28Mb of data, but (probably) the cache saturate, and it starts to add all the other data in a really slow way.
So, I need to empty the cache after a certain amount of data, or probably i need to change the way to do this!
That's the code that i use:
Try1.purge()
Try1 = TinyDB('FullDB.json')
with open('FirstDataBase.json') as Part1 :
Datapart1 = json.load(Part1)
Datapart1 = Datapart1['_default']
for dets in range(1, len(Datapart1)):
Try1.insert(Datapart1[str(dets)])
with open('SecondDatabase.json') as Part2:
Datapart2 = json.load(Part2)
Datapart2 = Datapart2['_default']
for dets in range(1, len(Datapart2)):
Try1.insert(Datapart2[str(dets)])
Question: Merge Two TinyDB Databases ... probably i need to change the way to do this!
From TinyDB Documentation
Why Not Use TinyDB?
...
You are really concerned about performance and need a high speed database.
Single row insertion into a DB are always slow, try db.insert_multiple(....
The second one. with generator. gives you the option to hold down the memory footprint.
# From list
Try1.insert_multiple([{"1":"bye",2:"good"....,"20000":"goodbye"}])
or
# From generator function
Try1.insert_multiple(generator())

How to do math operation on imported csv data?

I have read in a csv file ('Air.csv') and have performed some operations to get rid of the header (not important). Then, I used dB_a.append(row[1]) to put this column of the csv data into an array which I could later plot.
This data is dB data, and I want to convert this to power using the simple equation P = 10^(dB/10) for every value. I am new to Python, so I don't quite understand how operations within arrays, lists, etc. works. I think there is something I need to do to iterate over that full data set, which was my attempt at a for loop, but I am still receiving errors. Any suggestions?
Thank you!
frequency_a=[]
dB_a=[]
a = csv.reader(open('Air.csv'))
for row in itertools.islice(a, 18, 219):
frequency_a.append(row[0])
dB_a.append(row[1])
#print(frequency_a)
print(dB_a)
for item in dB_a:
power_a = 10**(dB_a/10)
print(power_a)
In your for loop, item is the iterator, so you need to use that. So instead of:
power_a = 10**(dB_a/10)
use:
power_a = 10**(item/10)
A nicer way to create a new list with that data could be:
power_a = [10**(db/10) for db in dB_a]
EDIT: The other issue as pointed out in the comment, is that the values are strings. The .csv file is essentially a text file, so a collection of string, rather than integers. What you can do is convert them to numeric values using int(db) or float(db), depending whether you have whole or floating point numbers.
EDIT2: As pointed out by #J. Meijers, I was using multiplication instead of exponentiation - this has been fixed in the answer.
To build on the answer #ed Jaras posted.
power_a = [10*(db/10) for db in dB_a]
is not correct, since this divides by 10, and then multiplies by the same.
It should be:
power_a = [10**(db/10) for db in dB_a]
Credits still go to #Ed Jaras though
Note:
If you're wondering what this [something for something in a list] is, it is a list comprehension. They are amazingly elegant constructs that python allows.
What is basically means is [..Add this element to the result.. for ..my element.. in ..a list..].
You can even add conditionals to them if you want.
If you want to read more about them, I suggest checking out:
http://www.secnetix.de/olli/Python/list_comprehensions.hawk
Addition:
#k-schneider: You are probably doing numerical operations (dividing, power, etc. ) on a string, this is because when importing a csv, it is possible for fields to be imported as a string.
To make sure that you are working with integers, you can cast db to a string by doing:
str(db)

any() function with csv file in python not behaving how I'm expecting it to

(I'm using python 2.7 for now)
So maybe I'm not understanding how this line of code is working, because for one part of my program, it seems to be working fine, while in another part, it doesn't.
elif not any(user in line for line in data):
Basically, I have a csv file that I'm reading from and storing in the variable "data" like this:
f = open("scores.csv")
data = csv.reader(f)
the variable "user" is a string from an Entry box in Tkinter,
and the variable "line" is an arbitrary name for the for loop, just like in a piece of code that says "for i in range(69):"
So what my brain thinks that this line should do is that if it fails to find a match of user in any of the lines in the csv file, it should run the code under that statement. But it doesn't seem to do that!
However, later on in my code I try something similar:
elif any(user in line for line in data):
and this seems to work without any problems!!
I have no idea why, and I could not find anywhere on the internet of anyone else trying to do this lol.
I'm trying to make a login form as a beginner project, as I somewhat know python, so I wanted to see what I can do, but I seem to be stuck here.
I have uploaded my code to github for anyone to review:
https://github.com/Arunscape/login-form/blob/master/login.py
oh and don't worry about the "passwords" in the csv file, they're of course fake!
Any help is appreciated. Thanks!!!
The problem you have is that data is an iterator, not a sequence you can iterate on multiple times. After you call any with a generator expression iterating over data, some or all of the items will have been consumed. Later calls will only see what is left over (which may be nothing if the first iteration had to check all the data).
You can reproduce the issue with a much simpler bit of code:
iterator = iter(range(10)) # an iterator over the numbers 0 through 9
first_result = any(x == 3 for x in iterator) # this will be True
second_result = any(x == 3 for x in iterator) # the same expression will be False this time!
The first any call consumes (via the generator expression) the numbers 0 through 3 from the iterator. Then it stops and any returns True (stopping early in this way is known as "short circuiting").
The second any call only gets to consume the remaining items, it can't see the ones that were already yielded to the first any call. Since the iterator will only yield one 3, the second any call will return False after consuming the rest of the numbers.
For your code to work correctly with data being an iterator, you can only iterate over it once.
If there are not too many values in your csv file, you might be better off reading all the rows into a list, which you can iterate over as many times as you want. Try:
data = list(csv.reader(f))
It might make sense to parse the data into a more meaningful data structure though, rather than a list (e.g. a dictionary mapping usernames to passwords, which you could query in O(1) time rather than O(N) time).

Python-How can i change part of a row in a CSV file?

I have a CSV file with some words in, followed by a number and need a way to append the number; either adding 1 to it, or setting it back to 1.
Say for instance I have these words:
variant,1
sixty,2
game,3
library,1
If the user inputs the number sixty, how could I use that to add one onto the number, and how would I reset it back to 1?
I've been all over Google+Stackoverflow trying to find an answer, but I expect me not being able to find an answer was due more to my inexperience than anything.
Thanks.
This is a quick throw together using fileinput. Since I am unaware of the conditions for why you would decrease or reset your value, I added it in as an keyword arg you can pass at will. Such as
updateFileName(filename, "sixty", reset=True)
updateFileName(filename, "sixty", decrease=True)
updateFileName(filename, "sixty")
The results of each should be self-explanatory. Good luck! I wrapped it in a Try as I had no clue how your structure was, which will cause it to fail ultimately either way. If you have spaces you will need to .strip() the key and value.
import fileinput
def updateFileName(filename, input_value, decrease=False, reset=False):
try:
for line in fileinput.input(filename, inplace=True):
key, value = line.split(",")
if key == input_value:
if decrease:
sys.stdout.write("%s,%s"(key, int(value) - 1)
elif reset:
sys.stdout.write("%s,%s"(key, 1)
else:
sys.stdout.write("%s,%s"(key, int(value) + 1)
continue
sys.stdout.write(line)
finally:
fileinput.close()
Without knowing when you want to switch a number to 1 and when you want to add 1, I can't give a full answer, but I can set you on the right track.
First you want to import the csv file, so you can change it around.
The csv module is very helpful in this. Read about it here: http://docs.python.org/2/library/csv.html
You will likely want to read it into a dictionary structure, because this will link each word to it's corresponding number. Something like this make dictionary from csv file columns
Then you'll want to use raw_input (or input if you are using Python 3.0)
to get the word you are looking for and use that as the key to find and change the number you want to change. http://anh.cs.luc.edu/python/hands-on/handsonHtml/handson.html#x1-490001.12
or http://www.sthurlow.com/python/lesson06/
will show you how to get one part of a dictionary by giving it the other part and how to save info back into a dictionary.
Sorry it's not a direct answer. Maybe someone else will write one up, but it should get you started,
This is very probably not the best way to do it but you can load your csv file in an array object using numpy with the help of loadtxt().
The following code is going to give you a 2 dimension array with names in the first column and your numbers in the second one.
import numpy as np
a = np.loadtxt('YourFile',delimiter=',')
Perform your changes on the numbers the way you want and use the numpy savetxt() to save your file.
If your file is very heay, this solution is going to be a pain as loading a huge array takes a lot of memory. So consider it just as a workaround. The dictionary solution is actually better (I think).

Efficient and clean way of growing a tokenizer function in Python

I have a library that does some "translation" and uses the awesome tokenize.generate_tokens() function to do so.
And it is pretty fast and I have things working correctly. But when translating, I've found that the function keeps growing with new tokens that I want to translate and the if and elif conditions start to pop all over. I also keep a few variables outside the generator that keeps track of "last keyword seen" and similar.
A good example of this is the actual Python documentation one seen here (at the bottom): http://docs.python.org/library/tokenize.html#tokenize.untokenize
Every time I add a new thing I need to translate this function grows a couple of conditionals. I don't think that having a function with so many conditionals is the way to or the proper way to pave the ground to grow.
Furthermore, I feel that the tokenizer consumes a lot of irrelevant lines that do not contain any of the keywords I am translating.
So 2 questions:
How can I avoid adding more and more conditional statements that will make this translation function easy/clean to keep growing (without a performance hit)?
How can I make it efficient for all the irrelevant lines I am not interested in?
You could use a dict dispatcher. For example, the code you linked to might look like this:
def process_number(result,tokval):
if '.' in tokval:
result.extend([
(NAME, 'Decimal'),
(OP, '('),
(STRING, repr(tokval)),
(OP, ')')
])
def process_default(result,tokval):
result.append((toknum, tokval))
dispatcher={NUMBER: process_number, }
for toknum, tokval, _, _, _ in g:
dispatcher.get(toknum,process_default)(result,tokval)
Instead of adding more if-blocks, you add key-value pairs to dispatcher.
This may be more efficient than evaluating a long list of if-else conditionals, since dict lookup is O(1), but it does require a function call. You'll have to benchmark to see how this compares to many if-else blocks.
I think its main advantage is that it keeps code organized in small(er), comprehensible units.

Categories