Python equivalent of ignoreboth:erasedups

Python equivalent of ignoreboth:erasedups - python

I'm running iPython (Jupyter) through Anaconda, on a Mac Sierra, through iTerm, with $SHELL=bash - if I've missed any helpful set up details, just let me know.
I love the $HISTCONTROL aspect of bash, mentioned here. To sum that answer up: when traversing history (aka hitting the up arrow), it's helpful to remove duplicate entries so you don't scroll past the same command multiple times, and this is accomplished with $HISTCONTROL=ignoreboth:erasedups.
Is there any equivalent for this inside the Python interpreter (or iPython, specifically)? I have readline installed and feel like that's a good place to start, but nothing jumped out as obviously solving the problem, and I would've thought this was built in somewhere.

Through some deep-diving into IPython, sifting through poorly-explained and/or deprecated documentation, I've pieced together a solution that seems to work fine, though I'm sure it's not optimal for a number of reasons, namely:
it runs a GROUP BY query on the history database every time I run a line in IPython
it doesn't take care to clean up/coordinate the database tables - I only modify history, but ignore output_history and sessions tables
I put the following in a file (I named it dedupe_history.py, but name is irrelevant) inside $HOME/.ipython/profile_default/startup:
import IPython
import IPython.core.history as H
## spews a UserWarning about locate_profile() ... seems safe to ignore
HISTORY = H.HistoryAccessor()
def dedupe_history():
query = ("DELETE FROM history WHERE rowid NOT IN "
"(SELECT MAX(rowid) FROM history GROUP BY source)")
db = HISTORY.db
db.execute(query)
db.commit()
def set_pre_run_cell_event():
IPython.get_ipython().events.register("pre_run_cell", dedupe_history)
## dedupe history at start of new session - maybe that's sufficient, YMMV
dedupe_history()
## run dedupe history every time you run a command
set_pre_run_cell_event()

Related

Intel Vtune cannot find python source file

This is an old problem as is demonstrated as in https://community.intel.com/t5/Analyzers/Unable-to-view-source-code-when-analyzing-results/td-p/1153210. I have tried all the listed methods, none of them works, and I cannot find any more solutions on the internet. Basically vtune cannot find the custom python source file no matter what is tried. I am using the most recently version as of speaking. Please let me whether there is a solution.
For example, if you run the following program.
def myfunc(*args):
# Do a lot of things.
if __name__ = '__main__':
# Do something and call myfunc
Call this script main.py. Now use the newest vtune version (I have using Ubuntu 18.04), run the vtune-gui and basic hotspot analysis. You will not found any information on this file. However, a huge pile of information on Python and its other codes are found (related to your python environment). In theory, you should be able to find the source of main.py as well as cost on each line in that script. However, that is simply not happening.
Desired behavior: I would really like to find the source file and function in the top-down manual (or any really). Any advice is welcome.

VTune offer full support for profiling python code and the tool should be able to display the source code in your python file as you expected. Could you please check if the function you are expecting to see in the VTune results, ran long enough?
Just to confirm that everything is working fine, I wrote a matrix multiplication code as shown below (don't worry about the accuracy of the code itself):
def matrix_mul(X, Y):
result_matrix = [ [ 1 for i in range(len(X)) ] for j in range(len(Y[0])) ]
# iterate through rows of X
for i in range(len(X)):
# iterate through columns of Y
for j in range(len(Y[0])):
# iterate through rows of Y
for k in range(len(Y)):
result_matrix[i][j] += X[i][k] * Y[k][j]
return result_matrix
Then I called this function (matrix_mul) on my Ubuntu machine with large enough matrices so that the overall execution time was in the order of few seconds.
I used the below command to start profiling (you can also see the VTune version I used):
/opt/intel/oneapi/vtune/2021.1.1/bin64/vtune -collect hotspots -knob enable-stack-collection=true -data-limit=500 -ring-buffer=10 -app-working-dir /usr/bin -- python3 /home/johnypau/MyIntel/temp/Python_matrix_mul/mat_mul_method.py
Now open the VTune results in the GUI and under the bottom-up tab, order by "Module / Function / Call-stack" (or whatever preferred grouping is).
You should be able to see the the module (mat_mul_method.py in my case) and the function "matrix_mul". If you double click, VTune should be able to load the sources too.

How to execute python function as whole in VSCode (it splits and sends just the first line to an interpreter)

I'm getting used to VSCode in my daily Data Science remote workflow due to LiveShare feature.
So, upon executing functions it just executes the first line of code; if I mark the whole region then it does work, but it's cumbersome way of dealing with the issue.
I tried number of extensions, but none of them seem to solve the problem.
def gini_normalized(test, pred):
"""Simple normalized Gini based on Scikit-Learn's roc_auc_score"""
gini = lambda a, p: 2 * roc_auc_score(a, p) - 1
return gini(test, pred)
Executing the beginning of the function results in error:
def gini_normalized(test, pred):...
File "", line 1
def gini_normalized(test, pred):
^
SyntaxError: unexpected EOF while parsing
There's a solution for PyCharm: Python Smart Execute - https://plugins.jetbrains.com/plugin/11945-python-smart-execute. Also Atom's Hydrogen doesn't have such issue either.
Any ideas regarding VSCode?
Thanks!

I'm a developer on the VSCode DataScience features. Just to make sure that I'm understanding correctly. You would like the shift-enter command to send the entire function to the Interactive Window if you run it on the definition of the function?
If so, then yes, we don't currently support that. Shift-enter can run line by line or run a section of code that you manually highlight. If you want, you can use #%% lines in your code to put functions into code cells. Then when you are in a cell shift-enter will run that entire cell, might be the best current approach for you.
That smart execute does look interesting, if you would like to file that as a suggestion you can use our GitHub here to get it on our backlog to look at.
https://github.com/Microsoft/vscode-python

Hi you could click the symbol before each line and turn it into > (the indented codes of the function was hidden now). Then if you select the whole line and the next line, shift+enter could run them together.
enter image description here

How can I change baselines code output/replay (PPO) on github?

I am trying to run my own version of baselines code source of reinforcement learning on github: (https://github.com/openai/baselines/tree/master/baselines/ppo2).
Whatever I do, I keep having the same display which looks like this :
Where can I edit it ? I know I should edit the "learn" method but I don't know how

Those prints are the result of the following block of code, which can be found at this link (for the latest revision at the time of writing this at least):
if update % log_interval == 0 or update == 1:
ev = explained_variance(values, returns)
logger.logkv("serial_timesteps", update*nsteps)
logger.logkv("nupdates", update)
logger.logkv("total_timesteps", update*nbatch)
logger.logkv("fps", fps)
logger.logkv("explained_variance", float(ev))
logger.logkv('eprewmean', safemean([epinfo['r'] for epinfo in epinfobuf]))
logger.logkv('eplenmean', safemean([epinfo['l'] for epinfo in epinfobuf]))
logger.logkv('time_elapsed', tnow - tfirststart)
for (lossval, lossname) in zip(lossvals, model.loss_names):
logger.logkv(lossname, lossval)
logger.dumpkvs()
If your goal is to still print some things here, but different things (or the same things in a different format) your only option really is to modify this source file (or copy the code you need into a new file and apply your changes there, if allowed by the code's license).
If your goal is just to suppress these messages, the easiest way to do so would probably be by running the following code before running this learn() function:
from baselines import logger
logger.set_level(logger.DISABLED)
That's using this function to disable the baselines logger. It might also disable other baselines-related output though.

How do I perform a "yum update" using the Yumbase Python module?

Edit: So apparantly my install wasn't working. This pointed me to a mailing list Here where I figured out which commands I was missing. I have the answer for the update below. Now that I think about it, it does make sense. I just wish they'd put this somewhere simple on the dev pages.
yb = yum.YumBase()
yb.conf.assumeyes = True
yb.update(name='aws-cli')
yb.buildTransaction()
yb.processTransaction()
I'm trying to perform an update using yumbase when a server first boots with my kickstart script. At the moment I have a rather crude python subprocess to do "yum update" and would like to make this better.
I'm trying to hook into Yumbase, but the documentation is quite scarce. I have had a look at both the source code and documentation on this page: http://yum.baseurl.org/wiki/5MinuteExamples
I've figured out how to list all packages but not the ones that need updating using an SO answer from 2008: Given an rpm package name, query the yum database for updates
I've also figured out it's a very simple 3-line process to install a new package:
yb = yum.YumBase()
yb.conf.assumeyes = True
yb.install(name='aws-cli')
However the following doesn't work to "update" the package:
yb = yum.YumBase()
yb.conf.assumeyes = True
yb.update(name='aws-cli')
So what I need is:
1: A way to list the packages that need updating, much like "yum check-update"
2: Install the packages above using "yum update"

From what I can see in the yum code, it doesn't seem to be written to be used as a library. The code you gave is not the right way to do it, there's much else happening behind the scenes.
Basically, as of yum-3.4.3, the process looks like this:
->yummain.__main__
<trap KeyboardInterrupt>
->yummain.user_main(sys.argv[1:], exit_code=True)
<check YUM_PROF,YUM_PDB envvars, wrap the following into debugger/profiler if set>
->yummain.main(args)
<set up locale, set up logging>
-><create a YumBaseCli (child of YumBase & YumOutput)>
<incl. fill a list field with YumCommand instances of known commands>
->cli.YumBaseCli.getOptionsConfig()
<parse args into the YumBaseCli instance, includes initializing plugins>
<obtain global yum lock>
<check write permissions for current dir>
->cli.YumBaseCli.doCommands()
<select a YumCommand from the list>
->YumCommand.needTs/needTsRemove if needed
->YumCommand.doCommand(self, self.basecmd, self.extcmds)
<handle errors & set error code if any>
'Resolving Dependencies'
->cli.YumBaseCli.buildTransaction()
<check for an unfinished transaction>
<resolve deps using the info written by the YumCommand into the object>
<honor clean_requirements_on_remove, protected_packages,
protected_multilib, perform some checks>
<handle errors & set error code if any>
'Dependencies Resolved'
->cli.YumBaseCli.doTransaction()
<download, transaction check, transaction test, transaction
using the info in the object>
<handle errors & set error code if any>
'Complete!'
<release global yum lock>
sys.exit(error_code)
As you can see, the main working sequence is embedded directly into main so you can only replicate this logic in-process by running it directly:
yummain.main(<sequence of cmdline arguments>)
Which is just the same as running a separate process minus process isolation.

Why does windows give an sqlite3.OperationalError and linux does not?

The problem
I've got a programm that uses storm 0.14 and it gives me this error on windows:
sqlite3.OperationError: database table is locked
The thing is, under linux it works correctly.
I've got the impression that it happens only after a certain amount of changes have been done, as it happens in some code, that copies a lot of objects.
Turning on the debug mode gives me this on windows:
83 EXECUTE: 'UPDATE regularorder_product SET discount=? WHERE regularorder_product.order_id = ? AND regularorder_product.product_id = ?', (Decimal("25.00"), 788, 274)
84 DONE
85 EXECUTE: 'UPDATE repeated_orders SET nextDate=? WHERE repeated_orders.id = ?', (datetime.date(2009, 3, 31), 189)
86 ERROR: database table is locked
On linux:
83 EXECUTE: 'UPDATE regularorder_product SET discount=? WHERE regularorder_product.order_id = ? AND regularorder_product.product_id = ?', (Decimal("25.00"), 789, 274)
84 DONE
85 EXECUTE: 'UPDATE repeated_orders SET nextDate=? WHERE repeated_orders.id = ?', (datetime.date(2009, 3, 31), 189)
86 DONE
System info
Windows
Windows XP SP 3
Python 2.5.4
NTFS partition
Linux
Ubuntu 8.10
Python 2.5.2
ext3 partition
Some code
def createRegularOrderCopy(self):
newOrder = RegularOrder()
newOrder.date = self.nextDate
# the exception is thrown on the next line,
# while calling self.products.__iter__
# this happens when this function is invoked the second time
for product in self.products:
newOrder.customer = self.customer
newOrder.products.add(product)
return newOrder
orders = getRepeatedOrders(date)
week = timedelta(days=7)
for order in orders:
newOrder = order.createRegularOrderCopy()
store.add(newOrder)
order.nextDate = date + week
The question
Is there anything about sqlite3/python that differs between windows and linux? What could be the reason for this bug and how can I fix it?
Another observation
When adding a COMMIT at the place where the error happens, this error is thrown instead: sqlite3.OperationalError: cannot commit transaction - SQL statements in progress
Answers to answers
I'm not using multiple threads / processes, therefore concurrency shouldn't be a problem and also I've got only one Store object.

The "database table is locked" error is often a generic/default error in SQLite, so narrowing down your problem is not obvious.
Are you able to execute any SQL queries? I would start there, and get some basic SELECT statements working. It could just be a permissions issue.

Hard to say without a little more info on the structure of your database access (which is a little obscured by using Storm).
I'd start by reading these documents; they contain very relevant information:
https://storm.canonical.com/Manual#SQLite%20and%20threads
http://sqlite.org/lockingv3.html

Are you running any sort of anti-virus scanners? Anti-virus scanners will frequently lock a file after it has been updated, so that they can inspect it without it being changed. This may explain why you get this error after a lot of changes have been made; the anti-virus scanner has more new data to scan.
If you are running an anti-virus scanner, try turning it off and see if you can reproduce this problem.

It looks to me like storm is broken, though my first guess was virus scanner as Brian suggested.
Have you tried using sqlite3_busy_timeout() to set the timeout very high? This might cause SQLite3 to wait long enough for the lock holder, whoever that is, to release the lock.

I've solved the problem at the moment by replacing the sqlite3-dll with the newest version. I'm still not sure if this was a bug in the windows code of sqlite or if python installed an older version on windows than on linux.
Thanks for your help.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python equivalent of ignoreboth:erasedups - python

Related

Intel Vtune cannot find python source file

How to execute python function as whole in VSCode (it splits and sends just the first line to an interpreter)

How can I change baselines code output/replay (PPO) on github?

How do I perform a "yum update" using the Yumbase Python module?

Why does windows give an sqlite3.OperationalError and linux does not?

Categories

Resources