Using black with git clean filter - python

I'm trying to set up black to run on any files checked into git.
I've set it up as follows:
git config filter.black.clean 'black -'
echo '*.py filter=black' >> .git/info/attributes
As far as I understand, this should work ok as black with - as the source path will read from STDIN and output to STDOUT, which is what I think the git filter needs it to do.
However this does not work. When I add an un-black file with git add I see the following output:
reformatted -
All done! ✨ 🍰 ✨
1 file reformatted.
And the file is not changed on disk. What am I doing wrong?

The Black documentation recommends using a pre-commit hook, rather than smudge and clean filters. Note that filter.black.clean defines a clean filter and you have not set up any smudge filters.
The reason you see no change to the work-tree version of the file is that a clean filter is used when turning the work-tree version of a file into the index (to-be-committed) version of the file. This has no effect on the work-tree version of the file!
A smudge filter is used in the opposite direction: Git has a file that is in the index—for whatever reason, such as because it was just copied into the index as part of the git checkout operation to switch to a specific commit—and desires to convert that in-the-index, compressed, Git-ized file to one that you can actually see and edit in your editor, or run with python. Git will, at this time, run the (de-compressed) file contents through your smudge filter.
Note that if you convert some file contents, pre-compression, in a clean filter, and then later extract that file from the repository, into the index, and on into your work-tree, you will at that time be able to see what happened in the clean filter (assuming you do not have a countervailing smudge filter that undoes the effect).
In a code-reformatting world, one could conceivably use a clean filter to turn all source files into some sort of canonical (perhaps four-space-indentation) form, and a smudge filter to turn all source files into one's preferred format (two-space or eight-space indentation). If these transformations are all fully reversible, what you would see, in your work-tree, would be in your preferred format, and what others would see in their work-trees would be their preferred format; but what the version control system itself would see would be the canonical, standardized format.
That's not how Black is actually intended to be used, though it probably can be used that way.

It's not obvious how to set this up manually, but using the pre-commit framework seems to work well.
This approach does two things:
checks prior to committing that the files pass the black test
runs black to fix the files if they don't
So if files don't pass, the commit fails, and black fixes the files which you then need to git -add before trying to commit again.
In a separate test, I managed to set up the pre-commit hook manually using black . --check in .git/hooks/pre-commit (this just does the check - doesn't fix anything if it fails), but never figured out how to configure black to work with the clean and smudge filters.

Related

Get method names and lines from git commit - python

I am wondering if there is a way to get the list of method names that are modified along with modified lines and file path from git commit or if there are any better solution for python files
I have been going through few answers in stackoverflow where the suggestion was to edit the gitconfig file and .gitattributes but I couldn't get the desired output
Git has no notion of the semantics of the files it tracks. It can just give you the lines added or removed from a file or the bytes changed in binary files.
How these maps to your methods is beyond Git's understanding. However there might be tools and/or scripts that will try and heuristically determine which methods are affected by a (range of) commits.

How to make Snakemake ignore an updated file when determining which input files have changed? [duplicate]

Even if the output files of a Snakemake build already exist, Snakemake wants to rerun my entire pipeline only because I have modified one of the first input or intermediary output files.
I figured this out by doing a Snakemake dry run with -n which gave the following report for updated input file:
Reason: Updated input files: input-data.csv
and this message for update intermediary files
reason: Input files updated by another job: intermediary-output.csv
How can I force Snakemake to ignore the file update?
You can use the option --touch to mark them up to date:
--touch, -t
Touch output files (mark them up to date without
really changing them) instead of running their
commands. This is used to pretend that the rules were
executed, in order to fool future invocations of
snakemake. Fails if a file does not yet exist.
Beware that this will touch all your files and thus modify the timestamps to put them back in order.
In addition to Eric's answer, see also the ancient flag to ignore timestamps on input files.
Also note that the Unix command touch can be used to modify the timestamp of an existing file and make it appear older than it actually is:
touch --date='2004-12-31 12:00:00' foo.txt
ls -l foo.txt
-rw-rw-r-- 1 db291g db291g 0 Dec 31 2004 foo.txt
In case --touch (with --force, --forceall or --forcerun as the official documentation says that needs to be used in order to force the "touch" if doesn't work by itself) didn't work out as expected, ancient is not an option or it would need to modify too much from the workflow file, or you faced https://github.com/snakemake/snakemake/issues/823 (that's what happened to me when I tried --force and --force*), here is what I did to solve this solution:
I noticed that there were jobs that shouldn't be running since I put files in the expected paths.
I identified the input and output files of the rules that I didn't want to run.
In the order of the rules that were being executed and I didn't want to, I executed touch on the input files and, after, on the output files (taking into account the order of the rules!).
That's it. Since now the timestamp is updated according the rules order and according the input and output files, snakemake will not detect any "updated" files.
This is the manual method, and I think is the last option if the methods mentioned by the rest of people don't work or they are not an option somehow.

git reset --hard HEAD vs git checkout <file>

I have a file foo.py. I have made some changes to the working directory, but not staged or commited any changes yet. I know i can use git checkout foo.py to get rid of these changes. I also read about using git reset --hard HEADwhich essentially resets your working directory, staging area and commit history to match the latest commit.
Is there any reason to prefer using one over the other in my case, where my changes are still in working directory?
Is there any reason to prefer using one over the other in my case, where my changes are still in working directory?
No, since they will accomplish the same thing.
Is there any reason to prefer using [git checkout -- path/to/file] over [git reset --hard] in [general but not in my specific case]?
Yes: this will affect only the one file. If your habit is git reset --hard to undo changes to one file, and you have work-tree and/or staged changes to other files too, and you git reset --hard, you may be out of luck getting those changes back without reconstructing them by hand.
Note that there is a third construct: git checkout HEAD path/to/file. The difference between this and the one without HEAD (with -- instead1) is that the one with HEAD means copy the version of the file that is in a permanent, unchangeable commit into the index / staging-area first, and then into the work-tree. The one with -- means copy the version of the file that is in the index / staging-area into the work-tree.
1The reason to use -- is to make sure Git never confuses a file name with anything else, like a branch name. For instance, suppose you name a file master, just to be obstinate. What, then, does git checkout master mean? Is it supposed to check out branch master, or extract file master? The -- in git checkout -- master makes it clear to Git—and to humans—that this means "extract file master".
Summary, or, things to keep in mind
There are, at all times, three active copies of each file:
one in HEAD;
one in the index / staging-area;
one in the work-tree.
The git status command looks at all three, comparing HEAD-vs-index first—this gives Git the list of "changes to be committed"—and then index-vs-work-tree second. The second one gives Git the list of "changes not staged for commit".
The git add command copies from the work-tree, into the index.
The git checkout command copies either from HEAD to the index and then the work-tree, or just from the index to the work-tree. So it's a bit complicated, with multiple modes of operation:
git checkout -- path/to/file: copies from index to work-tree. What's in HEAD does not matter here. The -- is usually optional, unless the file name looks like a branch name (e.g., a file named master) or an option (e.g., a file named -f).
git checkout HEAD -- path/to/file: copies from HEAD commit, to index, then to work-tree. What's in HEAD overwrites what's in the index, which then overwrites what's in the work-tree. The -- is usually optional, unless the file name looks like an option (e.g., -f).
It's wise to use the -- always, just as a good habit.
The git reset command is complicated (it has many modes of operation).
(This is not to say that git checkout is simple: it, too, has many modes of operation, probably too many. But I think git reset is at least a little bit worse.)
Maybe this can help :
Source :
https://www.patrickzahnd.ch
Under your constraints, there's no difference. If there ARE staged changes, there is, though:
reset reverts staged changes.
checkout does not ...

Python watchdog event not returning entire src_path

I'm using python watchdog to keep track of what files have been changed locally. Because I'm not keeping track of an entire directory but specific files, I'm using watchdog's event.src_path to check if the changed file is the one I'm looking for.
I'm using the FileSystemEventHandler and on_modified, printing the src_path. However, when I edit a file that should have the path /home/user/project/test in gedit, I get two paths, one that looks like /home/user/project/.goutputstream-XXXXXX and one that looks something like this: home/user/project/. I never get the path I'm expecting. I thought there may have been something wrong with watchdog or my own code, but I tested the exact same process in vi, nano, my IDE (PyCharm), Sublime Text, Atom...and they all gave me the src_path I'm expecting.
I'm wondering if there is a workaround for gedit, since gedit is the default text editor for many Linux distributions...Thanks in advance.
From the Watchdog GitHub readme:
Vim does not modify files unless directed to do so. It creates backup files
and then swaps them in to replace the files you are editing on the
disk. This means that if you use Vim to edit your files, the
on-modified events for those files will not be triggered by watchdog.
You may need to configure Vim to appropriately to disable this
feature.
As the quote says your issue is due to how these text editors modify files. Basically rather than directly modifying the file, then create "buffer" files that store the edited data. In your case this file is probably .goutputstream-XXXXXX. When you hit save your original file is deleted and a the buffer file is renamed into its place. So your second path is probably the result of the original file being deleted. Sometimes these files serve as backups instead, but still cause similar issues..
By far the easiest method to solve this issue is to disable the weird way of saving in your chosen text editor. In gedit this is done by unchecking the "Create a backup copy of file before saving" option within preferences. This will stop those backup files from being created and simplify life for watchdog.
Image and preference info shamelessly stolen from this AskUbuntu question
For more information (and specific information for solving vim/vi) see this issue on the watchdog GitHub.
Basically for Vim you need to run these commands to disable the backup/swapping in feature:
:set nobackup
:set nowritebackup
You can add them to your .vimrc to automate the task

Vim plugins don't always load?

I was trying to install autoclose.vim to Vim. I noticed I didn't have a ~/.vim/plugin folder, so I accidentally made a ~/.vim/plugins folder (notice the extra 's' in plugins). I then added au FileType python set rtp += ~/.vim/plugins to my .vimrc, because from what I've read, that will allow me to automatically source the scripts in that folder.
The plugin didn't load for me until I realized my mistake and took out the extra 's' from 'plugins'. I'm confused because this new path isn't even defined in my runtime path. I'm basically wondering why the plugin loaded when I had it in ~/.vim/plugin but not in ~/.vim/plugins?
:help load-plugins outlines how plugins are loaded.
Adding a folder to your rtp alone does not suffice; it must have a plugin subdirectory. For example, given :set rtp+=/tmp/foo, a file /tmp/foo/plugin/bar.vim would be detected and loaded, but neither /tmp/foo/plugins/bar.vim nor /tmp/foo/bar.vim would be.
You are on the right track with set rtp+=... but there's a bit more to it (rtp is non-recursive, help indexing, many corner cases) than what meets the eye so it is not a very good idea to do it by yourself. Unless you are ready for a months-long drop in productivity.
If you want to store all your plugins in a special directory you should use a proper runtimepath/plugin-management solution. I suggest Pathogen (rtp-manager) or Vundle (plugin-manager) but there are many others.
In addition to #Nikita Kouevda answer: modifying rtp on FileType event may be too late for vim to load any plugins from the modified runtimepath: if this event was launched after vimrc was sourced it is not guaranteed plugins from new addition will be loaded; if this event was launched after VimEnter event it is guaranteed plugins from new addition will not be sourced automatically.
If you want to source autoclose only when you edit python files you should use :au FileType python :source ~/.vim/macros/autoclose.vim (note: macros or any other subdirectory except plugin and directories found in $VIMRUNTIME or even any directory not found in runtimepath at all).
If you want to use autoclose only when you edit python files you should check out plugin source and documentation, there must be support on the plugin side for it to work.
// Or, if autoclose does not support this, use :au FileType command from above paragraph, but prepend source with something that records vim state (commands, mappings and autocommands), append same after source, find out differences in the state and delete the differences on each :au BufEnter if filetype is not python and restore them otherwise: hacky and may introduce strange bugs. The example of state-recording and diff-determining code may be found here.
All folders in the rtp (runtimepath) option need to have the same folder structure as your $VIMRUNTIME ($VIMRUNTIME is usually /usr/share/vim/vim{version}). So it should have the same subdirectory names e.g. autoload, doc, plugin (whichever you need, but having the same names is key). The plugins should be in their corresponding subdirectory.
Let's say you have /path/to/dir (in your case it's ~/.vim) is in your rtp, vim will
look for global plugins in /path/to/dir/plugin
look for file-type plugins in /path/to/dir/ftplugin
look for syntax files in /path/to/dir/syntax
look for help files in /path/to/dir/doc
and so on...
vim only looks for a couple of recognized subdirectories† in /path/to/dir. If you have some unrecognized subdirectory name in there (like /path/to/dir/plugins), vim won't see it.
† "recognized" here means that a subdirectory of the same name can be found in /usr/share/vim/vim{version} or wherever you have vim installed.

Categories