How to download single file from a git repository using python

How to download single file from a git repository using python - python

I want to download single file from my git repository using python.
Currently I am using gitpython lib. Git clone is working fine with below code but I don't want to download entire directory.
import os
from git import Repo
git_url = 'stack#127.0.1.7:/home2/git/stack.git'
repo_dir = '/root/gitrepo/'
if __name__ == "__main__":
Repo.clone_from(git_url, repo_dir, branch='master', bare=True)
print("OK")

Don't think of a Git repo as a collection of files, but a collection of snapshots. Git doesn't allow you to select what files you download, but allows you to select how many snapshots you download:
git clone stack#127.0.1.7:/home2/git/stack.git
will download all snapshots for all files, while
git clone --depth 1 stack#127.0.1.7:/home2/git/stack.git
will only download the latest snapshot of all files. You will still download all files, but at least leave out all of their history.
Of these files you can simply select the one you want, and delete the rest:
import os
import git
import shutil
import tempfile
# Create temporary dir
t = tempfile.mkdtemp()
# Clone into temporary dir
git.Repo.clone_from('stack#127.0.1.7:/home2/git/stack.git', t, branch='master', depth=1)
# Copy desired file from temporary dir
shutil.move(os.path.join(t, 'setup.py'), '.')
# Remove temporary dir
shutil.rmtree(t)

You can also use subprocess in python:
import subprocess
args = ['git', 'clone', '--depth=1', 'stack#127.0.1.7:/home2/git/stack.git']
res = subprocess.Popen(args, stdout=subprocess.PIPE)
output, _error = res.communicate()
if not _error:
print(output)
else:
print(_error)
However, your main problem remains.
Git does not support downloading parts of the repository. You have to download all of it. But you should be able to do this with GitHub. Reference

You need to request the raw version of the file! You can get it from raw.github.com

I don't want to flag this as a direct duplicate, since it does not fully reflect the scope of this question, but part of what Lucifer said in his answer seems the way to go, according to this SO post. In short, git does not allow for a partial download, but certain providers (like GitHub) do, via raw content.
That being said, Python does provide quite a number of different libraries to download, with the best-known being urllib.request.

Related

How to install utils.lib in anaconda to run a jupyter notebook for python?

In a jupyter notebook running on anaconda there is a line "import utils.lib as lib". When I run it, I get the error message "ModuleNotFoundError: No module named 'utils.lib'".
I tried to search for the utils.lib on Internet so that I can install it. But I could not find it. Please let me know how to install it. Thank you -- Manoranjan Dash

You don't provide the code context for the import line. You don't have to always provide the complete context; however, you didn't provide anything and that is why the Bot tried to encourage you to improve things. 'Additional context' would also be important to supply in your post such as the source of the notebook or any related blog post, etc..
The only place I see that import come up on the internet besides your post is here. The code there shows how to get that and install it under 'Downloading the utils and installing':
LIB_DIRECTORY_PATH = DIR+'/utils'
# Check if utils directory already exist, otherwise download, and install
import os
import shutil
if not os.path.isdir(LIB_DIRECTORY_PATH):
if not os.path.isdir(DIR+'/utils'):
os.mkdir(DIR+'/utils')
print('Downloading utils')
user = "ruslanmv"
repo = "Speech-Recognition-with-RNN-Neural-Networks"
src_dir = "utils"
pyfile = "lib.py"
url = f"https://raw.githubusercontent.com/{user}/{repo}/master/{src_dir}/{pyfile}"
!wget --no-cache --backups=1 {url}
print("Installing library...")
shutil.move(DIR+'/lib.py', DIR +'/utils/lib.py')
print("Done.")
Source of that above code: https://ruslanmv.com/blog/Speech-Recognition-with-RNN-Neural-Networks
Code that the code block retrieves and places in the correct location is found at this repo. It isn't something you install. You need to place it alongside the notebok. (Ideally it is set up so you just download or clone the repository, it looks like to me.) The utils directory and it's content is what you need to get or make/copy and place along with your notebook.
Direct link to raw code it gets:
https://raw.githubusercontent.com/ruslanmv/Speech-Recognition-with-RNN-Neural-Networks/master/utils/lib.py

Create/Update .json file to GitHub Private Repository without making local working directory or cloning repo using Python

I am having a GitHub Private Repository which has 3 .json files on its parent directory.
Suppose the three json files are:
1.json
2.json
3.json
I am trying to write a function through which I can just push any one of the .json file through the python function with contents and it makes a commit and push the changes.
I tried using solution from this but it seems outdated or unsupported: Python update files on Github remote repo without local working directory
Function should be liked this:
def update_file_to_repo(file_name,file_content):
# Do the push..
file_name has the 1.json or any other file name as string and file_content has contents as string imported through json.dumps() by me in main function..

Though I didn't found any ways to do it without cloning the repo in local directory
However, here's the way if you wanted to do while using a local directory:
Function for cloning the private repo to local directory. You will also need to create a Personal Access Token(you can create here) and make sure to give repo permissions for cloning and making changes to the private repo.
def initialize_repo():
os.system("git config --global user.name \"your.username\"")
os.system("git config --global user.email \"your_github_email_here\"")
os.system(r"git clone https://username:token#github.com/username/repo.git folder-name")
Function for pulling the repo:
def pull_repo():
os.system(r"cd /app/folder-name/ && git pull origin master")
return
Function for pushing the repo:
def push(pull="no"):
PATH_OF_GIT_REPO = r'/app/folder-name/.git' # make sure .git folder is properly configured
COMMIT_MESSAGE = 'commit done by python script'
# try:
repo = Repo(PATH_OF_GIT_REPO)
if pull=="yes":
pull_repo()
repo.git.add(update=True)
repo.index.commit(COMMIT_MESSAGE)
origin = repo.remote(name='origin')
origin.push()
# except:
# print('Some error occured while pushing the code')
return
Now using these functions, just make a pull, then make the changes to the json or any other file you want and then do the push :)

Downloading all jupyter notebooks from Coursera [tar size exeeding 100MB]

As mentioned in the coursera help articles in order to download notebooks from the class we need to zip all the content of root folder into single file and download the final workspace.tar.gz using these steps: but it is not working all courses.
Anyone knows proper way to do this !!

Open the home folder of your coursera jupyter notebook:
you can do this by opening any of the course notebooks and thanm selecting file> open or by clicking on Jupyter icon at the top left corner of notebook.
Open terminal inside the notebook:
On the home page of your notebooks, at the top left corner select new> terminal
Check in which dir you are:
this is important as different courses have their materials in different dir!
Some courses have a dir name jovyan and inside that you have two folders generally work and work-ro.
in work you have your actual content that you can see on your notebook home page.
in work-ro you have only read_only folder. This same folder you have it in your work dir but you cant open the content of that folder after downloading! (I dont know why I cant open it)
I turns out that this folder contains images which are in your notebooks. that is the reason you will have to zip both these folders.
Its not necessary that all the course have this folder named work!
In some courses materials are directly inside root dir. In such cases you can find the directory with your material by finding folder name ending with -ro
Ex in one of my course I located a folder named TF-ro and there was another folder named TF containing all course material! As per above pattern TF-ro contained read_only folder.
Just in case you are wondering how to navigate inside terminal: [Use these commands]
ls: list everything inside the folder
cd: to change the folder you are currently in
Ex: cd .. #go to previous folder cd <dirname> #go to that specified folder
compress both the folders using tar:
Navigate to the folder which contains both of these folders i.e work and work-ro or if you read my second case than Tf and TF-ro or folders in your case.
Use this to make tar file:
Use this when your folder contains only two dirs that you want
tar -czvf <choose a name>.tar.gz <address of dir to compress>
Ex: tar -czvf data.tar.gz ./
use this when you are in root folder and you have multiple dir along with the folders you want
tar -czvf <choose a name>.tar.gz <dir1 addres> <dir2 addres>
Ex: tar -czvf data.tar.gz ./work ./work-ro
Just in case you are wondering!
./ means current folder.
Check the size of your tar file:
This is also important!!
If your process of making tar file is taking too long or your terminal appears to be frozen ! than there are some big files in your home folder.
You can check the size of your tar file using: ls -lh data.tar.gz.
Normally the size should not be more than 10 - 15 Mbs.
If your size is in GBs than you are mostly downloading large amount of datasets and csv files!
you cannot download big files like this!
[Workaround for this problem are mentioned below]
run this command: du
This will list all the dir's and the size of dir's in current folder.
Figure out which folder has more size.
Note: size shown in this commands are in Number of sections occupied 1 section = 1024 bytes
Exclude these folder wile making tar...
In order to remove previous tar file run rm data.tar.gz
make the tar like this:
tar -czvf <yourName>.tar.gz --exclude=<address to exclude> <dir/dirs to zip>
Ex: tar -czvf data.tar.gz --exclude=./work/data --exclude=./work/- ./work ./work-ro
Move the file :
You can only see the content in the work folder (or any other folder your content is in) on your class's notebook home folder.
This is why we will move over tar file to that folder.
move using this command mv <file name> <location> Ex :mv data.tar.gz ./work
Download your file:
Now you can see your file in your home folder in your browser. simply select the file you will see download option available at the top !!
Sometimes you dont see the download button on the top, in such cases...
right click your file> save link As> then save it with .tar.gz extension
Just to confirm check the size of file you have downloaded and one in your classroom!!
Work Around for downloading big data sets:
Your course generally does not use all the csv's or data sets that it has stored in the data folder. When you do the assignments see which files are / data sets are used and download only those manually. i.e opening that file on your classroom and downloading it using using file> download
if you still want the entire thing than make separate tar file of that folder only. Than split the tar file (you will find it online easily) and than download as I have mentioned earlier!
After the download it is necessary to concatenate the files:
cat allfiles.tar.gz.part.* > allfiles.tar.gz
I would suggest not to waste time in doing this!! Just download what is required and that's it!!
I hope this was helpful !! cauz I spent 5 hr figuring out how to do it !! ENJOY !!

Alternatively, you could initialize a git repo and push it to your GitHub account.
Open terminal (Jupiter home > new > terminal)
Run the following code: (I'm assuming you've already created a GitHub repo, if not create one and then do the following; you'll need the link to your repo)
git init
git config --global user.name "test"
git config --global user.email "test"
git add -A; git commit -m "commit"
git remote add origin <_your-github-repo-url_>
git push origin master -u --verbose

You can just compress all the programming exercise (notebook + data) by placing this commands at the beginning of your notebook:
import os
!tar chvfz notebook.tar.gz *
print("File size: " + str(os.path.getsize("notebook.tar.gz")/1e6) + " MB")
if os.path.getsize("notebook.tar.gz")/1e6 >100 :
print("Splitting file")
!split -b 100M notebook.tar.gz "notebook.tar.gz."

How to execute git command in a identified path?

I want to execute git command in python program.
I have tried
os.system("git-command")
As we know, git command can be executed correctly only in the directories which contains repositories. I have tried to print current path and this path is not what I hope for, it does not contains repositories.
Now my question is how to execute git command in a identified path.

Use the subprocess module; pick one of the functions that suits your needs (based on what output you need). The functions all take a cwd argument that lets you specify the directory to operate in:
import subprocess
output = subprocess.check_output(['git', 'status'], cwd='/path/to/git/workingdir')

Using GitPython:
from git import *
repo = Repo("/path/to/repo")
git = repo.git
print git.status()

git folder download utility with python?

hello is there an good utility or package that handles git folder download ?
example
getsomething = {
'htmlpurifier' : 'http://repo.or.cz/w/htmlpurifier.git'
}
for key in vendors:
# someutility.get(http://repo.or.cz/w/htmlpurifier.git,htmlpurifier)
someutility.get(vendors[key],key)
# get http://repo.or.cz/w/htmlpurifier folder to /htmlpurifier on localstorage ?
if there is anything similar?

I prefer to use git commands directly and wrap it using subprocess module.
How ever, if you are looking for modules to interact with Git, I can think of
dulwich : http://www.samba.org/~jelmer/dulwich/docs/index.html
git-python: http://gitorious.org/projects/git-python/
For git-python, particularly, please look at class : Repo. It has a function:
fork_bare(path, **kwargs)
Fork a bare git repository from this repo
path is the full path of the new repo (traditionally ends with name.git)
options is any additional options to the git clone command
Returns git.Repo (the newly forked repo)
Also you can checkout: http://packages.python.org/GitPython/0.3.2/tutorial.html#using-git-directly
git = repo.git
git.checkout('head', b="my_new_branch")

GitPython is a python library used to interact with git repositories
-- GitPython docs
If by "git folder download" you mean clone the Git repository this should do it:
from git import Repo
repo_url = "http://repo.or.cz/w/htmlpurifier.git"
local_dir = "/Users/user1/gitprojects/"
Repo.clone_from(repo_url, local_dir)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.