Downloading all jupyter notebooks from Coursera [tar size exeeding 100MB] - python

As mentioned in the coursera help articles in order to download notebooks from the class we need to zip all the content of root folder into single file and download the final workspace.tar.gz using these steps: but it is not working all courses.
Anyone knows proper way to do this !!

Open the home folder of your coursera jupyter notebook:
you can do this by opening any of the course notebooks and thanm selecting file> open or by clicking on Jupyter icon at the top left corner of notebook.
Open terminal inside the notebook:
On the home page of your notebooks, at the top left corner select new> terminal
Check in which dir you are:
this is important as different courses have their materials in different dir!
Some courses have a dir name jovyan and inside that you have two folders generally work and work-ro.
in work you have your actual content that you can see on your notebook home page.
in work-ro you have only read_only folder. This same folder you have it in your work dir but you cant open the content of that folder after downloading! (I dont know why I cant open it)
I turns out that this folder contains images which are in your notebooks. that is the reason you will have to zip both these folders.
Its not necessary that all the course have this folder named work!
In some courses materials are directly inside root dir. In such cases you can find the directory with your material by finding folder name ending with -ro
Ex in one of my course I located a folder named TF-ro and there was another folder named TF containing all course material! As per above pattern TF-ro contained read_only folder.
Just in case you are wondering how to navigate inside terminal: [Use these commands]
ls: list everything inside the folder
cd: to change the folder you are currently in
Ex: cd .. #go to previous folder cd <dirname> #go to that specified folder
compress both the folders using tar:
Navigate to the folder which contains both of these folders i.e work and work-ro or if you read my second case than Tf and TF-ro or folders in your case.
Use this to make tar file:
Use this when your folder contains only two dirs that you want
tar -czvf <choose a name>.tar.gz <address of dir to compress>
Ex: tar -czvf data.tar.gz ./
use this when you are in root folder and you have multiple dir along with the folders you want
tar -czvf <choose a name>.tar.gz <dir1 addres> <dir2 addres>
Ex: tar -czvf data.tar.gz ./work ./work-ro
Just in case you are wondering!
./ means current folder.
Check the size of your tar file:
This is also important!!
If your process of making tar file is taking too long or your terminal appears to be frozen ! than there are some big files in your home folder.
You can check the size of your tar file using: ls -lh data.tar.gz.
Normally the size should not be more than 10 - 15 Mbs.
If your size is in GBs than you are mostly downloading large amount of datasets and csv files!
you cannot download big files like this!
[Workaround for this problem are mentioned below]
run this command: du
This will list all the dir's and the size of dir's in current folder.
Figure out which folder has more size.
Note: size shown in this commands are in Number of sections occupied 1 section = 1024 bytes
Exclude these folder wile making tar...
In order to remove previous tar file run rm data.tar.gz
make the tar like this:
tar -czvf <yourName>.tar.gz --exclude=<address to exclude> <dir/dirs to zip>
Ex: tar -czvf data.tar.gz --exclude=./work/data --exclude=./work/- ./work ./work-ro
Move the file :
You can only see the content in the work folder (or any other folder your content is in) on your class's notebook home folder.
This is why we will move over tar file to that folder.
move using this command mv <file name> <location> Ex :mv data.tar.gz ./work
Download your file:
Now you can see your file in your home folder in your browser. simply select the file you will see download option available at the top !!
Sometimes you dont see the download button on the top, in such cases...
right click your file> save link As> then save it with .tar.gz extension
Just to confirm check the size of file you have downloaded and one in your classroom!!
Work Around for downloading big data sets:
Your course generally does not use all the csv's or data sets that it has stored in the data folder. When you do the assignments see which files are / data sets are used and download only those manually. i.e opening that file on your classroom and downloading it using using file> download
if you still want the entire thing than make separate tar file of that folder only. Than split the tar file (you will find it online easily) and than download as I have mentioned earlier!
After the download it is necessary to concatenate the files:
cat allfiles.tar.gz.part.* > allfiles.tar.gz
I would suggest not to waste time in doing this!! Just download what is required and that's it!!
I hope this was helpful !! cauz I spent 5 hr figuring out how to do it !! ENJOY !!

Alternatively, you could initialize a git repo and push it to your GitHub account.
Open terminal (Jupiter home > new > terminal)
Run the following code: (I'm assuming you've already created a GitHub repo, if not create one and then do the following; you'll need the link to your repo)
git init
git config --global user.name "test"
git config --global user.email "test"
git add -A; git commit -m "commit"
git remote add origin <_your-github-repo-url_>
git push origin master -u --verbose

You can just compress all the programming exercise (notebook + data) by placing this commands at the beginning of your notebook:
import os
!tar chvfz notebook.tar.gz *
print("File size: " + str(os.path.getsize("notebook.tar.gz")/1e6) + " MB")
if os.path.getsize("notebook.tar.gz")/1e6 >100 :
print("Splitting file")
!split -b 100M notebook.tar.gz "notebook.tar.gz."

Related

Get all files out of a lot of folders?

I want to get all files from Folders.
I restored my hard disk and got a 1000 Folders each with 500 Files, i want to get them out of the foldes into a singel folder so i can run python to sort the files.
Sadly i didn't find anything what works so i hope someone can help me.
So i tryed one thing:
Runing this code in the Windows Console:
pushd C:\Users\KroherL\Downloads
for /r %%a in (*.?*) do (
MOVE "%%a" "C:\Users\KroherL\Music\new%%~nxa"
)
popd
Thank you all in advance.
This works on Windows:
make text file with the following single line
for /r %%i in (*.*) do xcopy /Y "%%i" c:\cumulFolder
(where c:\cumulFolder is the destination folder where you'll find your files all together)
save the text file as .bat (e.g RecurseCopy.bat)
copy this file in the main folder where you want to search and copy files into subfolders too.
doubleclick on your .bat file

How to unzip a file in a specific folder in colaboratory environment after download it?

I've looking for a solution to solve the slow upload speed of images dataset on google colab when i use a connection from GoogleDrive. Using the follow code:
from google.colab import drive
drive.mount('/content/gdrive')
Using this procedure i can upload images and create labels using a my def load_dataset:
'train_path=content/gdrive/MyDrive/Capstone/Enviroment/cell_images/train'
train_files, train_targets = load_dataset(train_path)
But, as i said, it's very slow, especially because my full dataset is composed by 27560 images.
To solve my problem, i've tried to use this solution.
But now, in order to still use my deffunction, after download the .tar file i wanna extract in a specific folder in the colab enviroment. I found this answer but not solve my problem.
Example:
This is the environment with the test.tar already downloaded.
But i wanna extract the files in the tar file, which structure is train/Uninfected ; train/Parasitized, to get this:
content
cell_images
test
Parasitized
Uninfected
train
Parasitized
Uninfected
valid
Parasitized
Uninfected
To use the path in def function:
train_path = train_path=content/cell_images/train/'
train_files, train_targets = load_dataset(train_path)
test_path = train_path=content/cell_images/test/'
test_files, test_targets = load_dataset(test_path)
valid_path = train_path=content/cell_images/valid/'
valid_files, valid_targets = load_dataset(valid_path)
I tried to use:
! mkdir -p content/cell_images
and
!tar -xvf 'test.tar' content/cell_images
But it doesn't work.
Does anyone know how to proceed?
Thanks!
To extract the files from the tar archiver to the folder content/cell_images use the command-line option -C:
!tar -xvf 'test.tar' -C 'content/cell_images'
Hope this helps!
Although late answer, but might help others:
shutil.unpack_archive works with almost all archive formats (e.g., “zip”, “tar”, “gztar”, “bztar”, “xztar”) and it's simple:
import shutil
shutil.unpack_archive("filename", "path_to_extract")
Connect to drive,
from google.colab import drive
drive.mount('/content/drive')
Check for directory
!ls and !pwd
unzip
!unzip drive/"My Drive"/images.zip -d destination
!tar -xvf "cord-19_2021-12-20.tar.gz"
as given here also
https://colab.research.google.com/github/sudo-ken/compress-decompress-in-Google-Drive/blob/master/Unrar_Unzip_Rar_Zip_in_GDrive.ipynb
If your current directory is the default directory, /content, you can unzip your folder project like this:
%%bash
mkdir foldername
tar -xvf '/content/foldername.tar' -C '/content/'
%%bash lets you script without using ! at the beginning of each line.

Python package, "Updating the INI File"

I am working with a python package that I installed called bacpypes for communicating with building automation equipment, right in the very beginning going thru the pip install & git clone of the repository; the readthedocs calls out to:
Updating the INI File
Now that you know what these values are going to be, you can configure the BACnet portion of your workstation. Change into the samples directory that you checked out earlier, make a copy of the sample configuration file, and edit it for your site:
$ cd bacpypes/samples
$ cp BACpypes~.ini BACpypes.ini
The problem that I have (is not enough knowledge) is there isn't a sample configuration file that I can see in bacpypes/samples directory. Its only a .py files nothing with an .ini extension or name of BACpypes.ini
If I open up the samples directory in terminal and run cp BACpypes~.ini BACpypes.ini I get an error cp: cannot stat 'BACpypes~.ini': No such file or directory
Any tips help thank you...
There's a sample .ini in the documentation, a couple of paragraphs after the commands you copied. It looks like this
[BACpypes]
objectName: Betelgeuse
address: 192.168.1.2/24
objectIdentifier: 599
maxApduLengthAccepted: 1024
segmentationSupported: segmentedBoth
maxSegmentsAccepted: 1024
vendorIdentifier: 15
foreignPort: 0
foreignBBMD: 128.253.109.254
foreignTTL: 30
I'm not sure why you couldn't copy BACpypes~.ini. I know tilda could be expanded by your shell so you could try to escape it with
cp BACpypes\~.ini BACpypes.ini
Though I assume it isn't needed now that you have a default configuration file.

How to achieve the tar file with a specific directory structure inside it using Python

I have a directory structure with many log files.
Example:
root/feature1/pre/a.log<br>
root/feature1/pre/b.log<br>
root/feature1/post/c.log<br>
root/feature1/post/d.og<br>
root/feature2/pre/e.log<br>
root/feature2/pre/f.log<br>
root/feature2/post/g.log<br>
root/feature2/post/h.log
I want to archive few of the log files with the condition that log files are older than 2 months. I could archive the log files with the given condition but couldn't maintain directory inside tar file.
I need :
root/archive.tar.gz
Where archive.tar.gz contains
feature1/pre/a.log
feature1/post/d.log
feature2/pre/e.log
feature2/post/g.log
feature2/post/h.log
Here, a.log, d.log, e.log, g.log, h.log files are the ones older than 2 months.
This is more of a Linux problem than Python. I suggest using the find command with file name and file time options. Pass those names into the tar command you're using, something like
tar -czf archive `find /root -name feature*.log -ctime +60`
Test this on your command line; when it works as you want, then use Python's os package to execute it from your Python script.
Does that get you moving?

Bash can't find right directory in script

I'm making a compress script for my text editor, and it's all working up to the part where it needs to make the file Run. Inside of Run is just this code: python ./App.pyc. When I run the program by double-clicking on it in Finder, it says that it can't open file './App.pyc' [Errno 2] No such file or directory within Terminal.
And if I run it through Terminal after I've cd'd to the directory Run and App.pyc are in, it works. I'm assuming this is because we aren't in the right directory.
My question is, how can I make sure Run is being ran in the right directory? If I put cd in it, it'll work, but then if somebody moves the folder elsewhere it won't work anymore.
#!/usr/bin/python
### Compresser script.
# Compress files.
import App
import Colors
# Import modules
import os
# Clear the folder to put the compressed
# files in (if it exists).
try:
os.system('rm -rf BasicEdit\ Compressed')
except:
pass
# Remake the folder to put compressed files in.
os.system('mkdir BasicEdit\ Compressed')
# Move the compiled files into the BasicEdit
# Compressed folder.
os.system('mv App.pyc BasicEdit\ Compressed/')
os.system('mv Colors.pyc BasicEdit\ Compressed/')
# Create contents of run file.
run_file_contents = "python ./App.pyc\n"
# Write run file.
run_file = open("./BasicEdit Compressed/Run", 'w')
run_file.write(run_file_contents)
# Give permissions of run file to anybody.
os.system('chmod a+x ./BasicEdit\ Compressed/Run')
# Finally compress BasicEdit, and remove the old
# folder for BasicEdit Compressed.
os.system('zip -9r BasicEdit.zip BasicEdit\ Compressed')
os.system('rm -rf BasicEdit\ Compressed')
(PS, what's [Errno 1]? I've never seen it before.)
The Python script's current working directory can be modified with the os.chdir() call, after which references to . will be correct.
If you want to find the location of the source file currently being run rather than hardcoding a directory, you can use:
os.chdir(os.path.dirname(__file__))
The bash equivalent to this logic is:
cd "${BASH_SOURCE%/*}" || {
echo "Unable to change directory to ${BASH_SOURCE%/*}" >&2
exit 1
}
See BashFAQ #28 for more details and caveats.
As developed above together with #William Purcell, you have to retrieve the absolute path by os.pwd() and then use the absolute path for the python call.
I withdraw my proposal and go with #Charles Duffy's answer. However, I don't delete my attempt as the comments seem to be useful to others!

Categories