Fail to load a subpart of "open-images-v6" with Fiftyone

Fail to load a subpart of "open-images-v6" with Fiftyone - python

Context
I'm trying to retrieve a large amount of data to train a CNN.
More specifically, I'm looking for pictures of Swimming pools.
I have found a lot of them in the open-images-v6 database made by Google.
So now, I just want to download these particular images (I don't want 9 Millions images to end up in my download folder).
Problem
In order to do this, I followed carefully the instructions given on the Download page (see : https://storage.googleapis.com/openimages/web/download.html).
So, I installed "fiftyone", tried out the "testing" procedure (which would be loading the "quickstart" dataset and navigating through the data) and have not encountered any issues so far.
But when I tried to retrieve the Swimming pool images with the following code, I went through a lot of issues :
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"open-images-v6",
split="validation",
label_types="detections",
classes="Swimming pool"
)
session = fo.launch_app(dataset)
I will skip right to the problem I couldn't figure out :
when I run the code, it properly downloads a bunch of .csv files, but when it tries to download the data (the images) it shows a pretty bad looking error :
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
State of the art
After hours of searching the origin of the error, I eventually discovered that it was somehow linked with AWS, but I have absolutely no clue what I can do on this field.
I saw a random tutorial on internet that recommended to install "awscli" via PIP but nothing changed.
I tried to import other datasets with the same procedure (i.e foz.load_zoo_dataset("coco-2017")) and it seemed to work (at least the download started but I stopped it early).
Thank you for your time.

Thank you for the aws hint, that finally got me on the right trail.
Fiftyone uses the python os.path.join() functionality, which will create windows style paths when running windows. The s3 blob storage can't use those windows paths, therefore raising the 404 error.
Since this is a bug in fiftyone itself (I will create a pr to get that bug fixed), you will need to modify fiftyone yourself.
Go to your python site-packages dir, then open fityone/utils/openimages.py
In this file, add the following code to the import statements:
import re
Then search for the _download_images_if_necessary method and replace this line:
fp_download = os.path.join(split, image_id + ".jpg")
with this one:
fp_download = re.sub(r"\\", "/", os.path.join(split, image_id + ".jpg"))
This did fix the problem for me.

Related

Export a geoTIFF file from jupyter notebook to google drive but nothing appears in my google drive

After having searched for a long time for a solution to my problem without succeeding, I'd like to ask you my question here.
I have a python code that creates a geoTIFF file from google earth engine data. I'm running it on jupyter notebook and I want to export the geoTIFF to my google drive.
The code works without error and a shapefile (shp) is embedded as input.
The problem is that nothing appears on my drive, the folder "GEE" that it creates is well created, but it is empty.
Here is the export part of the code:
task = ee.batch.Export.image.toDrive(image=bare1.clip(aoi),
scale=10,
region=aoi.getInfo()['coordinates'],
fileFormat='GeoTIFF',
description='Active',
folder='GEE',
maxPixels=1e9)
task.start()
You should also know that I am a beginner in python :)
Do you have an idea for a solution? Do not hesitate to ask me for more details.
Thanks :)

First: Have you checked the code editor (https://code.earthengine.google.com/) to see if there has been an error message that accounts for the lack of export, or if the file is actually being deposited in a different place? One note about the 'folder' parameter, is that (in my understanding) it doesn't create a folder necessily, but instead is just telling GEE to deposit your image in the most recently created folder of the same name, which could be anywhere in your Drive.
Next, have you definitely mounted your Google Drive? I assume so, if the GEE folder is working, but just to be sure you can always run:
from google.colab import drive
drive.mount('/content/drive')
Next, I have found that when I am exporting an image, I need to convert it to double for correct export. So in your case, this would be changing the first line to (adding the .toDouble())
task = ee.batch.Export.image.toDrive(image=bare1.clip(aoi).toDouble())
If that doesn't work: Have you tried exporting other images with this same code? (Ie replacing bare1 with another image that you know works, like ee.Image(1) which makes a blank image where every pixel is the value 1?
Happy to take another look if none of this helps!

The task console in the GEE code editor should give a description of the export error. Exports are finicky with a number of causes for error. A good first place to check is that you didn't exceed the maximum pixels. You can deal with max pixel errors by reducing the number of bands in your image to only include those that you need, or increasing the maxpixel parameter in your export task. Sometimes the following dictionary style formatting works for me although it's not clear why:
task = ee.batch.Export.image.toDrive(**{
'image':bare1.clip(aoi),
'scale':10,
'region': aoi.getInfo()['coordinates'],
'fileFormat':'GeoTIFF',
'description':'Active',
'folder':'GEE',
'maxPixels':1e9
})
task.start()

Couldn't open file yolov3_custom_last.weights when trying to run darknet detection

I've been trying to use YOLO (v3) to implement and train an object detection of Tank with OpenImage dataset.
I have tried to get help from this tutorial and my code pretty much looks like it.
Also I'm using Google Colab and Google Drive services.
everything is going fine through my program. But I hit an error at the final step when I'm running the darknet to train detection.
!./darknet detector train "data/obj.data" cfg/yolov3_custom.cfg "darknet53.conv.74" -dont_show
after 100 iterations, when it's trying to save the progress in the backup folder I've addressed in obj.data file, I get the following error:
Saving weights to /content/drive/My\Drive/YOLOv3/backup/yolov3_custom_last.weights
Couldn't open file: /content/drive/My\Drive/YOLOv3/backup/yolov3_custom_last.weights
at first, I thought I made a mistake in using the address; So, I tried checking the address by using ls command
!ls /content/drive/My\Drive/YOLOv3/backup/
and the result was an empty folder (However, not an error meaning I've written the address correctly and that it is accessable in my google drive).
Here are contents of data.object file:
classes = 1
train = data/train.txt
valid = data/test.txt
names = data/obj.names
backup = /content/drive/My\ Drive/YOLOv3/backup
also I've made required changes in config file so I don't think that the problem is about that. But just to make sure here are the changes I've made in my yolov3.cfg file:
Fist of all we will comment lines 3 and 4 (batch, subdivisions) to unset testing mode
We will uncomment lines 6 and 7(batch, subdivisions) to set to training mode
We change our max_batches value to 2000 * number_of_classes (if there's one class like our case, set to 4000)
We change our step tuple-like values to 80%, 90% value of our max_baches value. In this case it will be 3200, 3600.
For all YOLO layers and convoloutional layers before them, changed the classes value to number of classes, In this case 1, and change the value of filters according to following formula(In this case, 18)
Formula for conv layers filters value: (number_of_classes + 5)*3
I searched the error and found this issue on Github.
However, I tried the following methods recommended there and the problem is still the same:
Removing and recreating the backup folder
Tried to adding the line backup = backup in my yolo.data file in folder .cfg but there was no such file in cfg folder.
Creating an empty yolov3_custom_last.weights in backup folder
The other solutions mentioned in this issue was about when you are running YOLO on your PC and not google Colab.
Also, here my tree structure of the folder YOLOv3 which is stored in my Google Drive My Drive(main folder).
YOLOv3
darknet53.conv.74
obj.data
obj.names
Tank.zip
yolov3.weights
yolov3_custom.cfg
yolov3_custom1.cfg.txt
So, I'm kinda stuck and I have no idea what could fix this. I'd appreciate some help.

I have solved the problem with changing the local drive address with ln command.
The problem wasn't from my code rather from the way way yolov3 developers were handling space in directory addresses! Apparently as much as I figured in their docs, they are not handling space in directory quite well.
So I created a virtual address which does not have space like My Drive has.
P.S: As you know My Drive folder is already there in your google drive so you can't actually rename it.
Here is the code you can use to achieve this:
!ln -s /content/drive/My\ Drive/ /mydrive

I was getting the same error on my Windows 10, and I think I managed to fix it. All I had to do was move my "weights" folder (with my yolov3.weights inside), closer to the .exe file, which was in the "x64" folder, in my case. After that, the error stopped appearing and the app was able to predict the test image normally.

"Desired structure doesn't exist" for PDB retrieve_pdb_file method

Trying to download some protein data from PDB using Biopython's Bio.PDB.PDBList
Here is a min. reproducible example:
from Bio.PDB import PDBList
pdbl=PDBList()
pdbl.retrieve_pdb_file('1GAV', file_format="pdb")
This returns:
Downloading PDB structure '1GAV'...
Desired structure doesn't exists
Desired behavior is download of the PDB file to the working directory.
Possibly useful info:
Using python 3
Do not want to download whole PDB, just pick and choose files
Using a proxy, but I don't think that's the problem because Biopython uses urllib to make requests and I tried using urllib with my proxy settings and it worked fine.
I've tried for a few different PDB code/IDs and for other file types ("mmCif", "bundle") and it returns the same thing
No error is being hit, it just can't find the file in PDB apparently?
The folder where the file should appear does get made in the working directory, but the folder is empty

We think the problem has to do with our corporate VPN because it works when the VPN is off (although proxy still on).
So as sammam said, no problems in the code.
Don't know the specifics of why this occurs with our VPN, will update if I find out.

I'm getting the same error message, which I looked at the source code and found out that the "Desired structure doesn't exists" message outputs whenever an IOError is hit.

Is the Paragram_300_SL999 Word Embeddings file corrupt?

I need to use the Paragram_SL999_300 embeddings for my project that uses the open source code from a published article (https://github.com/cecilialeiqi/adversarial_text). When I try to run Step 4 (generate adversarial examples) from https://github.com/cecilialeiqi/adversarial_text, I get a ValueError saying int() expected but got ','. I know from the readme.txt for Paragram-SL999 300 that is supposed to be one token per line followed by its embeddings. Upon trying to open the Paragram_SL999_300.txt file to see if it matches this criteria, it loads about half way and then closes the TextEditor, without letting me edit it. Furthermore, it crashes LibreOffice if I try and open it in there. This was in an Ubuntu 18.04 Virtual Machine. However, I wasn't sure if this was because the author's code is wrong (in discrete_attack.py at https://github.com/cecilialeiqi/adversarial_text/blob/master/src/discrete_attack.py) or because the file is corrupt so I tried downloading and extracting the Paragram-SL999 300 archive from Wieting's website (http://www.cs.cmu.edu/~jwieting/) on my Windows computer, I get a message saying that the archive is corrupted, which prevents me from extracting the Paragram_SL999_300.txt file and also using it. On another Windows computer, I get the Error Code 0x80004005: Unspecified error when trying to extract the archive.
Is there any way to get around this issue or anyone who can provide insight on it? Would it be recommended instead to produce the embeddings from Wieting's GitHub (https://github.com/jwieting/paragram-word)? I would very much appreciate any input as these embeddings are paramount to my project.

I managed to download it from the Google drive link at https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F0B9w48e1rj-MOck1fRGxaZW1LU2M%2Fview%3Fusp%3Dsharing&data=02%7C01%7C%7C36fd021bae0343bbe54408d7bdd28c81%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637186584305548961&sdata=PouX2kyBlnQHpzAaDKjqe7gFC3ctti6tjBcGWt8pg1s%3D&reserved=0. In the end it worked but I'm not sure why the other times I was unable to get it to work. Also, I didn't realise that for the code I had I also I needed to add the vocabulary size and the embedding size at the first line of the file (1703756 300).

Extremely new user to Python. "No module named request" error while trying code to detect image subdomains in a website to extract them to a folder

I may sound rather uninformed writing this, and unfortunately, my current issue may require a very articulate answer to fix. Therefore, I will try to be specific as possible as to ensure that my problem can be concisely understood.
My apologizes for that- as this Python code was merely obtained from a friend of mine who wrote it for me in order to complete a certain task. I myself had had extremely minimal programming knowledge.
Essentially, I am running Python 3.6 on a Mac. I am trying to work out a code that allows Python to scan through a bulk of a particular website's potentially existent subdomains in order to find possibly-existent JPG images files contained within said subdomains, and download any and all of the resulting found files to a distinct folder on my Desktop.
The Setup-
The code itself, named "download.py" on my computer, is written as follows:
import urllib.request
start = int(input("Start range:100000"))
stop = int(input("End range:199999"))
for i in range(start, stop + 1):
filename = str(i).rjust(6, '0') + ".jpg"
url = "http://website.com/Image_" + filename
urllib.request.urlretrieve(url, filename)
print(url)
(Note that the words "website" and "Image" have been substituted for the actual text included in my code).
Before I proceed, perhaps some explanation would be necessary.
Basically, the website in question contains several subdomains that include .JPG images, however, the majority of the exact URLs that allow the user to access these sub-domains are unknown and are a hidden component of the internal website itself. The format is "website.com/Image_xxxxxx.jpg", wherein x indicates a particular digit, and there are 6 total numerical digits by which only when combined to make a valid code pertain to each of the existent images on the site.
So as you can see, I have calibrated the code so that Python will initially search through number values in the aforementioned URL format from 100000 to 199999, and upon discovering any .JPG images attributed to any of the thousands of link combinations, will directly download all existent uncovered images to a specific folder that resides within my Desktop. The aim would be to start from that specific portion of number values, and upon running the code and fetching any images (or not), continually renumbering the code to work my way through all of the possible 6-digit combos until the operation is ultimately a success.
(Possible Side-Issue- Although I am fairly confident that my friend's code is written in a manner so that Python will only download .JPG files to my computer from images that actually do exist on that particular URL, rather than swarming my folder with blank/bare files from every single one of URL attempts regardless of whether that URL happens to be successful or not, I am admittedly not completely certain. If the latter is the case, informing me of a more suitable edit to my code would be tremendously appreciated.)
The Execution-
Right off the bat, the code experienced a large error. I'll list through the series of steps that led to the creation of said error.
#1- Of course, I first copy-pasted the code into a text document, and saved it as "download.py". I saved it inside of a folder named "Images" where I sought the images to be directly downloaded to. I used BBEdit.
#2- I proceeded, in Terminal, to input the commands "cd Desktop/Images" (to account for the file being held within the "Images" folder on my Desktop), followed by the command "Python download.py" (to actually run the code).
As you can see, the error which I obtained following my attempt to run the code was the ImportError: No module named request. Despite me guessing that the answer to solving this is simple, I can legitimately say I have got such minimal knowledge regarding Python that I've absolutely no idea how to solve this.
Hint: Prior to making the download.py file, the folder, and typing the Terminal code the only interactions I made with Python were downloading the program (3.6) and placing it in my toolbar. I'm not even quite sure if I am required to create any additional scripts/text files, or make any additional downloads before a script like this would work and successfully download the resulting images into my "Images" folder as is my desired goal. If I sincerely missed something integral at any point during this long read, hopefully, someone in here can provide a thoroughly detailed explanation as to how to solve my issue.
Finishing statements for those who've managed to stick along this far:
Thank you. I know this is one hell of a read, and I'm getting more tired as I go along. What I hope to get out of this question is
1.) Obviously, what would constitute a direct solution to the "No module named request" Input Error in Terminal. In other words, what I did wrong there or am missing.
2.) Any other helpful information that you know would assist this code, for example, if there is any integral step or condition I've missed or failed to meet that would ultimately cause the entirety of my code to cease to work. If you do see a fault in this, I only ask of you to be specific, as I've not got much experience in the programming world. After all, I know there is a lot of developers out here that are far more informed and experienced than am I. Thanks.

urllib.request is in Python 3 only. When running 'python' on a Mac, you're running Python 2 by default. Try running executing with python3.
python --version
might need to
brew install python3

urllib.request is a Python 3 construct. Most systems run Python 2 as default and this is what you get when you run simply python.
To install Python 3, go to https://brew.sh/ and follow the instructions to install the Hombrew package manager. Then run
brew install python3
python3 download.py

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.