I'm beginning to play with GeoPySpark and am implementing an example notebook.
I successfully retrieved the images
!curl -o /tmp/B01.jp2 http://sentinel-s2-l1c.s3.amazonaws.com/tiles/32/T/NM/2017/1/4/0/B01.jp2
!curl -o /tmp/B09.jp2 http://sentinel-s2-l1c.s3.amazonaws.com/tiles/32/T/NM/2017/1/4/0/B09.jp2
!curl -o /tmp/B10.jp2 http://sentinel-s2-l1c.s3.amazonaws.com/tiles/32/T/NM/2017/1/4/0/B10.jp2
Here is the script:
import rasterio
import geopyspark as gps
import numpy as np
from pyspark import SparkContext
conf = gps.geopyspark_conf(master="local[*]", appName="sentinel-ingest-example")
pysc = SparkContext(conf=conf)
jp2s = ["/tmp/B01.jp2", "/tmp/B09.jp2", "/tmp/B10.jp2"]
arrs = []
for jp2 in jp2s:
with rasterio.open(jp2) as f: #CRASHES HERE
arrs.append(f.read(1))
data = np.array(arrs, dtype=arrs[0].dtype)
data
The script crashes where I placed the marker here, with the following error:
RasterioIOError: '/tmp/B01.jp2' not recognized as a supported file format.
I copy-pasted the example code exactly, ad in the Rasterio docs it even uses .jp2 files in examples.
I'm using the following version of Rasterio, installed with pip3. I do not have Anaconda installed (messes up my Python environments) and do not have GDAL installed (it refuses to, that would be the topic of another question if it is my only solution)
Name: rasterio
Version: 1.1.0
Summary: Fast and direct raster I/O for use with Numpy and SciPy
Home-page: https://github.com/mapbox/rasterio
Author: Sean Gillies
Author-email: sean#mapbox.com
License: BSD
Location: /usr/local/lib/python3.6/dist-packages
Requires: click-plugins, snuggs, numpy, click, attrs, cligj, affine
Required-by:
Why does it refuse to read .jp2 files? Is there maybe a way to convert them to something usable? Or do you know of any example files similar to these ones in an acceptable format?
I was stuck in the same situation.
I used the pyvips package and it's resolved.
import pyvips
image = pyvips.Image.new_from_file("000240.jp2") image.write_to_file("000240.jpg")
Related
I am writing a python code for merging ppts. It takes the location of the 2 ppts, merges them and puts the merged ppt formed in the folder given by user. The code used is:
import sys
from pptx import Presentation
#import Aspose.Words.License
#import aspose.slides as a_slides
#import os
#import win32com.client
def merge_powerpoint_ppts(pres_loc1, pres_loc2, output_loc):
p1 = open(pres_loc1)
pres1 = Presentation(p1)
p2 = open(pres_loc2)
pres2 = Presentation(p2)
for slide in pres2.slides:
for lide in pres1.slides:
if slide.shape.title.text == lide.shape.title.text:
pres1.slides.add_Clone(slide)
pres1.save(output_loc)
p1.close()
p2.close()
When I try to debug the code, I get the following:
[Error][1]
[1]: https://i.stack.imgur.com/yID2l.png
I have already installed the module pptx on my system and it is updated, but I am still getting this error.
First if you have a folder named Presentation or pptx change it cause this could happen because of naming confusion of files or folder and python modules
Secondly make sure you use the correct python interpreter or env where you install pptx in in
last Option uninstall pptx and write the following command
conda install -c conda-forge python-pptx
I have installed the eccodes library using Conda, but when I try to import it in Python I get "Cannot find the ecCodes library".
Why do I get this error and how can I resolve it? I think that Python does not know where to find the library.
I used the commands found here. That is,
conda install -c conda-forge eccodes
pip3 install --upgrade eccodes
I am using a Windows machine.
After asking a colleague we found a solution by running the code
import ecmwflibs
Now eccodes is recognised
He found this because the error message was raised by the script in
~/Anaconda3/Lib/site-packages/gribapi/bindings.py
#
# (C) Copyright 2017- ECMWF.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation nor
# does it submit to any jurisdiction.
#
# Authors:
# Alessandro Amici - B-Open - https://bopen.eu
# Shahram Najm - ECMWF - https://www.ecmwf.int
#
from __future__ import absolute_import, division, print_function, unicode_literals
import logging
import pkgutil
import cffi
__version__ = "1.4.2"
LOG = logging.getLogger(__name__)
try:
import ecmwflibs as findlibs
except ImportError:
import findlibs
library_path = findlibs.find("eccodes")
if library_path is None:
raise RuntimeError("Cannot find the ecCodes library")
# default encoding for ecCodes strings
ENC = "ascii"
ffi = cffi.FFI()
CDEF = pkgutil.get_data(__name__, "grib_api.h")
CDEF += pkgutil.get_data(__name__, "eccodes.h")
ffi.cdef(CDEF.decode("utf-8").replace("\r", "\n"))
lib = ffi.dlopen(library_path)
I wouldn't mess with Pip here. Conda Forge provides both the compiled library (eccodes) and the Python bindings (python-eccodes). The latter lists the former as a dependency, so it should be sufficient to use:
conda install -c conda-forge python-eccodes
I am using Python 3.6 interpreter in my PyCharm venv, and trying to convert a CSV to Parquet.
import pandas as pd
df = pd.read_csv('/parquet/drivers.csv')
df.to_parquet('output.parquet')
Error-1
ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
pyarrow or fastparquet is required for parquet support
Solution-1
Installed fastparquet 0.2.1
Error-2
File "/Users/python parquet/venv/lib/python3.6/site-packages/fastparquet/compression.py", line 131, in compress_data
(algorithm, sorted(compressions)))
RuntimeError: Compression 'snappy' not available. Options: ['GZIP', 'UNCOMPRESSED']
I Installed python-snappy 0.5.3 but still getting the same error? Do I need to install any other library?
If I use PyArrow 0.12.0 engine, I don't experience the issue.
In fastparquet snappy compression is an optional feature.
To quickly check a conversion from csv to parquet, you can execute the following script (only requires pandas and fastparquet):
import pandas as pd
from fastparquet import write, ParquetFile
df = pd.DataFrame({"col1": [1,2,3,4], "col2": ["a","b","c","d"]})
# df.head() # Test your initial value
df.to_csv("/tmp/test_csv", index=False)
df_csv = pd.read_csv("/tmp/test_csv")
df_csv.head() # Test your intermediate value
df_csv.to_parquet("/tmp/test_parquet", compression="GZIP")
df_parquet = ParquetFile("/tmp/test_parquet").to_pandas()
df_parquet.head() # Test your final value
However, if you need to write or read using snappy compression you might follow this answer about installing snappy library on ubuntu.
I've used the following versions:
python 3.10.9 fastparquet==2022.12.0 pandas==1.5.2
This code works seemlessly for me
import pandas as pd
df = pd.read_csv('/parquet/drivers.csv')
df.to_parquet('output.parquet', engine="fastparquet")
I'd recommend you move away from python 3.6 as it has reached end of life and is no longer supported.
I am trying to knit an Rmd file using reticulate and Python inside of a virtualenv.
The following is my R set up chunk:
```{r r-setup}
library(reticulate)
venv_path <- "/path/to/venv/"
use_virtualenv(venv_path, required = TRUE)
```
This works as expected. However, the next step breaks when I try to import geopandas:
```{python}
import geopandas as gpd
```
The traceback is as follows:
Error in py_module_import... OSError: Could not find lib c or load any variants...
The traceback error points to the shapely package from shapely.geometry import shape, Point File. Other Python libraries load with no issue within the chunk e.g. import os.
From these messages, I'm guessing that it is not loading the OGR/GDAL bindings. However, I'm not sure how to solve this.
import geopandas runs without error when I run the chunk inside of the notebook e.g. not knitting. It also works within the repl_python() shell of my project. So the issue seems to be principally with knitr and knitting.
My RStudio version is: 1.1.456.
The output of session_info() is:
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin17.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reticulate_1.10 stringr_1.3.1 dplyr_0.7.6 ggplot2_3.0.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 pillar_1.3.0 compiler_3.5.1 plyr_1.8.4
[5] bindr_0.1.1 tools_3.5.1 digest_0.6.17 packrat_0.4.9-3
[9] jsonlite_1.5 evaluate_0.11 tibble_1.4.2 gtable_0.2.0
[13] lattice_0.20-35 pkgconfig_2.0.2 rlang_0.2.2 Matrix_1.2-14
[17] yaml_2.2.0 bindrcpp_0.2.2 withr_2.1.2 knitr_1.20
[21] rprojroot_1.3-2 grid_3.5.1 tidyselect_0.2.4 glue_1.3.0
[25] R6_2.2.2 rmarkdown_1.10 purrr_0.2.5 magrittr_1.5
[29] scales_1.0.0 backports_1.1.2 htmltools_0.3.6 assertthat_0.2.0
[33] colorspace_1.3-2 stringi_1.2.4 lazyeval_0.2.1 munsell_0.5.0
[37] crayon_1.3.4
I managed to solve this by removing the "DYLD_FALLBACK_LIBRARY_PATH" which points to my brew installed R libraries.
The solution was within a python chunk as follows:
```{python}
import os
FALLBACK_PATH = {"DYLD_FALLBACK_LIBRARY_PATH" : "/usr/local/Cellar/r/3.5.1/lib/R/lib"}
del os.environ["DYLD_FALLBACK_LIBRARY_PATH"]
import geopandas
# Reset the environmental variable.
os.environ.update(FALLBACK_PATH)
```
I'm not sure if this is the cleanest solution but it works. Also not sure if this is a Mac OSX problem only as well.
I'm experimenting with the lzma module in Python 2.7.6 to see if I could create compressed files using the XZ format for a future project that will make use of it. My code used during the experiment was:
import lzma as xz
in_file = open('/home/ki2ne/Desktop/song.wav', 'rb')
input_data = in_file.read()
compressed_data = xz.compress(input_data)
out_file = open('/home/ki2ne/Desktop/song.wav.xz', 'wb')
in_file.close()
out_file.close()
and I noticed there were two different checksums (MD5 and SHA256) from the resulting file compared to when I used the plain xz (although I could decompress fine with either method - the checksums of the decompressed versions of both files were the same). Would this be a problem?
UPDATE: I found a fix for it by installing the backport (from Python 3.3) via peterjc's Git repository (link here), and now it's showing identical checksums. Not sure if it helps, but I made sure the LZMA Python module in my repository wasn't installed to avoid possible name conflicts.
Here's my test code to confirm this:
# I have created two identical text files with some random phrases
from subprocess import call
from hashlib import sha256
from backports import lzma as xz
f2 = open("test2.txt" , 'rb')
f2_buf = buffer(f2.read())
call(["xz", "test1.txt"])
f2_xzbuf = buffer(xz.compress(f2_buf))
f1 = open("test1.txt.xz", 'rb')
f1_xzbuf = buffer(f1.read())
f1.close(); f2.close()
f1sum = sha256(); f2sum = sha256()
f1sum.update(f1_xzbuf); f2sum.update(f2_xzbuf)
if f1sum.hexdigest() == f2sum.hexdigest():
print "Checksums OK"
else:
print "Checksum Error"
I've also verified it using the regular sha256sum as well (when I wrote the data to file).
I would not be concerned about the differences in the compressed files - depending on the container format and the checksum type used in the .xz file, the compressed data could vary without affecting the contents.
EDIT I've been looking into this further, and wrote this script to test the PyLZMA Python2.x module and the lzma Python3.x built in module
from __future__ import print_function
try:
import lzma as xz
except ImportError:
import pylzma as xz
import os
# compress with xz command line util
os.system('xz -zkf test.txt')
# now compress with lib
with open('test.txt', 'rb') as f, open('test.txt.xzpy', 'wb') as out:
out.write(xz.compress(bytes(f.read())))
# compare the two files
from hashlib import md5
with open('test.txt.xz', 'rb') as f1, open('test.txt.xzpy', 'rb') as f2:
hash1 = md5(f1.read()).hexdigest()
hash2 = md5(f2.read()).hexdigest()
print(hash1, hash2)
assert hash1 == hash2
This compresses a file test.txt with the xz command line utility and with the Python module and compares the results. Under Python3 lzma produces the same result as xz, however under Python2 PyLZMA produces a different result that cannot be extracted using the xz command line util.
What module are you using that is called "lzma" in Python2 and what command did you use to compress the data?
EDIT 2 Okay, I found the pyliblzma module for Python2. However it seems to use CRC32 as the default checksum algorithm (others use CRC64) and there is a bug that prevents changing the checksum algorithm https://bugs.launchpad.net/pyliblzma/+bug/1243344
You could possibly try compressing using xz -C crc32 to compare the results, but I'm still not having success making a valid compressed file using the Python2 libraries.
In my case (Ubuntu/Mint), in order to use the lzma module with Pyhton 2.7, I installed backports.lzma directly with pip (I have not used github), with sudo or root user:
pip2 install backports.lzma
FYI pip2 has the --user option that doesn't require superuser permissions and installs the module for the local user only, but I have not tested this.
First than performing the pip installation, you have also to install, with your package manager, one mandatory dependency: the library liblzma.
In my case the package names were liblzma5 andliblzma-dev but package names may differ between Linux distro/releases.
P.s: I also repeated the same operation with success with conda on a different Linux environment (Unknown cluster distro):
conda install backports
conda install backports.lzma --name pyEnvName
Hope useful