Import dataset from a ROOT file with cut using zfit

Import dataset from a ROOT file with cut using zfit - python

I am trying to perform a fit to a tree. But I need to add some cut to the branches which are not the observables of the fit.
Website https://zfit.readthedocs.io/en/latest/getting_started/intro/data.html tells me that I can include cuts in the dataset by specifying the root_dir_options. But I don't know how to operate it.
For example, I want to open a ROOT file "test.root" with tree "ntuple". The observables of the fit is x.
I can write
data = zfit.Data.from_root("tese.root","ntuple","x")
If I need to set cut of two other branches in the tree y>1 and z>1, how can I write the code?

There are actually two ways as of today:
Using pandas
The most general way is to load the data first into a pandas dataframe (using uproot) and then load into zfit with from_pandas, there you can give an obs. So you will need to first create a space with obs = zfit.Space('obsname', (lower, upper)). Then you can use that in zfit.Data.from_pandas(...)
Loading with uproot can be (as an example):
branches = ["pt1", "pt2"]
with uproot.open(path_root) as f:
tree = f["events"]
true_data = tree.arrays(branches, library="pd")
Cutting edge
The cutting edge way is to give the limits directly in from_root; this is cutting edge development and will be available soon: https://github.com/zfit/zfit/pull/396

Related

How to extract a profile of value from a raster along a given line?

How to extract a profile of values from a raster along a given shapefile line in Python?
I am struggling finding a method to extract a profile of values (e.g. topographic profile) from a raster (geotiff). The library Rasterio has a method to clip/extract value from a raster based on a polygon, but I cannot find an equivalent method for a line shapefile.
There is a basic method with scipy, but it does not inherently conserve geographic information like a method based on higher level toolbox like rasterio could provide.
In other words, I am looking for an equivalent in Python of what the tool Terrain Profile in QGIS offers.
Thanks

This is a bit different than extracting for a polygon, as you want to sample every pixel touched by the line, in the order they are touched (the polygon approaches don't care about pixel order).
It looks like it would be possible to adapt this approach to use rasterio instead. Given a line read from a shapefile using geopandas or fiona as a shapely object, you use the endpoints to derive a new equidistant projection that you use as dst_crs in a WarpedVRT and read pixel values from that. It looks like you would need to calculate the length of your line in terms of the number of pixels you want sampled, this is the width parameter of the WarpedVRT.
This approach may need to be adapted further if your line is not an approximately straight line between the endpoints.
If you want to just get the raw pixel values under the line, you should be able to use a mask in rasterio or rasterize directly, for each line. You may want to use the all_touched=True in the case of lines.

I had a similar problem and found a solution which works for me. The solution uses shapely to sample points on a line/lines and then accesses respective values from the GeoTiff, therefore the extracted profile follows the direction of the line. Here is the method that I ended up with:
def extract_along_line(xarr, line, n_samples=256):
profile = []
for i in range(n_samples):
# get next point on the line
point = line.interpolate(i / n_samples - 1., normalized=True)
# access the nearest pixel in the xarray
value = xarr.sel(x=point.x, y=point.y, method="nearest").data
profile.append(value)
return profile
Here is a working example with data from the copernicus-dem database and the line is the diagonal of the received tile:
import rioxarray
import shapely.geometry
import matplotlib.pyplot as plt
sample_tif = ('https://elevationeuwest.blob.core.windows.net/copernicus-dem/'
'COP30_hh/Copernicus_DSM_COG_10_N35_00_E138_00_DEM.tif')
# Load xarray
tile = rioxarray.open_rasterio(sample_tif).squeeze()
# create a line (here its the diagonal of tile)
line = shapely.geometry.MultiLineString([[
[tile.x[-1],tile.y[-1]],
[tile.x[0], tile.y[0]]]])
# use the method from above to extract the profile
profile = extract_along_line(tile, line)
plt.plot(profile)
plt.show()

How to find objects in floor plan image in Tkinter python through svg file?

I have a vectorized floorplan image. I want to identify the objects in the image through the vector data in the SVG file of that image. The SVG code does not have any close points(z) in between them. So I am unable to understand when does the point moves to the other object? Can somebody help me, please?
I have very little knowledge about these SVG files and using them in Tkinter. So please somebody help me or suggest me what can I do?
This is the vector data of the image.
vector data of the image

use in conjunction with SO floorplan question.
Jump to z_final_floorplan.svg for final file.
A
Create 4 files:
w_original_floorplan.svg
x_rough_static_floorplan.svg
y_rough_live_floorplan.svg
z_final_floorplan.svg
w_original_floorplan.svg and x_rough_static_floorplan.svg are identical apart from filename.
y_rough_live_floorplan.svg and z_final_floorplan.svg are empty; to be populated.
Copy x_rough_static_floorplan.svg to y_rough_live_floorplan.svg.
Open y_rough_live_floorplan.svg on browser using server.
x_rough_static_floorplan.svg find all M and replace with two newlines / symbol M (case sensitive). shift + enter shift + enter /M
B
[this section takes the time]
Take away 1st '/' in path in y_rough_live_floorplan.svg [shows blackout_floorplan]
Label x_rough_static_floorplan.svg code section blackout_floorplan where code is.
(this file is used as rough-work, so being xml / svg valid is irrelevant)
In y_rough_live_floorplan.svg find next '/' and delete it [shows floorplan_top_left_whiteout]
Label x_rough_static_floorplan.svg code section floorplan_top_left_whiteout where code is.
Have x_rough_static_floorplan.svg and y_rough_live_floorplan.svg open in 2 windows, will be going back and forth to each of them. Keep repeating until at end.
(hint: find tool seems to be on switching from files in vscode, so you can use find / and next one cmd + g easily) Maybe handy to have a paper printout of original svg as reference and label the names of objects you create e.g.bath, sink, table, as you go along (don’t be fooled by this, one table is 'table'. Is 2nd chair chair2, chair_2, chair_two etc.?) etc..
C
Reorder the whole labels and corresponding code in path x_rough_static_floorplan.svg so the labels are ordered next to each other, but in the order they are found in the path:
e.g.
…
floorplan
bath
sink
table_chairs
sofa
…
Use the 'find' tool here. This process, itself will require a temp file to copy and paste to rather than reorder within the file working on. And rewrite temp to file working on. Might be good idea to create checklist of objects and cross-off as done.
E.g. floorplan, bath, table_chairs, sink…
D
Create path elements from your grouped objects, putting each id as id=“floorplan_main”, id=“bath”, id=“sink” etc.. etc..
Bear in mind, the data of how this is drawn is really, really bad. Really they should be drawn with rect elements for a rectangle when possible and a lot of the path data is very unnecessary, but that’s obviously how the application generates the svg.

Selecting multiple files for input and getting respective output

So I have this bit of code, which clips out a shapefile of a tree out of a Lidar Pointcloud. When doing this for a single shapefile it works well.
What I want to do: I have 180 individual tree shapefiles and want to clip every file out of the same pointcloud and save it as a individual .las file.
So in the end I should have 180 .las files. E.g. Input_shp: Tree11.shp -> Output_las: Tree11.las
I am sure that there is a way to do all of this at once. I just dont know how to select all shapefiles and save the output to 180 individual .las files.
Im really new to Python and any help would be appreciated.
I already tried to get this with placeholders (.format()) but couldnt really get anywhere.
from WBT.whitebox_tools import WhiteboxTools
wbt = WhiteboxTools()
wbt.work_dir = "/home/david/Documents/Masterarbeit/Pycrown/Individual Trees/"
wbt.clip_lidar_to_polygon(i="Pointcloud_to_clip.las", polygons="tree_11.shp", output="Tree11.las")

I don't have the plugin you are using, but you may be looking for this code snippet:
from WBT.whitebox_tools import WhiteboxTools
wbt = WhiteboxTools()
workDir = "/home/david/Documents/Masterarbeit/Pycrown/Individual Trees/"
wbt.work_dir = workDir
# If you want to select all the files in your work dir you can use the following.
# though you may need to make it absolute, depending on where you run this:
filesInFolder = os.listDir(workDir)
numberOfShapeFiles = len([_ for _ in filesInFolder if _.endswith('.shp')])
# assume shape files start at 0 and end at n-1
# loop over all your shape files.
for fileNumber in range(numberOfShapeFiles):
wbt.clip_lidar_to_polygon(
i="Pointcloud_to_clip.las",
polygons=f"tree_{fileNumber}.shp",
output=f"Tree{fileNumber}.las"
)
This makes use of python format string templates.
Along with the os.listdir function.

How to efficiently randomly select a subset of data from an h5py dataset

I have a very very big dataset in h5py and this leads to memory problem when loaded in full and subsequent processing. I need to randomly select a subset and work with it. This is doing "boosting" in the context in machine learning.
dataset = h5py.File(h5_file, 'r')
train_set_x_all = dataset['train_set_x'][:]
train_set_y_all = dataset['train_set_y'][:]
dataset.close()
p = np.random.permutation(len(train_set_x_all))[:2000] # rand select 2000
train_set_x = train_set_x_all[p]
train_set_y = train_set_y_all[p]
I still somehow need to get the full set and slice it with index array p. This works for me as subsequently training only worked on the smaller set. But I wonder if there's still a better way to let me do this without even keeping the full dataset in memory at all.

cross section plot using python Iris module

I want to plot cross section along longitude using python Iris module which developed for oceanography and meteorology, I'm using their example:
http://scitools.org.uk/iris/docs/v1.4/examples/graphics/cross_section.html
I tried to change their code to my example but output of my code is empty.
data: http://data.nodc.noaa.gov/thredds/fileServer/woa/WOA09/NetCDFdata/temperature_annual_1deg.nc
import iris
import iris.plot as iplt
import iris.quickplot as qplt
# Enable a future option, to ensure that the netcdf load works the same way
# as in future Iris versions.
iris.FUTURE.netcdf_promote = True
# Load some test data.
fname = 'temperature_annual_1deg.nc'
theta = iris.load_cube(fname, 'sea_water_temperature')
# Extract a single depth vs longitude cross-section. N.B. This could
# easily be changed to extract a specific slice, or even to loop over *all*
# cross section slices.
cross_section = next(theta.slices(['longitude',
'depth']))
qplt.contourf(cross_section, coords=['longitude', 'depth'],
cmap='RdBu_r')
iplt.show()

What you need to understand here is that your current cross_section is defined as first member of theta.slices iterator, meaning that it starts from one end of coordinates (which are empty in current case). So you need to iterate to the next members of the iterator until you get some data. If you add these lines to the code, maybe it helps to understand what is going on:
import numpy as np
cs = theta.slices(['longitude', 'depth'])
for i in cs:
print(np.nanmax(i))
Which should print something like:
--
--
--
-0.8788
-0.9052

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.