Survival Analysis for MIP Solving Time

Survival Analysis for MIP Solver Running Time

Inspired by the recent activity in open-source MIP solvers (HiGHS is becoming stable after the 1.0 release and many users submitting bug reports, SCIP has become open-source software), I wanted to revisit some folklore about running times.

When solving difficult problems, in particular, when running computational experiments, we have to choose a timelimit to tell the solver when to give up solving the problem (that is finding a solution and proving its optimality). If the timelimit is chosen too small, many instances will not be solved within it and therefore we can not compare the running time between different solvers or settings. But if we set the timelimit to long, we might waste computational resources while waiting for a result that never comes.

I remember the rule of thumb:

If the problem is not solved within an hours, it will also not be solved after four hours (or later).

Let's see if we can use some data and survival analysis to confirm this rule.

In [1]:
from lifelines import KaplanMeierFitter
import matplotlib.pyplot as plt
import pandas as pd
In [2]:
def make_ax():
    fig = plt.figure(figsize=(12, 9))
    ax = fig.subplots()
    return ax

Data Prep

We use running time data from the Mittelmann benchmarks on the MIPLIB set of benchmark instances, a set of representative difficult MIP instances.

From the solver comparison, we only look at the open-source solvers still in active development: HiGHS and SCIP. The timelimit was set as 2 hours (7200 seconds), which we use as our window of observation. The tables contain either the time to optimality (solved instance) or some exceptional status (either timout or an error). We interpret all exceptions as right-censored data.

In [3]:
# Read in MIPLIP results (subset of solvers, converted to CSV)
times = pd.read_csv("../files/miplib_20221113_8threads.csv")
times.head()
Out[3]:
Instance HiGHS SCIP
0 30n20b8 537 69
1 50v-10 5508 timeout
2 academictimetablesmall timeout timeout
3 air05 22 24
4 app1-1 23 3
In [4]:
# Bring table in long form, not separate columns per solver
times = pd.melt(times, id_vars="Instance", var_name="Solver", value_name="Runtime")
times.set_index(["Solver", "Instance"], inplace=True)
In [5]:
# Replace exceptional status with time limit
for status in ["abort", "mismatch", "timeout"]:
    times.replace(status, 7200, inplace=True)

# Convert all values to numbers
times["Runtime"] = times["Runtime"].astype(float)
In [6]:
# Identify the instances that have been solved properly within the time limit.
# The others are right-censored, that is, we don't know how long it would have taken to solve them,
# including cases with an error status.
times["Finished"] = 1 * (times["Runtime"] < 7200)
times.head()
Out[6]:
Runtime Finished
Solver Instance
HiGHS 30n20b8 537.0 1
50v-10 5508.0 1
academictimetablesmall 7200.0 0
air05 22.0 1
app1-1 23.0 1

Kaplan-Meier Estimate

We apply the Kaplan-Meier estimate to draw survival curves for each solver.

These show the probability that an instance has survived for some time (ie, has not yet been solved to optimality), together with a 95% confidence interval.

Note that the curves (one for each solver) start at 100% for time 0 and decrease monotonically. The curves stop at 7200s (the time limit set) but have not reached 0%, because of some instances that have not been solved.

We can see a steep decline in the first few minutes, while the curves flattens to a long tail, as predicted by our rule of thumb.

In [7]:
def kaplan_meier(solver, ax=None):
    kmf = KaplanMeierFitter()
    kmf.fit(
        times.loc[solver]["Runtime"],
        event_observed=times.loc[solver]["Finished"],
        label=solver,
    )
    return kmf.plot_survival_function(ax=ax)
In [8]:
ax = make_ax()
for s in ["HiGHS", "SCIP"]:
    kaplan_meier(s, ax)

Conditional Survival

We can also look at conditional survival, considering only that part of the curves beyond the first hour. That is, we reframe our questions from

What is the probability that an instance has survived (not been solved) after some time?

to

What is the probabilty that an instance is solved after some time of additional waiting, knowing that it wasn't solver after the first hour?

To that end, we merge the data from both solvers and make some adjustments to the resulting survival curve.

In [9]:
kmf = KaplanMeierFitter()
kmf.fit(
    times["Runtime"],
    event_observed=times["Finished"],
    label="Open Source Solvers"
)
ax = make_ax()
kmf.plot_survival_function(ax=ax);
In [10]:
# Only keep right tail of curve, going beyond 1h
survival = kmf.survival_function_["Open Source Solvers"]
tail = survival[3600:]
tail.index = tail.index - 3600.0
In [11]:
# Rescale probability to start at 100% (conditional on surviving 1h),
# and negate the probability (solving it within time).
cond_tail = 1.0 - tail / tail.iloc[0]
In [12]:
# Visualize probability of solving it after waiting for additional time, after 1h.
ax = make_ax()
cond_tail.plot(ax=ax, drawstyle="steps-pre");

The above chart shows the probability than instance is solved after some time (beyond 1h), given that it wasn't solved already. It tells us that an additional hour of computation would only solve about 10% of the remaining instances.

The Food Triangle Quantified

In a recent cookbook, the Healthspan Solution, the authors propose a new way to categorize food: the food triangle.

First of all, only whole food ingredients are considered. Then, rather than grouping foods into one of the macronutrients (fat, protein and carbohydrates), they are sorted by energy density on the one hand (top / down), and by source (animal on the left, plants on the right) on the other:

Food Triangle

The recommendation is to eat on the "right side", in particular near the top, in order to get sufficient micronutrients while keeping energy intake in check.

On the other hand, many traditional recipes choose foods from both sides near the bottom, yielding the highest energy density while promoting fat storage due to oxidative priority.

I like this scheme, but I wondered whether most foods could really be assigned one of these sides easily. So this post is about visualizing foods by their energy density and relative source of energy (fat vs carbohydrates).

In [1]:
# Activate custom environment ad load packages
using Pkg
Pkg.activate("../../environments/food-triangle/");

using CSV
using DataFrames
using VegaLite
 Activating environment at `~/src/rschwarz.github.io/environments/food-triangle/Project.toml`
In [2]:
# Load nutrition data
foods = DataFrame(CSV.File("../files/foods.csv"))
@show size(foods)
first(foods, 3)
size(foods) = (142, 7)
Out[2]:

3 rows × 7 columns

Source Category Food Energy Protein Fat Carbohydrates
String String String Float64 Float64 Float64 Float64
1 Plant Grain Oats 272.0 2.4 1.6 10.1
2 Plant Grain Barley 388.0 2.9 0.9 18.0
3 Plant Grain Buckwheat 304.0 2.5 0.6 14.1

Energy

Our nutrition data lists 142 different foods from different categories with energy (in kJ per 100g) as well as relative weight of the macronutrients.

Since fat has a much higher energy density, we will replace the weight values by the energy contribution per macronutrient.

In [3]:
foods.Fat = 37.0 .* foods.Fat
foods.Carbohydrates = 16.0 .* foods.Carbohydrates
foods.Protein = 17.0 .* foods.Protein
first(foods, 3)
Out[3]:

3 rows × 7 columns

Source Category Food Energy Protein Fat Carbohydrates
String String String Float64 Float64 Float64 Float64
1 Plant Grain Oats 272.0 40.8 59.2 161.6
2 Plant Grain Barley 388.0 49.3 33.3 288.0
3 Plant Grain Buckwheat 304.0 42.5 22.2 225.6

Food Triangle

Our quantified food triangle will be implemented as a scatter plot. For the vertical axis, we use the total energy density (low energy on the top). For the horizontal axis, we take the difference of energy contribution from carbohydrates and fat. So, foods rich in fat tend to be near the left, while foods rich in carbohydrates are on the right. Foods in the middle could have either both fat and carbohydrates, or neither, being rich in protein instead.

In [4]:
foods.fat_cho = foods.Carbohydrates - foods.Fat
first(foods, 3)
Out[4]:

3 rows × 8 columns

Source Category Food Energy Protein Fat Carbohydrates fat_cho
String String String Float64 Float64 Float64 Float64 Float64
1 Plant Grain Oats 272.0 40.8 59.2 161.6 102.4
2 Plant Grain Barley 388.0 49.3 33.3 288.0 254.7
3 Plant Grain Buckwheat 304.0 42.5 22.2 225.6 203.4
In [5]:
foods |> @vlplot(height=400, width=600, :point,
                 x=:fat_cho, y={:Energy, sort="descending"}, color=:Source)
Out[5]:
-3,000-2,800-2,600-2,400-2,200-2,000-1,800-1,600-1,400-1,200-1,000-800-600-400-20002004006008001,0001,2001,400fat_cho05001,0001,5002,0002,5003,0003,500EnergyAnimalPlantSource

To my surprise, most foods do actually line up on either side of the triangle and few are found in the center.

We can also see that all animal foods are on the left, while most plant foods are on the right.

Finally, it's quite obvious that the left side goes a lot "deeper" than the right. Foods rich in fat have a much higher potential for high energy density!

Plant-based Food

Let's go into more detail, looking only at plant foods now, split into more categories.

In [6]:
filter(row -> row.Source == "Plant", foods) |>
    @vlplot(height=400, width=600, :point,
            x=:fat_cho, y={:Energy, sort="descending"}, color=:Category)
Out[6]:
-3,000-2,800-2,600-2,400-2,200-2,000-1,800-1,600-1,400-1,200-1,000-800-600-400-20002004006008001,0001,2001,400fat_cho02004006008001,0001,2001,4001,6001,8002,0002,2002,4002,6002,8003,000EnergyFruitGrainLegumeNuts_SeedsVegetableCategory

We can see that the different categories form nice clusters in the this view.

Most items on the left side seem to be either nuts or seeds. Let's zoom in on these.

In [7]:
filter(row -> row.Category == ("Nuts_Seeds"), foods) |>
    @vlplot(height=400, width=600, :text, text=:Food,
            x=:fat_cho, y={:Energy, sort="descending", scale={zero=false}}, color=:Category)
Out[7]:
-2,800-2,600-2,400-2,200-2,000-1,800-1,600-1,400-1,200-1,000-800-600-400-2000fat_cho1,4001,6001,8002,0002,2002,4002,6002,8003,000EnergyAlmondCashewHazelnutMacadamiaPeanutPecanPinePistachioWalnutChiaFlaxseedPumpkinSesameSunflowerNuts_SeedsCategory

When we remove nuts and seeds, we can get a better picture of the foods near the top of the triangle.

In [8]:
filter(row -> row.Source == "Plant" && row.Category != "Nuts_Seeds", foods) |>
 @vlplot(height=400, width=600, :point,
         x=:fat_cho, y={:Energy, sort="descending"}, color=:Category)
Out[8]:
-800-700-600-500-400-300-200-10001002003004005006007008009001,0001,1001,200fat_cho01002003004005006007008009001,0001,1001,200EnergyFruitGrainLegumeVegetableCategory

There are still some outliers of higher energy density:

In [9]:
filter(row -> row.Source == "Plant" && row.Category != "Nuts_Seeds"
              && row.Energy > 400.0, foods) |>
 @vlplot(height=400, width=600, :text, text=:Food,
         x=:fat_cho, y={:Energy, sort="descending",scale={zero=false}}, color=:Category)
Out[9]:
-800-700-600-500-400-300-200-10001002003004005006007008009001,0001,1001,200fat_cho4005006007008009001,0001,1001,200EnergyPolentaRyeBrown_RiceDried_DateDried_FigSoy_BeanChickpeaGreen_LentilRed_LentilAvocadoOliveFruitGrainLegumeVegetableCategory

OK, so there are two vegetables high in fat (avocado and olive) and also two types of dried fruits, which might not be considered proper whole foods, I guess.

The other items are either grains or legumes, but not fruits or vegetables.

In [10]:
filter(row -> row.Source == "Plant" && row.Category != "Nuts_Seeds"
              && row.Energy < 120.0, foods) |>
 @vlplot(height=400, width=600, :text, text=:Food,
         x=:fat_cho, y={:Energy, sort="descending",scale={zero=false}}, color=:Category)
Out[10]:
-20-15-10-505101520253035404550556065707580fat_cho30405060708090100110120EnergyGrapefruitLemonWatermelonStrawberryAsparagusBok_ChoyBroccoliBrussels_SproutChinese_CabbageRed_CabbageWhite_CabbageCarrotCauliflowerCeleriacCeleryChicoryCucumberEggplantEndiveFennelKaleLeekLettuceMushroomOkraOnionRocketRed_RadishSpinachTomatoTurnipZucchiniFruitVegetableCategory

Near the top we have mostly vegetables and some fruits. The foods lowest in energy density are either cabbages, leafy greens or watery vegetables.

In [11]:
filter(row -> row.Category == "Vegetable" &&  row.Fat < 200, foods) |>
    @vlplot(height=400, width=600, :text, text=:Food,
            x=:fat_cho, y={:Energy, sort="descending", scale={zero=false}}, color=:Category)
Out[11]:
-20020406080100120140160180200220240fat_cho20406080100120140160180200220240260280EnergyAsparagusBok_ChoyBroccoliBrussels_SproutChinese_CabbageRed_CabbageWhite_CabbageCarrotCauliflowerCeleriacCeleryChicoryCucumberEggplantEndiveFennelKaleKohlrabiLeekLettuceMushroomOkraOnionParsnipPeaPotatoButternut_PumpkinRocketRed_RadishSpinachSweet_PotatoTomatoTurnipZucchiniVegetableCategory

Looking at all vegetables now, we can see that the starchy vegetables are near the bottom. Should peas actually be with the legumes?

In [12]:
filter(row -> row.Category in ("Grain", "Legume"), foods) |>
    @vlplot(height=400, width=600, :text, text=:Food,
            x=:fat_cho, y={:Energy, sort="descending", scale={zero=false}}, color=:Category)
Out[12]:
-300-250-200-150-100-50050100150200250300350400450500fat_cho250300350400450500550600650EnergyOatsBarleyBuckwheatBulgurPolentaQuinoaRyeBrown_RiceCannellini_BeanLima_BeanKidney_BeanSoy_BeanChickpeaGreen_LentilRed_LentilGrainLegumeCategory

All grains and legumes are higher in density than vegetables. It should be noted here that the numbers apply to cooked, not fresh or dried foods. So maybe oats are only at the top here, because they are typically cooked with more water compared to other grains?

Soy beans are quite the outlier here, storing most of their energy as fat.

Animal-based Foods

Let's now turn to animal foods, all located on the left side.

In [13]:
filter(row -> row.Source == "Animal", foods) |>
 @vlplot(height=400, width=600, :point,
         x=:fat_cho, y={:Energy, sort="descending",scale={zero=false}}, color=:Category)
Out[13]:
-3,000-2,800-2,600-2,400-2,200-2,000-1,800-1,600-1,400-1,200-1,000-800-600-400-2000200400fat_cho2004006008001,0001,2001,4001,6001,8002,0002,2002,4002,6002,8003,0003,200EnergyDairyEggFishMeatOrganShellfishCategory

There is one outlier with very high energy content:

In [14]:
first(sort(foods, :Energy, rev=true), 3)
Out[14]:

3 rows × 8 columns

Source Category Food Energy Protein Fat Carbohydrates fat_cho
String String String Float64 Float64 Float64 Float64 Float64
1 Animal Meat Pork_Belly 3073.0 229.5 2841.6 0.0 -2841.6
2 Plant Nuts_Seeds Macadamia 2966.0 156.4 2738.0 72.0 -2666.0
3 Plant Nuts_Seeds Pecan 2906.0 166.6 2660.3 78.4 -2581.9

It's pork belly, with lots of fat. But actually not much more than some nuts.

In [15]:
filter(row -> row.Source == "Animal" && row.Energy < 450.0, foods) |>
 @vlplot(height=400, width=600, :text, text=:Food,
         x=:fat_cho, y={:Energy, sort="descending",scale={zero=false}}, color=:Category)
Out[15]:
-80-75-70-65-60-55-50-45-40-35-30-25-20-15-10-50510fat_cho280300320340360380400420440460EnergyQuarkMilkYoghurtCodLobsterMusselOctopusOysterScallopSquidDairyFishShellfishCategory

Near the top, with lower energy density (and high water content), we can find dairy products and shellfish.

Dairy also has significant carbohydrates, but the scallops and mussels seem to store almost all of their energy as protein.

In [16]:
filter(row -> row.Source == "Animal" && 700.0 < row.Energy < 2000.0, foods) |>
 @vlplot(height=400, width=600, :text, text=:Food,
         x=:fat_cho, y={:Energy, sort="descending",scale={zero=false}}, color=:Category)
Out[16]:
-1,300-1,200-1,100-1,000-900-800-700-600-500-400-300-200-1000fat_cho7008009001,0001,1001,2001,3001,4001,5001,6001,7001,8001,9002,000EnergyCheddarEdamFetaGoudaHaloumiMozzarellaParmesanPecorinoBeef_SteakMinced_BeefLamb_ChopLamb_SteakMinced_PorkChicken_ThighChicken_WingChickenTurkeyBeef_LiverLamb_HeartLamb_LiverVeal_HeartVeal_KidneyVeal_LiverMackerelSalmonTroutDairyFishMeatOrganCategory

Leaving out the pork belly, the highest energy foods are cheeses. The fishes, meats and organ are rich in fat as well as protein.

Still, I would have expected liver to be farther down. Also, I'm surprised by the wide spread of fat content among different types of fish:

In [17]:
filter(row -> row.Category == "Fish", foods) |>
 @vlplot(height=400, width=600, :text, text=:Food,
         x=:fat_cho, y={:Energy, sort="descending",scale={zero=false}}, color=:Category)
Out[17]:
-800-750-700-650-600-550-500-450-400-350-300-250-200-150-100-500fat_cho4005006007008009001,0001,1001,2001,300EnergyCodMackerelSalmonSardineSnapperTilapiaTroutTunaFishCategory

Overall, I was positively surprised by the accuracy of the food triangle schema when looking at the actual numbers.

Energy (Kernel) Density

As a final analysis, let's compare the energy density of plants and animals (by count) in a simpler, one-dimensional chart:

In [18]:
foods |> @vlplot(height=400, width=600, :line,
    transform=[{density="Energy", bandwidth=200, groupby=["Source"]}],
    x={"value:q", title="Energy"}, y="density:q", color=:Source)
Out[18]:
02004006008001,0001,2001,4001,6001,8002,0002,2002,4002,6002,8003,0003,2003,400Energy0.000000.000100.000200.000300.000400.000500.000600.000700.000800.000900.001000.001100.001200.001300.00140densityAnimalPlantSource

Refined Delaunay Triangulations for Flow Problems

I am trying to solve a network design problem for gas pipeline networks (see PipeLayout). In the original problem statement, only the coordinates of the source and sink nodes are given. Pipes can be connected in any way, also using Steiner nodes at any location.

But I also consider a discretized variation of the problem, where the pipes are selected from a given ground structure. Initially, I thought of using a regular (square) grid as a ground structure, but this brings problems related to solution symmetry (among other things).

So, instead I will now try to find a (not so regular) triangulation of the area that starts with the given source & sink nodes, but may add additional points for refinement.

Mesh generation with triangles is a well-studied problems, with mature implementations (see Triangle), but is usually done in order to solve PDE systems or similar. So these tools define a good quality triangulation in terms of triangles with large angles or small areas.

For our application, the triangle cells themselves are not relevant, but rather the edges around them. These are used to connect the nodes, so they should provide good candidates for shortest paths and spanning trees. Therefore, we will evaluate several triangulations based on the minimum tree that spans all terminal nodes.

Implementation notes.

As usual, we will experiment with Julia code. The triangulation code is part of the PipeLayout package, and based on TriangleMesh. Visualizations are made with Makie and Cairo. Optimization models are formulated with JuMP and solved with SCIP.

In [1]:
# Activate custom environment
using Pkg
Pkg.activate("../../environments/refined-delaunay-for-flow-problems/");

# Julia standard library
using LinearAlgebra
using Random
using SparseArrays
using Statistics

# Third-party packages
using AbstractPlotting
using CairoMakie
using JuMP
using LightGraphs
using SCIP

# My own code (not registered)
using PipeLayout
using PipeLayout: delaunay_triangulation, make_pslg, TriangleSwitches, triangulate
using PipeLayout: antiparallel_digraph, euclidean_steiner_tree

Random Instance Data

We generate a single set of 7 nodes that is used throughout the experiment.

In [2]:
Random.seed!(0);
const N = 7
const WIDTH = 500
const HEIGHT = 300

x = round.(0.95 * WIDTH * rand(N))
y = round.(0.95 * HEIGHT * rand(N))
points = [x y];

Drawing with Makie

We define some drawing utitities to be used below.

In [3]:
function make_scene(width=WIDTH, height=HEIGHT)
    return Scene(resolution=(width, height), show_axis=false, scale_plot=false)
end

function draw_points!(points; markersize=4, color=:black)
    scatter!(points[:, 1], points[:, 2], markersize=markersize, color=color)
end

function draw_edges!(points, edges; color=:gray)
    linesegments!(points[edges'[:], :], color=color)
end

function draw_triangulation(points, edges)
    make_scene()
    draw_edges!(points, edges)
    draw_points!(points)   
end

draw_triangulation(trimesh) = draw_triangulation(trimesh.point', trimesh.edge');

Triangulation with TriangleMesh

TriangleMesh wraps the Triangle C library for Julia. They provide, in particular, an algorithm for Constrained Delaunay Triangulation. This takes as input a planar polygon and outputs a triangulation of the nodes, while ensuring that the existing edges are part of the triangulation (but may be subdivided).

In [4]:
# Create Delaunay Triangulation from points
deltri = delaunay_triangulation(points)

# Create Polygon from triangulation (as input for further refinements)
delpoly = make_pslg(deltri)

draw_triangulation(deltri)
Out[4]:
In [5]:
# Create triangulation of the raw points, specifying a minimum angle constraint
draw_triangulation(triangulate(points,
        TriangleSwitches(minimum_angle=20, conforming_delaunay=true)))
Out[5]:
In [6]:
# Create a CDT, keeping the edges of the original triangulation 
draw_triangulation(triangulate(delpoly,
        TriangleSwitches(minimum_angle=20, conforming_delaunay=true)))
Out[6]:

The simple Delaunay Triangulation makes sense visually and even has nice theoretical properties about approximating Euclidean distances with shortest paths!

The second triangle mesh introduces few additional nodes in order to get more regular triangles. But in turn, it also loses some of the "direct" connections between the terminal nodes.

Finally, the CDT keeps the direct edges of the Delaunay Triangulation, but refines the triangles, using many more additional nodes and edges.

Steiner Tree Model

We will now define an optimization model to determine the shortest subgraph that spans the given terminal nodes (Steiner tree in graphs). The model uses a multi-commodity flow formulation.

With the help of this model, we can see which edges of the triangulation are actually useful for our application, as well as quantitatively compare the triangulations.

In [7]:
function edge_lengths(points, edges)
    diff = points[edges[:, 1], :] - points[edges[:, 2], :]
    return dropdims(mapslices(norm, diff, dims=2), dims=2)
end

edge_lengths(trimesh) = edge_lengths(trimesh.point', trimesh.edge')

function edge_length_map(trimesh)
    edges = trimesh.edge'
    lengths = edge_lengths(trimesh)

    length_map = Dict{Tuple{Int64, Int64}, Float64}()
    for e = 1:size(edges, 1)
        s, t = edges[e, :]
        length_map[s, t] = lengths[e]
        length_map[t, s] = lengths[e]
    end
    return length_map
end

function steiner_tree(trimesh, terminals)
    # Compute length by Euclidean distance of nodes.
    lengths = edge_length_map(trimesh)
    
    # Build digraph with all antiparallel arcs, for nonnegative flow.
    graph = antiparallel_digraph(trimesh)
    nodes = collect(1:nv(graph))
    arcs = collect(keys(lengths))
    
    length(terminals) >= 2 || error("Need at least 2 terminals.")
    all(terminals .<= nv(graph)) || error("Terminals out of range.")
    root = terminals[1]
    sinks = terminals[2:end]
   
    demand(v, s) = 1.0*(v == s) - 1.0*(v == root)
    
    # Using arc length for fixed capacity cost and multi-commodity flow.
    model = JuMP.direct_model(SCIP.Optimizer(display_verblevel=0))
    @variable(model, select[a in arcs], Bin, container=SparseAxisArray)
    @variable(model, flow[a in arcs,s in sinks]  0, container=SparseAxisArray)
    @constraint(model, balance[v in nodes, s in sinks],
        sum(flow[(n, v), s] - flow[(v, n), s] for n in neighbors(graph, v))
        == demand(v, s))
    @constraint(model, capacity[a in arcs, s in sinks], flow[a, s] <= select[a])
    @objective(model, Min, sum(lengths[a] * select[a] for a in arcs))
    
    optimize!(model)
    
    # Return objective, as well as selected edges.
    return objective_value(model), value.(select)
end
Out[7]:
steiner_tree (generic function with 1 method)

Another utility will take a triangulation as input, compute the optimal subtree and visualize both.

In [8]:
function draw_tree(trimesh, terminals=1:N)
    points, edges = trimesh.point', trimesh.edge'
    obj, select = steiner_tree(trimesh, terminals)
    @show obj
    selected = [select[(s,t)] + select[(t,s)] for (s,t) in eachrow(edges)]
    active_edges = edges[selected .> 0.0, :]
    
    draw_triangulation(trimesh)
    linesegments!(points[active_edges'[:], :], color=:plum, linewidth=3)
    draw_points!(points[terminals, :], markersize=5, color=:teal)
end
Out[8]:
draw_tree (generic function with 2 methods)
In [9]:
draw_tree(deltri)
obj = 693.4867635250124
Out[9]:
In [10]:
tri = triangulate(points, TriangleSwitches(minimum_angle=20, conforming_delaunay=true))
draw_tree(tri)
obj = 700.5151721904267
Out[10]:
In [11]:
tri = triangulate(points, TriangleSwitches(maximum_area=sqrt(WIDTH*HEIGHT), conforming_delaunay=true))
draw_tree(tri)
obj = 697.572096498523
Out[11]:
In [12]:
tri = triangulate(delpoly, TriangleSwitches(maximum_area=sqrt(WIDTH*HEIGHT), conforming_delaunay=true))
draw_tree(tri)
obj = 693.4867635250124
Out[12]:

The Delaunay Triangulation allows for a solution of length 693.5. Other point cloud triangulations based on minimum angle or maximum area constraints yield poorer solutions, of length 700.5 and 687.5, while using a larger graph.

The CDT recovers the solution of the Delaunay Triangulation, but does not use any of the additional nodes or edges introduced in the subdivision.

Euclidean Steiner Tree

We can "cheat" and solve the EST problem directly (based on GeoSteiner). This will gives a lower bound on the length (what we can hope to achieve), and maybe a polygon to be triangulated.

In [13]:
est = euclidean_steiner_tree(points)

make_scene()
draw_edges!(est...)
draw_points!(est[1][1:N, :])
Out[13]:

We can see that 3 additional Steiner nodes are inserted.

Let's now build a triangulation of this tree and compute the length.

Then, we show a point cloud triangulation, based on the terminals and the Steiner nodes of the EST.

In [14]:
estpoly = make_pslg(est...)
draw_tree(triangulate(estpoly, TriangleSwitches(
            minimum_angle=20, maximum_area=sqrt(WIDTH*HEIGHT), convex_hull=true)))
obj = 664.018124937279
Out[14]:
In [15]:
draw_tree(triangulate(est[1], TriangleSwitches(minimum_angle=20, maximum_area=sqrt(WIDTH*HEIGHT))))
obj = 696.9337210443265
Out[15]: