- Joined
- Jan 4, 2024
Well, for linear storage devices, fragmentation makes you seek over unrelated data, and this impacts performance. So, a function to minimize could be the sum of non-d_i bytes in the span of data item d_i, over all data blocks on the device, as this is the total "fragmentation" you would have to seek over when reading all files from e.g. disc.My problem is defining the fragmentation function. I've considered using the entropy of the free and used regions of the tape, I've considered having a measure involving the minimum and maximum free sizes in the tape, etc
jemalloc does essentially what you've outlined, though it allocates bins of fixed power-of-two size in advance (though it does expand each such region dynamically)
The transformation you mentioned, if let arbitrary would just be whatever function makes phi(x)=0, but since the idea started by modeling a physical device, there should probably be a cost to each transformation mimicking real disc copies, etc. For example, you could count changed bytes/minimal block sizes.
The mathematical way to solve this would probably be to define a functional like F(τ) = Σ[d(t) - (τ(d))(t) + φ(τ(d))] and find where its Gateaux/Frechet derivative is zero (or differential, the two are mostly equivalent) — this looks like discrete (2nd order) Euler-Lagrange with no derivatives. Maybe the bit difference part could be written as Bolza type boundary conditions.
The CS way to solve it is probably running something like A* with heuristics favoring denser packing, or any other optimizing algorithm (GA, PSO, anthill).
Hope it helps stir some thoughts