- Joined
- Jan 4, 2024
I've given the problem some more thought (enough thought to lose edit permission on my previous post), and I've managed to reformulate it much more nicely.
Firstly, the defragmentation transform τ really only shuffles around the data blocks (otherwise, we'd end up losing existing data). If we write the data on the disc/tape as a function:
d: [1, n]∩ℕ -> B
Then τ is a bijection (permutation) on Dom(d) which means it's an element of the group S_n.
This lets us write the cost of the defragmentation more simply as either the total distance we move/swap the blocks (e.g. Σ|τ(k)-k|) or the number of places swapped (e.g. |{1 : k in N, τ(k) = k}|) for k=1,2,...,n.
Lastly, if we have a set of data items written to disc/tape D = { D_i }, with each data item being a family of indices corresponding to the location of each of its data blocks in storage (i.e. D_i = { i_1, i_2, ..., i_k } ⊆ Dom(d), D_i pairwise disjoint), then we can write the fragmentation of a single block D_i as:
For ascending i_k in D_i, with k = 1, 2, ..., m
Σ_{k=1}^{m-1} ( i_{k+1) - i_k - 1 )
This counts all the non-consecutive blocks making up data block D_i.
There's a couple of cool things about this approach. For one, S_n is pretty well researched, so there's probably some cool combinatorial/algebraic method I'm unaware of. Second, we've completely avoided evaluating d(k), instead replacing it with just the size of the disc/tape. And lastly, if we make our defrag cost look at data item indices, we only have to work with one type of object (which may or may not be easier).
This sounds like combinatorial optimisation, something like minimum linear arrangement, but I'm no expert in optimisation, much less so in discrete optimisation.
Firstly, the defragmentation transform τ really only shuffles around the data blocks (otherwise, we'd end up losing existing data). If we write the data on the disc/tape as a function:
d: [1, n]∩ℕ -> B
Then τ is a bijection (permutation) on Dom(d) which means it's an element of the group S_n.
This lets us write the cost of the defragmentation more simply as either the total distance we move/swap the blocks (e.g. Σ|τ(k)-k|) or the number of places swapped (e.g. |{1 : k in N, τ(k) = k}|) for k=1,2,...,n.
Lastly, if we have a set of data items written to disc/tape D = { D_i }, with each data item being a family of indices corresponding to the location of each of its data blocks in storage (i.e. D_i = { i_1, i_2, ..., i_k } ⊆ Dom(d), D_i pairwise disjoint), then we can write the fragmentation of a single block D_i as:
For ascending i_k in D_i, with k = 1, 2, ..., m
Σ_{k=1}^{m-1} ( i_{k+1) - i_k - 1 )
This counts all the non-consecutive blocks making up data block D_i.
There's a couple of cool things about this approach. For one, S_n is pretty well researched, so there's probably some cool combinatorial/algebraic method I'm unaware of. Second, we've completely avoided evaluating d(k), instead replacing it with just the size of the disc/tape. And lastly, if we make our defrag cost look at data item indices, we only have to work with one type of object (which may or may not be easier).
This sounds like combinatorial optimisation, something like minimum linear arrangement, but I'm no expert in optimisation, much less so in discrete optimisation.