If you read through the 1.0.0.355 release notes, there is an odd item that says "Switched copying speed estimator to a linear regression algorithm". Here is what this is about.
An estimate of the completion time (ETA) is a nice thing to know, especially with very long backups. Computing the ETA however is far less trivial than it may seem at first.
Naive approach
The naive approach is to periodically check how much time has elapsed since the start of the backup and how much data was copied. Divide one by another and that will yield a backup speed in bytes per second. Multiply it by an number of bytes left to be copied and that will be a rough estimate of the remaining time. Simple. And wildly inaccurate.
For example, take my local setup. I am backing up from a Windows laptop over WiFi to a fairly old and stressed NAS device. Copying a bunch of smallish (10KB) files yields an average throughput of 79 KB/sec. 100KB ones go at about 541 KB/sec, and 1MB files - at 1305 KB/sec:
File size Throughput
10 KB 79 KBps
100 KB 541 KBps
1 MB 1305 KBps
Clearly the ETA cannot be computed from the throughput estimate, it just depends too much on the file size.
Better model
The reason obviously enough is that some time is spent on actually creating the file, copying its attributes and doing all other things that are needed for replicating the file in addition to copying the file data. In other words the time spent copying a file is a sum of a fixed overhead and whatever is needed to push the data from source to destination:
T = FileSetupTime + BytesCopied / MediaSpeed
If we can estimate the MediaSpeed and FileSetupTime, then we can more accurately guess how much longer we are going to be processing the remaining X bytes of data in N files.
For every processed file we know the BytesCopied and we know the T, which is cumulative time spent processing the file. For the above data the T is 127 ms, 185 ms and 785 ms respectively, and T plotted against BytesCopied looks like this:

Now squint your eyes, tilt your head to the left a bit. Isn't that a beautifully straight line going through these three points? :)
T linearly depends on the BytesCopied, the model appears to be correct. All we have to do now is solve it.
Linear regression
Long story short - we solve the whole thing using simple linear regression. Applying it to the above data yields FileSetupTime of 120 ms and the MediaSpeed of 1541 KBps. With these two numbers we can now accurately estimate how long it is going to take to copy any given file.
Caveats:
1. For this to work we need to process two files or more.
The good news is that if there is just one file, we will need an ETA only if the file is big. This in turn means that FileSetupTime can be ignored as the most of the time will be spent on copying, so the naive estimation will work just fine.
2. Files should be of a noticeably different size. Otherwise any measurement noise in T values is going to affect the quality of the solution. A single stutter anywhere along the copying pipeline and we have T that sits above all other measurements on the above plot, and erroneously tilts the solution line in its own direction.
If files are of roughly the same size, then abnormally high and low samples can be discarded with the median filter.
3. MediaSpeed may change with time. For example if the backup is done over the WiFi, then the backup speed is going to fluctuate depending on the available WiFi bandwidth. Other sources of the speed fluctuation include various caches, the hard drive or the CPU being stressed by another application and the presence of the network link compression.
The way to address this is to use sliding time window and use only recent data samples, e.g. only those from the last 5 to 10 seconds.
4. Other operations such as creating a directory or deleting a file may also take non-trivial amount of time, and these need to be accounted for when calculating the ETA.
What's done, what's not done
As of 1.0.0.355 the linear regression estimator is a part of Bvckup. However all it does at the moment is estimate the copying speed. It needs to be further adapted to work around the caveats, and to perform the actual ETA calculation. There is also some UI work as the ETA needs to be displayed somewhere. Stay tuned. Probably right after the Volume Shadow Copying support :)
An estimate of the completion time (ETA) is a nice thing to know, especially with very long backups. Computing the ETA however is far less trivial than it may seem at first.
Naive approach
The naive approach is to periodically check how much time has elapsed since the start of the backup and how much data was copied. Divide one by another and that will yield a backup speed in bytes per second. Multiply it by an number of bytes left to be copied and that will be a rough estimate of the remaining time. Simple. And wildly inaccurate.
For example, take my local setup. I am backing up from a Windows laptop over WiFi to a fairly old and stressed NAS device. Copying a bunch of smallish (10KB) files yields an average throughput of 79 KB/sec. 100KB ones go at about 541 KB/sec, and 1MB files - at 1305 KB/sec:
File size Throughput
10 KB 79 KBps
100 KB 541 KBps
1 MB 1305 KBps
Clearly the ETA cannot be computed from the throughput estimate, it just depends too much on the file size.
Better model
The reason obviously enough is that some time is spent on actually creating the file, copying its attributes and doing all other things that are needed for replicating the file in addition to copying the file data. In other words the time spent copying a file is a sum of a fixed overhead and whatever is needed to push the data from source to destination:
T = FileSetupTime + BytesCopied / MediaSpeed
If we can estimate the MediaSpeed and FileSetupTime, then we can more accurately guess how much longer we are going to be processing the remaining X bytes of data in N files.
For every processed file we know the BytesCopied and we know the T, which is cumulative time spent processing the file. For the above data the T is 127 ms, 185 ms and 785 ms respectively, and T plotted against BytesCopied looks like this:

Now squint your eyes, tilt your head to the left a bit. Isn't that a beautifully straight line going through these three points? :)
T linearly depends on the BytesCopied, the model appears to be correct. All we have to do now is solve it.
Linear regression
Long story short - we solve the whole thing using simple linear regression. Applying it to the above data yields FileSetupTime of 120 ms and the MediaSpeed of 1541 KBps. With these two numbers we can now accurately estimate how long it is going to take to copy any given file.
Caveats:
1. For this to work we need to process two files or more.
The good news is that if there is just one file, we will need an ETA only if the file is big. This in turn means that FileSetupTime can be ignored as the most of the time will be spent on copying, so the naive estimation will work just fine.
2. Files should be of a noticeably different size. Otherwise any measurement noise in T values is going to affect the quality of the solution. A single stutter anywhere along the copying pipeline and we have T that sits above all other measurements on the above plot, and erroneously tilts the solution line in its own direction.
If files are of roughly the same size, then abnormally high and low samples can be discarded with the median filter.
3. MediaSpeed may change with time. For example if the backup is done over the WiFi, then the backup speed is going to fluctuate depending on the available WiFi bandwidth. Other sources of the speed fluctuation include various caches, the hard drive or the CPU being stressed by another application and the presence of the network link compression.
The way to address this is to use sliding time window and use only recent data samples, e.g. only those from the last 5 to 10 seconds.
4. Other operations such as creating a directory or deleting a file may also take non-trivial amount of time, and these need to be accounted for when calculating the ETA.
What's done, what's not done
As of 1.0.0.355 the linear regression estimator is a part of Bvckup. However all it does at the moment is estimate the copying speed. It needs to be further adapted to work around the caveats, and to perform the actual ETA calculation. There is also some UI work as the ETA needs to be displayed somewhere. Stay tuned. Probably right after the Volume Shadow Copying support :)
