This is a large update. Three big changes:
- Substantial speed improvements – should improve both absolute speed and scaling
- Added generics for plot, summary, print.
- Switched from the package
cpp11for the backend. This removes a runtime dependency on
Rcpp, but adds one on
C++11and adds a compile time dependency on
Together, all this lead to many changes under the hood. As a consequence,
permutation_test_builder is substantially different (and no longer exported), and
order_stl no longer exists.
Each run of a *_test function now only sorts the data one time. Denoting the joint sample size N and the number of bootstraps K, this update moves the code from O(KNlog(N)) to O(KN) + O(Nlog(N)).
- Particularly for large samples or large numbers of bootstraps, this means a substantial improvement in speed.
- This required reworking the underlying C++
_statfunctions as well as the
- Instead of breaking code in unpredictable ways, this function is no longer exported. If you used it, archived copies can be found on github (particularly under the example R code versions), or by emailing me.
*_statthat are syntactically identical to the old ones still exist, but are no longer what is used by
- These changes likely reduced memory requirements for most users, though this is offset by the new default of storing bootstrap outputs.
There is now a ‘twosamples’ class, and generics for
plot, as well as a function for combining outputs correctly. This should make the printed behavior much better. As well as making it easy to see a fair bit of information using summary.
- plotting currently shows a histogram of the bootstrap values and a red line where the test-statistic is.
- This required making the
*_testfunctions export the bootstrap values. If you have memory intensive applications, this can be turned off with a toggle
keep.boots, at the cost of no longer being able to use the plotting.
- In the future I may add the ability to plot the ECDFs and the test stat images. This is the main reason for the
keep.samplestoggle which is turned off by default.
- This required making the
- In order to only sort once, this is now a proper permutation test again. This should also resolve some classes of potential validity issues. Proofs in the associated paper are (at the moment) not relevant to this for the same reason.
order_stlno longer exists. I do not believe anybody used this function outside its internal package use.
permutation_test_builderis no longer exported. I am not aware of anyone using this function outside its internal package use. A similar function is still available, but will require changing the syntax of functions for its inputs.
- Dependency switch from
cpp11and an additional system requirement of
CRAN release: 2022-06-06
This version is primarily bug fixes and documentation updates. These bug fixes may affect outputs users see.
I expect this update to be purely cosmetic for the vast majority of users. - For a few users of
cvm_test it is possible that re-running code will make significant differences to conclusions. - For the rare users of
two_sample) who relied on the scale of the test stat (rather than merely the p-value), this update will change outputs substantially. In principal this change is merely re-scaling everything by (np)/(2p/2).
- Fixed a major bug in how
cvm_stattreated duplicates. This bug lead to excessive power in some situations. Re-running code, p-values and test stats may change.
- Fixed a minor bug in how
dts_statcalculated standard deviations. Re-running code this will change the scale/location of the test stat, but should not affect p-values.
- Some minor performance improvements: e.g. eliminated some unnecessary comparisons
if (sd >0).
- renamed a functions internal variables to prevent an unlikely namespace conflict.
- Website using pkgdown now exists at https://twosampletest.com
- link to website in description
- Fixed an error in the documentation describing
dts_stat– in which a square root term was dropped
- updated discussion of order_stl
- added some notes about ability to use factors (ordered or not)
CRAN release: 2020-07-19
This update is only fixing up documentation. Fixes a bug that lead to poor formatting, improves formatting of equations, adds graphs for test statistics, adds links between help pages. See v1.1.0 for recent improvements to codebase.
CRAN release: 2020-07-14
This update is primarily fixing a bug which meant that the test stat sorting routine was O(N^2), not O(Nlog(N)).
- order_cpp was using an O(N^2) sort routine that was supposed to be ditched before package release. It is now deprecated.
- order_stl replaces order_cpp, using the STL sort function to run the required sorting routine.
- All test stat calculations were using 3 more length N vectors than necessary. This has been fixed.
- A paper demonstrating package components was posted to arXiv, and linked to throughout the documentation.
- The folder R/Extras was updated to use the code for the simulations in the arXiv paper.
- permutation_test_builder is now sampling with replacement.
CRAN release: 2018-12-03
The package has been released. The package includes test statistic functions (written in C++) for the following two-sample distance measures:
- Cramer-Von Mises
- Wasserstein Metric
- An updated Wasserstein – referred to as DTS
Each test statistic also has a corresponding permutation test function.
In addition there are two functions:
- order_cpp These are primarily intended for internal use, but there was no reason to not export them for other’s use.