I actually think that the key part of the test is whether bad contributions are encouraged. More total edits and a "small" increase in vandalism/bad edits would show success without requiring careful measurement.
Yes, I agree. A test sounds like a good idea. Want to bring it up in the BP?
That's what I had in mind