Detecting the Copy/Paste Antipattern from within SBT
I'm in the middle of porting the build process of a few (Java) projects from old and complicated Ant to SBT, trying to fill a few gaps in SBT tooling, especially those that concern those of us sentenced to Java.
I've already done FindBugs integration which you can find here. FindBugs works on bytecode, so theoretically, it can work with Scala code, too. However, I've heard that it reports a lot of false positives there, as it's really tailored to Java in practice.
Then, there's the very useful Copy/Paste Detector from the PMD code analysis suite. I've written an SBT plug-in for CPD yesterday, too.
Now, if you ask me, Java is basically Copy/Paste Paradise: Without higher-order functions or closures, there's always the temptation to just copy all the boilerplate again and again. I had a Java code review project recently where about 40% of the whole codebase was generated by copy/paste!
But of course, this is not Java-specific. You can copy/paste Scala code, too, although in Scala, you should really never have to...
CPD can help you avoid this pitfall. Especially coupled with the nice DRY plug-in for the Hudson Continuous Integration Server, which gives you a pretty good overview of CPD's results, especially in Brownfield projects.
And now, thanks to me (yay for me!), you can use CPD from within SBT, too.
There's one drawback, though: CPD doesn't really support Scala as a language, either. I think it may work sufficiently well using the "Any" language tokenizer they have. But if you want good results, there's some work to do for you. And looking at the existing tokenizers for the other languages CPD supports, it's not too much work.
So, is anyone up to adding Scala support to CPD?