Connectionists: ReForeSt an Apache Spark Library (Random Forests, Random Rotation Ensembles, and Model Selection)

Alessandro Lulli alessandro.lulli at dibris.unige.it
Tue Sep 11 04:03:58 EDT 2018


Apologies for the cross postings.

We are pleased to announce the first stable version of our ReForeSt library.
https://github.com/alessandrolulli/reforest
which is made available under the Apache License 2.0 on GitHub

Key features
- Implemented in Scala to be fully distributed on Apache Spark
- Implements Random Forests [1]
- Implements Random Rotation Ensembles [2]
- Implements an efficient Model Selection strategy [3]
- similar API to MLlib Random Forest but up to 6x faster and up to 10x less
memory requirements [3]

ReForeSt is a distributed, Apache Spark based scalable implementation of
the Random
Forest learning algorithm targeting a fast and memory efficient processing
written in Scala.
The distinguishing features of ReForeSt are the ability to support
arbitrary large datasets
ranging from millions of samples to millions of features, categorical
features and missing
values, different data distributions models, Random Rotations, and
automatic hyperparameters selection.
ReForeSt is a simple alternative to MLlib since it shares very similar API.
It covers the
lack of MLlib in providing results for dataset having million of features.
ReForeSt is always
faster and requires less memory with respect to MLlib. MS is a useful tool
to retrieve the
best performing hyperparameters and may help users when there is low
knowledge about
the problem or to test multiple hyperparameters in less time.

[1] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
[2] Blaser, R., & Fryzlewicz, P. (2016). Random rotation ensembles. The
Journal of Machine Learning Research, 17(1), 126-151.
[3] Lulli, A., Oneto, L., & Anguita, D. (2017, December). Crack random
forest for arbitrary large datasets. In Big Data (Big Data), 2017 IEEE
International Conference on (pp. 706-715). IEEE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20180911/925a46de/attachment.html>


More information about the Connectionists mailing list