From schneide at cs.cmu.edu Fri Jan 28 10:35:19 2011 From: schneide at cs.cmu.edu (Jeff Schneider) Date: Fri, 28 Jan 2011 10:35:19 -0500 Subject: [Research] Fwd: Reminder - Thesis Proposal TODAY- Yi Zhang Message-ID: <4D42E237.8020408@cs.cmu.edu> Come to NSH 1507 at noon to see the next Auton Lab thesis proposal! Jeff. -------- Original Message -------- Subject: Reminder - Thesis Proposal TODAY- Yi Zhang Date: Fri, 28 Jan 2011 09:48:14 -0500 From: Diane Stidle To: ml-seminar at cs.cmu.edu, Jerry Zhu Thesis Proposal Date: 1/28/11 Time: 12:00pm (Pizza, while it lasts) Place: 1507 NSH PhD Candidate: Yi Zhang Title: Supervision Reduction by Encoding Extra Information about Models, Features and Labels Abstract: Learning with limited supervision presents a major challenge to machine learning systems in practice. Fortunately, various types of extra information exist in real-world problems, characterizing the properties of the model space, the feature space and the label space, respectively. With the goal of supervision reduction, this thesis studies the representation, discovery and incorporation of extra information in learning. Extra information about the model space can be encoded as compression operations and used to regularize models in terms of compressibility. This leads to learning compressible models. Examples of model compressibility include local smoothness, compacted energy in frequency domains, and parameter correlation. When multiple related tasks are learned together, such a compact representation can be automatically inferred as a matrix-variate normal distribution with sparse inverse covariances on the parameter matrix, which simultaneously captures both task relations and feature structures. Extra information about the feature space can usually be conveyed by certain feature reduction. We propose the projection penalty to encode any feature reduction without the risk of discarding useful information: a reduction of the feature space can be viewed as a restriction of the model search to certain model subspace, and instead of directly imposing such a restriction, we can search in the full model space but penalize the projection distance to the model subspace. In multi-view learning, the projection penalty framework provides an opportunity to simultaneously address both overfitting and underfitting. Extra information about the label space can be extracted and exploited to improve multi-label predictions. To achieve this goal, we present error-correcting output codes (ECOCs) for multi-label classification: label dependency is represented by the most predictable directions in the label space and extracted by canonical correlation analysis (CCA) and its variants; the output code is designed to include these most predictable directions in the label space to correct prediction errors. Decoding of such codes can be efficiently performed by mean-field approximation and significantly improves the accuracy of multi-label predictions. Effective collection of supervision signals is an indispensable part of supervision reduction. We consider active learning for multiple prediction tasks when their outputs are coupled by constraints. A cross-task value of information criteria is designed, which encodes output constraints to measure not only the uncertain of the prediction for each task but also the inconsistency of predictions across tasks. A specific example of this criteria leads to the cross entropy between the predictive distributions of coupled tasks, which generalizes the notion of entropy used in single-task uncertainty sampling. Thesis Committee: Jeff Schneider, Chair Geoff Gordon Tom Mitchell Xiaojin Zhu (University of Wisconsin-Madison) Link to the draft: http://www.cs.cmu.edu/~yizhang1/docs/Proposal_V2.pdf -- ******************************************************************* Diane Stidle Business & Graduate Programs Manager Machine Learning Department School of Computer Science 8203 Gates Hillman Complex Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3891 Phone: 412-268-1299 Fax: 412-268-3431 Email: diane at cs.cmu.edu URL:http://www.ml.cmu.edu From schneide at cs.cmu.edu Fri Jan 28 10:25:54 2011 From: schneide at cs.cmu.edu (Jeff Schneider) Date: Fri, 28 Jan 2011 15:25:54 -0000 Subject: [Research] Fwd: Reminder - Thesis Proposal TODAY- Yi Zhang Message-ID: <4D42DFFF.1070803@cs.cmu.edu> Come by NSH1507 at noon and see the next Auton Lab thesis proposal! Jeff. -------- Original Message -------- Subject: Reminder - Thesis Proposal TODAY- Yi Zhang Date: Fri, 28 Jan 2011 09:48:14 -0500 From: Diane Stidle To: ml-seminar at cs.cmu.edu, Jerry Zhu Thesis Proposal Date: 1/28/11 Time: 12:00pm (Pizza, while it lasts) Place: 1507 NSH PhD Candidate: Yi Zhang Title: Supervision Reduction by Encoding Extra Information about Models, Features and Labels Abstract: Learning with limited supervision presents a major challenge to machine learning systems in practice. Fortunately, various types of extra information exist in real-world problems, characterizing the properties of the model space, the feature space and the label space, respectively. With the goal of supervision reduction, this thesis studies the representation, discovery and incorporation of extra information in learning. Extra information about the model space can be encoded as compression operations and used to regularize models in terms of compressibility. This leads to learning compressible models. Examples of model compressibility include local smoothness, compacted energy in frequency domains, and parameter correlation. When multiple related tasks are learned together, such a compact representation can be automatically inferred as a matrix-variate normal distribution with sparse inverse covariances on the parameter matrix, which simultaneously captures both task relations and feature structures. Extra information about the feature space can usually be conveyed by certain feature reduction. We propose the projection penalty to encode any feature reduction without the risk of discarding useful information: a reduction of the feature space can be viewed as a restriction of the model search to certain model subspace, and instead of directly imposing such a restriction, we can search in the full model space but penalize the projection distance to the model subspace. In multi-view learning, the projection penalty framework provides an opportunity to simultaneously address both overfitting and underfitting. Extra information about the label space can be extracted and exploited to improve multi-label predictions. To achieve this goal, we present error-correcting output codes (ECOCs) for multi-label classification: label dependency is represented by the most predictable directions in the label space and extracted by canonical correlation analysis (CCA) and its variants; the output code is designed to include these most predictable directions in the label space to correct prediction errors. Decoding of such codes can be efficiently performed by mean-field approximation and significantly improves the accuracy of multi-label predictions. Effective collection of supervision signals is an indispensable part of supervision reduction. We consider active learning for multiple prediction tasks when their outputs are coupled by constraints. A cross-task value of information criteria is designed, which encodes output constraints to measure not only the uncertain of the prediction for each task but also the inconsistency of predictions across tasks. A specific example of this criteria leads to the cross entropy between the predictive distributions of coupled tasks, which generalizes the notion of entropy used in single-task uncertainty sampling. Thesis Committee: Jeff Schneider, Chair Geoff Gordon Tom Mitchell Xiaojin Zhu (University of Wisconsin-Madison) Link to the draft: http://www.cs.cmu.edu/~yizhang1/docs/Proposal_V2.pdf -- ******************************************************************* Diane Stidle Business & Graduate Programs Manager Machine Learning Department School of Computer Science 8203 Gates Hillman Complex Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3891 Phone: 412-268-1299 Fax: 412-268-3431 Email: diane at cs.cmu.edu URL:http://www.ml.cmu.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: Thesis Proposal Poster-zhang.pdf Type: application/acrobat Size: 1019821 bytes Desc: not available URL: