From sabhnani at cs.cmu.edu Wed May 13 06:53:29 2009 From: sabhnani at cs.cmu.edu (Robin Sabhnani) Date: Wed, 13 May 2009 06:53:29 -0400 (EDT) Subject: [Research] subset selection problem Message-ID: <63826.67.165.53.93.1242212009.squirrel@webmail.cs.cmu.edu> Hi, Here is the question: Imagine there are M pairs of numbers (a_1, b_1), (a_2, b_2) ... (a_m, b_m). For each subset m_1 of these M pairs we define a 2x2 table as follows: \sum a_i \sum b_i (i in m_1) \sum a_j \sum b_j (j in M - m_1) Is there a linear order algorithm in M to find the subset m_1 that yields the lowest p-value using fisher's exact test for the above 2x2 table? Any pointers/papers/ideas/thoughts will be helpful. Thanks in advance. -- Robin Sabhnani Machine Learning PhD student From schneide at cs.cmu.edu Wed May 13 10:17:59 2009 From: schneide at cs.cmu.edu (Jeff Schneider) Date: Wed, 13 May 2009 10:17:59 -0400 Subject: [Research] subset selection problem In-Reply-To: <63826.67.165.53.93.1242212009.squirrel@webmail.cs.cmu.edu> References: <63826.67.165.53.93.1242212009.squirrel@webmail.cs.cmu.edu> Message-ID: <4A0AD697.5070309@cs.cmu.edu> I'll offer a coffee/pepsi to anyone who solves this puzzle :-) I feel like such an algorithm must be possible, but I haven't been able to come up with one yet. Jeff. Robin Sabhnani wrote: > Hi, > > Here is the question: > > Imagine there are M pairs of numbers (a_1, b_1), (a_2, b_2) ... (a_m, b_m). > For each subset m_1 of these M pairs we define a 2x2 table as follows: > > \sum a_i \sum b_i (i in m_1) > \sum a_j \sum b_j (j in M - m_1) > > Is there a linear order algorithm in M to find the subset m_1 that yields > the lowest p-value using fisher's exact test for the above 2x2 table? > > Any pointers/papers/ideas/thoughts will be helpful. > > Thanks in advance. From neill at cs.cmu.edu Wed May 13 10:19:26 2009 From: neill at cs.cmu.edu (Daniel B. Neill) Date: Wed, 13 May 2009 10:19:26 -0400 (EDT) Subject: [Research] subset selection problem In-Reply-To: <63826.67.165.53.93.1242212009.squirrel@webmail.cs.cmu.edu> References: <63826.67.165.53.93.1242212009.squirrel@webmail.cs.cmu.edu> Message-ID: Hi Robin, This is very similar to my current work on linear-time subset scanning (LTSS), so we should chat soon (I'll be back in town on Monday). There are a couple fairly easy ways of showing that a function has the LTSS property, so we could give these a try. Best, Daniel On Wed, 13 May 2009, Robin Sabhnani wrote: > Hi, > > Here is the question: > > Imagine there are M pairs of numbers (a_1, b_1), (a_2, b_2) ... (a_m, b_m). > For each subset m_1 of these M pairs we define a 2x2 table as follows: > > \sum a_i \sum b_i (i in m_1) > \sum a_j \sum b_j (j in M - m_1) > > Is there a linear order algorithm in M to find the subset m_1 that yields > the lowest p-value using fisher's exact test for the above 2x2 table? > > Any pointers/papers/ideas/thoughts will be helpful. > > Thanks in advance. > -------------- next part -------------- A non-text attachment was scrubbed... Name: Syndromic Surveillance 2008- LTSS, revised.pdf Type: application/pdf Size: 67930 bytes Desc: ltss.pdf URL: From psarkar at cs.cmu.edu Sat May 30 18:56:43 2009 From: psarkar at cs.cmu.edu (Purnamrita Sarkar) Date: Sat, 30 May 2009 18:56:43 -0400 Subject: [Research] Using the .fds file format Message-ID: <4A21B9AB.6010705@cs.cmu.edu> Hi all, I am having some problems with saving a datset in fds (fast datset) format and loading it again. Right now the fds file (attached) has some junk in it. Could this be because I am running my code on a 64-bit machine? This is a snippet of code where I save it, ----------------- ds = ds_load_assuming_attr_types_and_number(filename,"x1/s x2/s x3/r x4/s"); ds_save(filename_fds,ds); ----------------- I would really appreciate any ideas/help :) thanks! Purna -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.fds URL: