It is currently Thu Mar 28, 2024 10:20 am


All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 2 posts ] 
Author Message
 Post subject: Code for selecting Random entries in a FileMan File.
PostPosted: Mon Mar 14, 2011 5:45 pm 
User avatar

Joined: Wed Nov 17, 2010 4:02 pm
Posts: 71
Location: Houston TX
Real Name: David Whitten
Began Programming in MUMPS: 06 Jan 1982
This is code that I haven't documented yet, and am trying to make sure works.

As a learning exercise, I've decided to post it. feedback is welcome...

KBADSELR ;DJW; Select a Random Sample of the Patient Population
;
K RAND S RAND("S")="I 1",U="^"
K DIR
S DIR("A")="Include Vets Only ?",DIR("B")="YES"
S DIR(0)="Y" D ^DIR G QX:$D(DIRUT) S VETSONLY=Y
I VETSONLY="Y" S RAND("S")=RAND("S")_",$P($G(^DPT(Y,""VET"")),U)=""Y"""
K DIR
S DIR("A")="Y^Include Deceased Persons ?",DIR("B")="YES"
S DIR(0)="Y" D ^DIR G QX:$D(DIRUT) S DEAD=Y
I DEAD="N" S RAND("S")=RAND("S")_",$P($G(^DPT(Y,.35)),""^"")"
S RAND="^DPT",RAND("OUT")="^UTILITY($J,$T(+0))"
D RAND(.RAND)
; do something with results stored in @RAND("OUT")@(foo)
QX ;
Q
;
; Rand() is called with a variable named after itself, following the design
; of classic FileMan and Kernel calls.
RAND(RAND) ;
N ROOT S ROOT="^DPT" S:$D(RAND)#2 ROOT=RAND
N IENLIST S IENLIST="IENLIST" S:$D(RAND("OUT"))#2 IENLIST=RAND("OUT")
N CHECK S CHECK="I 1" S:$D(RAND("S"))#2 CHECK=RAND("S")
N LIM S LIM=1 S:$D(RAND("LIM")) LIM=RAND("LIM")
N CURIEN,INDX,RANDOFFS,RINDX,RLIST,OFS
;if you are excluding some entries with a RAND("S") check
; you should assume you don't know how many entries you have,
; so you have to count them.
;otherwise, pass in how many entries known
N TOTAL
I $D(RAND("TOTAL"))[0 D
. S (CURIEN,TOTAL)=0
. F S CURIEN=$O(@ROOT@(CURIEN)) Q:CURIEN'=+CURIEN D
.. N Y S Y=CURIEN X:$D(CHECK)#2 CHECK E Q
.. S TOTAL=TOTAL+1
E S TOTAL=RAND("TOTAL")
; get LIM unique random numbers
S INDX=1 F Q:LIM+1=INDX D
. S RANDOFFS=1+$RANDOM(TOTAL) ; shift random number to range 1..COUNT
. Q:$D(RLIST(RANDOFFS))'=0 ; ignore if we already chose this one
. S RINDX(INDX)=RANDOFFS ; offset of the I-th random entry is RINDX()
. S RLIST(RANDOFFS)=INDX ;put the offset in our list to guarantee uns
. S INDX=INDX+1 ;up the current count, until we get to the t
;loop thru the given global to find the identity
; of each of the random entries
;small optimization to not examine offsets after the maximum we chose
;B "S+" B
S CURIEN=0,OFS=0,INDX=1
F S CURIEN=$ORDER(@ROOT@(CURIEN)) D Q:OFS>TOTAL Q:$O(RLIST(OFS))=""
. N Y S Y=CURIEN X:$D(CHECK)#2 CHECK E Q
. S OFS=OFS+1
. Q:$DATA(RLIST(OFS))=0 ;this offset is not one of the chosen ones.
. S @IENLIST@(INDX)=CURIEN ; the I-th random entry is at IENLIST(INDX)
. S INDX=INDX+1
I 1+LIM'=INDX WRITE !,"Bug in Code ("_(1+LIM)_"'="_INDX_")",! B "S+"B
;I $D(RAND("OUT"))#2 K @RAND("OUT") M @RAND("OUT")=IENLIST
I $D(RAND("RLIST"))#2 K @RAND("RLIST") M @RAND("RLIST")=RLIST
I $D(RAND("RINDX"))#2 K @RAND("RINDX") M @RAND("RINDX")=RINDX
Q
TEST ;
K RAND
S RAND("LIM")=3,ROOT="^UTILITY($J,""SAMPLE"")"
K @ROOT
F I=1:1:10 S @ROOT@(1+$R(25000),0)=I
S J=0
F I=0:1 S J=$O(@ROOT@(J)) Q:J="" W !,$J(J,5)_",0) ="_$G(@ROOT@(J,0))
W !," Total Entries in test case: ",I
W !," Selecting number of entries:",RAND("LIM")
S RAND=ROOT W !,"For Global Root: ",ROOT
S RAND("S")="I Y#2=1" W !,"Allowing only Odd IENs"
W !
K ROOT,PATLST,RANDLST,RINDXLST
S RAND("OUT")="PATLST",RAND("RLIST")="RANDLST",RAND("RINDX")="RINDXLST"
D RAND(.RAND)
ZWRITE PATLST,RANDLST,RINDXLST
Q


Top
Offline Profile  
 
 Post subject: Re: Code for selecting Random entries in a FileMan File.
PostPosted: Tue Mar 15, 2011 5:00 pm 
User avatar

Joined: Wed Nov 17, 2010 4:02 pm
Posts: 71
Location: Houston TX
Real Name: David Whitten
Began Programming in MUMPS: 06 Jan 1982
I'm posting this reply to myself, as I didn't go into a lot of detail in my last posting, and the documentation still needs to be written, as well as enhancements to do useful things that integrate well with VistA based systems.

In general design, the subroutine RAND has one argument which is an local array named RAND.
This subroutine when given certain parameters, including how many entries to randomly select in the File, and will then looks through the FileMan File, and return the entries chosen.

The variable RAND is a closed global root of the FileMan File you wish to use as input.
The variable RAND("OUT") is the closed global root of the location to store the entries found.
The variable RAND("LIM") is the number of random entries that should be found
The variable RAND(
The variable RAND("S") is code that when given the entry number in the local variable Y
tests that entry number to see if it should be screened out of the entries you wish to possibly
include in the group of entries which can be randomly chosen from among. The Result of the screen is returned in the $TEST with $T=0 meaning don't include the entry, and $T=1 meaning do include this entry.
The variable RAND("TOTAL") is the number of entries in the File. If this is not supplied, the code calculates it.

The procedure followed basically involves:

Either the total number of entries in RAND("TOTAL") is given, or each entry in the file is counted.
If RAND("S") is provided, only entries where $TEST is TRUE are counted.

Then as many random numbers as specified in RAND("LIM") are chosen, with the requirement that
there are no duplicate random numbers allowed.
There are two array variables named RLIST and RINDX.
The first random number is in RLIST(1), the second random
number is in RLIST(2), etc. The offset for each number is stored in RINDX.
If the number 7 were chosen as the fifteenth random number, then RINDEX(7)=15 and RLIST(15)=7.

After the arrays specifying the various random offsets are chosen, then
the FileMan File is processed again, to determine which entries in the File
correspond to the random offsets that were chosen. This requires an entire
pass through the File, except for offsets that are larger than the highest offset
randomly chosen.

If someone can think of a need for the RLIST and RINDX arrays, they can be returned from
the program by defining open global roots as locations to save them, specifying RAND("RINDX")
and specifying RAND("RLIST").

As written now, This code goes through all the entries in the FileMan file once if the total is not known,
and goes through some portion of all the entries depending on the required count of random numbers specified.

This could be sped up by creating a cross reference with as many entries as there are in the
file which pass the screening logic. It would link the offset into the file for each entry with the entry number.
Then it would be rather quick to select which ones were chosen. This is a traditional time-space tradeoff
which I have not coded in this program. Does anyone think a variant that actually does this would be useful?


Top
Offline Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 2 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Theme created StylerBB.net