Count the number of records to determine Sample Size

8 posts / 0 new
Last post
E.Heizyk
Offline
Joined: 11/29/2018 - 16:53
Count the number of records to determine Sample Size

Hi,
I was wondering if there is a way to gather the number of records within a database, then evaluate that number to determine the number of records to randomly extract for sampling.
For example:
If there are: <250 records in the database, the number of items sampled is 10%
250-2,500: 25 items need to be sampled
2,501 - 10,000: 45 items
10,001 + : 60 items
I cant figure out how to use the db.Count function and assign it to a varriable in order to evaluate it in the If function.

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Good day,

Here is some sample code that will calculate your sample size.  It first gets the number of items in the database, I used the Sample-Bank Transactions but you can change that for your file.  I then have a series of if / elseif statements to calculate the sample size (I could have also done this with a select case) and then at the end I have a message box that displays the sample size.  Hope this makes sense.


Sub Main
	Dim db As database
	Dim count As Long
	Dim sampleSize As Long
	
	Set db = client.OpenDatabase("Sample-Bank Transactions.IMD")
		count = db.count
		If count < 250 Then
			sampleSize = .10 * count
		ElseIf count <= 2500 Then
			sampleSize = 25
		ElseIf count <= 10000 Then
			sampleSize = 45
		Else
			sampleSize = 60
		End If
		MsgBox "This is your sample size: " & sampleSize
	Set db = Nothing
End Sub
E.Heizyk
Offline
Joined: 11/29/2018 - 16:53

Thanks that works great! 
Once I assign sampleSize, I am using it within the RandomSample task, is there a way to generate a random number to use for the 'random number seed'
 
I currently have:
Set task = db.RandomSample
task.IncludeAllFields
dbName = "NonParticipants Random Sample.IMD"
task.CreateVirtualDatabase = False
task.PerformTask dbName, "", sampleSize, 1, db.Count, 19531, False
Set task = Nothing
Set db = Nothing
the 19531 is the field I am trying to make a random amount. 
 
 
Thanks!
 

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

No problem, you can use this to create your random number, there is actually a function called rnd in visual basic that will give you a random number between 0 and 1 so I have multiplied by 1000000 to get a whole number.

randomStart = 1000000 * Rnd

E.Heizyk
Offline
Joined: 11/29/2018 - 16:53

Thanks Brian, this worked perfectly for the first set of selections I needed to make, however I then have a set of selections where I am getting an error of 'Invaild number of records', and I set it up using the same logic
 
This is the one that works:
Function NPRandomSample1
Set db = Client.OpenDatabase("CY non participants.IMD")
'Assign count and caculate the number of selections to be made
nonParCount = db.Count
If nonParCount < 250 Then
nonParSel = .10 * nonParCount
ElseIf nonParCount <= 2500 Then
nonParSel = 25
ElseIf nonParCount <= 10000 Then
nonParSel = 45
Else
nonParSel = 60
End If
'Run the random selection
randomStart = 10000000 * Rnd()
Set task = db.RandomSample
task.IncludeAllFields
dbName = "NonParticipants Random Sample.IMD"
task.CreateVirtualDatabase = False
task.PerformTask dbName, "", nonParSel, 1, db.Count, randomStart, False
Set task = Nothing
Set db = Nothing
End Function
 But this is the one that is getting an error:
Function PartRandomSample1
Set db = Client.OpenDatabase("2018 Participants.IMD")
'Assign count and caculate the number of selections to be made
ParCount = db.Count
If ParCount < 250 Then
ParSelTot = .10 * ParCount
ElseIf ParCount <= 2500 Then
ParSelTot = 25
ElseIf ParCount <= 10000 Then
ParSelTot = 45
Else
ParSelTot = 60
End If
Set db = Nothing
'Run the random selection of max out participants
Set db = Client.OpenDatabase("18.5k.IMD")
ParCountMax = db.Count
ParSelMax = Round((.10 * ParCountMax) ,0,0)
randomStart = 10000000 * Rnd()
Set task = db.RandomSample
task.IncludeAllFields
dbName = "Max Out Participants Random Sample.IMD"
task.CreateVirtualDatabase = False
task.PerformTask dbName, "",ParSelMax, 1, db.Count, randomStart, False <-Line with the error showing up
Set task = Nothing
Set db = Nothing
'Run the random selection of catch up particpants
Set db = Client.OpenDatabase("Catch up.IMD")
ParCountCU = db.Count
ParSelCU = .10 * ParCountCU
randomStart = 10000000 * Rnd()
Set task = db.RandomSample
task.IncludeAllFields
dbName = "Catch Up Participants Random Sample.IMD"
task.CreateVirtualDatabase = False
task.PerformTask dbName, "",ParSelCU, 1, db.Count, randomStart, False
Set task = Nothing
Set db = Nothing
'Run the random selection of remaining participants
Set db = Client.OpenDatabase("Less than 18.5k.IMD")
ParSel = ParSelTot - (ParSelCU + ParSelMax)
randomStart = 100000 * Rnd()
Set task = db.RandomSample
task.IncludeAllFields
dbName = "Less than 18.5k Participants Random Sample.IMD"
task.CreateVirtualDatabase = False
task.PerformTask dbName, "",ParSel, 1, db.Count, randomStart, False
Set task = Nothing
Set db = Nothing
End Function
 
 
 
 
 

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

I tried running the lines that you indicated had a problem and I was able to run them.  What I would do next is use msgbox to see what is in the variables on the PerformTask line and see if they make sense.  There could be a problem that the database is empty or doesn't have enough records.

E.Heizyk
Offline
Joined: 11/29/2018 - 16:53

Thanks for taking a look, I tried this and it appeas that the in the data set I am using the line:
ParSelMax = Round((.10 * ParCountMax) ,0,0) 
is returning 0 because ParCountMax is 4 (therefore resulting in .4, when I need it to round up to 1)
I added the following after the ParCountMax = db.Count in order to combat this issue but it seems like a messy way to deal with the problem.
If ParCountMax = 0 Then
Set task = Nothing
Set db = Nothing
ElseIf ParCountMax *.10 < 1 Then
ParSelMax = 1 + (.10 * ParCountMax)
Else
ParSelMax = .10 * ParCountMax
End If

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Yes, that would cause the problem.  Also that is a good way to get around this by using an if statement, I have done that one different scripts in the past where you don't know if you might get any results.