Extract Records Matching Criteria in another File

A common problem that presents itself when doing analysis is the need to extract detail records that match a list of values (criteria) in another file. This can range from simple to more complicated, primarily depending on the nature of the match. The following describes three approaches, based on the objective of the analysis.

Simple question:

A simple question of this type is where we have a file with detailed transactions that we want to compare to a file that has a list of values (e.g. Payroll). If we have a file of all pay transactions (details) and a list of employees (Employee file) who should be paid (criteria) and we want to know if every pay transaction has a matching employee record, then you can do a JOIN or RELATE with the detailed payroll transactions as your Primary (or Parent) file. ACL allows you to do a many-to-one JOIN or RELATE. This will give you all of the detailed payroll transactions where there is a matching employee (Matched Primary); or you can get all of the payroll transactions where there is no matching employee (Unmatched Primary). Finally, you can also obtain both matched and unmatched (All Primary) – which is what the RELATE will give you as well.

If your results contains both match and unmatched pay record, to determine whether you have a matched record or an unmatched record in your results, simply apply a filter.

First to identify the matched records: this is the case where the employee number from payroll details (EmpNo) has a match with an employee number (EmpNum) from the employee file.

In JOIN it would be the EmpNo is the same as the EmpNum – filter: Empno = EmpNum
In a RELATE it would be similar, but since the Parent and Child files are physically separate, we need to include the table name in the filter for child fields – filter: EmpNo = Employee.EmpNum

Next to identify the unmatched records: this is the case where the employee number from payroll details (EmpNo) does not have a match with an employee number (EmpNum) from the employee file.

In JOIN it would be the EmpNo does not have a match with an Empnum – filter: Empno <> EmpNum
In a RELATE it would be similar, but again we need to include the table name in the filter for child fields – filter: EmpNo <> Employee.EmpNum

More Complicated Question:

The previous example is a many-to-one match. What if the Secondary or Child file has duplicates records (records with the same key value). For example, the Employee file had not only the employee number of everyone who should be paid, but also the rates of pay for different periods of time (From-To dates). So an employee could have multiple records if their rate of pay changed.

The employee file might look like this:
Empnum        Pay rate           From Date          To Date
12345             1,200.00              Jan 1, 2016           Jul 31, 2016
12345             1,225.00              Aug 1, 2016          Dec 31, 2016
12345             1,250.00            Jan 1, 2017           Dec 31, 2017

The pay file might look like this:
   EmpNo                 Pay Amount               Pay Date
12345                $1,200.00                      Jul 15, 2016
12345                $1,225.00                      Aug 15, 2016
12345                $1,225.00                      Sep 15, 2016
12345                  $1,250.00                      Oct 15, 2016
12345                    $1,250.00                      Nov 15, 2016
12345                $1,250.00                      Dec 15, 2016

The objective might be to find not only people who were paid and should not have been (as in the previous example), but also people who were not paid the correct amount. This means we have to match the pay details not only with the correct employee record (EmpNo = EmpNum) but also the pay period with the correct From-To dates. That means that were have a many-to-many match. We cannot use RELATE since it can only perform a many-to-one match. JOIN, however, does have a many-to-many option.

So we would perform a JOIN where the pay transaction detail file is our Primary file and the Employee file (with the pay periods and rates) is our SECONDARY file. A possible issue with a many-to-many JOIN is that the resulting file can be quite large. For example, If I had 10,000 employees each with 26 pay records (260,000 detailed records); and an employee file where each employee had at least 10 sets of pay rates (10,000 * 10 = 100,000 employee records) and performed a JOIN many-to-many, the resulting file would have 2,600,000 records (260,000 pay records * 10 pay rate records per employee).

You have two options: perform the JOIN and then filter the results to find pay records where the Pay Date falls within the From-To Dates; or apply the filter to the JOIN command. If the resulting file is not going to be overly large, I suggest the first option: perform the JOIN to a temporary file; apply the filter; and EXTRACT to your final results file. However, you can also apply the filter directly to the JOIN command. In either case, the filter will isolate the detailed pay transactions for each employee that has a matching record in the Employee file AND the Pay Date falls within the From-To Dates.

Temporary file: Filter IF BETWEEN(Pay_Date, From_Date, To_Date); and then EXTRACT to the final results.

Filter applied to JOIN command: JOIN MANY PKey EmpNo SKey EmpNum IF BETWEEN(Pay_Date, Employee.From_Date, Employee.To_Date) – note: you must include the table name for the secondary fields.

With either approach, the results will contain pay records where the matching pay rate is included. You can now filter Pay_Amount <> Pay_Rate to identify people who were paid an incorrect amount based on the pay day and pay rate for that period.

Even More Complicated Question:

In essence, the “more complicated question” was still fairly simple because we were performing the JOIN on field that equalled the value in the Secondary file (i.e. where the key fields were equal: EmpNo = EmpNum). It becomes “even more complicated” when the two files do not have key fields that match. For example, I want to identify all detailed transactions where the description field contains one or more of list of keywords in my exceptions file (e.g. all P-Card transactions where the Merchant name contains a keyword such a bar, hotel, spa, casino, etc.). I can’t perform a JOIN where Merchant Name equals the keyword value because they are not identical (“Dave’s Bar and Funspot” won’t match with the keyword “Bar”).

Instead I need to EXTRACT the records where the merchant name contains a keyword. To do this manually, I would:

Open the Keyword file
Find the first keyword value (“Bar”)
Open the P-Card transaction file
Apply the filter FIND(“Bar”, Merchant_Name)
EXTRACT all the records that meet the filter to a Results file
Open the keyword file
Get the next keyword value (“Hotel”)
Open the P-Card transaction file
Apply the filter FIND(“Hotel”, Merchant_Name)
EXTRACT all the records that meet the filter to a Results file
Continue until all keyword values have been selected and the filter and extract performed

Simple if you have a short list of keywords, but what if you have 1,000 keywords? Why not have ACL do the keyword value selection, apply the filter and perform the extract – over and over again until all keywords have been applied?

The most common approach is to use a main script that calls a subscript which applies a filter and performs an extract of the records that meet the filter. So if I have four keywords (bar, hotel, spa, and casino), the main script will call the subscript four times. The first time the subscript will apply the filter FIND(“Bar”, Merchant_Name) and EXTRACT all the records that meet the filter to a Results file. It then gets the next keyword value (“Hotel”) and calls the subscript again, applies filter FIND(“Hotel”, Merchant_Name) and EXTRACTs all the records that meet the filter to a Results file. It continues to get keyword values until all the keyword values have been selected and the EXTRACTs performed.

Question #1: How many times does it call the subscript? Answer: as many times as there are keyword values – which can be determined by performing a COUNT on the keyword file.

Question #2: How does it get the 2^nd, 3^rd, and 4^th keyword? Answer: by using a counter (v_Rec_ctr) and locating the record that equals the counter (LOCATE RECORD v_Rec_ctr). The Counter is incremented each time the subscript is run (v_Rec_ctr = v_Rec_ctr + 1). Assuming the P-Card file is called PCard_Trans and the Keyword file is called Key_Words which contains a field called Keyword_value (i.e. has “Bar”, Hotel”, etc.), the script looks like this:

***** Main script Extract_PCard Recs1 *****

COMMENT **** close all tables and delete previous results *************
CLOSE PRIMARY SECONDARY
DELETE PCard_Results.FIL OK
DELETE FORMAT PCard_Results OK

OPEN Key_Words

COUNT – counts the number of records in the Key_Words file (i.e. number of keyword values)
v_Num_Keywords = COUNT1 – the number of times to run the subscript
v_Rec_ctr = 1 – this tells me which record in the Key_words file to read
v_keyword = BLANKS(30) – initializes variable to store the keyword value from the Key_words file – make sure you have enough blanks to hold the longest keyword value

DO Extract_PCard_Recs2 WHILE v_Rec_ctr <= v_Num_Keywords – executes the subscript until all keywords have been read.

OPEN PCard_Results – open final results file after subscript has finished

****** Script Extract_PCard Recs2 ********

OPEN Key_Words

LOCATE RECORD v_Rec_Ctr – positions ACL at the next Key_Words record (rec 1 the first time it is run)

v_Keyword = Keyword_Value – sets v_keyword to the value in Keyword_value (e.g. v_keyword = “Bar”)

OPEN Pcard_Trans

SET FILTER FIND(ALLTRIM(“%v_keyword%”), Merchant_name) – (e.g. FIND(“Bar”, Merchant_name)

EXTRACT FIELDS ALL v_keyword as “Keyword” to PCard_Results APPEND – extract records meeting filter, plus a filed which contain the keyword value that caused the record to be extracted

V_Rec_ctr = v_Rec_Ctr + 1 – increments the record counter to get the next record in the Key_Words file.

Another option would be to combine your Keyword value values into a string (field) and use this field in your transaction file. However, it depends on the length of each Keyword value – will all the keyword values fit into 10,000 characters when you include a “;” separator?

SET SAFETY OFF
OPEN Key_words
COUNT
V_Max_Ctr = COUNT1
v_Keyword_list = Blanks(10000)
v_found = F
v_ctr = 1

GROUP
v_keyword_list = ALLTRIM(v_keyword_list) + ALLTRIM(Keyword_Value) + “;”
END

OPEN PCard_Trans

GROUP
LOOP WHILE v_ctr <= v_Max_Ctr AND v_found = F
v_found = T IF FIND(SPLIT(v_Keyword_list, “;”, v_ctr), Merchant)
EXTRACT FIELDS ALL SPLIT(v_Keyword_list, “;”, v_ctr) AS “Match_Code” IF v_Found = T to PCard_RESULTS
v_ctr = v_ctr + 1
END
v_ctr = 1
v_found = F
END

OPEN PCard_Results

SET SAFETY ON

**************

Other common examples of this type of analysis are:

Amount charged compared to master file with products and different prices at different times
Amount paid compared to Contract file – with products and different prices

Basically anything where you want to compare detailed transactions with a historical record of that includes changes (e.g. different prices for specific time periods).

There are others ways to accomplish the analyses discussed above, but in every case it must start with a good understanding of what you want to accomplish. Such as:

Do I want to see all the records or just those that don’t match?
Do I have duplicates in the Secondary file or not?

I hope this has helped you to understand three ways to extract records meeting criteria that are contained in a second file.

Extract Records Matching Criteria in another File

This article has 1 Comment

Leave a Reply Cancel reply

Share this:

This article has 1 Comment

Leave a Reply Cancel reply