Even though I am retired, I am still getting to do interesting analysis – perhaps even more so because I get to pick and chose want I want to do. In this case, I was asked to assist with an audit being performed by a municipal audit function. A large city had an internal audit group that was interested in auditing revenue – primarily from property taxes. One of the key objectives was to determine the completeness of the property tax data (i.e. were all properties included in the property tax data base?).
I suggested examining other external data sources including: water meter, building permit, and hydro data. The city was not responsible for issuing and collecting on hydro bills so only the first two sources were available – but even that was a good start.
Using water meter data we were able to identify all addresses with water meters and calculate the water usage. We had to be sure to use the correct information because the data also had billing address which could be different from the address where the meter was located. The data also had the meter install date and the first and last date the meter was read.
The property tax data also had a billing address and the physical – building location – address. But, before we could compare the property tax data with the water meter data we had to normalize the address fields. Even the address entries with each system were inconsistent, with variations such as: “St.” or “St” or “Street”; and “Unit 12 123 Main St” or “Unit #12 123 Main St” or “12 – 123 Main St” or “123 Main St Unit 12” or “123 Main St Unit #12”.
In the end we were able to match the water meter data with the property tax data to identify properties with buildings that were not paying property tax. While there were only a couple of hundred properties identified this still represented a total of over $1M in missing revenue.
The next source of data to test the completeness of the property tax information was the building permits. Building permits were issued for new building and for major renovations on existing building. The data including the address, the permit issue date, the inspection date, and status (e.g. completed). Again, the only common field was the address so we had to clean the data before doing the comparison.
Using the permit and property tax data we were able to identify new and existing buildings that were not paying property tax.
The next risk that we decided to address, after looking at the data, was the timeliness of the property tax data. We had notice significant delays in the time between the issuance of the permits and the inspection – sometimes several years. There were also long periods between the water meter install and the date of the first reading. We also noticed inspection and water meter reading dates that were years into the future.
The auditors determined that some of the dates were important and had a bearing on the amount of property tax paid in the year. For example if a building was not completed until July 1st, then property tax was only 50% of the yearly amount. The start date for tax calculation was the date of the final inspection. Delays in the inspection meant less property tax and dates into the future meant no property tax was collected.
Again, there were only a few hundred issues with dates, but the results included one case of fraud and hundreds of thousands in lost revenue.
In the end the audit identified issuing around completeness of the property tax data, resulting in loss revenue, and made recommendations that would integrate the separate data sources. We also identified data entry control weaknesses with respect to key dates and made recommendations which include: range checks and aging reports.
ACL Commands: FILTER, EXPRESSIONS, DUPLICATES, AGE, STATISTICS, INDEX and RELATE
Lessons-Learned: even though the audit had a very specific objective – assess the controls ensuring the completeness of the property tax data – there was a multi-phased approach. During the initial planning phase the idea of testing completeness by comparing property tax data to other data sources led to a data analysis approach that had not initially been considered. Despite challenges finding fields to use for the Relate command, perseverance and use of scripts from ACL scriptub such as ”Address Clean” made the task easier.
Another important lesson was the examination of the data in the meter and building permit data prior to performing the Relate in ACL. It is important to assess the integrity and to understand the data before performing the Relate. Had we simply performed a Relate using the address field and not examined the data first, we would not have identified issues with respect to the date fields.
When using and source data, even if only to test completeness of another system, you should analyze the source data. High-level summaries such as Classify, and simple Aging and Statistics on date fields can be very informative and may identify additional risks to be addressed during the audit. Not only did we identify control weaknesses around key date fields – and a fraud involving a deliberate future dating of inspection data in exchange for kickbacks – but the high-level summaries identified duplicate records which had to be addressed before we could perform a proper Relate.
Simply having access to data provides an opportunity to assess risks and controls in the system. My consulting contract was over, so I am not sure what happened after I left, but I image that the auditors will be examining the building permit and water meter processes in future audits. Based on our limited review of the data, there are control weaknesses and fraud risks in these business processes that should be examined.
Note: In thinking back on the analysis performed, I wish I had performed an additional analysis examining unit and apartment numbers for addresses with multiple units. Units or apartments that are not in the property tax data (e.g. Apt 101, 102, 103, 104, 105, 201, 202, 203, 205 – possibly missing Apt 204; or Unit 10, 12, 18 – possibly missing 14 and 16) could be instances of incomplete data. I do realize that this would likely involve more coding and manual follow-up to eliminate the false positive, but it could be worthwhile.
This article has 1 Comment
Interesting! I hate seeing inconsistent addresses but it’s all part of the data analysis grind.