Improving tax collection with big data analytics

Tax evasion is one of the major obstacles to increasing the competitiveness of an economy. It directly and negatively affects the conditions for business activities in the market for the companies that legally declare and pay taxes, making their production costs, and, consequently, the price of their products and services higher compared to the prices of competitors who do not pay taxes and contributions. Moreover, tax evasion, due to relatively low budget revenues, directly erodes the space for improving the level and quality of public goods and services and citizens’ satisfaction. From the perspective of individuals working in the informal sector, it generates the conditions of insecurity and denial of fundamental human rights, that is, the features of a modern society (health, retirement, disability insurance, etc.).

Tax authorities around the world are putting their efforts to strenghten the tax collection and minimize tax evasion. Traditionaly they rely on tax control which is a limited and costly instrument. Big data analytics offers an immense opportunity to improve tax collection by improving risk management and thus focusing efforts of control on the most riskiest cases.

The Department of mathematics and computes science at the Faculty of Sciences of the University of Novi Sad has realized a research project with the Tax administration of the Republic of Serbia in big data analytics.

This complex task required to have a multiskilled data analytics team of researchers (data base management, statistics, economics and business analytics, numerical optimization, computation on big data) and to work in close cooperation with the Tax administration staff in order to benefit from their insights and experience from field control.  

The tax evasion in Serbia is relatively high. The dominant pattern of tax evasion by firms is the avoidance to declare the whole salary of the employees or to totaly avoid to declare labour income. In the first case the common practice is to declare and pay tax on the amount around a legaly prescribed minimal wage (and to pay in cash the rest of the agreed salary). In the second case, the tax is not paid at all.

In the research we used depersonalized individual data (excluding personal data of income recipients and income payers) based on the tax returns from the unified tax collection database. covering the period from April 2014 to May 2018. The database was previously encrypted, cleared and prepared in accordance with the regulations and principles on personal data protection (in line with GDPR).

In the entire database used for the research, there were 6,141,812 income recipients, 234,310 income payers (unique tax identification numbers) and 201,635,126 different combinations of income recipients, income payers, tax returns and types of income.

The main approach was to develop a set of risk indicators, which can be used by the tax authority to focus and prioritize the efforts of supervision and control.

Risk indicators are algorithms that are based on detecting a significant deviation of the “behavior” of a taxpayer from the expected/average behavior based on theoretical and/or empirically determined patterns.

In this way, we expected that a distribution of salaries within one firm would correspond to an aggregated distribution at the industry level. This industry level income distribution is expected to follow some industry specific requirements in terms of structure of labor as well as an income distribution pattern at the level of the entire population. The income distribution of the population is usually depicted as a log normal distribution – as supported by the existing theoretical literature on income distribution and in line with some previous empirical findings in countries with lower tax evasion.

Relying on this assumption, we constructed a risk indicators as a measure of a deviation of the individual firm level distribution of salaries from the distribution at the industry level in which a specific firm does its business.

This indicator is discerning firms with more “regular” shape of distribution of salaries – qualified as less risky from those with rather unusual shape of salaries distribution. It usually has a peak at the salary level around a minimal wage and a “missing bite” at the level of about a two times average salary. The firms from the latter segment should be targets of tax control.

This indictor is given to the tax administration for testing and it is currently being used for planning and directing the on-site inspection visits. Another possible channel to use it is through communication to tax payers about the new methodology. Such announcements by the tax authority could activate a self-correction behavior by tax payers.

There are a few other similar indicators that are being explored and developed. Some of them will include a machine learning methods to predict the expected level of personal salary and firm level tax obligation.