Applied Modeling Techniques and Data Analysis 2. Группа авторов. Читать онлайн. Newlib. NEWLIB.NET

Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Экономика
Год издания: 0
isbn: 9781119821625
Скачать книгу
all of them have received a tax notice that somehow rectified the tax return they had filed. Thus, the predictive analysis tool we develop is designed to find patterns in data that may help tax offices recognize only the riskiest taxpayers’ profiles.

      Evidence on data at hand shows that our first model, which is described in detail later, is able to distinguish the taxpayers who are worthy of closer investigation from those who are not. 2

      However, by defining the class value as a function of the higher due taxes, we satisfy the need of focusing on the taxpayers who are more likely to be “significant” tax evaders, but we do not ensure an efficient collection of their tax debt. Indeed, data shows that as the tax bill increases, the number of coercive collection procedures put in place also increases. Unfortunately, these procedures are highly inefficient, as they are able to only collect about 5% of the overall credits claimed against the audited taxpayers (Italian Court of Auditors 2016). As a result, the tax authorities’ ability to collect the due taxes may be jeopardized.

      Therefore, to tackle our second problem, i.e. to guarantee a certain degree of due tax collection, a trivial fact that we start from is that a taxpayer with no properties will not be willing to pay his dues, whereas if he had something to lose (a home or a car that could be seized), then, if the IRA’s claim is right, it is more probable that he might reach an agreement with the tax authorities.

      Therefore, a second model only focusing on a few features indicating whether the taxpayer owned some kind of assets or not is built, in order to predict each tax notice’s final status (in this case, we only distinguish between statuses ending with an enforced recovery proceeding and statuses where such enforced recovery proceedings do not take place). Once both models are available, the taxpayer selection process is held in such a way that businesses will only be audited if they are judged as worthy by both models.

      The key feature of our procedure is the twofold selection process target, needed to maximize the IRA’s audit processes’ effectiveness. The methodology we suggest will soon be validated in real cases i.e. a sample of taxpayers will be selected according to the classification criteria developed in this chapter and will be subsequently involved in some audit processes.

      1.2.1. Data

      Data on hand refers to a sample of 8,028 audited self-employed individuals for fiscal year 2012, each described by a set of features, concerning, among others, their tax returns, their properties and their tax notice.3

      Most of our dataset is made up of small-sized taxpayers, of which almost 50% show revenues lower than € 75,000 per year and only 4% higher than € 500,000, with a sample average of € 146,348.

      For each taxpayer in the dataset, both his tax notice status and the additional due taxes (i.e. the additional requested tax amount) are known.

      Here comes the first problem that needs to be tackled: the additional due tax is a numeric attribute which measures the seriousness of the taxpayer’s tax evasion, whereas our algorithms, as we will show later on, need categorical values in order to predict. Thus, we cannot directly use the additional due taxes, but we need to define a class variable and decide both which values it will take and how to map each numeric value referred to the additional due taxes into such categorical values.

      1.2.2. Interesting taxpayers

      We must define a function f(x) which associates, to each element x in the dataset, a categorical value that shows its fraud risk degree and represents the class our first model will try to predict. Of course, a function that labels all the taxpayers in the dataset as tax evaders would be useless. Thus, a distinction needs to be drawn between serious tax evasion cases and those that are less relevant. To this purpose, we somehow follow (Basta et al. 2009) and choose to divide the taxpayers into two groups, the interesting ones and the not interesting ones, from the tax administration point of view (to a certain extent, interesting stands for “it might be interesting for the tax administration to go and check what’s going on ...”), based on two criteria: profitability (i.e. the ability to identify the most serious cases of tax evasion, independently from all other factors) and fairness (i.e. the ability to identify the most serious cases of tax evasion, with respect to the taxpayer’s turnover).

      The left vertical axis is related to the tax claim distribution, grouped in the classes shown on the horizontal axis; the right vertical axis, on the contrary, sums up the monetary tax claim amount that arises from each group (in thousands of euro). Therefore, as it can easily be seen, the 331 most profitable tax notices (12% of the total) account for almost half of the tax revenue arising from our dataset.

      The fairness criterion is then introduced to address the audit process, even towards smaller firms (which usually are charged smaller amounts of due income taxes), and it is useful as it allows the tax authorities to not discriminate against taxpayers on the basis of their turnover and introduces a deterrent effect which improves the overall tax compliance.

      Therefore, we define another variable, called Z, which takes into account, for each taxpayer, both his turnover and revenues, and compares