Data collection
The social isolation index data used in this study was provided by the In Loco Company for the period between February 26 to May 19, 2020. This index is indicated using the daily percentage of mobile devices that have remained in people’s homes. The company uses aggregated data, with user consent, and does not collect personally identifiable information from users [19]. The authors obtained consent to use In Loco’s database of social isolation data through data transfer cooperation.
Information on the number of COVID-19 cases and deaths were extracted from the Coronavirus Dashboard of the Ministry of Health [9].
Both geolocation and epidemiological data on the state of São Paulo were collected. This region was chosen as it is the most populous state in the country, with nearly 46 million inhabitants [18]. Furthermore, the first registered case of COVID-19 (and community transmission) in Brazil was recorded in São Paulo.
The isolation data were aggregated without differentiating for gender. Thus, there was no analysis by gender in this study.
Model
This study aimed to analyze the impact of variation in the social isolation index on the number of cases and deaths due to COVID-19. Data were modeled using the vector autoregression (VAR) model, as described in Eq. 1,
$$ {y}_t^{\prime }\ {A}_0=\sum \limits_{l=1}^p{y}_{t-1}^{\prime }\ {A}_l+{\varepsilon}_t^{\prime }\ for\ 1\le t\le T $$
(1)
where \( {y}_t^{\prime } \) is an n x 1 vector of endogenous variables; A0 is an n x n array of parameters; Al is an n x n array of parameters of the lagged variables, for 1 ≤ l ≤ p; εt is an n x 1 vector of structural shocks; p is the lag order; and T is the size of the sample. The structural model presented in Eq. (1) was not determined. Thus, to estimate the VAR, it was necessary to use a reduced form, pre-multiplying A−1 and obtaining Eq. 2,
$$ {y}_t^{\prime }={y}_{t-1}^{\prime }\ B+{u}_t^{\prime } $$
(2)
where \( B={FA}^{-1}\ {u}_t^{\prime }={\varepsilon}_t^{\prime }\ {A}^{-1} \) and \( E\left[{u}_t^{\prime }\ {u}^t\right]=\Omega ={\left({AA}^{\prime}\right)}^{-1} \) is a variance-covariance matrix of the residuals. According to Sims [16], to estimate Eq. (2), one must identify Eq. (1) by restricting the array of contemporaneous effects A0 through the Cholesky decomposition. Hence, it was possible to recover the structural parameters of the first equation after estimating the second.
To restrict contemporaneous effects, we assumed A0 as a lower triangular matrix, that is, the number of cases and deaths due to COVID-19 would have contemporaneous effects on the isolation index. However, the isolation index had no contemporaneous effect on the number of cases and deaths. The empirical model has the structural form defined in Eq. 3,
$$ {y}_t={\left({Cases}_t, Isolation\ {Index}_t\right)}^{\prime } $$
(3)
where Casest is the number of new cases every day in the state of São Paulo and Isolation Indext is the daily rate of isolation for the same region. The model was estimated using Eq. 4.
$$ \left[\begin{array}{cc}1& 0\\ {}{a}_{12}& {a}_{22}\end{array}\right]\left[\begin{array}{c}{Cases}_t\\ {}\ Isolation\ {Index}_t\ \end{array}\right]=\left[F\right]\left[\begin{array}{c}{Cases}_{t-1}\\ {}\ Isolation\ {Index}_{t-1}\ \end{array}\right]+{C}_{\xi } $$
(4)
Separately, another model was estimated considering the daily number of deaths due to COVID-19 as exogenous variables, generating the following equations,
$$ {y}_t={\left({Deaths}_t, Isolation\ {Index}_t\right)}^{\prime } $$
(5)
$$ \left[\begin{array}{cc}1& 0\\ {}{a}_{12}& {a}_{22}\end{array}\right]\left[\begin{array}{c}{Deaths}_t\\ {}\ Isolation\ {Index}_t\ \end{array}\right]=\left[F\right]\left[\begin{array}{c}{Deaths}_{t-1}\\ {}\ Isolation\ {Index}_{t-1}\ \end{array}\right]+{C}_{\xi } $$
(6)
where Deathst is the daily number of deaths in the state of São Paulo.
Three endogenous variables were defined for Eqs. 4 and 6, the first being the constant. The second indicates the day on Sunday, the day in which the isolation index has typically had higher values. Finally, the third variable, the temporal dummy, indicates the quarantine policy in the state of São Paulo.
After estimating the VAR, the reduced form of Eq. 2 was placed as dependent on the residuals. The estimated parameters were then used to identify how the variables responded to shocks in ut. The results of this procedure is called variance decomposition and impulse response function (IRF).
According to Enders [12], variance decomposition signals how much information (the variance of the forecast error) an endogenous variable contributes to the other variables in a model. The IRF determines how an exogenous variable can be explained when exogenous shocks occur in other variables.
In the models we estimated, the variance decomposition showed how the isolation index influenced the variation in the number of contaminants and deaths due to COVID-19. The IRF demonstrated how the number of cases and deaths responded to an increase in the isolation index.
To test for the presence of non-stationarity, a condition for the VAR, three unit root tests were performed: the Augmented Dickey Fuller test, the Kwiatkowski–Phillips–Schmidt–Shin test, and the Phillips–Perron test.