## based on full dataevent.data1 <- reg.data1 %>%mutate(event_time=case_when(!is.na(expand_year) ~ year-expand_year,is.na(expand_year) ~-1 ) ) event.reg1 <-feols(uncomp_care ~i(as.factor(event_time), expand_ever, ref=-1) | year + provider_number, cluster=~provider_number, data=event.data1)## based on 2014 treatment group onlyevent.reg2 <-feols(uncomp_care ~i(as.factor(year), expand_ever, ref=2013) | year + provider_number, cluster=~provider_number, data=reg.data2)
Exercise 4: Demand Estimation and Market Share Construction
Question
What is the role of market definition on market shares and demand estimates?
Data
Hospital Cost Report Information System
Hospital Service Area Files
Data management
Biggest issue is creating markets using the community detection algorithm
I did that for you with the ‘hospital_markets’ data due to time, but if you want to do it yourself, please take a look at my ongoing Hospital Choice Project
Market Definition
Every analysis of competition requires some definition of the market. This is complicated in healthcare for several reasons:
Hospital markets more local than insurance markets
Hospitals are multi-product firms
Geographic market may differ by procedure
Insurance networks limit choice within a geographic market
Hospital Service Areas (HSAs)
Begin with town or cities with a hospital (possibly more than one)
Assign zip codes to that town/hospital(s) if the plurality of people in that zip code receive care from that town/hospital(s)
Define the HSA as all contiguous zip codes from step 2
Around 3,400 HSAs total
Hospital Referral Regions (HRRs)
Contiguous HSAs
Population of at least 120,000
Account for at least 65% of residents’ health care services (cardiovascular and neurosurgery)
306 HRRs total
Community Detection
Goal: Identify connected nodes (some geographic region like zip code or county) where residents tend to receive health care services
Community Detection
Form data on geographic units, providers, and patient counts (bipartite). This is a matrix with geographic unit as row, provider as columns, and patient counts as cells
Convert to matrix on counts of connections (common hospitals) between geographic areas (unipartite)
Employ “cluster walktrap” algorithm to identify clusters of geographic units
His paper shows that a logit discrete choice model can be estimated with continuous market share data as follows…
Basic Setup
Indirect utility of person \(i\), \[u_{ij} = x_{ij}\beta + \epsilon_{ij},\] where \(x_{ij}\) denotes person (and perhaps product) characteristics and \(\epsilon_{ij}\) denotes an error term.
Standard logit: one choice, \(j=0,1\)
Multinomial logit: many possible choices, \(j=0,1,...,J\)
Logit terminology
A few different terms for very similar models:
Multinomial Logit: Individual covariates only, alternative-specific coeficients. \[u_{ij}=x_{i}\beta_{j} + \epsilon_{ij},\] such that \[p_{ij} = \frac{e^{x_{i}\beta_{j}}}{\sum_{k} e^{x_{i}\beta_{k}}}\]
Conditional Logit: Allow for alternative-specific regressors, such that \[u_{ij}=x_{ij}\beta + \epsilon_{ij}\]
“Mixed” Logit: Allow for individual and alternative-specific regressors, such that \[u_{ij}=x_{ij}\beta + w_{i} \gamma_{j} + \epsilon_{ij}\]
but people sometimes use “mixed” to refer to random-coefficients logit
Does it matter?
These are really all the same and it’s just a matter of specification (e.g., interact individual covariates with product characteristics or with product dummies). I’ll refer to them as “multinomial” logit.
The Indepenence of Irrelevant Alternatives
Fundamental issue with logit models…the ratio of choice probabilities for \(j\) and \(k\) does not depend on any other alternatives: \[\frac{P_{ij}}{P_{ik}} = \frac{e^{V_{ij}}}{e^{V_{ik}}}.\]
Relaxing IIA
This is really an omitted variables problem…with enough interactions, we can allow for a sufficiently rich substitution pattern
Alternatively, relax assumptions on the error term with nested logit or random-coefficient logit (or multinomial probit)
Discrete choice with market level data
Utility of individual \(i\) from selecting product \(j\) is \[U_{ij}=\delta_{j}+\epsilon_{ij},\] where \(\delta_{j}=x_{j}\beta + \xi_{j}\), and \(\xi_{j}\) represents the mean level of utility derived from unobserved characteristics.
Discrete choice with market level data
Goal is to find \(\hat{\delta}\) to statisfy moment condition, \[\frac{1}{J}\sum_{j} (\hat{\delta}_{j}-x_{j}\beta)z_{j}=0.\]
In standard logit, \(s_{j}=e^{\delta_{j}}/\sum e^{\delta_{j}}\), and \(\delta_{j}\) then follows directly from taking logs and subtracting the outside share (with the normalization of \(\delta_{0}=0\), which yields the estimating equation \[\ln(s_{j}) - \ln(s_{0}) = x_{j}\beta + \xi_{j}\]
Discrete choice with market level data
Standard logit imposes cross-price elasticities that are proportional to market shares (limited substitution patterns)
Relax with nested logit or random-coefficients logit
References
Berry, Steven T. 1994. “Estimating Discrete-Choice Models of Product Differentiation.”The RAND Journal of Economics, 242–62.