|
Does anyone know of studies that have looked at the impact of licensing conditions on use of natural language data sets? It seems pretty clear that data available without a license will get more casual looks than data that requires a license but no fee, and much more than data that requires a fee. But I'm wondering is if there's been more quantitative measures of degree of use by researchers and industry. For researchers, looking at time course of number of published papers using the data sets, with some attempt to control for nature and quality of the data, seems possible. For studying industrial use, probably a survey is the only way to get at this. This is obviously an area where people (including myself) have strong intuitions, and strong feelings, but I'd like to know what objective information is out there. |