Having access to the New York Times through ProQuest (you can view it as a Google visualization with mouseovers for the individual datapoints here) I thought I’d see what NYT search results for “Baseball” and “Orchestra” looked like. I’m not going to give much descriptive analysis here as I did with the TIME Magazine search results (I have to go run some cello sectionals in a bit) but it’s interesting to note that there is a similar divergence of results (this one in the early 80s rather than the late 70s with the TIME search results). Also notice that rather than an upward trend into the 2000s we have a downward trend.
The NYT database is only searchable up to 2009, and I did add results beginning on January 1, 1923 (TIME Magazine’s first issue was on March 3, 1923). Obviously the NYT is a daily newspaper while TIME Magazine is a weekly news magazine, but I think it might be interesting to see what patterns we might see given more data from several print entities.
Some other things I’m wondering is given that peaks and dips of the lines (if not the magnitudes and ranges) seem to match for Baseball and Orchestra in the respective print entities–I wonder how much of that is a reflection of editorial direction (which was one of the reasons I did the TIME search results in the first place) or simply an artifact of the search engine/algorhythms used to derive the results. Since I don’t know [the methodology of the search engines] I can’t really say one way or the other, but this is one of minimal things we should be asking when collecting data. As my old Statistics textbook says:
Often, one must use a frequency distribution constructed by others, and the nature of the raw data may not be clearly indicated. The producer of a frequency distribution should always indicate the nature of the underlying data. (Hamburg 1979, pg. 40)
Notice also that I have given a link to the individual datapoints in the form of the Google visualizations (obviously anyone can access TIME Magazine’s search to replicate the data, ProQuest requires subscription access)–this is pretty much standard practice and allows the research community to check, verify the work as well as to analyze the data in their own way.
The descriptive analysis I did in the TIME Magazine results post is really the bare minimum that should be done, e.g. I didn’t do any formal variance analysis or regression or correlation analysis of the data. We are generally so terrible when it comes to big numbers but have developed a number of techniques and tools to help us understand what those numbers mean and correct for errors.
It’s not a perfect science, but as the late Richard J. Hernstein says, “It is easy to lie with statistics, but it’s a lot easier to lie without them.”
Hamburg, Morris (1979) Basic Statistics: A Modern Approach. New York: Harcourt Brace Jovanovish, Inc.