# MacKay Data Analysis

## Small Multiples

Small multiples for the graphic that MacKay used in his 1901 report on phenology, featuring the summarized variables that MacKay chose:

- Mayflower,
- Strawberry,
- Apple,
- Lilac, and
- Blackberry.

"Small multiplies" is an idea due to Edward Tufte, who is famous for his designs of the proper way to illustrate graphical information. I found this article entitled "Tufte in R", which might be interesting to investigate. Ironically the section on "Small Multiples" is "in preparation". Maybe we could do this in R. I do have an example of Small Multiples in R, actually, which I've made available on mathstat (/var/www/html/mackay/R/SmallMultiples/generate_graphs.R). This also illustrates how to read an Excel spreadsheet in R, and some MYSQL commands for selecting data. It's all a little overwhelming perhaps! But it's worth digging into....

Here is the Mathematica File that produces the following plots [This needs to be updated Madison -- also the file for year 07 is called 1917m....]:

- I added "padding" for missing years, by repeating the preceding year.
- To create the animation I used the ImageMagik command convert:
**convert -delay 200 -loop 0 *19*png animate.gif**

Now to the important question: **What do we learn?**

## Centroids

- In order to get a (very) crude estimate for the centroid of each region, I reproduced the regions with cardboard and then balanced it on a pen to find the midpoint. From there, I was able to use a computer program to find a coordinate for each centroid to compare to the point I found. Eight of the regions coincide with county borders, so I was able to find coordinates for each county. I then put them into a program that would give me a geographical midpoint. I was able to compare to what I had done with the cardboard, and surprisingly enough, they actually matched up quite well. The estimate for regions six and seven were even more crude because they split up a county. Finding those coordinates involved a lot of trial and error.

- Here are the coordinates I ended up with (in degrees);
- Region 1: 43.9, -65.8
- Region 2: 44.2, -65
- Region 3: 44.85, -64.9
- Region 4: 45.25, -63.6
- Region 5: 45.1, -62.45
- Region 6: 45.5, -63.97
- Region 7: 45.7, -62.75
- Region 8: 45.85, -60.45
- Region 9: 46.4, -60.6
- Region 10: 46.2, -61.1

- Here is the link to the website I used to find midpoints

## Missing Data

- Here is a proposed strategy for dealing with missing data, based on the Singular Value Decomposition. Madison and Laura, we discussed this when we talked about the "factors" in Canonical Correspondence Analysis (say), but a lot of multivariate statistical techniques are built on the SVD.
- http://www.norsemathology.org/longa/research/MacKay/analysis/Mathematica/DataCorrection1922.nb
**Steve**: I'm having trouble getting a numerical optimizer in Mathematica to work....