Introduction

Motivation

In the present project, I conducted a serious exploratory analysis of donations made to the presidential election of 2016 from the state of New York. My main goal was to gain an understanding of different variables that could have gone into play in terms of how much any given candidate raised. Being NY the state where I live, this could prove informative.

Summary of findings

At the univariate level, as predicted from personal and national experience, I found that most donations from NY state went to the Democratic candidates. I explore several more single variables throughout this project, not all expected.

There are three salient findings from exploring this dataset – repeated and accompanied by pictures in the Multivariate analyses and selected plots section.

  • Candidates who raised most did so through many donations of small size

There is a positive correlation between total amount raised and number of donations received.That is, those candidates who raised the most money did so because they received many more donations, despite them being small. The sheer number of donations offset the fact that they were not large.

  • Party lines more important than gender lines

Another important finding is that donors pattern along party lines rather than along gender. Most donations for democrats come from women, whereas most donations for republicans, greens, and libertarians come from men; yet for all parties both genders follow a similar pattern of donations over time.

  • Patterns over time (in number donations and their amount) vary significantly by party.

Donnors’ behavior over time shows that debate and dational convention days caused peaks for those contributing to the Republican candidate, peaks that never repeated and in fact flattened towards the end. The opposite is true for donnors to the Democratic candidate, who picked up steam as the race came to an end.

The data

The dataset contains campaign contributions by individuals from the state of NY. I merged and later cleaned a dataset from the Federal Elections Commission (http://classic.fec.gov/disclosurep/pnational.do), with data from the USPS office.

There are 640496 observations of 27 variables. Each row represents a donation. The subset of the variables that I analyze in this exploratory analysis are:


  • party:
    • Factor with 4 levels – Democratic, Republican, Green, Libertarian
    • Party to which the candidates for which the donations were made belong
  • candidate:
    • Factor with 25 levels – Clinton Sanders Trump O’Malley Cruz Walker Bush Rubio Kasich Christie Stein Johnson Webb Graham Paul Fiorina Santorum Jindal Huckabee Pataki Gilmore Carson Lessig Perry McMullin
    • Presidential candidates to whom donations were made.
  • amount:
    • Number – range from 0.01 to 2700
    • Amount in dollars of donation.
  • amount bucket:
    • Factor with 4 levels – (0,15] (15,27] (27,100] (100,2700]
    • Quartiles of the variable “amount” for the entire dataset, also in dollars.
  • donor id:
    • Integer – range from 1 to 121922
    • Unique number to identify a donor.
  • gender:
    • Factor with 5 levels – both family female male undet
    • Gender of the donor as determined from the first name using the R package “gender” in the data wrangling step. – female: names given to females more than 90% of time – male: names given to males more than 90% of time – both: names that were given to either gender less than 90% of time – family: donations made on behalf of more than one individual – undet: names for which gender could not be determined (only initial, no name, too uncommon in the US)
  • city:
    • Factor with 1455 levels
    • City to which the donor belongs
  • zip:
    • Character with levels
    • Zip to which the donor belongs
  • county:
    • Character with 63 unique values
    • County as determined by the zipcode provided by the donor
  • county population:
    • Number with 63 unique values
    • Population of each of the counties
  • day:
    • Integer – range from 1 to 31
    • Day of the month, for example maximum day for february of 2016 is 29
  • month:
    • Integer – range from 1 to 12
    • Month of the year
  • year:
    • Integer – range from 14 to 16
    • Year of donations
  • time point:
    • Integer – range from 285 to 1463
    • All days consecutively through months and years so that if “Dec 31 2015” is timepoint n, then “Jan 1 2016” is timepoint n+1. The minimum value is 285 because the first entry was late into 2013, on its 285th day. ***

Univariate analyses and plots

Summary of the variables contained in the dataset:

##        X            candidate           lastnm          firstnm      
##  Min.   :     1   Clinton:394612   SMITH   :  2677   MICHAEL: 11331  
##  1st Qu.:160125   Sanders:173351   JOHNSON :  2037   JOHN   : 10469  
##  Median :320248   Trump  : 35743   BROWN   :  1941   DAVID  : 10450  
##  Mean   :320248   Cruz   : 16060   MILLER  :  1901   ROBERT :  9579  
##  3rd Qu.:480372   Carson :  6552   COHEN   :  1675   SUSAN  :  7876  
##  Max.   :640496   Rubio  :  4389   WILLIAMS:  1607   JAMES  :  6814  
##                   (Other):  9789   (Other) :628658   (Other):583977  
##              wholename          amount              amount_bucket   
##  BODNICK, KATIE   :  1313   Min.   :   0.01   (0,15]       :169332  
##  BRUN, GINA       :   413   1st Qu.:  15.00   (100,2.7e+03]: 99625  
##  BRONER, NAHAMA   :   318   Median :  27.00   (15,27]      :153397  
##  SCHWARTZ, HILARY :   311   Mean   : 144.74   (27,100]     :218142  
##  GRODY, GORDON    :   307   3rd Qu.: 100.00                         
##  KILLORIN, MICHAEL:   290   Max.   :2700.00                         
##  (Other)          :637544                                           
##       day            month             year         time_point  
##  Min.   : 1.00   Min.   : 1.000   Min.   :13.00   Min.   : 285  
##  1st Qu.: 8.00   1st Qu.: 4.000   1st Qu.:16.00   1st Qu.:1177  
##  Median :17.00   Median : 7.000   Median :16.00   Median :1272  
##  Mean   :16.61   Mean   : 6.782   Mean   :15.91   Mean   :1256  
##  3rd Qu.:26.00   3rd Qu.:10.000   3rd Qu.:16.00   3rd Qu.:1363  
##  Max.   :31.00   Max.   :12.000   Max.   :16.00   Max.   :1463  
##                                                                 
##     gender       party           zip         latitude_zip  
##  both  : 13265   D:568459   Min.   :    0   Min.   :40.51  
##  family:   241   G:   999   1st Qu.:10029   1st Qu.:40.73  
##  female:323187   L:   780   Median :11201   Median :40.79  
##  male  :280477   R: 70258   Mean   :11353   Mean   :41.25  
##  undet : 23326              3rd Qu.:11931   3rd Qu.:41.29  
##                             Max.   :99999   Max.   :44.99  
##                             NA's   :37      NA's   :2691   
##  longitude_zip       county          population_county
##  Min.   :-79.70   Length:640496      Min.   :   4836  
##  1st Qu.:-74.00   Class :character   1st Qu.: 919040  
##  Median :-73.97   Mode  :character   Median :1585873  
##  Mean   :-74.28                      Mean   :1329146  
##  3rd Qu.:-73.84                      3rd Qu.:1585873  
##  Max.   :-71.94                      Max.   :2504700  
##  NA's   :2691                        NA's   :2691     
##         population_county_bucket        city           title_1      
##  (1.58e+06,1.6e+06] :203283      New York :204570   MR     : 12525  
##  (2.48e+06,2.51e+06]: 86425      Brooklyn : 86535   MRS    :  3635  
##  (9.3e+05,9.55e+05] : 51189      Bronx    : 13944   MS     :  3522  
##  (1.48e+06,1.5e+06] : 39468      Rochester: 10026   JR     :  1031  
##  (2.23e+06,2.25e+06]: 35771      Buffalo  :  8951   DR     :   875  
##  (Other)            :221669      (Other)  :316110   (Other):  2049  
##  NA's               :  2691      NA's     :   360   NA's   :616859  
##     title_2       title_3            contbr_employer  
##  JR     :   476   CCM :     1   N/A          : 82181  
##  SR     :   117   MS  :     2   SELF-EMPLOYED: 66742  
##  III    :    76   RET :    23   RETIRED      : 41371  
##  PHD    :    33   SR  :     3   NONE         : 32318  
##  RET    :    33   NA's:640467   NOT EMPLOYED : 21862  
##  (Other):   136                 (Other)      :395700  
##  NA's   :639625                 NA's         :   322  
##              contbr_occupation  election_tp       donor_id     
##  RETIRED              : 97865        :   612   Min.   :     1  
##  NOT EMPLOYED         : 47972   G2016:267065   1st Qu.:  3806  
##  ATTORNEY             : 26362   O2016:   237   Median : 13068  
##  INFORMATION REQUESTED: 16912   P2015:     1   Mean   : 23932  
##  TEACHER              : 15066   P2016:372578   3rd Qu.: 33597  
##  (Other)              :436244   P2020:     3   Max.   :121922  
##  NA's                 :    75                                  
##  state.x          fips         census_area          state.y      
##  NY:640496   36061  :203283   Min.   :  22.83   New York:637805  
##              36047  : 86425   1st Qu.:  22.83   Alabama :     0  
##              36119  : 51189   Median : 108.53   Alaska  :     0  
##              36103  : 39468   Mean   : 329.40   Arizona :     0  
##              36081  : 35771   3rd Qu.: 603.83   Arkansas:     0  
##              (Other):221669   Max.   :2680.38   (Other) :     0  
##              NA's   :  2691   NA's   :2691      NA's    :  2691  
##           geometry     
##  MULTIPOLYGON :637805  
##  NA LL        :  2691  
##  epsg:NA      :     0  
##  +proj=aeqd...:     0  
##                        
##                        
## 

Donations per candidate

As expected, the candidates that received the greatest number of donations were major candidates of each party. And it is unsurprising that the candidate that won the popular vote in the general election received the greatest number of donations of all. However, two things stand out:

  • The candidate who went on to be declared president received only the third greatest number of donations, and a little more than a fifth of the candidate immediate above him.
  • The two candidates who received the most contributions were both from the Democratic party.

Party, count of donations to each

The Democratic Party received the greatest number of donations, followed by the Republican Party, and then Green and Libertatian.

Percentage of each party in the dataset, which represents the percentage that each party took of the count of donations in New York state.

##     D     G     L     R 
## 88.75  0.16  0.12 10.97

Amount of donations

Overall, donations amounts are on the smaller side, if we look at the quartile distribution for all the data. Half of all donations are $27 or less.

summary(ny$amount)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.01   15.00   27.00  144.74  100.00 2700.00

And 95% are 500 or less.

quantile(ny$amount, .95)
## 95% 
## 500

The five most frequent donation amounts were: $25, $50, $100, $10, and $5, in that order. The list below shows the count of times each these amounts were donated.

## # A tibble: 5 x 2
##   amount count
##    <dbl> <int>
## 1     25 94308
## 2     50 72644
## 3    100 67365
## 4     10 55253
## 5      5 42681

Opposite to those most favored amounts, were amounts that had only one person having donated them. Interestingly, none of these one instance amounts were round dollars, but they were all with cents added. The list below shows a randomly chosen sample of five amounts donated only once.

## # A tibble: 3 x 2
##   amount count
##    <dbl> <int>
## 1  19.29     1
## 2  35.80    30
## 3  23.97     1

It must be said that there were also plenty of non round amounts (dollars plus cents) that were donated more than once.

Also, there were full dollar amounts that nobody had donated, but these were all for potential donations greater than 250. More especifically, any whole dollar amount less than 268, was donated at least twice.

Donor: number of donations a person makes

Half of donors donated at most twice,

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    1.000    1.000    2.000    5.279    5.000 1313.000

and only 1% of donors donated more than 46 times.

## 99% 
##  46

There could be people on automatic recurring donations, which might explain the 1218 donations of more than 45 dolars up to 413.

(*There was one case of a donor who made 1313 donations, I eliminated these entries since I think this must have been an error.)

Notice the change in scale between the two plots above.

Date of donations

Expectedly, the donation rate tracks the election, with most donations taking place in 2016 as november approaches.

The graphs below show the number of donations across consecutive days in the two years leading up to the election.

Zooming in 2016.

Looking at number of donations by month.

Generally, in 2016, the end of the month is when most donations are made, excepting November.

Gender, count of donations by each

Donations are distributed between males and females very similarly. While there are more donations by females, there is the possibility that the counts made up of androgynous names plus those of names for which gender could not be determined, if they were all masculine, could even out the difference.

Percentage of each gender in the dataset.

##   both family female   male  undet 
##   2.07   0.04  50.46  43.79   3.64

Cities: count of donations per each.

There are 1456 unique cities in the dataset. Half of them were home to 53 donations or less,

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##      1.0     13.0     53.0    439.9    210.5 204570.0

with 79 being home to a single donation,

## [1] 79

and 15 cities being the origin of donation counts greater than 3033.

## [1] 15

The following plot shows which cities were the origin the of greatest number of donations.

As expected, these correspond to densely populated areas, so the number of donations matches what we know about the population. And although city population is not part of this datataset, this indicates that it would be a relevant variable to include.

Zipcodes: count of donations from each

Three quarter of zips were home to less than 125 donations, with half of them being the origin of only 27.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     5.0    25.0   158.5   120.0  9251.0

The zips home to the greatest number of donations are all from New York City or Brooklyn area, which is confirmed when we look at the distribution of cities.

County: count of donations from each

The count of donations by zip is corroborated by the count of donations by county. Again, the more densely populated area of the south east of the state has the highest count of donations originating there.

County population

Most counties have a small population, with the distribution of counties by population in the state heavily skewed to the right.

The map below shows the population of the counties plotted above, and we can already see that there is an overlap between the regions home to the most donations and the regions most densely populated. While this isn’t anything surprising, it works well for reassurance and a bit of sanity check.

Repeat donors

Out of the total donations, 589708 were repeat donations, which represents 92% of all donations, and 58% of all donors.

Most donors that repeat, do so few times. Half of those who repeat do so less than 4 times, and three out of four donate 9 times or fewer.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    2.000    2.000    4.000    8.419    9.000 1313.000

Only one out of ten donates 19 times or more.

## 90% 
##  19

Dataset features

The main features are candidate and donation amount. I would like to describe what led to the total recaudated by each candidate.

Other features are party, date (in its various sizes), gender, and geography, as they all could interact and affect how much money gets to each candidate. They are also intrinsically interesting.

With the present dataset, I log transformed donation count and donation amount for each candidate, since this transformation reveals a correlation pattern.

Univariate summary of findings

Party

Democrats received several times more donations than any other party, and that this was driven by two main candidates: Clinton and Sanders. The next party in number of donations was the Republican party and they received only 1/8 of the Democrats. The other two parties combined received much less than a tenth of a percent of all donations.

Candidate

The candidates that received the greatest number of donations turned out to be major candidates of each party. With hindsight knowledge, we can notice two things stand out:

  • The candidate who went on to be declared president received only the third greatest number of donations, and a little more than a fifth of the candidate immediate above him.

  • The two candidates who received the most contributions were both from the Democratic party.

Amount

Most donations are small: 75% were 100 dollars or less, a fraction of the maximum 2700 allowed. And half of all donations are $27 or less. The five most frequent donation amounts were: $25, $50, $100, $10, and $5, in that order. The list frequent amounts all were non-whole dollar amounts ending in cents.

Donor

Most donors contributed money very few times: 75% gave money five times or less, and half of donors gave money between one and two times. Of those who donated more than one time, half did so less than 4 times, and three out of four donated 9 times or fewer.

Gender

Gender distribution among donors is approximately equally distributed with women making up only a slightly larger number of donations than men.

Time

The pace of donations follows the approach of the election. Most donations take place in 2016 as november approaches. Generally, in 2016, the end of the month is when most donations are made, excepting November.

Geography

Most donations come from the south-east corner of the state, regardless of whether we look at zipcode or county. Other important centers align with the location of cities that are source to large numbers of donations: those in the New York City area, Rochester, Buffalo, Albany, Ithaca, Astoria, and Syracuse.

City

There are 1456 unique cities in the dataset. Half of them were home to 53 donations or less, with 79 being home to a single donation, and 15 cities being the origin of donation counts greater than 3033.

As expected, these correspond to densely populated areas, so the number of donations matches what we know about the population. And although city population is not part of this datataset, this indicates that it would be a relevant variable to include.

Zip

Three quarter of zips were home to less than 125 donations, with half of them being the origen of only 27. The zips home to the greatest number of donations are all from New York City or Brooklyn area, which is confirmed when we look at the distribution of cities.

County

The count of donations by zip is corroborated by the count of donations by county. Again, the more densely populated area of the south east of the state has the highest count of donations originating there.

County population

Most counties have a small population, and a few have very large. The distribution of county population is very skewed to the right.

Bivariate analyses and plots

As we had seen, 75% of all donations in the data are 100 $ or less. We also saw that parties and candidates varied in the share of donations they received. In this section, I will explore whether the observation that donations are small is equally distributed by party and candidate, or whether there are any specific candidates or party driving this pattern. Finally, we will look at whether the gender or geographic location of the donor, or the date when they donated, was related to the amount contributed.

Amount by Party

Small amount donations was not a trait of all parties.

Distribution of amounts for the dataset.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.01   15.00   27.00  144.74  100.00 2700.00

While over half (53%) of the Democratic party’s donations belonged to the two lower quartiles, the opposite is true of the Republican (73% of its donations belonged to the upper quartiles).

The figure below shows what share of each party belonged to each quartile of the whole dataset.

The following boxplots show that the Democratic party received smaller donations than the Republican, in turn smaller than the Green and the Libertarian.

Since the donation amounts tend to concentrate on the low values, I performed a log10 transformation in order to more easily detect any differences.

I performed a one way ANOVA test to determine whether there was any significant difference between the amounts donated to each party.

##                 Df    Sum Sq   Mean Sq F value Pr(>F)    
## party            3 1.001e+09 333549905    1841 <2e-16 ***
## Residuals   640492 1.161e+11    181226                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And a Tukey test for significant differences confirmed that in fact donations to the Democratic party were significantly smaller than to any other party, followed by the Republican and Green parties (no significant difference between them), followed by the Libertarian party.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = amount ~ party, data = ny)
## 
## $party
##           diff        lwr       upr     p adj
## G-D 127.632115  93.000063 162.26417 0.0000000
## L-D 182.528244 143.342316 221.71417 0.0000000
## R-D 124.262288 119.888715 128.63586 0.0000000
## L-G  54.896129   2.639954 107.15230 0.0350928
## R-G  -3.369827 -38.216620  31.47697 0.9946183
## R-L -58.265955 -97.641798 -18.89011 0.0008285

Amount by Candidate

Small donations is not a trait of all candidates either. In fact, it seems that top contenders were the recipients of smaller contributions than less competitive candidates. This is true of both parties. However, given that the Democratic Party had a greater share of small contributions than the Republican, it’s no surprise that its top candidates had the greatest share of their contributions being a small amount.

While the top democratic contenders (Clinton and Sanders) were the recipients of most donations, the amount of each donation was smaller than that of the republican frontrunners (Trump and Cruz).

The following graph shows the distribution of donations amount for all candidates.

Notice that the distributions for most candidates were skewed to the right, consistent with the finding that most people tended to give towards the lower rather than the upper limits.

As per the candidates who received 10 000 donations or more, three out of four donations given to Sanders were 50 dollars or less, to Clinton and Cruz 100 or less, and to Trump 160 or less.

## # A tibble: 4 x 7
## # Groups:   candidate [4]
##   candidate  party median seventyfive_qtle ninetyfive_qtle  count
##      <fctr> <fctr>  <dbl>            <dbl>           <dbl>  <int>
## 1   Sanders      D  27.00               50          106.88 173351
## 2   Clinton      D  30.00              100          750.00 394612
## 3      Cruz      R  40.00              100          250.00  16060
## 4     Trump      R  65.72              160          500.00  35743
## # ... with 1 more variables: total_dollars <dbl>

The candidates who received smaller amounts of donations also received the greatest number of donations.

A simple linear model between log10 of total raised and the maximum amount of 75% of donations suggests a linear relationship between the log10 of both variables.

## 
## Call:
## lm(formula = log10(total_dollars) ~ log10(seventyfive_qtle), 
##     data = ny.candidate)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3194 -0.5267 -0.2108  0.6176  1.7896 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               7.4138     0.8914   8.317 2.19e-08 ***
## log10(seventyfive_qtle)  -0.6935     0.3195  -2.170   0.0405 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8749 on 23 degrees of freedom
## Multiple R-squared:   0.17,  Adjusted R-squared:  0.1339 
## F-statistic: 4.711 on 1 and 23 DF,  p-value: 0.04054

Receiving a large number of donations was enough to not only offset but surpass in total amount raised, the candidates whose contributors gave more money per donation.

A linear model of the log10 of these variables suggests a linear relation that is expectedly negative.

## 
## Call:
## lm(formula = log10(count) ~ log10(seventyfive_qtle), data = ny.candidate)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.46732 -0.60708 -0.09714  0.66092  1.68266 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               6.8173     0.9041    7.54 1.17e-07 ***
## log10(seventyfive_qtle)  -1.4519     0.3241   -4.48  0.00017 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8874 on 23 degrees of freedom
## Multiple R-squared:  0.466,  Adjusted R-squared:  0.4428 
## F-statistic: 20.07 on 1 and 23 DF,  p-value: 0.0001701

The result is the fact that those candidates who received the most contributions also accumulated the greatest total.

And a significant linear model suggests a positive linear relationship between the log10 of donation count and total money recaudated.

## 
## Call:
## lm(formula = log10(count) ~ log10(total_dollars), data = ny.candidate)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.73294 -0.26986  0.00741  0.27933  0.71447 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -3.73528    0.48959  -7.629 9.58e-08 ***
## log10(total_dollars)  1.19287    0.08753  13.627 1.68e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4031 on 23 degrees of freedom
## Multiple R-squared:  0.8898, Adjusted R-squared:  0.885 
## F-statistic: 185.7 on 1 and 23 DF,  p-value: 1.678e-12

Donation size, number of donations received, total amount raised – by candidate and party.

The following two plots convey one of the main findings of this analysis, by putting these three important variables together: donation size, number of donations received, total amount raised. Top raisers totaled the most money despite receiving the smaller size individual contributions because they received a very large number of them.

In the next plot, the small amount of donations is reflected in the value of their 75% percentile (although this holds for other chosen quantiles, as we will see later). Small contributors make an enormous difference, by the sheer power of their numbers, despite their small effect individually. It must be noted that this patterns persists across candidates of all parties.

The following plot is a confirmation, with a cosmetic change for visualization exploration, with size in the font of the candidate names itself.

The following plot puts together total raised, donation size, and number of donations received, with candidates ranked from highest total to smallest. It shows that those candidates that raised most also received the smallest amount for 75% of their donations, see how marker size increases with decreasing total raised values. Conversely, number of donations and total recaudated go in hand together, higher brightness values are for those with most money received.

Amount over time

Looking at all donations over time is impractical, as the following plot shows. However, it serves to identify donation amounts that remained popular over time. This plot also succeeds only in confirming that there is an increase in donations over time.

However, if we look at amounts in buckets, trends start to appear.

Very early on, there are few donations but they tend to be large. As 2016 progresses, smaller amounts begin to make up a larger share of all donations.

(*There were 44 entries of donations made in 2013 and 2014 to Rubio, Paul, Webb and Cruz. They added up to $47701.6 with a median of $500 and mean $1084.)

Broken down by month for each of the last two years leading up to the election.

Broken down by day by month of each of the last two years leading up to the election.

Amount over time by party

If we break down donations amount over time by party, we see that the two major parties show different trends. The greatest peak in donations for the Democratic party was shortly before the election, whereas the peak for the Republican party was around its convention. In its peak, the bulk of contributions to the Republican party belonged to the upper quartile amount, contributions greater than $100.01, and towards the end most were either in the second (15 to 27 dollars) or the upper (larger than 100) quartiles. The opposite is true of the Democratic party, which didn’t change much over time, and at its peak received most of its contributions from the lower three quartiles, donations of less than $100 dollars.

It is also interesting to see a rise upwards in donations to the Democrats (and candidate Clinton) after each of the debates.

Number of donations by day throughout each month, for both in 2015 and 2016.

2015

For Democrats, number of donations and their size go upward as 2015 progresses, with peaks within months that do not seem to follow any particular pattern. In five out of the nine months with donations, these tend to increase towards the end of the month.

There were considerably fewer donations to the Republican party in 2015 than to the Democratic party. But like the Democractic party, the month-by-month breakdown does not show any particularly salient trends in 2015.

2016

The only salient pattern in donations by month in 2016 to the Democratic party is that the end of the month is a good time for donations. This pattern was present in a few months of 2015, although there were fewer donations then.

2016 donations to the Republican party do not follow the end-of-month surge that we see in the Democratic party donations, but instead, they seem flat except for spurts between mid june and mid august. I hypothesize that this might be related to patriotic feelings around the celebration of the 4th of july. Iy must be noted, however, that donations on july 4th itself are quite small. I must also note the difference in scale for 2016 between the Democratic and Republican party donations.

Amount by gender

There isn’t a difference in the amount of money given per donation between females and males for the whole dataset. Donations by females tend to be slightly lower (lower quantiles, lower minimum), but do not seem sensitively different from those by males.

Females:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.01   15.00   27.00  125.31   75.00 2700.00

Males:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.04   19.00   35.00  167.24  100.00 2700.00

An idependent two-group t-test confirms this. For both actual data

## 
##  Welch Two Sample t-test
## 
## data:  female$amount and male$amount
## t = -37.537, df = 542260, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -44.12054 -39.74171
## sample estimates:
## mean of x mean of y 
##  125.3088  167.2399

and log10 transformed.

## 
##  Welch Two Sample t-test
## 
## data:  log10(female$amount) and log10(male$amount)
## t = -54.033, df = 582450, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.09054528 -0.08420646
## sample estimates:
## mean of x mean of y 
##  1.533923  1.621299

Amount by geographical location: city, zip and county

There is no significant relation between geography and amount of donations. Big and small donations seem to originate similarly from all points, proportional to the amount of donations.

By city: By looking at three different grain sizes, it does not seem that there is any significant relationship between the number of donations coming from a city and the size of the donations.

By zip: Donation amounts come proportionally from all zips relative to the number of donations they are the source of.

Amount by zip by party

Unexpectedly for me, there are no differences in the amount donated to each of the two major parties based on geography. Both republicans and democrats received larger sums from densely populated areas and smaller amounts from more rural ones.

Amount by county population

The relationship between county population and donation size is tenuous. The following exploratory plots reveal no apparent trend between county population and either average or 75% percentile amount of donations received.

It is important to remember that the population distribution is skewed, with most counties having smaller population sizes.

However, when we examine the distribution of donation amount by population size bins, we see that, regardless of the size of the county, the distribution of the amount given per donation is very similar across all population sizes.

The plots below represent population in bins of 100.

Gender by party

Most donations to the Democratic party came from women

##   both family female   male  undet 
##   2.17   0.04  52.76  41.29   3.74

whereas most donations to the Republican party came from men.

##   both family female   male  undet 
##   1.27   0.03  32.62  63.25   2.83

Both Green and Libertarian parties had most of their donations coming mostly from men.

## [1] "Libertarian"
##   both family female   male  undet 
##   0.77   0.00   7.05  89.10   3.08
## [1] "Green"
##   both family female   male  undet 
##   1.00   0.10  31.53  64.66   2.70

Gender by candidate

For the major candidates of the Democratic party, Sanders received at least half of contributions from males, and likely less than that from females, although overall the proportion of each gender seems very similar. Clinton, on the other hand, received a greater number of donations from women than from men, and the difference is greater.

## [1] "Sanders:"
##   both family female   male  undet 
##   2.24   0.12  42.81  50.96   3.87
## [1] "Clinton:"
##   both family female   male  undet 
##   2.14   0.00  57.16  37.00   3.69

Republican party candidates received a majority of their donations from men, across all candidates. This is the case unquestionably for the two leading candidates, Cruz and Trump.

## [1] "Cruz:"
##   both family female   male  undet 
##   1.28   0.00  29.76  66.28   2.68
## [1] "Trump:"
##   both family female   male  undet 
##   1.42   0.06  31.83  63.23   3.46

For the candidates of the Libertarian and Green parties, males were also the main contributors. This despite the fact that the candidate for Green was a female.

## [1] "Stein:"
##   both family female   male  undet 
##   1.00   0.10  31.53  64.66   2.70
## [1] "Johnson"
##   both family female   male  undet 
##   0.77   0.00   7.05  89.10   3.08

We know that the gender distribution of the whole dataset is slightly more female than male. However, of all candidates, only Clinton received a majority of their contributions from females. This goes to show the magnitude by which Clinton outraised all other candidates.

Gender over time

Men seem to have gotten a head start in donations in 2015, but women donated more shortly after the start of 2016 and made a large difference as election day approached.

Gender over time by party

If we look at gender over time by party we find that the trend of the overall data is very similar to the data exclusively for the Democratic party, which is consistent with the fact that the majority of donations were to them.

Another finding is that that females and males behave similarly within party lines, even if there are differences based on party. Most donations for democrats come from women, whereas most donations for republicans, greens, and libertarians come from men, yet both genders seem to follow a similar pattern in the rate of their donations within their party, except for donors to the Libertarian party.

Democrats and Republicans

Greens and Libertarians

Amount over time by gender by party

If we break down the number of donations over time by the amount of the donations, we see that the pattern persists. Males and females of each party tend do make donations of similar size, at the similar time, the only difference being that one gender donates more than the other.

Gender by county population

There does not seem to be a difference in the amount of donations based on the population size of the county where they originate.

As most counties simply have an even proportion of male and female donors. There is greater variabily among smaller popualtion numbers, but that’s unsurprising since there are more datapoints.

Gender by county

Most counties have a roughly similar proportion of female and male donors. Some, though not all, of the less populated areas show a skew towards donors of either gender.

Party by county

Most donations in the state were made to a candidate of the Democratic party. Despite this, we can see differences in some areas.

The following maps show what proportion of the total contributions from that county went to each of the four parties.

The bar plot below displays the same content, with counties sorted by count of donations.

Being the source of the most donations to a party does not mean that most of the county donations were to that party. As is the case with the county New York, home to the largest number of donations to all parties, it does not lead any party in share of donations.

When we look at the share of donations we can see some interesting patterns, such as that Tompkins county has the second greatest share of its donations going to the Democratic party.

Democratic party

When we look at the share of donations we can see some interesting patterns, such as that Tompkins county has the second greatest share of its donations going to the Democratic party, even though it trails in actual number of donations. Also, counties with

Republican party

Even though New York was home to the largest number of donations to the Republican party, it was the third-to-last county in share of donations of its total. More rural counties were home to fewer donations to the Republican party, but they were home to fewer donations in general, so their leaning is revealed when we look at share of donations.

Libertarian and Green parties

There were counties from which no donations were made to the Libertarian or the Green party, and the share they make in those counties in which their contributors reside is very small.

Candidate by party

The exploration of single donation amount per candidate enriches the earlier findings of the distinction by party. Within the Democratic and Republican party, candidates varied in what proportion of their contributions were small or large. Green and Libertarian contributions were directed towards a single candidate for each party.

For example, 75% of donations to the Democratic Party were 90$ or less. While 75% of donations to the frontrunners Sanders and Clinton were up to $50 and $100 respectively; for the other three candidates 75% of their donations were up to $1000. However, these other candidates made for less than .0001% of total donations, which explains their negligible impact on the stats of the party.

Summary of donation size for minor Democratic candidates.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     3.0   100.0   250.0   730.9  1000.0  2700.0

Percentage taken by minor Democratic candidates.

## [1] 0.0008725343

Contributions to Clinton and Sanders made up over 99.92% of all contributions in NY.

## [1] 99.91

For the Republican party, 75% of donations were $150 or less. While the most contributed-to candidates, Trump and Cruz, received 75% of donations up to $160 and $100 respectively; for the other 16 candidates 75% of donations were up to $500. These other 16 candidates, however, make up 26% of the contributions the Republican Party received in NY.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    25.0   100.0   563.5   500.0  2700.0

Percentage of minor Republican candidates.

## [1] 0.2626747

Bivariate summary of findings

Multivariate analyses and selected plots

Candidates who raised most did so through many donations of small size

The most interesting finding of this exploration is the positive correlation between total amount raised and number of donations received, and the negative correlation of total amount raised and donation size. In times when people express hopelessness about individuals of regular having any power over an electoral process with many rich donors, this finding can be very refreshing. The limitations of this analysis of course, is that it only looks at donations from people, which is important to keep in mind.

Those candidates that received the biggest total of money, also received the smaller donations in general. There was a negative correlation between the total money received and log10 of the 75th percentile of the donations’ amount: r=-.33, p=.10, which becomes more significant if we log10 transformed total money received:r=-.41, p=0.04. In summary: the more total money a candidate raised, the smaller the donations they received tended to be.

What offsets this is the fact that these candidates received also many more donations, even if small, than those whose donors gave them more money per donations. There was a negative correlation between the count of donations received and the 75th percentile of the donations’ amount: r=-.44, p=.03, and with count of donations log10 transformed the strength and significance of the correlation increase:r=-.68, p<0.001. The larger the count of donations received, the smaller they tended to be.

The edificant conclusion is that there were so many small donations that they were enough to be positively correlated with total amount raised. There is a strong positive correlation between total amount raised and the count of donations: r=.95, p<0.001; and if we log10 transformed these variables, the relationship still holds: r=0.94, p<0.001.

For that finding, I think two plots are crucial.

First, with candidates ranked according to total money raised and names clearly indicated on the x axis, we can see the relationships mentioned.

The next plot conveys the same information, plus party. I think it is very important to note that the pattern observed applies regardless of party affiliation.

Party lines more important than gender lines

Another important finding is that donors pattern along party lines rather than along gender. Most donations for democrats come from women, whereas most donations for republicans, greens, and libertarians come from men; yet for all parties both genders follow a similar pattern of donations over time.

Patterns over time (in number donations and their amount) vary significantly by party.

Donnors’ behavior over time shows that debate and dational convention days caused peaks for those contributing to the Republican candidate, peaks that never repeated and in fact flattened towards the end. The opposite is true for donnors to the Democratic candidate, who picked up steam as the race came to an end.