In the present project, I conducted a serious exploratory analysis of donations made to the presidential election of 2016 from the state of New York. My main goal was to gain an understanding of different variables that could have gone into play in terms of how much any given candidate raised. Being NY the state where I live, this could prove informative.
At the univariate level, as predicted from personal and national experience, I found that most donations from NY state went to the Democratic candidates. I explore several more single variables throughout this project, not all expected.
There are three salient findings from exploring this dataset – repeated and accompanied by pictures in the Multivariate analyses and selected plots section.
There is a positive correlation between total amount raised and number of donations received.That is, those candidates who raised the most money did so because they received many more donations, despite them being small. The sheer number of donations offset the fact that they were not large.
Another important finding is that donors pattern along party lines rather than along gender. Most donations for democrats come from women, whereas most donations for republicans, greens, and libertarians come from men; yet for all parties both genders follow a similar pattern of donations over time.
Donnors’ behavior over time shows that debate and dational convention days caused peaks for those contributing to the Republican candidate, peaks that never repeated and in fact flattened towards the end. The opposite is true for donnors to the Democratic candidate, who picked up steam as the race came to an end.
The dataset contains campaign contributions by individuals from the state of NY. I merged and later cleaned a dataset from the Federal Elections Commission (http://classic.fec.gov/disclosurep/pnational.do), with data from the USPS office.
There are 640496 observations of 27 variables. Each row represents a donation. The subset of the variables that I analyze in this exploratory analysis are:
Summary of the variables contained in the dataset:
## X candidate lastnm firstnm
## Min. : 1 Clinton:394612 SMITH : 2677 MICHAEL: 11331
## 1st Qu.:160125 Sanders:173351 JOHNSON : 2037 JOHN : 10469
## Median :320248 Trump : 35743 BROWN : 1941 DAVID : 10450
## Mean :320248 Cruz : 16060 MILLER : 1901 ROBERT : 9579
## 3rd Qu.:480372 Carson : 6552 COHEN : 1675 SUSAN : 7876
## Max. :640496 Rubio : 4389 WILLIAMS: 1607 JAMES : 6814
## (Other): 9789 (Other) :628658 (Other):583977
## wholename amount amount_bucket
## BODNICK, KATIE : 1313 Min. : 0.01 (0,15] :169332
## BRUN, GINA : 413 1st Qu.: 15.00 (100,2.7e+03]: 99625
## BRONER, NAHAMA : 318 Median : 27.00 (15,27] :153397
## SCHWARTZ, HILARY : 311 Mean : 144.74 (27,100] :218142
## GRODY, GORDON : 307 3rd Qu.: 100.00
## KILLORIN, MICHAEL: 290 Max. :2700.00
## (Other) :637544
## day month year time_point
## Min. : 1.00 Min. : 1.000 Min. :13.00 Min. : 285
## 1st Qu.: 8.00 1st Qu.: 4.000 1st Qu.:16.00 1st Qu.:1177
## Median :17.00 Median : 7.000 Median :16.00 Median :1272
## Mean :16.61 Mean : 6.782 Mean :15.91 Mean :1256
## 3rd Qu.:26.00 3rd Qu.:10.000 3rd Qu.:16.00 3rd Qu.:1363
## Max. :31.00 Max. :12.000 Max. :16.00 Max. :1463
##
## gender party zip latitude_zip
## both : 13265 D:568459 Min. : 0 Min. :40.51
## family: 241 G: 999 1st Qu.:10029 1st Qu.:40.73
## female:323187 L: 780 Median :11201 Median :40.79
## male :280477 R: 70258 Mean :11353 Mean :41.25
## undet : 23326 3rd Qu.:11931 3rd Qu.:41.29
## Max. :99999 Max. :44.99
## NA's :37 NA's :2691
## longitude_zip county population_county
## Min. :-79.70 Length:640496 Min. : 4836
## 1st Qu.:-74.00 Class :character 1st Qu.: 919040
## Median :-73.97 Mode :character Median :1585873
## Mean :-74.28 Mean :1329146
## 3rd Qu.:-73.84 3rd Qu.:1585873
## Max. :-71.94 Max. :2504700
## NA's :2691 NA's :2691
## population_county_bucket city title_1
## (1.58e+06,1.6e+06] :203283 New York :204570 MR : 12525
## (2.48e+06,2.51e+06]: 86425 Brooklyn : 86535 MRS : 3635
## (9.3e+05,9.55e+05] : 51189 Bronx : 13944 MS : 3522
## (1.48e+06,1.5e+06] : 39468 Rochester: 10026 JR : 1031
## (2.23e+06,2.25e+06]: 35771 Buffalo : 8951 DR : 875
## (Other) :221669 (Other) :316110 (Other): 2049
## NA's : 2691 NA's : 360 NA's :616859
## title_2 title_3 contbr_employer
## JR : 476 CCM : 1 N/A : 82181
## SR : 117 MS : 2 SELF-EMPLOYED: 66742
## III : 76 RET : 23 RETIRED : 41371
## PHD : 33 SR : 3 NONE : 32318
## RET : 33 NA's:640467 NOT EMPLOYED : 21862
## (Other): 136 (Other) :395700
## NA's :639625 NA's : 322
## contbr_occupation election_tp donor_id
## RETIRED : 97865 : 612 Min. : 1
## NOT EMPLOYED : 47972 G2016:267065 1st Qu.: 3806
## ATTORNEY : 26362 O2016: 237 Median : 13068
## INFORMATION REQUESTED: 16912 P2015: 1 Mean : 23932
## TEACHER : 15066 P2016:372578 3rd Qu.: 33597
## (Other) :436244 P2020: 3 Max. :121922
## NA's : 75
## state.x fips census_area state.y
## NY:640496 36061 :203283 Min. : 22.83 New York:637805
## 36047 : 86425 1st Qu.: 22.83 Alabama : 0
## 36119 : 51189 Median : 108.53 Alaska : 0
## 36103 : 39468 Mean : 329.40 Arizona : 0
## 36081 : 35771 3rd Qu.: 603.83 Arkansas: 0
## (Other):221669 Max. :2680.38 (Other) : 0
## NA's : 2691 NA's :2691 NA's : 2691
## geometry
## MULTIPOLYGON :637805
## NA LL : 2691
## epsg:NA : 0
## +proj=aeqd...: 0
##
##
##
As expected, the candidates that received the greatest number of donations were major candidates of each party. And it is unsurprising that the candidate that won the popular vote in the general election received the greatest number of donations of all. However, two things stand out:
The Democratic Party received the greatest number of donations, followed by the Republican Party, and then Green and Libertatian.
Percentage of each party in the dataset, which represents the percentage that each party took of the count of donations in New York state.
## D G L R
## 88.75 0.16 0.12 10.97
Overall, donations amounts are on the smaller side, if we look at the quartile distribution for all the data. Half of all donations are $27 or less.
summary(ny$amount)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01 15.00 27.00 144.74 100.00 2700.00
And 95% are 500 or less.
quantile(ny$amount, .95)
## 95%
## 500
The five most frequent donation amounts were: $25, $50, $100, $10, and $5, in that order. The list below shows the count of times each these amounts were donated.
## # A tibble: 5 x 2
## amount count
## <dbl> <int>
## 1 25 94308
## 2 50 72644
## 3 100 67365
## 4 10 55253
## 5 5 42681
Opposite to those most favored amounts, were amounts that had only one person having donated them. Interestingly, none of these one instance amounts were round dollars, but they were all with cents added. The list below shows a randomly chosen sample of five amounts donated only once.
## # A tibble: 3 x 2
## amount count
## <dbl> <int>
## 1 19.29 1
## 2 35.80 30
## 3 23.97 1
It must be said that there were also plenty of non round amounts (dollars plus cents) that were donated more than once.
Also, there were full dollar amounts that nobody had donated, but these were all for potential donations greater than 250. More especifically, any whole dollar amount less than 268, was donated at least twice.
Half of donors donated at most twice,
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 5.279 5.000 1313.000
and only 1% of donors donated more than 46 times.
## 99%
## 46
There could be people on automatic recurring donations, which might explain the 1218 donations of more than 45 dolars up to 413.
(*There was one case of a donor who made 1313 donations, I eliminated these entries since I think this must have been an error.)
Notice the change in scale between the two plots above.
Expectedly, the donation rate tracks the election, with most donations taking place in 2016 as november approaches.
The graphs below show the number of donations across consecutive days in the two years leading up to the election.
Zooming in 2016.
Looking at number of donations by month.
Generally, in 2016, the end of the month is when most donations are made, excepting November.
Donations are distributed between males and females very similarly. While there are more donations by females, there is the possibility that the counts made up of androgynous names plus those of names for which gender could not be determined, if they were all masculine, could even out the difference.
Percentage of each gender in the dataset.
## both family female male undet
## 2.07 0.04 50.46 43.79 3.64
There are 1456 unique cities in the dataset. Half of them were home to 53 donations or less,
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 13.0 53.0 439.9 210.5 204570.0
with 79 being home to a single donation,
## [1] 79
and 15 cities being the origin of donation counts greater than 3033.
## [1] 15
The following plot shows which cities were the origin the of greatest number of donations.
As expected, these correspond to densely populated areas, so the number of donations matches what we know about the population. And although city population is not part of this datataset, this indicates that it would be a relevant variable to include.
Three quarter of zips were home to less than 125 donations, with half of them being the origin of only 27.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 5.0 25.0 158.5 120.0 9251.0
The zips home to the greatest number of donations are all from New York City or Brooklyn area, which is confirmed when we look at the distribution of cities.
The count of donations by zip is corroborated by the count of donations by county. Again, the more densely populated area of the south east of the state has the highest count of donations originating there.
Most counties have a small population, with the distribution of counties by population in the state heavily skewed to the right.
The map below shows the population of the counties plotted above, and we can already see that there is an overlap between the regions home to the most donations and the regions most densely populated. While this isn’t anything surprising, it works well for reassurance and a bit of sanity check.
Out of the total donations, 589708 were repeat donations, which represents 92% of all donations, and 58% of all donors.
Most donors that repeat, do so few times. Half of those who repeat do so less than 4 times, and three out of four donate 9 times or fewer.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.000 4.000 8.419 9.000 1313.000
Only one out of ten donates 19 times or more.
## 90%
## 19
The main features are candidate and donation amount. I would like to describe what led to the total recaudated by each candidate.
Other features are party, date (in its various sizes), gender, and geography, as they all could interact and affect how much money gets to each candidate. They are also intrinsically interesting.
With the present dataset, I log transformed donation count and donation amount for each candidate, since this transformation reveals a correlation pattern.
Democrats received several times more donations than any other party, and that this was driven by two main candidates: Clinton and Sanders. The next party in number of donations was the Republican party and they received only 1/8 of the Democrats. The other two parties combined received much less than a tenth of a percent of all donations.
The candidates that received the greatest number of donations turned out to be major candidates of each party. With hindsight knowledge, we can notice two things stand out:
The candidate who went on to be declared president received only the third greatest number of donations, and a little more than a fifth of the candidate immediate above him.
The two candidates who received the most contributions were both from the Democratic party.
Most donations are small: 75% were 100 dollars or less, a fraction of the maximum 2700 allowed. And half of all donations are $27 or less. The five most frequent donation amounts were: $25, $50, $100, $10, and $5, in that order. The list frequent amounts all were non-whole dollar amounts ending in cents.
Most donors contributed money very few times: 75% gave money five times or less, and half of donors gave money between one and two times. Of those who donated more than one time, half did so less than 4 times, and three out of four donated 9 times or fewer.
Gender distribution among donors is approximately equally distributed with women making up only a slightly larger number of donations than men.
The pace of donations follows the approach of the election. Most donations take place in 2016 as november approaches. Generally, in 2016, the end of the month is when most donations are made, excepting November.
Most donations come from the south-east corner of the state, regardless of whether we look at zipcode or county. Other important centers align with the location of cities that are source to large numbers of donations: those in the New York City area, Rochester, Buffalo, Albany, Ithaca, Astoria, and Syracuse.
There are 1456 unique cities in the dataset. Half of them were home to 53 donations or less, with 79 being home to a single donation, and 15 cities being the origin of donation counts greater than 3033.
As expected, these correspond to densely populated areas, so the number of donations matches what we know about the population. And although city population is not part of this datataset, this indicates that it would be a relevant variable to include.
Three quarter of zips were home to less than 125 donations, with half of them being the origen of only 27. The zips home to the greatest number of donations are all from New York City or Brooklyn area, which is confirmed when we look at the distribution of cities.
The count of donations by zip is corroborated by the count of donations by county. Again, the more densely populated area of the south east of the state has the highest count of donations originating there.
Most counties have a small population, and a few have very large. The distribution of county population is very skewed to the right.
As we had seen, 75% of all donations in the data are 100 $ or less. We also saw that parties and candidates varied in the share of donations they received. In this section, I will explore whether the observation that donations are small is equally distributed by party and candidate, or whether there are any specific candidates or party driving this pattern. Finally, we will look at whether the gender or geographic location of the donor, or the date when they donated, was related to the amount contributed.
Small amount donations was not a trait of all parties.
Distribution of amounts for the dataset.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01 15.00 27.00 144.74 100.00 2700.00
While over half (53%) of the Democratic party’s donations belonged to the two lower quartiles, the opposite is true of the Republican (73% of its donations belonged to the upper quartiles).
The figure below shows what share of each party belonged to each quartile of the whole dataset.
The following boxplots show that the Democratic party received smaller donations than the Republican, in turn smaller than the Green and the Libertarian.
Since the donation amounts tend to concentrate on the low values, I performed a log10 transformation in order to more easily detect any differences.
I performed a one way ANOVA test to determine whether there was any significant difference between the amounts donated to each party.
## Df Sum Sq Mean Sq F value Pr(>F)
## party 3 1.001e+09 333549905 1841 <2e-16 ***
## Residuals 640492 1.161e+11 181226
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And a Tukey test for significant differences confirmed that in fact donations to the Democratic party were significantly smaller than to any other party, followed by the Republican and Green parties (no significant difference between them), followed by the Libertarian party.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = amount ~ party, data = ny)
##
## $party
## diff lwr upr p adj
## G-D 127.632115 93.000063 162.26417 0.0000000
## L-D 182.528244 143.342316 221.71417 0.0000000
## R-D 124.262288 119.888715 128.63586 0.0000000
## L-G 54.896129 2.639954 107.15230 0.0350928
## R-G -3.369827 -38.216620 31.47697 0.9946183
## R-L -58.265955 -97.641798 -18.89011 0.0008285
Small donations is not a trait of all candidates either. In fact, it seems that top contenders were the recipients of smaller contributions than less competitive candidates. This is true of both parties. However, given that the Democratic Party had a greater share of small contributions than the Republican, it’s no surprise that its top candidates had the greatest share of their contributions being a small amount.
While the top democratic contenders (Clinton and Sanders) were the recipients of most donations, the amount of each donation was smaller than that of the republican frontrunners (Trump and Cruz).
The following graph shows the distribution of donations amount for all candidates.
Notice that the distributions for most candidates were skewed to the right, consistent with the finding that most people tended to give towards the lower rather than the upper limits.
As per the candidates who received 10 000 donations or more, three out of four donations given to Sanders were 50 dollars or less, to Clinton and Cruz 100 or less, and to Trump 160 or less.
## # A tibble: 4 x 7
## # Groups: candidate [4]
## candidate party median seventyfive_qtle ninetyfive_qtle count
## <fctr> <fctr> <dbl> <dbl> <dbl> <int>
## 1 Sanders D 27.00 50 106.88 173351
## 2 Clinton D 30.00 100 750.00 394612
## 3 Cruz R 40.00 100 250.00 16060
## 4 Trump R 65.72 160 500.00 35743
## # ... with 1 more variables: total_dollars <dbl>
The candidates who received smaller amounts of donations also received the greatest number of donations.
A simple linear model between log10 of total raised and the maximum amount of 75% of donations suggests a linear relationship between the log10 of both variables.
##
## Call:
## lm(formula = log10(total_dollars) ~ log10(seventyfive_qtle),
## data = ny.candidate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3194 -0.5267 -0.2108 0.6176 1.7896
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.4138 0.8914 8.317 2.19e-08 ***
## log10(seventyfive_qtle) -0.6935 0.3195 -2.170 0.0405 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8749 on 23 degrees of freedom
## Multiple R-squared: 0.17, Adjusted R-squared: 0.1339
## F-statistic: 4.711 on 1 and 23 DF, p-value: 0.04054
Receiving a large number of donations was enough to not only offset but surpass in total amount raised, the candidates whose contributors gave more money per donation.
A linear model of the log10 of these variables suggests a linear relation that is expectedly negative.
##
## Call:
## lm(formula = log10(count) ~ log10(seventyfive_qtle), data = ny.candidate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.46732 -0.60708 -0.09714 0.66092 1.68266
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.8173 0.9041 7.54 1.17e-07 ***
## log10(seventyfive_qtle) -1.4519 0.3241 -4.48 0.00017 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8874 on 23 degrees of freedom
## Multiple R-squared: 0.466, Adjusted R-squared: 0.4428
## F-statistic: 20.07 on 1 and 23 DF, p-value: 0.0001701
The result is the fact that those candidates who received the most contributions also accumulated the greatest total.
And a significant linear model suggests a positive linear relationship between the log10 of donation count and total money recaudated.
##
## Call:
## lm(formula = log10(count) ~ log10(total_dollars), data = ny.candidate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73294 -0.26986 0.00741 0.27933 0.71447
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.73528 0.48959 -7.629 9.58e-08 ***
## log10(total_dollars) 1.19287 0.08753 13.627 1.68e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4031 on 23 degrees of freedom
## Multiple R-squared: 0.8898, Adjusted R-squared: 0.885
## F-statistic: 185.7 on 1 and 23 DF, p-value: 1.678e-12
The following two plots convey one of the main findings of this analysis, by putting these three important variables together: donation size, number of donations received, total amount raised. Top raisers totaled the most money despite receiving the smaller size individual contributions because they received a very large number of them.
In the next plot, the small amount of donations is reflected in the value of their 75% percentile (although this holds for other chosen quantiles, as we will see later). Small contributors make an enormous difference, by the sheer power of their numbers, despite their small effect individually. It must be noted that this patterns persists across candidates of all parties.
The following plot is a confirmation, with a cosmetic change for visualization exploration, with size in the font of the candidate names itself.
The following plot puts together total raised, donation size, and number of donations received, with candidates ranked from highest total to smallest. It shows that those candidates that raised most also received the smallest amount for 75% of their donations, see how marker size increases with decreasing total raised values. Conversely, number of donations and total recaudated go in hand together, higher brightness values are for those with most money received.
Looking at all donations over time is impractical, as the following plot shows. However, it serves to identify donation amounts that remained popular over time. This plot also succeeds only in confirming that there is an increase in donations over time.
However, if we look at amounts in buckets, trends start to appear.
Very early on, there are few donations but they tend to be large. As 2016 progresses, smaller amounts begin to make up a larger share of all donations.
(*There were 44 entries of donations made in 2013 and 2014 to Rubio, Paul, Webb and Cruz. They added up to $47701.6 with a median of $500 and mean $1084.)
Broken down by month for each of the last two years leading up to the election.
Broken down by day by month of each of the last two years leading up to the election.
If we break down donations amount over time by party, we see that the two major parties show different trends. The greatest peak in donations for the Democratic party was shortly before the election, whereas the peak for the Republican party was around its convention. In its peak, the bulk of contributions to the Republican party belonged to the upper quartile amount, contributions greater than $100.01, and towards the end most were either in the second (15 to 27 dollars) or the upper (larger than 100) quartiles. The opposite is true of the Democratic party, which didn’t change much over time, and at its peak received most of its contributions from the lower three quartiles, donations of less than $100 dollars.
It is also interesting to see a rise upwards in donations to the Democrats (and candidate Clinton) after each of the debates.
For Democrats, number of donations and their size go upward as 2015 progresses, with peaks within months that do not seem to follow any particular pattern. In five out of the nine months with donations, these tend to increase towards the end of the month.
There were considerably fewer donations to the Republican party in 2015 than to the Democratic party. But like the Democractic party, the month-by-month breakdown does not show any particularly salient trends in 2015.
The only salient pattern in donations by month in 2016 to the Democratic party is that the end of the month is a good time for donations. This pattern was present in a few months of 2015, although there were fewer donations then.
2016 donations to the Republican party do not follow the end-of-month surge that we see in the Democratic party donations, but instead, they seem flat except for spurts between mid june and mid august. I hypothesize that this might be related to patriotic feelings around the celebration of the 4th of july. Iy must be noted, however, that donations on july 4th itself are quite small. I must also note the difference in scale for 2016 between the Democratic and Republican party donations.
There isn’t a difference in the amount of money given per donation between females and males for the whole dataset. Donations by females tend to be slightly lower (lower quantiles, lower minimum), but do not seem sensitively different from those by males.
Females:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01 15.00 27.00 125.31 75.00 2700.00
Males:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04 19.00 35.00 167.24 100.00 2700.00
An idependent two-group t-test confirms this. For both actual data
##
## Welch Two Sample t-test
##
## data: female$amount and male$amount
## t = -37.537, df = 542260, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -44.12054 -39.74171
## sample estimates:
## mean of x mean of y
## 125.3088 167.2399
and log10 transformed.
##
## Welch Two Sample t-test
##
## data: log10(female$amount) and log10(male$amount)
## t = -54.033, df = 582450, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.09054528 -0.08420646
## sample estimates:
## mean of x mean of y
## 1.533923 1.621299
There is no significant relation between geography and amount of donations. Big and small donations seem to originate similarly from all points, proportional to the amount of donations.
By city: By looking at three different grain sizes, it does not seem that there is any significant relationship between the number of donations coming from a city and the size of the donations.
By zip: Donation amounts come proportionally from all zips relative to the number of donations they are the source of.
Unexpectedly for me, there are no differences in the amount donated to each of the two major parties based on geography. Both republicans and democrats received larger sums from densely populated areas and smaller amounts from more rural ones.
The relationship between county population and donation size is tenuous. The following exploratory plots reveal no apparent trend between county population and either average or 75% percentile amount of donations received.
It is important to remember that the population distribution is skewed, with most counties having smaller population sizes.
However, when we examine the distribution of donation amount by population size bins, we see that, regardless of the size of the county, the distribution of the amount given per donation is very similar across all population sizes.
The plots below represent population in bins of 100.
Most donations to the Democratic party came from women
## both family female male undet
## 2.17 0.04 52.76 41.29 3.74
whereas most donations to the Republican party came from men.
## both family female male undet
## 1.27 0.03 32.62 63.25 2.83
Both Green and Libertarian parties had most of their donations coming mostly from men.
## [1] "Libertarian"
## both family female male undet
## 0.77 0.00 7.05 89.10 3.08
## [1] "Green"
## both family female male undet
## 1.00 0.10 31.53 64.66 2.70
For the major candidates of the Democratic party, Sanders received at least half of contributions from males, and likely less than that from females, although overall the proportion of each gender seems very similar. Clinton, on the other hand, received a greater number of donations from women than from men, and the difference is greater.
## [1] "Sanders:"
## both family female male undet
## 2.24 0.12 42.81 50.96 3.87
## [1] "Clinton:"
## both family female male undet
## 2.14 0.00 57.16 37.00 3.69
Republican party candidates received a majority of their donations from men, across all candidates. This is the case unquestionably for the two leading candidates, Cruz and Trump.
## [1] "Cruz:"
## both family female male undet
## 1.28 0.00 29.76 66.28 2.68
## [1] "Trump:"
## both family female male undet
## 1.42 0.06 31.83 63.23 3.46
For the candidates of the Libertarian and Green parties, males were also the main contributors. This despite the fact that the candidate for Green was a female.
## [1] "Stein:"
## both family female male undet
## 1.00 0.10 31.53 64.66 2.70
## [1] "Johnson"
## both family female male undet
## 0.77 0.00 7.05 89.10 3.08
We know that the gender distribution of the whole dataset is slightly more female than male. However, of all candidates, only Clinton received a majority of their contributions from females. This goes to show the magnitude by which Clinton outraised all other candidates.
Men seem to have gotten a head start in donations in 2015, but women donated more shortly after the start of 2016 and made a large difference as election day approached.
If we look at gender over time by party we find that the trend of the overall data is very similar to the data exclusively for the Democratic party, which is consistent with the fact that the majority of donations were to them.
Another finding is that that females and males behave similarly within party lines, even if there are differences based on party. Most donations for democrats come from women, whereas most donations for republicans, greens, and libertarians come from men, yet both genders seem to follow a similar pattern in the rate of their donations within their party, except for donors to the Libertarian party.
If we break down the number of donations over time by the amount of the donations, we see that the pattern persists. Males and females of each party tend do make donations of similar size, at the similar time, the only difference being that one gender donates more than the other.
There does not seem to be a difference in the amount of donations based on the population size of the county where they originate.
As most counties simply have an even proportion of male and female donors. There is greater variabily among smaller popualtion numbers, but that’s unsurprising since there are more datapoints.
Most counties have a roughly similar proportion of female and male donors. Some, though not all, of the less populated areas show a skew towards donors of either gender.
Most donations in the state were made to a candidate of the Democratic party. Despite this, we can see differences in some areas.
The following maps show what proportion of the total contributions from that county went to each of the four parties.
The bar plot below displays the same content, with counties sorted by count of donations.
Being the source of the most donations to a party does not mean that most of the county donations were to that party. As is the case with the county New York, home to the largest number of donations to all parties, it does not lead any party in share of donations.
When we look at the share of donations we can see some interesting patterns, such as that Tompkins county has the second greatest share of its donations going to the Democratic party.
When we look at the share of donations we can see some interesting patterns, such as that Tompkins county has the second greatest share of its donations going to the Democratic party, even though it trails in actual number of donations. Also, counties with
Even though New York was home to the largest number of donations to the Republican party, it was the third-to-last county in share of donations of its total. More rural counties were home to fewer donations to the Republican party, but they were home to fewer donations in general, so their leaning is revealed when we look at share of donations.
There were counties from which no donations were made to the Libertarian or the Green party, and the share they make in those counties in which their contributors reside is very small.
The exploration of single donation amount per candidate enriches the earlier findings of the distinction by party. Within the Democratic and Republican party, candidates varied in what proportion of their contributions were small or large. Green and Libertarian contributions were directed towards a single candidate for each party.
For example, 75% of donations to the Democratic Party were 90$ or less. While 75% of donations to the frontrunners Sanders and Clinton were up to $50 and $100 respectively; for the other three candidates 75% of their donations were up to $1000. However, these other candidates made for less than .0001% of total donations, which explains their negligible impact on the stats of the party.
Summary of donation size for minor Democratic candidates.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 100.0 250.0 730.9 1000.0 2700.0
Percentage taken by minor Democratic candidates.
## [1] 0.0008725343
Contributions to Clinton and Sanders made up over 99.92% of all contributions in NY.
## [1] 99.91
For the Republican party, 75% of donations were $150 or less. While the most contributed-to candidates, Trump and Cruz, received 75% of donations up to $160 and $100 respectively; for the other 16 candidates 75% of donations were up to $500. These other 16 candidates, however, make up 26% of the contributions the Republican Party received in NY.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 25.0 100.0 563.5 500.0 2700.0
Percentage of minor Republican candidates.
## [1] 0.2626747
The most interesting finding of this exploration is the positive correlation between total amount raised and number of donations received, and the negative correlation of total amount raised and donation size. In times when people express hopelessness about individuals of regular having any power over an electoral process with many rich donors, this finding can be very refreshing. The limitations of this analysis of course, is that it only looks at donations from people, which is important to keep in mind.
Those candidates that received the biggest total of money, also received the smaller donations in general. There was a negative correlation between the total money received and log10 of the 75th percentile of the donations’ amount: r=-.33, p=.10, which becomes more significant if we log10 transformed total money received:r=-.41, p=0.04. In summary: the more total money a candidate raised, the smaller the donations they received tended to be.
What offsets this is the fact that these candidates received also many more donations, even if small, than those whose donors gave them more money per donations. There was a negative correlation between the count of donations received and the 75th percentile of the donations’ amount: r=-.44, p=.03, and with count of donations log10 transformed the strength and significance of the correlation increase:r=-.68, p<0.001. The larger the count of donations received, the smaller they tended to be.
The edificant conclusion is that there were so many small donations that they were enough to be positively correlated with total amount raised. There is a strong positive correlation between total amount raised and the count of donations: r=.95, p<0.001; and if we log10 transformed these variables, the relationship still holds: r=0.94, p<0.001.
For that finding, I think two plots are crucial.
First, with candidates ranked according to total money raised and names clearly indicated on the x axis, we can see the relationships mentioned.
The next plot conveys the same information, plus party. I think it is very important to note that the pattern observed applies regardless of party affiliation.
Another important finding is that donors pattern along party lines rather than along gender. Most donations for democrats come from women, whereas most donations for republicans, greens, and libertarians come from men; yet for all parties both genders follow a similar pattern of donations over time.
Donnors’ behavior over time shows that debate and dational convention days caused peaks for those contributing to the Republican candidate, peaks that never repeated and in fact flattened towards the end. The opposite is true for donnors to the Democratic candidate, who picked up steam as the race came to an end.