Name: Michael Zhang
Contact Info: 832-726-2026
Date: 6/3/2024
A minority of large donors have always had a disproportionate influence on the Houston elections. For example, the current mayor of Houston, John Whitmire, held a huge advantage during the recent mayoral race with his “war chest” campaign funding. With $9 million dollars from donations, his funding pool was 30 times the size of any of his competitors in the Houston mayoral race.
This trend doesn’t end with Whitmire either. Various incumbent Houston city council members, judges, and elected constables all have records showing large donations from mysterious, and often out-of-state donors. Yet little information exists online about who these donors are, and more importantly, what their motivations are to funding local Houston politicians.
Records: The data for this project comes from The City of Houston Open Data, which is updated and maintained by the City Secretary’s Office. The excel sheet lists a detailed history of campaign finance transactions, containing candidates, transaction amounts, and donor information from 2007-2019.
Using the records, we can hopefully shed some more light on the background and motivation of these local “big hitter” campaign donors. While the sheet has lacked sufficient upkeep, with its most recent update being in 2019, many listed in the document are currently in some kind of local political office. It would be interesting to see an expansive story on where these city politicans get all their campaign money, and how it could influence their policy decisions.
Non-Houstonian Donors often contributed large sums of money to the current political incumbent campaings. While some could be just friends of the politicians, a significant portion of the donors held high corporate positions or were some kind of investor.
There were also a large number of non Houstonian real estate developers, it would be worth looking into whether some of these incumbents were unfairly proportioning out contracts to developers which donated to their campaigns
Another story worth pursuing would be covering the lack of transparency for this data. The excel sheet I retrieved was not updated properly, and when I called the City of Houston Secretary’s Office, they didn’t even know the database listed on their website existed. Further, the website was being run by a contracted data company, with how badly this company is doing its job, it would be worthwhile to see how much the government pays this company for its badly kept data.
installing basic packages if needed
#install.packages("openxlsx")
#install.packages("tidyverse")
#install.packages("rio")
#install.packages("janitor")
#install.packages("dplyr")
#install.packages("scales")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rio)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(dplyr)
library(openxlsx)
library(kableExtra)
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
reading in the data
suppressWarnings({
show_col_types = FALSE
Campaign_data <- read_csv("campaign_finance.csv")
})
## Rows: 198231 Columns: 87
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (66): CandidateOrCommittee, CommitteeName, CandidateFirstName, Candidat...
## dbl (13): ReportID, CampaignYear, MostRecentReportVersion, 30th day before ...
## lgl (7): LoanGuarantorName, LoanGuarantorCity, LoanGuarantorState, LoanGua...
## dttm (1): ReportFiledDate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Now, just to get a sense of the kind of transactions being tracked in this file, we do a quick count and arrange
# Perform the count and arrange operations
counted_data <- Campaign_data %>%
count(MasterTransactionType) %>%
rename(Count = n,`Transaction Type` = MasterTransactionType) %>%
arrange(desc(Count))
top_5_data <- head(counted_data, 5)
top_5_data %>%
kable(format = "html", caption = "<b style='color:black;'>Top 5 Political Transactions by Count</b>") %>%
kable_styling("striped")
Transaction Type | Count |
---|---|
Revenue | 111851 |
Expense | 72919 |
Political Contributions | 4791 |
Political Expenditures | 4509 |
Political Contributions Other Than Pledges or Loans | 3059 |
After getting a sense of the data the next step was to filter the data set for current incumbents, creating a new spreadsheet for each individual.
Sallie_Alcorn <- Campaign_data %>% filter(CandidateFullName %in% "Sallie Alcorn")
Willie_R_Davis <- Campaign_data %>% filter(CandidateFullName == "Willie R Davis")
Martha_Castex_Tatum <- Campaign_data %>% filter(CandidateFullName == "Martha Castex-Tatum")
Mr_Edward_J_Pollard <- Campaign_data %>% filter(CandidateFullName == "Mr. Edward J Pollard")
Adrian_Garcia <- Campaign_data %>% filter(CandidateFullName == "Adrian Garcia")
Dr_Carolyn_Evans_Shabazz <- Campaign_data %>% filter(CandidateFullName == "Dr. Carolyn Evans-Shabazz")
amy_peck_variants <- c("Ms. Amy Peck", "Ms. Amy L Peck", "Mrs. Amy Peck")
Amy_Peck <- Campaign_data %>% filter(CandidateFullName %in% amy_peck_variants)
edward_gonzalez_variants <- c("Mr. Edward Ed Gonzalez", "Edward Edward Gonzalez", "Edward Ed Gonzalez")
Edward_Gonzalez <- Campaign_data %>% filter(CandidateFullName %in% edward_gonzalez_variants)
From there, we can sort each of the incumbent dataframes by donation amount to see the most influential donors for each incumbent
suppressWarnings({
# Amy_Peck
Amy_Peck$TransactionAmount <- as.numeric(Amy_Peck$TransactionAmount)
Amy_Peck <- Amy_Peck %>% arrange(desc(TransactionAmount))
# Willie_R_Davis
Willie_R_Davis$TransactionAmount <- as.numeric(Willie_R_Davis$TransactionAmount)
Willie_R_Davis <- Willie_R_Davis %>% arrange(desc(TransactionAmount))
# Martha_Castex_Tatum
Martha_Castex_Tatum$TransactionAmount <- as.numeric(Martha_Castex_Tatum$TransactionAmount)
Martha_Castex_Tatum <- Martha_Castex_Tatum %>% arrange(desc(TransactionAmount))
# Mr_Edward_J_Pollard
Mr_Edward_J_Pollard$TransactionAmount <- as.numeric(Mr_Edward_J_Pollard$TransactionAmount)
Mr_Edward_J_Pollard <- Mr_Edward_J_Pollard %>% arrange(desc(TransactionAmount))
# Adrian_Garcia
Adrian_Garcia$TransactionAmount <- as.numeric(Adrian_Garcia$TransactionAmount)
Adrian_Garcia <- Adrian_Garcia %>% arrange(desc(TransactionAmount))
# Dr_Carolyn_Evans_Shabazz
Dr_Carolyn_Evans_Shabazz$TransactionAmount <- as.numeric(Dr_Carolyn_Evans_Shabazz$TransactionAmount)
Dr_Carolyn_Evans_Shabazz <- Dr_Carolyn_Evans_Shabazz %>% arrange(desc(TransactionAmount))
# Edward_Gonzalez
Edward_Gonzalez$TransactionAmount <- as.numeric(Edward_Gonzalez$TransactionAmount)
Edward_Gonzalez <- Edward_Gonzalez %>% arrange(desc(TransactionAmount))
# Sallie_Alcorn
Sallie_Alcorn$TransactionAmount <- as.numeric(Sallie_Alcorn$TransactionAmount)
Sallie_Alcorn <- Sallie_Alcorn %>% arrange(desc(TransactionAmount))
})
From there, we check that the candidate spreadsheet was properly sorted based on the transaction amount
Just as an example, we can use Sallie Alcorn’s transaction sheet to check this
result <- Sallie_Alcorn %>%
select(CampaignYear, TransactionAmount, TransactionPartyFullName, TransactionPartyEmployer, OfficeSought, TransactionAddressStreet, TransactionPartyCity, TransactionPartyOccupation, TransactionType) %>%
group_by(TransactionAmount)
result <- Sallie_Alcorn %>% filter(TransactionType %in% c("Political Contributions","Monetary Political Contributions"))
write.xlsx(result, file = "Sallie_Alcorn_Transactions.xlsx")
top_5_data <- result %>%
select(CampaignYear, TransactionAmount, TransactionPartyFullName, TransactionPartyCity, TransactionType)
top_5_data <- head(top_5_data, 10)
# Render the table
top_5_data %>%
kable(format = "html", caption = "<b style='color:black;'>Sallie Alcorn Transaction History</b>") %>%
kable_styling("striped")
CampaignYear | TransactionAmount | TransactionPartyFullName | TransactionPartyCity | TransactionType |
---|---|---|---|---|
2019 | 5000 | Mr. Scott Sparvero | Houston | Political Contributions |
2019 | 5000 | Anthony Riggins | Oakland | Political Contributions |
2019 | 5000 | Melinda Hildebrand | Houston | Political Contributions |
2019 | 5000 | Diana Hawkins | Houston | Political Contributions |
2019 | 5000 | Jeff Hildebrand | Houston | Political Contributions |
2019 | 5000 | Mrs. Rosemary Comstock | Houston | Political Contributions |
2019 | 5000 | Mrs. Virginia Elkins | Houston | Political Contributions |
2019 | 5000 | Stephen Costello | Houston | Political Contributions |
2019 | 2500 | Ms. Jan Bres | Houston | Political Contributions |
2019 | 2500 | Kate Gibson | Houston | Political Contributions |
Great, now we have seperated the dataset by incumbent, and can use that data for further analysis if needed.
To make it easier for reporters to play around with this data, we could use datasette lite. For proof of concept, feel free to load the following link in the load csv tab on datasette lite:
Going forward, if we want to look at in city/ outside city contributions, we need to create a incumbent transactions variable, and use a secondary software to clean the naming conventions
suppressWarnings({
incumbent_transactions <- read_csv("incumbent_transactions_cleaned.csv")
})
## Rows: 9512 Columns: 88
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (62): CandidateOrCommittee, ReportFiledDate, CandidateFirstName, Candida...
## dbl (16): Column, ReportID, CampaignYear, MostRecentReportVersion, 15th day ...
## lgl (10): CommitteeName, 15-Jul, BallotMeasureDescription, ExpenseCandidateB...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Let’s filter our data to only include contributions and get rid of NAs (For now at least, cleaning up the data more and figuring out what the NA means is necessary for later)
contributions_to_incumbents <- incumbent_transactions %>%
filter(transaction_type %in% c("Political Contributions", "Monetary Political Contributions")) %>%
filter(!is.na(transaction_amount))
A few candidates’ names show up as multiple values. We can use case_when() to standardize them.
contributions_to_incumbents <- contributions_to_incumbents %>%
mutate(candidate_full_name = case_when(
candidate_full_name == "Mr. Edward Ed Gonzalez" ~ "Edward Ed Gonzalez",
candidate_full_name == "Mrs. Amy Peck" ~ "Ms. Amy Peck",
candidate_full_name == "Ms. Amy L Peck" ~ "Ms. Amy Peck",
.default = candidate_full_name
))
I wanted to also take a look at which kind of groups are funding these politicans, so let’s do a quick count of that as well.
# Perform the count and arrange operations
counted_data <- contributions_to_incumbents %>%
count(TransactionPartyEmployer) %>%
arrange(desc(n))
top_5_data <- head(counted_data, 5)
# Render the table
top_5_data %>%
kable(format = "html", caption = "<b style='color:black;'>Top 5 Donor Occupations</b>") %>%
kable_styling("striped")
TransactionPartyEmployer | n |
---|---|
NA | 3231 |
Self | 402 |
Self-Employed | 112 |
City of Houston | 58 |
Information Requested | 53 |
Seems like a lot of the contributors are either self employed, or did not fill in that section of the form, regardless this data could be interesting as those listed as “self employed” are likely some kind of lobbyist or investor with some sort of business intentions.
Now back to the original analysis, let’s group all contributions by person and get the sum for each person.
contr_sums_per_cand <- contributions_to_incumbents %>%
group_by(candidate_full_name) %>%
summarise(total_contributions = sum(transaction_amount)) %>%
arrange(desc(total_contributions))
top_5_data <- head(contr_sums_per_cand, 10)
top_5_data <- top_5_data %>%
mutate(total_contributions = dollar(total_contributions))
top_5_data %>%
kable(format = "html", caption = "<b style='color:black;'>Total Incumbent Campaign Funds</b>") %>%
kable_styling("striped")
candidate_full_name | total_contributions |
---|---|
Adrian Garcia | $1,745,114 |
Edward Ed Gonzalez | $659,938 |
Dr. Carolyn Evans-Shabazz | $152,735 |
Sallie Alcorn | $145,907 |
Martha Castex-Tatum | $123,198 |
Willie R Davis | $36,125 |
Ms. Amy Peck | $14,099 |
Mr. Edward J Pollard | $11,285 |
In_state_table <- contributions_to_incumbents %>%
group_by(candidate_full_name) %>%
filter(str_detect(transaction_party_state, "TX")) %>%
summarise(In_state_contributions = sum(transaction_amount)) %>%
arrange(desc(In_state_contributions))
top_5_data <- head(In_state_table, 8)
top_5_data <- top_5_data %>%
mutate(In_state_contributions = dollar(In_state_contributions))
top_5_data %>%
kable(format = "html", caption = "<b style='color:black;'> Incumbent In-State Campaign Funds</b>") %>%
kable_styling("striped")
candidate_full_name | In_state_contributions |
---|---|
Adrian Garcia | $1,675,152 |
Edward Ed Gonzalez | $627,471 |
Sallie Alcorn | $135,177 |
Dr. Carolyn Evans-Shabazz | $128,260 |
Martha Castex-Tatum | $122,198 |
Willie R Davis | $35,575 |
Ms. Amy Peck | $12,600 |
Mr. Edward J Pollard | $10,785 |
Instate_outstate_comparison <- left_join(In_state_table, contr_sums_per_cand, by = "candidate_full_name")
Instate_outstate_comparison <- Instate_outstate_comparison %>%
mutate(in_state_percentage = In_state_contributions
/ total_contributions)
top_5_data <- head(Instate_outstate_comparison, 8)
top_5_data <- top_5_data %>%
mutate(In_state_contributions = dollar(In_state_contributions)) %>%
mutate(total_contributions = dollar(total_contributions)) %>%
mutate(in_state_percentage = percent(in_state_percentage))
top_5_data %>%
kable(format = "html", caption = "<b style='color:black;'>Incumbent Campaign Fund Breakdown</b>") %>%
kable_styling("striped")
candidate_full_name | In_state_contributions | total_contributions | in_state_percentage |
---|---|---|---|
Adrian Garcia | $1,675,152 | $1,745,114 | 95.99% |
Edward Ed Gonzalez | $627,471 | $659,938 | 95.08% |
Sallie Alcorn | $135,177 | $145,907 | 92.65% |
Dr. Carolyn Evans-Shabazz | $128,260 | $152,735 | 83.98% |
Martha Castex-Tatum | $122,198 | $123,198 | 99.19% |
Willie R Davis | $35,575 | $36,125 | 98.48% |
Ms. Amy Peck | $12,600 | $14,099 | 89.37% |
Mr. Edward J Pollard | $10,785 | $11,285 | 95.57% |
contr_sums_per_cand <- contributions_to_incumbents %>%
group_by(candidate_full_name) %>%
summarise(total_contributions = sum(transaction_amount)) %>%
arrange(desc(total_contributions))
In_city_table <- contributions_to_incumbents %>%
group_by(candidate_full_name) %>%
filter(str_detect(transaction_party_city, "Houston")) %>%
summarise(In_City_Contributions = sum(transaction_amount)) %>%
arrange(desc(In_City_Contributions))
InCity_OutCity_comparison <- left_join(In_city_table, contr_sums_per_cand, by = "candidate_full_name")
InCity_OutCity_comparison <- InCity_OutCity_comparison %>%
mutate(in_city_percentage = In_City_Contributions
/ total_contributions)
InCity_OutCity_comparison <- InCity_OutCity_comparison %>%
mutate(In_City_Contributions = dollar(In_City_Contributions)) %>%
mutate(total_contributions = dollar(total_contributions)) %>%
mutate(in_city_percentage = percent(in_city_percentage))
InCity_OutCity_comparison %>%
kable(format = "html", caption = "<b style='color:black;'>Incumbent Campaign Fund Breakdown</b>") %>%
kable_styling("striped")
candidate_full_name | In_City_Contributions | total_contributions | in_city_percentage |
---|---|---|---|
Adrian Garcia | $1,135,151 | $1,745,114 | 65.0% |
Edward Ed Gonzalez | $530,194 | $659,938 | 80.3% |
Sallie Alcorn | $131,477 | $145,907 | 90.1% |
Dr. Carolyn Evans-Shabazz | $105,710 | $152,735 | 69.2% |
Martha Castex-Tatum | $92,585 | $123,198 | 75.2% |
Willie R Davis | $27,675 | $36,125 | 76.6% |
Ms. Amy Peck | $10,025 | $14,099 | 71.1% |
Mr. Edward J Pollard | $9,235 | $11,285 | 81.8% |
now write the csv’s out
write.csv(Instate_outstate_comparison, "inState_outState_comparison.csv", row.names = FALSE)
write.csv(InCity_OutCity_comparison, "inCity_outCity_comparison.csv", row.names = FALSE)