Honeybees and Neonic Pesticides

Are Neonic pesticides connected to the decline of bees colonies?

Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly as py
import cufflinks as cf
cf.go_offline()
pd.set_option('display.max_columns', 30)

Data cleaning

Importing dataset

The data come from Kevin Zmith on Kaggle, inspired by the dataset Honey Production in the USA, extended to the period 1998-2017. Additionnaly, the data from USGS's Pesticide National Synthesis Project has been agregated, allowing evaluation of the statistical connections between Honey Production and the use of Neonicotinoid (neonic) pesticides.

In [2]:
data = pd.read_csv("./data/vHoneyNeonic_v03.csv")
In [3]:
data.head()
Out[3]:
state numcol yieldpercol totalprod stocks priceperlb prodvalue year StateName Region FIPS nCLOTHIANIDIN nIMIDACLOPRID nTHIAMETHOXAM nACETAMIPRID nTHIACLOPRID nAllNeonic
0 AL 14000.0 66 924000.0 92000.0 0.81 748000.0 1997 Alabama South 1 0.0 6704.8 0.0 0.0 0.0 6704.8
1 AL 15000.0 64 960000.0 96000.0 0.87 835000.0 1996 Alabama South 1 0.0 371.6 0.0 0.0 0.0 371.6
2 AL 16000.0 58 928000.0 28000.0 0.69 640000.0 1995 Alabama South 1 0.0 716.5 0.0 0.0 0.0 716.5
3 AL 18000.0 50 900000.0 99000.0 0.52 468000.0 1994 Alabama South 1 NaN NaN NaN NaN NaN NaN
4 AL 19000.0 45 855000.0 103000.0 0.59 504000.0 1993 Alabama South 1 NaN NaN NaN NaN NaN NaN
In [4]:
data.shape
Out[4]:
(1132, 17)
In [5]:
data.columns
Out[5]:
Index(['state', 'numcol', 'yieldpercol', 'totalprod', 'stocks', 'priceperlb',
       'prodvalue', 'year', 'StateName', 'Region', 'FIPS', 'nCLOTHIANIDIN',
       'nIMIDACLOPRID', 'nTHIAMETHOXAM', 'nACETAMIPRID', 'nTHIACLOPRID',
       'nAllNeonic'],
      dtype='object')

I convert the columns using pounds to kg

In [6]:
data.insert(loc=3, column='yieldpercol_kg', value=data["yieldpercol"]*0.45359237)
data.insert(loc=5, column='totalprod_kg', value=data["totalprod"]*0.45359237)
data.insert(loc=6, column='totalprod_to', value=data["totalprod"]*0.00045359237)
data.insert(loc=8, column='stocks_to', value=data["stocks"]*0.00045359237)
data.insert(loc=10, column='priceperkg', value=data["priceperlb"]/0.45359237)
data.insert(loc=11, column='pricepertonne', value=data["priceperlb"]/0.00045359237)
data.head()
Out[6]:
state numcol yieldpercol yieldpercol_kg totalprod totalprod_kg totalprod_to stocks stocks_to priceperlb priceperkg pricepertonne prodvalue year StateName Region FIPS nCLOTHIANIDIN nIMIDACLOPRID nTHIAMETHOXAM nACETAMIPRID nTHIACLOPRID nAllNeonic
0 AL 14000.0 66 29.937096 924000.0 419119.34988 419.119350 92000.0 41.730498 0.81 1.785744 1785.744324 748000.0 1997 Alabama South 1 0.0 6704.8 0.0 0.0 0.0 6704.8
1 AL 15000.0 64 29.029912 960000.0 435448.67520 435.448675 96000.0 43.544868 0.87 1.918022 1918.021681 835000.0 1996 Alabama South 1 0.0 371.6 0.0 0.0 0.0 371.6
2 AL 16000.0 58 26.308357 928000.0 420933.71936 420.933719 28000.0 12.700586 0.69 1.521190 1521.189609 640000.0 1995 Alabama South 1 0.0 716.5 0.0 0.0 0.0 716.5
3 AL 18000.0 50 22.679619 900000.0 408233.13300 408.233133 99000.0 44.905645 0.52 1.146404 1146.403763 468000.0 1994 Alabama South 1 NaN NaN NaN NaN NaN NaN
4 AL 19000.0 45 20.411657 855000.0 387821.47635 387.821476 103000.0 46.720014 0.59 1.300727 1300.727347 504000.0 1993 Alabama South 1 NaN NaN NaN NaN NaN NaN
In [7]:
data = data.rename(columns={"nCLOTHIANIDIN": "CLOTHIANIDIN", "nIMIDACLOPRID": "IMIDACLOPRID",
                     "nTHIAMETHOXAM": "THIAMETHOXAM", "nACETAMIPRID": "ACETAMIPRID",
                    "nTHIACLOPRID": "THIACLOPRID","nAllNeonic":"AllNeonic"})
data.to_csv('./data/vHoneyNeonic_v04.csv')
In [8]:
data.isnull().sum()
Out[8]:
state               0
numcol              0
yieldpercol         0
yieldpercol_kg      0
totalprod           0
totalprod_kg        0
totalprod_to        0
stocks              0
stocks_to           0
priceperlb          0
priceperkg          0
pricepertonne       0
prodvalue           0
year                0
StateName           0
Region              0
FIPS                0
CLOTHIANIDIN      237
IMIDACLOPRID      237
THIAMETHOXAM      237
ACETAMIPRID       237
THIACLOPRID       237
AllNeonic         237
dtype: int64

Deleting rows with missing values because they concern Neonic pesticides features and I want to analyze their effects on the honey production. We already know that 237 honey producers that were'nt analyze or did'nt want to display this information

In [9]:
data = data.dropna()
In [10]:
data.shape
Out[10]:
(895, 23)

Data Analyse

Top 10 of States producing honey

In [11]:
data.groupby("StateName")['totalprod_to'].sum().sort_values(ascending=False)[:10]
Out[11]:
StateName
North Dakota    294351.964626
California      227256.581256
South Dakota    183088.477819
Florida         171538.655302
Montana         108165.904512
Minnesota        95751.988530
Texas            77622.355093
Michigan         53207.292186
Wisconsin        50328.795006
Idaho            47393.598779
Name: totalprod_to, dtype: float64
In [12]:
data.groupby("Region")['totalprod_kg'].sum().sort_values(ascending=False)
Out[12]:
Region
Midwest      7.685361e+08
West         5.204415e+08
South        3.922662e+08
Northeast    6.140779e+07
Name: totalprod_kg, dtype: float64

Evolution of the price of honey

In [13]:
evo_price = data.groupby("year", as_index=False).agg({'priceperkg':'mean'})
evo_price.iplot(kind='line', x='year', xTitle='Year', color='orange',
           yTitle='Price of honey (dollars)', title='Evolution of the price of honey')

The price of honey has seen a five-fold increase in 12 years !

Production by year

In [14]:
prod_by_year = data.groupby("year", as_index=False).agg({'totalprod_to':'mean'})
prod_by_year.iplot(kind='bar', x='year', xTitle='Year', color='red',
           yTitle='Production of honey (Tonne)', title='Evolution of the production of honey')

Is there a correlation between the price and the production ?

In [15]:
data['priceperkg'].corr(data['totalprod_kg'])
Out[15]:
-0.2314051633828917

The production only has a 23% impact on the price of honey ! Other features should enter into account... The market ?

Use of Neonic by state

In [16]:
data.groupby("StateName")['AllNeonic'].sum().sort_values(ascending=False)
Out[16]:
StateName
California        1993527.1
Illinois          1978523.1
Iowa              1974038.9
Minnesota         1288217.3
Nebraska          1231300.7
Texas             1146020.6
Indiana           1135627.8
South Dakota       834293.9
Missouri           753323.4
Kansas             721914.5
North Dakota       683578.0
Ohio               683522.8
Florida            665890.4
Michigan           550048.4
Wisconsin          535772.7
Mississippi        501019.2
Arkansas           481926.0
Washington         476307.3
Georgia            455796.9
North Carolina     424006.2
Tennessee          381524.2
Louisiana          361382.7
Idaho              326371.6
Kentucky           326303.3
New York           274042.0
Pennsylvania       265164.6
Arizona            242326.7
Alabama            185986.5
Colorado           172937.9
Virginia           146227.4
Oregon             116928.4
Montana             93737.8
Maine               85701.7
New Mexico          46606.7
South Carolina      41571.0
New Jersey          36659.7
West Virginia       15852.8
Wyoming             13265.8
Vermont             11965.7
Utah                 8787.6
Maryland             3976.9
Oklahoma             2438.2
Nevada               2303.3
Name: AllNeonic, dtype: float64

Evolution of the use of Neonic pesticides

In [17]:
evo_neonic = data.groupby("year", as_index=False).agg({'AllNeonic':'mean'})
evo_neonic.iplot(kind='bar', x='year', xTitle='Year', color='green',
           yTitle='Use of Neonic pesticides (kg)', title='Evolution of the use of Neonic pesticides')

There is an 4460% increase of use of Neonic between 1995 and 2014 !

Is there a correlation between the production and the use of Neonic ?

In [18]:
data['totalprod_kg'].corr(data['AllNeonic'])
Out[18]:
0.11863126185912876

The correlation is low between the production of honey and the use of Neonic pesticides (11%)

In [19]:
evo_col = data.groupby("year", as_index=False).agg({'numcol':'count'})
evo_col.iplot(x='year', xTitle='Year', color='purple',
           yTitle='Number of colonies', title='Evolution of the number of colonies')

Is there a correlation between the number of colonies and the use of Neonic pesticides ?

In [20]:
data['numcol'].corr(data['AllNeonic'])
Out[20]:
0.18999631567917497

The correlation is low between the number of colonies and the use of Neonic pesticides (19%)

In [ ]: