### Data Science Day 5:

### Chi-Square Application 2:

**Test Independence of Two categorical variables**, or known as **Contingency Table**.

*We use the Chi-Square test of Independence to check if two categorical variables are independent, or have a strong association.*

**Example 1: Ice-cream Favor VS. Buyer’s Gender**

We want to see if there is a preference for ice-cream favor based on the gender of people eating it

Total | Gender | Strawberry | Chocolate | Vanilla | Green Tea |
---|---|---|---|---|---|

380 | Men | 80 | 120 | 120 | 60 |

620 | Women | 50 | 200 | 250 | 120 |

1000 | 130 | 320 | 370 | 180 |

**H0 (Null Hypothesis):** The preference for *Ice-Cream Favor* and *Buyer’s* *Gender* are **Independent**, (There is no association between Ice-Cream Favor and Gender selection)

**Solution:**

we will use **SciPy** package and *chi2_contingency* function in Python.

**Python Code:**

*favor_gender=np.array([[60,120,120,80],[120,250,200,50]])*

*stats.chi2_contingency(favor_gender)*

**Result:**

We see the **p-value is 4.3e-08**, which is significantly < 0.05. So we **Reject** the Null Hypotheses and Conclude the** Ice-Cream Favor is dependent on the buyer’s Gender.** Note:* If if the total count is <5 the result might be biased, and if it is *2 x 2* table (2 categorical variables with 2 observations) we will proceed with Fisher’s Exact Test*.

**Data Visualization:**

From the visualization, we can see the indeed there’s more girls prefer Green Tea favor Ice-cream. 🙂

Source Code:

To be continue….