Tuesday, November 1, 2016

Wednesday (11/2): Polling, measurement errors, causal reasoning #1

Announcements
  • Midterm grades entered
    • Overall pretty good
    • Notably, not rounded up
    • Some grades were on the borderline
    • Our conditional deal for students concerned about failing grades still holds
  • Quiz next week, no quiz this week

Plan for today
  • Polling, voting, and a PSA  
  • Lesson 4.2: Review selection bias and sources of measurement error in polling arguments
  • Review selected homework questions 
  • Lesson 4.3: Review Mill's method(s) of discovering causes, general structure of a causal argument, correlation vs. causation
  • Review selected homework questions


Polling, Voting, and a PSA

Speaking of polls...

At this point, the presidential race seems too close to call.  Polls show Hillary Clinton maintaining a narrow lead - and indications are that lead is narrowing.  

In some states, of course, the contest is pretty much decided. New York's going to go for Hillary. Texas is going to go for Trump. But in "battleground states," it ain't over til it's over.



 11/1/16 Map via FiveThirtyEight.com

Oh... look where you are!

Ohio is a battleground state.

"One week out, Ohio remains a tossup" 

Liz's PSA:
Exercise your right to vote this election season!

I know, I know... we've all got election fatigue. Media coverage of Trump, Hillary, and the various scandals afflicting their campaigns is constant. We're subject to a barrage of (often conflicting) polling results on a daily basis. In short, this is how we all feel:




But it would be a mistake to let election fatigue keep you from exercising your right to vote. 
You have a real chance to influence the outcome of the election.

Remember, if you can't get behind either major-party candidate, you can vote for a third-party candidate for president, or even write in a candidate. 
Of course, this is probably pointless. But hey, it's your right, and governor Kasich even did it. 

Perhaps more crucially, while the presidential race is (understandably) dominating headlines, remember that it's not the only election on the ballot.

In Ohio, we're voting for a U.S. Senator, a U.S. Representative, several State Supreme Court justices, and more. State officials, and the laws they support, have a huge impact on your day-to-day life in Ohio - probably more than the president.

So...



My advice, for what it's worth, is to vote early. When I voted in the primary election, I spent the majority of my only free hour in the day waiting in line. Lines on election day are probably going to be even longer.
Ain't nobody got time for that. 
You can vote early at the Board of Elections office in the Wood County Courthouse. If you live in a different county, check with your local board of elections.
 
If you're interested in voting but you're not sure about where/when you can go, what you should bring, and so on, here's a link to get you started. Importantly, you'll also find a link to the Ohio 2016 ballot here, so you can do your research on candidates before heading to the polls. 


Lesson 4.2: Polling & Measurement errors

Key terms:
-Selection bias
-Self-selection
-Measurement error
-Margin of error - the degree to which measurements are dependable (e.g., a poll that claims to be accurate +/- 4%)

Takeaway points: 
-Selection bias, esp. self-selection, can create biased samples and measurement errors
-Assess P1 in a polling argument for sample size (r/o hasty generalizations), representativeness (r/o biased samples)
-Assess P2 for sources of measurement error that might make you doubt the conclusion. These include: medium, vagueness, timing, location, second-hand reporting, "people are dumb but don't wanna look it," phrasing, and selection bias (incl. self-selection).



Polls are generalizations
  • Generalizations re: people's attitudes, beliefs, desires, etc.
  • Pertinent considerations:
    • Sample size and representativeness
    • Target population, i.e., the group you're generalizing about
    • Property in question that you're trying to measure  

General structure of a polling argument

P1) S is a representative sample of X's
P2) Proportion 1 of X's in S have property Y
C) Proportion 2 of X's have property Y

Example:

P1) This class is representative of all students at BGSU 
P2) 90% of students polled in this class don't like homework
C) So, 90% of BGSU students don't like homework

Assessing P1:  

Sample size & representativeness (last week)

Selection bias (new this week)
When the way the sample is selected causes sample bias 

Self-selection is a common form of selection bias
  • When the selection method for collecting data relies on people volunteering to participate. 
  • Problem because, in general, only those with extreme/strong views on an issue will be represented.



Assessing P2: Measurement errors


Measurement errors:

1. You're not measuring what you think you're measuring

2. The way you collect data biases results in one direction or the other 

Example: I want to measure how satisfied students are with this class. So, I ask students who come to my review session if they think the class is going well or poorly. The majority of the group says the class is going well. So, I conclude that most students in the class think the class is going well.

There are lots of problems with this poll. To name just two possible sources of measurement error, the way I collected info (face to face, not anonymous) and the timing of the survey (at a review session, i.e., just prior to an exam) are likely to skew results (more positive feedback).

Can you think of other problems? (It may help to formalize my argument.)


Sources of measurement error

1. The medium through which questions are asked can bias results. -In some situations are more likely to lie the more impersonal the medium (think internet surveys, phone interviews) than they are in face to face surveys. 

-In other situations, people are more likely to lie when the polling medium is personal (face to face) rather than impersonal.

Homework example: ?

2. Vagueness 

-If the terms aren't properly operationalized, the target population can interpret the terms differently than the researchers intend. 

Example: Do you drink frequently, rarely, or occasionally?

3. Time 

-The time at which a poll takes place can have a tremendous impact on results. 

Homework example?

4. Place 

-For a variety of reasons, location can cause measurement problems. 
-When people are asked questions while in groups or among friends, they're more likely to want to conform to what they think are the group's views rather than express their own. 

Asch line experiment:





 

5. Second-hand reporting 
-Newspapers and media outlets want eyeballs, so they might (okay, often) over-emphasize certain aspects of the poll or interpret the results in a way that sensationalizes them.

6. People are dumb/don't want to look uninformed

-They'll give you an answer even if they have no idea what you're asking about. 
-Instead of the relevant variable, you're measuring people's willingness to give an opinion, to be on TV, to not seem like idiots... 




Homework example?
 

7. Phrasing 
-How you phrase a question can have a large impact on people's responses. 
-Loaded questions set the tone in order to get the kinds of answers they want. 

Explain how questions like these might skew poll results: 

Ami (in an incredulous tone): I mean, did anyone think the last exam was difficult?

Recent studies show that 9/10 Ohioans favor Proposition X. Where do you stand on the issue?

8. Method of collecting data can lead to self-selection 
-A self-selected group leads to a non-representative sample. -Because I have a biased sample I'm no longer measuring what I think I'm measuring (attitudes in the general population).

Homework example(s)?




Review selected homework questions

Link to Lesson 4.2 HW




Lesson 4.3: Causal Reasoning (1)

Key terms:
-Method of Agreement
-Method of Difference
-Joint Method of Agreement and Difference
-Method of Concomitant Variation
-Correlation
-Causation
-Direction of causation 
-Causal mechanism

Takeaway points:
-Causal arguments -general structure

-Mill's methods - Agreement, Difference, Joint Agreement & Difference, Concomitant Variation - are (partial) avenues by which to isolate causal factors
-Differentiating correlation from causation & importance of isolating a plausible causal mechanism
-Important: Mill's methods don't allow us to definitively conclude that variable X causes variable Y, but passing all the tests Mill proposed indicates we're on the right track 
  
Causal claims are generalizations   
  • The same considerations regarding sample size (must be large enough) and representativeness of the sample we discussed w/r.t. generalizations apply to causal arguments.
 Causal arguments are inductive arguments 
  • The conclusion contains a causal claim   
  • The conclusion is probabilistic 
    • Sometimes highly (approaching 100%) probable, e.g., "Gravity causes things to fall down." 
    • Sometimes plausible but somewhat more dubious, e.g., "Playing violent video games causes kids to be more violent." 

General structure of a causal argument

P1) X is correlated with Y.


P2)The correlation between X and Y is not due to chance (i.e., it is not merely statistical or temporal).* <- see notes below

P3) The correlation between X and Y is not due to some mutual cause Z or some other cause.

P4) Y is not the cause of X. (Direction of causation).

C): X causes Y.


 Regarding P2: Crucial to distinguish correlation vs. causation
  •  Roughly speaking, correlation = covariance 
    • Positive correlation: Two variables increase or decrease together
    • Negative correlation: One variable increases as the other decreases (& vice versa) 
  • Correlation does not imply causation 
    • If there is a correlative relationship between variables X and Y, one variable may be the cause of the other... but it may not be. 

How to distinguish correlation from causation?




*In (P2), to show that the correlation isn't merely due to chance there should be a proposed causal mechanism.

  • Hypothesize a plausible causal mechanism to explain how X causes Y.
  • Plainly put: you need a causal mechanism that makes sense! 
  • May require specialized knowledge in some cases, but often intuitive.
  • Example to clarify: 

Okay... there's a correlation here.
But can you offer a reasonable explanation as to how the IE market share caused the declining murder rate?

Upshot: If you can offer an explanation of how X causes Y that relies on a plausible causal mechanism, you've gone one step towards establishing that X causes Y, rather than (merely) correlates with Y. (Note: strengthens your case, but still not a sure thing.) 


Brief review:
Mill's method for discovering causes

(Who was John Stuart Mill?)

1. Method of agreement 
2. Method of difference 
3. Joint method of agreement and difference (combo of 1&2)
Related: importance of control groups in research
4. Method of concomitant variation 
Application: dose-response relationship (med research) 

1. Method of agreement: What's the same?

If two or more events share only one relevant characteristic/variable then that variable must be the cause. 




Example from class: Food poisoning


2. Method of difference: What's different?

The relevant factor present when a phenomenon occurs, and absent when the phenomenon does not occur, must be the cause. 

To apply the the method of difference:
1. List all the factors that are present prior to an event happening
2. Compare it to a list of the factors present when the event doesn't happen. 
Whatever is on the first list but not the second is the cause.



Example: You're testing a new migraine medication. You divide migraine sufferers into two groups. Group A gets the new medication. Group B gets a sugar pill. The results (controlling for the placebo effect & other variables as much as possible) show that Group A had shorter/less severe bouts of headache pain than Group B.


3. Joint method of agreement and difference
Combine the two methods. Establish that there exists a relevant variable, X, (and X alone) that regularly precedes event E, and that when X is absent E does not occur.

Formalization:

Instance 1: Factors a, b, and c are followed by E.

Instance 2: Factors a, b, and d are followed by E.

Instance 3: Factors b and c are not followed by E. (Look for counter-examples)

Instance 4: Factors b and d are not followed by E. (Look for counter examples)

Therefore, factor a is probably the cause of E.
 

Example: "Cookies are always missing from the cookie jar whenever Johnny is in a group of children, and never when Johnny is missing from one or more of those same groups. This does not apply to any other child. We therefore suspect Johnny as the thief."

Credit for this example http://www.baam.emich.edu/baamsciencelessons/baammillsrules.htm


4. Method of concomitant variation

Formalization of Method of Concomitant Variation

Instance 1: Factors a, b, and c are correlated with E.

Instance 2: Factors a, b and increased c are correlated with increased E.

Instance 3: Factors a, b, and decreased c are correlated with decreased E.*


Therefore, factor c is causally connected with E. 


*Instances 2 and 3 are crucial re: differentiating correlation & causation!

 

Application of the method of concomitant causes: Dose-response relationship 
"One tequila, two tequila, three tequila, floor."


Regarding P3: "The correlation between X and Y is not due to some mutual cause Z or some other cause."

"Some other cause" - fairly self-explanatory
You were tracking the wrong thing

"Some mutual cause Z" - more interesting, often trickier to discern

The third variable problem

"A type of confounding [e.g., erroneous causal reasoning] in which a third variable leads to a mistaken causal relationship between two others. 


Example: Cities with a greater number of churches have a higher crime rate. 
However, more churches do not lead to more crime, but instead the third variable, population, leads to both more churches and more crime."

(Credit: http://onlinestatbook.com/2/glossary/third_variable_problem.html)

 Related to homework example (substitute "video games" for "media violence"):

 
Explain the third variable problem in the above example?


Review selected homework problems (4.3) 

Link to HW 4.3 


Any questions??

 

No comments:

Post a Comment