BULLETIN BOARD (Q & A)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question. How do the problem sets will affect the
overall grade in the class? I remember in class you talking that it
will only matter if your grade is on the border line between two
grades, but the syllabus says that they worth 20% of your grade.
Answer. The short answer is that the problem sets in
practice will affect your course grade only if you are near AND BELOW
the border line between two letter grades. The first thing I will
do is average out the two test grades (25% each) and final exam grade
(50%) and you are guaranteed to get at least that average as the course
grade. But if that average comes out to, say, 2.45 (high C+
just short of the 2.5 borderline between a B and a C), then I would
look at how many problem sets you turned in and how well they were
done. They could count up to 20% (depending on how many you
turned in) of the overall grade, and they might boost from a C+
(recorded on your transcript as a C) to a B- (recorded as a
B). On the other hand, if your tests average out to
say 2.55 (a low B-), you'd get a B in the course even if you turned in
no problem sets.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question: I am trying to decide
whether it
would be best to take POLI 300 or POLI 400. I am planning on
going into a law related field upon graduation and after law
school. Which of these would you recommend? Also, is
statistics a recommendation for either of these courses?
Answer: POLI 300 (Quantitative Methods) arguably
has more relevance beyond academic social/political science than POLI
400 (Qualitative Methods). If the LSAT still has
quantitative/logical/mathematical questions (along the lines of the SAT
Math test), POLI 300 should be helpful there
also. POLI 300 introduces some statistical ideas and
computations, but
does so pretty informally. My general recommendation is that, if
students
are going to take a STAT course (even STAT 121) at some point, its
probably
best to do this after taking POLI 300, on the grounds that POLI
300
will give you a bit of a headstart on some of the topics and should
also
give you a sense of the practical utility of the topics taught in a
STAT
course.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question: Would POLI 300 class be more appropriate
to take this fall as a sophomore, a junior, or a senior?
<>Answer: While POLI 300 probably should not
be taken by freshmen and we really want sudents to take it before their
senior year (though unfortunately many students do put it off), there
are no powerful reasons to prefer sophomore to junior year or vice
versa. But if it fits into your schedule as a sophmore, it would
probably to make sense to take it then, since in can be helpful in some
300-level POLI courses (POLI 324, 325, etc.) that you may be interested
in.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question:
>I found the SPSS directions fairly simple to follow. I
did however
find it difficult conceptualizing the cross tabulation percentages
(i.e., rows, columns, totals...what info does each provide?). I
also
found recoding to be difficult, but I assume that is because we have
not covered that material in class.
Answer:
Yes, we will cover these and other topics in much more detail later in
the course. The point at this stage [Problem Set #1A] is to make
sure that you can carry out the SPSS commands discussed in the handout,
even if you don't fully understand what they mean.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<>Question: I was searching around on the Web
and I found a decent SPSS tutorial. I don't expect anyone would
have
much trouble using it, but if they do perhaps this tutorial would
further
help. >
Answer: Thanks very much for the
information on SPSS tutorial website. I was aware of that faculty
group before, have heard some of their presentations, and have some of
their printed
materials and diskettes, but I was not aware of the website, which is
much
more useful. I'll announce it class and put a link on the course web
page.
It goes beyond the topics needed for POLI 300, but the hypertext format
makes it easy to skip around to find what you need (and I'm sure it's
more
helpful that the SPSS Help function). I'm sure I can learn useful
things
from it, and will refer other students and UMBC faculty and staff
members
to it. [SPSS
Tutorial]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question: I have recently read in the news
that a researcher used an online poll for his study. It is
obvious, from what we've learned in class, that there are serious
reliabilty [really bias issues -- NRM] issues here.
According to the news excerpt [Reuters], out of some 40,000 responses
from an MSNBC website he examined a random sample of some 7,000
respondents and further narrowed the group down to 384. How is
this valid? Even if he took a random sample, wouldn't the
response be biased anyway? Further, why would PhD level
researchers use polls like these when stupid undergrads like me know
that it is a no-no?
Answer: The researcher may have regarded
the 40,000 self-selected MSNBC respondents to be the population of
interest but did not have the resources to study all 40,000
cases, and therefore took a random sample of 7,000 to study and then a
subsample of 384 for more detailed study. This would be an
entirely reasonable and proper procedure. Indeed, it would
be of considerable to compare the opinions of people who (voluntarily)
view the MSNBC website and (voluntarily) respond to its on-line survey
with the opinions of the general public (based on a standard survey
asking the same questions). What would not be proper (as you
recognize and cf. the Ann Landers example) would be to assume that the
40,000 original respondents constituted a representative sample
of the general population.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question: In Problem Set #3A (Identifying
Variables), don't we sometimes need to include more than one unit of
analysis? For example, in question #4, wouldn't you need to
analyze both elections (to
see if they're competitive), and individuals (to find out Congressional
responsiveness)?
Answer: A very good question. The fact that
reduces this proposition to a single unit of analysis is that there is
a one-to-one correspondence between members of the House
and
House districts -- that is, each district has exactly one member and
each
member comes from exactly one district. Think about setting up an
(Excel
or SPSS) data array or spreadsheet for the data you would need to
assess
the empirical truth of statement #4, such that row of data corresponds
to
one case and each column to one variable. In turn, each case can
be
deemed to represent either one district (in which event the
variables are DEGREE OF COMPETITIVENESS [of each district] and DEGREE
OF RESPONSIVENESS OF MEMBER [from each district]) or
each case can be deemed to represent one member (in which event
the variables are DEGREE OF COMPETITIVENESS [of each member's district]
and DEGREE OF RESPONSIVENESS [of each member]). Either way you
think of it, it really comes to the same data arranged in the same same
way.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question: We are confused by a number terms.
Can you clarify?
Answer: RELIABILITY -- see Handout on
Measuring Variables; also Moore, pp. 168-171; and (with respect to
survey measurement and coding) Weisberg, pp. 94, 143-144. RAW
DATA/DATA
ARRAY (or Data Spreadsheet, etc.) -- illustrated by the Student
Survey Raw Data (Spreadsheet) that was distributed and that you used in
Problem Sets #5 (Q1&2), #6 (Q2&3), and #12, or by the
NES/SETUPS
data you see when you open SPSS. OBSERVATIONS
(or OBSERVED VALUES) refers simply to the entries
in
a data spreadsheet, e.g., we "observe" (e.g., on the basis of a
response
to a survey question) that the value of the variable PARTY ID in a
particular
case is "Weak Democrat." MISSING DATA --
where
we have failed, for one reason or another, to observe any value of a
variable,
e.g., the respondent failed to answer the question. In both
SETUPS
and Student Survey data, all missing data is coded is coded "9."
The
difference between (Unadjusted) Relative Frequencies and Adjusted
Relative
Frequencies ("Valid Percent" in SPSS) is that missing data is excluded
from
the latter calculations.
UNIVARIATE ANALYSIS -- is data analysis that involves
only ONE variable at a time (frequency distributions, histograms,
measures of central tendency and dispersion) as opposed to TWO
(bivariate) or more (multivariate) variables at a time
(crosstabulations, scattergrams, measures of association, regression
equations).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question: Referring to the charts for PS #7, we
seemed to conclude in class that the Bush chart has the smallest SD,
and the Perot chart has the largest. But then you said didn't
actually have the smallest SD. Also, I calculated the rough SD
for the Bush and Perot charts, and
the Perot SD is approx. 6.51, while the Bush one is about 15.68.
Answer: The general class discussion concluded
(once we established that the question pertains not to dispersion in
the height of the bars but to dispersion in the data represented by the
frequency bar charts) that dispersion was clearly greatest in the Perot
chart and seemed to be smallest in the Bush chart. I agreed with
the latter "eyeball" assessment but noted right at the end of the
period that the Bush chart turns out to have a slightly greater SD than
the Clinton chart.
I don't know how you reached them but your calculations for the Perot
and Bush SDs are way off. Consider the following. The range in
all three charts is 4, because in each the maximum observed value is 5
and the minimum is 1. Obviously we can't use the range as a
measure to compare and contrast
the dispersions in these three charts, so we need to turn to a more
informative
measure of dispersion, such as the SD.
Once we think out the logic of the formula, it should be clear that the
SD can't exceed half the range, so your estimated SDs of 6.51 for the
Perot chart and 15.68 for the Bush chart must be wrong. Remember that
the "building blocks" of the SD formula are the deviations from the
mean in each case. The largest positive deviation is the
deviation in the case that has the maximum
value and the largest negative deviation is the the deviation in the
case
that has the minimum value. Ignoring the minus sign of the latter
the
deviation, the sum of these two deviations is equal to the range, while
the
average of these two deviations is just half of that. Since by
definition
no other cases have larger deviations and some may have smaller
deviations, the average (and also the standard) deviation from mean
must be less than half the range if
there are any cases with values intermediate between the two extreme.
Let's proceed more step-by-step.
The range is the answer to this question: how far apart are the maximum
and minimum observed values in the data? (In this case, the answer is 4
ideological "points" or "steps.")
The mean deviation MD is the answer to this question: how far apart on
average are all the observed values from the mean value, i.e., what is
the average absolute deviation from the mean? For the
reason noted above, the MD cannot be larger than half the range --
indeed it can be that big only
when half the cases have identical maximum values and the other half
have
identical minimum values (maximum "polarization"). Otherwise, the
MD
clearly must be less half the range, usually much less. (In the
case
of Bush chart, the MD happens to be appproximately 1.)
The variance is the answer to this question: what is the average
squared deviation from the mean? The standard deviation SD, which is
the square root of the variance, is never less than and usually about
somewhat larger than the MD. In the special case of maximum
polarization described above, the SD (like the MD) is equal to half the
range. Otherwise the SD is less than half of the range (but
greater than the MD). (In the case of the Bush chart, the
variance is about 1.7 and the SD is about 1.3.)
The question did not ask you to actually calculate the SDs, but here is
how it would be done in the Bush case. We start with a relative
frequency table (like in Question 3), where the relative frequencies
are read off the bar chart and turned into decimal fractions. The
mean perceived Bush ideological position is 3.96 (calculated from the
relative frequences as described
in Handout #6 top of p. 3 and PS #6:A&D, Q2(c))--- let's round this
off
to 4 to simplify the arithmetic.
Values Rel.
Freqs. Deviations
Sq. Devs. x Freq. Abs. Devs. x
Freq.
1
.08
-3
9 x .08 =
.72 3
x .08 = .24
2
.08
-2
4 x .08 =
.32 2
x .08 = .16
3
.14
-1
1 x .14 =
.14 1
x .14 = .14
4
.20
0
0 x .20 =
.00 0 x
.20 = .00
5
.50
+1
1 x .50 =
.50 1
x .50 = .50
Total
1.00
*
var =
1.68
MD = 1.04
SD = 1.29
* The deviations sum to zero once they have been weighted by relative
frequency. If you go to the final column and put the minus signs
back in the first three rows, you see that the weighted deviations sum
to -0.04. (This is differs slightly from zero because we used an
approximation of the mean.)