--------------------------------------------------------------------------------------------------------------------------------------- log: e:\stata611.log log type: text opened on: 15 Sep 2008, 10:31:38 . do "C:\DOCUME~1\Tim's\LOCALS~1\Temp\STD01000000.tmp" . *INTRODUCTION TO STATA, ECON611, UMBC, T. H. GINDLING, Fall, 2008 (using Stata10) . . *STATA CAN BE USED INTERACTIVELY OR RUN FROM AN OUTSIDE PROGRAM. . *LET'S START INTERACTIVELY. . *THE INTERACTIVE COMMANDS MUST BE WRITTEN IN THE STATA COMMAND WINDOW. . . *COMMENTS BEGIN WITH A * . *IF THE LINE BEGINS WITH A * IT WILL NOT BE IMPLEMENTED . . *STATA COMMANDS must be in lower case letters. . . *THE FIRST THING YOU NEED TO DO IS TO CREATE A LOG FILE. IF YOU DO NOT, . *THEN THERE WILL BE NO RECORD OF YOUR WORK!!!! . . *log using e:\ECON611\stata611.log . . *STATA PUTS ALL DATA IT USES IN MEMORY, AND YOU NEED TO MAKE SURE . *THAT IT HAS ENOUGH MEMORY AVAILABLE FOR THE DATA YOU ARE TO USE. . . set memory 34m Current memory allocation current memory usage settable value description (1M = 1024k) -------------------------------------------------------------------- set maxvar 5000 max. variables allowed 1.909M set memory 34M max. data space 34.000M set matsize 400 max. RHS vars in models 1.254M ----------- 37.163M . . *INPUTING DATA: YOU MAY INPUT DATA DIRECTLY, INPUT DATA FROM AN EXTERNAL . *FILE, OR USE THE DATA EDITOR. LET'S BEGIN BY INPUTING DIRECTLY. . . input family person salary hours tall family person salary hours tall 1. 1 1 10 5 4 2. 1 2 20 5 5 3. 2 1 30 6 5.5 4. 2 2 30 7 5 5. end . . *WINDOWS . *COMMAND WINDOW . *RESULTS WINDOW . * PRINT RESULTS (ON FILE MENU) . *VARIABLES WINDOW . *REVIEW WINDOW (IF YOU CLICK ON A COMMAND IN ANY WINDOW, IT SHOWS UP IN . *THE COMMAND WINDOW). . *HELP MENU . * CONTENTS . * SEARCH . * STATA COMMAND . * WEB SITE: USER SUPPORT, RESOURCES AND CLASSES FOR LEARNING MORE, . * tech-support@stata.com IS VERY GOOD. . . *DATA EDITOR . *YOU CAN USE THE DATA EDITOR TO EXAMINE THE DATA, AND TO . *CHANGE DATA ON A CASE-BY-CASE BASIS . *"SORT" AND "PRESERVE" WITHIN THE DATA EDITOR ALLOW YOU TO SORT AND SAVE . *THE CHANGES YOU MAKE . * EDITING . * SORT . * DELETE . * RESTORE (an "undo" command) . * PRESERVE . . save data1.dta file data1.dta saved . . * WILL SAVE THE DATA AS DIR:\FNM.FTP (IT WILL . * NOT WRITE OVER THE CURRENT DATA SET. . . *USE EXPLORE TO SEE LOCATION OF data1.dta . . *CHANGING THE DEFAULT DIRECTORY WHERE STATA LOOKS FOR AND WRITES DATA FILES, LOG FILES, . *AND PROGRAM FILES CAN SAVE YOU SOME TYPING. . . dir e:\*.dta 486.1k 9/12/08 13:07 ps1.dta 1.2k 9/15/08 10:31 data1.dta 934.0k 9/12/08 12:01 march2008.dta 412.1k 7/12/04 16:31 ps12003.dta . . cd e:\ e:\ . save data1.dta . *TO OVER-WRITE AN EXISTING DATA FILE, YOU MUST USE . . save data1, replace file data1.dta saved . . dir e:\*.dta 486.1k 9/12/08 13:07 ps1.dta 1.2k 9/15/08 10:33 data1.dta 934.0k 9/12/08 12:01 march2008.dta 412.1k 7/12/04 16:31 ps12003.dta . . *YOU CANNOT INPUT 2 DATA SETS AT ONCE. YOU MUST CLEAR THE DATA SET YOU ARE . *WORKING WITH FROM MEMORY BEFORE INPUTTING A NEW DATA SET. . *NOTE THAT USING CLEAR WILL GET RID OF ANY CHANGES THAT YOU MADE TO THE DATA . *SINCE THE LAST "SAVE." . . clear . . . *INPUTING A STATA DATA SET FROM AN EXISTING FILE . *(--THE DEFAULT FTP IS .DTA) . . use e:\data1.dta . . *YOU DO NOT NEED .dta. SINCE f:\ IS THE DEFAULT DIRECTORY, YOU DO NOT NEED f:\ . . clear . use data1 . . *OR, YOU CAN USE THE "OPEN" COMMAND ON THE FILE MENU . . * EXAMINING THE DATA--, , , , . . clear . use data1 . . describe Contains data from data1.dta obs: 4 vars: 5 15 Sep 2008 10:33 size: 96 (99.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- family float %9.0g person float %9.0g salary float %9.0g hours float %9.0g tall float %9.0g ------------------------------------------------------------------------------- Sorted by: . . list +-----------------------------------------+ | family person salary hours tall | |-----------------------------------------| 1. | 1 1 10 5 4 | 2. | 1 2 20 5 5 | 3. | 2 1 30 6 5.5 | 4. | 2 2 30 7 5 | +-----------------------------------------+ . . summarize salary Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- salary | 4 22.5 9.574271 10 30 . summarize salary hours Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- salary | 4 22.5 9.574271 10 30 hours | 4 5.75 .9574271 5 7 . summarize salary, detail salary ------------------------------------------------------------- Percentiles Smallest 1% 10 10 5% 10 20 10% 10 30 Obs 4 25% 15 30 Sum of Wgt. 4 50% 25 Mean 22.5 Largest Std. Dev. 9.574271 75% 30 10 90% 30 20 Variance 91.66667 95% 30 30 Skewness -.4933822 99% 30 30 Kurtosis 1.628099 . . tabulate salary salary | Freq. Percent Cum. ------------+----------------------------------- 10 | 1 25.00 25.00 20 | 1 25.00 50.00 30 | 2 50.00 100.00 ------------+----------------------------------- Total | 4 100.00 . tabulate salary tall | tall salary | 4 5 5.5 | Total -----------+---------------------------------+---------- 10 | 1 0 0 | 1 20 | 0 1 0 | 1 30 | 0 1 1 | 2 -----------+---------------------------------+---------- Total | 1 2 1 | 4 . tabulate salary hours | hours salary | 5 6 7 | Total -----------+---------------------------------+---------- 10 | 1 0 0 | 1 20 | 1 0 0 | 1 30 | 0 1 1 | 2 -----------+---------------------------------+---------- Total | 2 1 1 | 4 . . *CORRELLATION COEFFICIENTS . corr salary tall (obs=4) | salary tall -------------+------------------ salary | 1.0000 tall | 0.8992 1.0000 . . *CREATING GRAPHS (AND PLOTS) . *GRAPHING IS COMPLEX, HERE ARE SOME EXAMPLES . . *THE BASIC GRAPH IS A FREQUENCY DISTRIBUTION OR HISTOGRAM . . hist salary (bin=2, start=10, width=10) . . *YOU CAN SPECIFY THE NUMBER OF CATEGORIES (AT MOST 50) . . hist salary, bin(2) (bin=2, start=10, width=10) . . *SCATTER PLOTS . . twoway scatter salary tall . *OR . plot salary tall 30 + | * * | | | | | s | a | l | a | r | * y | | | | | | | | 10 + * +----------------------------------------------------------------+ 4 tall 5.5 . . *LINE GRAPHS . twoway line salary tall . *I DO NOT LIKE THE WAY THAT GRAPH LOOKS. . sort tall . twoway line salary tall . . *YOU CAN PRINT THE GRAPH FROM THE GRAPH MENU . *SAVE GRAPH FROM GRAPY MENU (SAVE GRAPH1) . *OR YOU CAN USE THE COMMAND . . graph save graph1 (file graph1.gph saved) . . *TO SEE GRAPH AGAIN . graph use graph1 . . *YOU CAN ALSO USE TO GRAPH TWO VARIABLES . plot salary tall 30 + | * * | | | | | s | a | l | a | r | * y | | | | | | | | 10 + * +----------------------------------------------------------------+ 4 tall 5.5 . . . *CREATING NEW VARIABLES--, , , . . gen wage=salary/hours . l wage salary hours +---------------------------+ | wage salary hours | |---------------------------| 1. | 2 10 5 | 2. | 4.285714 30 7 | 3. | 4 20 5 | 4. | 5 30 6 | +---------------------------+ . . *TO DISCOVER WHAT YOU CAN DO WITH GEN, . *LOOK IN THE HELP MENU, SEARCH FOR FUNCTIONS. . . gen big=1 . *I DID NOT WANT TO DO THAT . drop big . . gen big=0 . replace big=1 if tall==5.5 (1 real change made) . . *NOTE THE DOUBLE EQUALS SIGN AFTER THE "IF" STATEMENT . l +----------------------------------------------------------+ | family person salary hours tall wage big | |----------------------------------------------------------| 1. | 1 1 10 5 4 2 0 | 2. | 2 2 30 7 5 4.285714 0 | 3. | 1 2 20 5 5 4 0 | 4. | 2 1 30 6 5.5 5 1 | +----------------------------------------------------------+ . . drop big . gen big=0 . replace big=1 if tall>5 (1 real change made) . l +----------------------------------------------------------+ | family person salary hours tall wage big | |----------------------------------------------------------| 1. | 1 1 10 5 4 2 0 | 2. | 2 2 30 7 5 4.285714 0 | 3. | 1 2 20 5 5 4 0 | 4. | 2 1 30 6 5.5 5 1 | +----------------------------------------------------------+ . . drop if family==3 (0 observations deleted) . l +----------------------------------------------------------+ | family person salary hours tall wage big | |----------------------------------------------------------| 1. | 1 1 10 5 4 2 0 | 2. | 2 2 30 7 5 4.285714 0 | 3. | 1 2 20 5 5 4 0 | 4. | 2 1 30 6 5.5 5 1 | +----------------------------------------------------------+ . . *, AND , . . sort person . l +----------------------------------------------------------+ | family person salary hours tall wage big | |----------------------------------------------------------| 1. | 1 1 10 5 4 2 0 | 2. | 2 1 30 6 5.5 5 1 | 3. | 2 2 30 7 5 4.285714 0 | 4. | 1 2 20 5 5 4 0 | +----------------------------------------------------------+ . sort family . l +----------------------------------------------------------+ | family person salary hours tall wage big | |----------------------------------------------------------| 1. | 1 2 20 5 5 4 0 | 2. | 1 1 10 5 4 2 0 | 3. | 2 2 30 7 5 4.285714 0 | 4. | 2 1 30 6 5.5 5 1 | +----------------------------------------------------------+ . . by family: summarize wage --------------------------------------------------------------------------------------------------------------------------------------- -> family = 1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- wage | 2 3 1.414214 2 4 --------------------------------------------------------------------------------------------------------------------------------------- -> family = 2 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- wage | 2 4.642857 .5050764 4.285714 5 . . *YOU CAN USE THE "by" COMMAND WITH MOST OTHER STATA COMMANDS ALSO, . *SOMETIMES THE COMMAND GOES FIRST, SOMETIMES LAST, YOU NEED . *TO CONSULT THE DOCUMENTATION. . . . *USING "EGEN" PROVIDES ANOTHER WAY TO CALCULATE MEAN WAGE . *BY FAMILY. EGEN SAVES THE VALUES WHILE SUMMARIZE DOES NOT. . . sort family . . egen msal=mean(wage), by(family) . tabulate msal family | family msal | 1 2 | Total -----------+----------------------+---------- 3 | 2 0 | 2 4.642857 | 0 2 | 2 -----------+----------------------+---------- Total | 2 2 | 4 . . *ANOTHER WAY TO DO THIS, USING "COLLAPSE" . . collapse (mean) wage, by(family) . l +-------------------+ | family wage | |-------------------| 1. | 1 3 | 2. | 2 4.642857 | +-------------------+ . . *EGEN VS. GEN . *EGEN CAN DO SUMS, MINIMUMUMS, MAXIMUMS, ETC. . *SOMETIMES EGEN AND GEN HAVE THE SAME FUNCTION NAMES, . *BE CAREFUL, EGEN AND GEN OFTEN DO DIFFERENT THINGS EVEN . *THOUGH THE FUNCTION NAMES ARE THE SAME--READ THE MANUALS . *BEFORE USING EGEN!!!! . *EXAMPLE . . gen twage1=sum(wage) . egen twage2=sum(wage) . l twage1 twage2 +---------------------+ | twage1 twage2 | |---------------------| 1. | 3 7.642857 | 2. | 7.642857 7.642857 | +---------------------+ . clear . . *RUNNING A PROGRAM FROM AN EXTERNAL FILE (A .do FILE). . *OPEN DO-FILE EDITOR . *COPY AND PASTE: . use data1 . gen small=1 . replace small=0 if tall>5.5 (0 real changes made) . tabulate small tall | tall small | 4 5 5.5 | Total -----------+---------------------------------+---------- 1 | 1 2 1 | 4 -----------+---------------------------------+---------- Total | 1 2 1 | 4 . . *DO AND RUN WILL BOTH RUN COMMANDS, . *WITH DO THE COMMANDS SHOW UP IN THE RESULTS WINDOW, . *WITH RUN THE RESULTS DO NOT SHOW UP IN THE RESULTS WINDOW . . *SAVE THIS FILE AS . *EXIT DO-FILE EDITOR . *YOU CAN RUN A DO-FILE DIRECTLY FROM THE COMMAND MENU . . do e:\ECON611\test.do . use data1 no; data in memory would be lost r(4); end of do-file r(4); end of do-file r(4); . do "C:\DOCUME~1\Tim's\LOCALS~1\Temp\STD01000000.tmp" . *OOPS. STATA WILL NOT LET YOU EXIT WITH DATA IN MEMORY. . . clear . end of do-file . *Let's look at the data for problem set #1 . use "E:\ECON611\ps1.dta", clear . describe Contains data from E:\ECON611\ps1.dta obs: 9,479 vars: 13 12 Sep 2008 13:07 size: 530,824 (98.5% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- GEDIV float %18.0g FM4X Geography - Recode-Census Division FOWNU18 float %21.0g FM5X Family recode, number own child under 18 A_MJOCC float %49.0g FM6X Indus.&Occ.-(main job) occupation, major groups - recode A_AGE float %9.0g Demographics, Age A_GRSWK float %40.0g FM8X Current job, Earnings, usual weekly amount A_HGA float %41.0g FM9X Demographics, Educational attainment A_MARITL float %19.0g FM10X Demographics, Marital status A_SEX float %9.0g FM11X Demographics, Sex FTOTVAL float %9.0g Total income amount - Family PMHRUSLT float %35.0g FM13X Current job, Hours usually worked - Person weight float %9.0g Weight, March supplement - Person PRDTRACE float %36.0g FM21X Demographics- race of respondent edyears float %9.0g ------------------------------------------------------------------------------- Sorted by: . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- GEDIV | 9479 5.051904 2.594581 1 9 FOWNU18 | 9479 .709041 1.05211 0 9 A_MJOCC | 9479 4.171854 2.837943 1 10 A_AGE | 9479 42.95664 10.86167 24 65 A_GRSWK | 9479 940.5844 575.1104 2 2885 -------------+-------------------------------------------------------- A_HGA | 9479 40.80198 2.505224 31 46 A_MARITL | 9479 2.92341 2.543033 1 7 A_SEX | 9479 1.443929 .4968723 1 2 FTOTVAL | 9479 84272.85 69481.84 -9999 884845 PMHRUSLT | 9479 43.40753 7.170774 40 142 -------------+-------------------------------------------------------- weight | 9479 1599.851 1028.367 99.72 10380.99 PRDTRACE | 9479 1.385484 1.203736 1 18 edyears | 9479 14.10043 2.636811 0 20 . clear . exit