*INTRODUCTION TO STATA, ECON611, UMBC, T. H. GINDLING, Fall, 2010 (using Stata10) *STATA CAN BE USED INTERACTIVELY OR RUN FROM AN OUTSIDE PROGRAM. *LET'S START INTERACTIVELY. *THE INTERACTIVE COMMANDS MUST BE WRITTEN IN THE STATA COMMAND WINDOW. *COMMENTS BEGIN WITH A * *IF THE LINE BEGINS WITH A * IT WILL NOT BE IMPLEMENTED *STATA COMMANDS must be in lower case letters. *THE FIRST THING YOU NEED TO DO IS TO CREATE A LOG FILE. IF YOU DO NOT, *THEN THERE WILL BE NO RECORD OF YOUR WORK!!!! log using e:\stata611.log *STATA PUTS ALL DATA IT USES IN MEMORY, AND YOU NEED TO MAKE SURE *THAT IT HAS ENOUGH MEMORY AVAILABLE FOR THE DATA YOU ARE TO USE. set memory 34m *INPUTING DATA: YOU MAY INPUT DATA DIRECTLY, INPUT DATA FROM AN EXTERNAL *FILE, OR USE THE DATA EDITOR. LET'S BEGIN BY INPUTING DIRECTLY. *USE THE "DO" FILE WINDOW *WITH "DO" THE COMMANDS SHOW UP IN THE RESULTS WINDOW *WITH "RUN" YOU DO NOT SEE THE COMMANDS IN THE RESULTS WINDOW input family person salary hours tall 1 1 10 5 4 1 2 20 5 5 2 1 30 6 5.5 2 2 30 7 5 end *OTHER WINDOWS *RESULTS WINDOW * PRINT RESULTS (ON FILE MENU) *VARIABLES WINDOW *REVIEW WINDOW (IF YOU CLICK ON A COMMAND IN ANY WINDOW, IT SHOWS UP IN *THE COMMAND WINDOW). *HELP MENU * CONTENTS * SEARCH * STATA COMMAND * WEB SITE: USER SUPPORT, RESOURCES AND CLASSES FOR LEARNING MORE, * tech-support@stata.com IS VERY GOOD. *DATA EDITOR *YOU CAN USE THE DATA EDITOR TO EXAMINE THE DATA, AND TO save data1.dta * WILL SAVE THE DATA AS DIR:\FNM.FTP (IT WILL * NOT WRITE OVER THE CURRENT DATA SET. *USE EXPLORE TO SEE LOCATION OF data1.dta *CHANGING THE DEFAULT DIRECTORY WHERE STATA LOOKS FOR AND WRITES DATA FILES, LOG FILES, *AND PROGRAM FILES CAN SAVE YOU SOME TYPING. dir e:\*.dta cd e:\ save data1 *TO OVER-WRITE AN EXISTING DATA FILE, YOU MUST USE save data1, replace dir e:\*.dta *YOU CANNOT INPUT 2 DATA SETS AT ONCE. YOU MUST CLEAR THE DATA SET YOU ARE *WORKING WITH FROM MEMORY BEFORE INPUTTING A NEW DATA SET. *NOTE THAT USING CLEAR WILL GET RID OF ANY CHANGES THAT YOU MADE TO THE DATA *SINCE THE LAST "SAVE." clear *INPUTING A STATA DATA SET FROM AN EXISTING FILE *(--THE DEFAULT FTP IS .DTA) use e:\data1.dta *YOU DO NOT NEED .dta. SINCE e:\ IS THE DEFAULT DIRECTORY, YOU DO NOT NEED e:\ clear use data1 *OR, YOU CAN USE THE "OPEN" COMMAND ON THE FILE MENU * EXAMINING THE DATA--, , , , describe list *FIRST, USE PULL-DOWN "STATISTICS" MENU *NOTE THAT IN THE RESULTS WINDOW YOU WILL BE SHOWN THE FORMAT OF THE COMMAND, *THIS IS USEFUL, BECAUSE YOU CAN USE IT TO SEE WHAT YOU NEED TO WRITE IN THE .DO FILES summarize salary summarize salary, detail summarize salary hours tabulate salary tabulate salary tall tabulate salary hours tabulate salary, summarize(hours) *CORRELLATION COEFFICIENTS corr salary tall *CREATING GRAPHS (AND PLOTS) *GRAPHING IS COMPLEX, HERE ARE SOME EXAMPLES *THE BASIC GRAPH IS A FREQUENCY DISTRIBUTION OR HISTOGRAM hist salary *OR, USE THE PULL-DOWN "GRAPHICS" MENU *YOU CAN SPECIFY THE NUMBER OF CATEGORIES (AT MOST 50) hist salary, bin(5) *SCATTER PLOTS twoway scatter salary tall *OR plot salary tall *LINE GRAPHS twoway line salary tall *I DO NOT LIKE THE WAY THAT GRAPH LOOKS. sort tall twoway line salary tall *YOU CAN PRINT THE GRAPH FROM THE GRAPH MENU *SAVE GRAPH FROM GRAPY MENU (SAVE GRAPH1) *OR YOU CAN USE THE COMMAND graph save graph1 *TO SEE GRAPH AGAIN graph use graph1 *YOU CAN ALSO USE TO GRAPH TWO VARIABLES *(STATA ALMOST ALWAYS GIVES YOU TWO OR MORE WAYS TO DO ANYTHING) plot salary tall *CREATING NEW VARIABLES--, , , gen wage=salary/hours l wage salary hours save, replace *TO DISCOVER WHAT YOU CAN DO WITH GEN, *LOOK IN THE HELP MENU, SEARCH FOR FUNCTIONS. gen big=1 *I DID NOT WANT TO DO THAT drop big gen big=0 replace big=1 if tall==5.5 *NOTE THE DOUBLE EQUALS SIGN AFTER THE "IF" STATEMENT l drop big gen big=0 replace big=1 if tall>5 l drop if family==3 l *, AND , sort person l sort family l by family: summarize wage *YOU CAN USE THE "by" COMMAND WITH MOST OTHER STATA COMMANDS ALSO, *SOMETIMES THE COMMAND GOES FIRST, SOMETIMES LAST, YOU NEED *TO CONSULT THE DOCUMENTATION. *USING "EGEN" PROVIDES ANOTHER WAY TO CALCULATE MEAN WAGE *BY FAMILY. EGEN SAVES THE VALUES WHILE SUMMARIZE DOES NOT. sort family egen msal=mean(wage), by(family) tabulate msal family *ANOTHER WAY TO DO THIS, USING "COLLAPSE" collapse (mean) wage, by(family) l *EGEN VS. GEN *EGEN CAN DO SUMS, MINIMUMUMS, MAXIMUMS, ETC. *SOMETIMES EGEN AND GEN HAVE THE SAME FUNCTION NAMES, *BE CAREFUL, EGEN AND GEN OFTEN DO DIFFERENT THINGS EVEN *THOUGH THE FUNCTION NAMES ARE THE SAME--READ THE MANUALS *BEFORE USING EGEN!!!! *EXAMPLE clear use data1 gen twage1=sum(wage) egen twage2=sum(wage) l twage1 twage2 clear *MORE COMMENTS ON DO-FILES *YOU CAN EDIT THE FILE USING ANY WORD *PROCESSOR, BUT YOU MUST SAVE IT AS A TEXT FILE. *A USEFUL FEATURE OF STATA--I CAN SAVE THE LOG FILE, EDIT IT, * AND THEN USE THE EDITED FILE AS A PROGRAM FILE. *WITHIN A PROGRAM FILE, *THE END OF THE LINE IS THE DEFAULT FOR THE END OF THE COMMAND-YOU SHOULD NOT *END YOUR COMMAND WITH A ";", AS YOU DID IN SAS. IF YOUR COMMAND GOES BEYOND THE *END OF THE LINE, YOU NEED TO SET A DIFFERENT END OF LINE DELIMITER. *FOR EXAMPLE, <# DELIM ;> WILL TELL STATA THAT THE CHARACTER ";" INDCATES THE *END OF A COMMAND. AFTER THIS COMMAND IS INPUT, YOU WILL NEED TO END EACH *COMMAND WITH A ";" (AS IN SAS). TO CHANGE BACK, USE THE COMMAND <# DELIM CR>. log close