1. Creating a Dataset

(return to SPSS Guide home)

This page will give you information on how to deal with your dataset - how to set it up, how to create new variables and change existing ones, how to exclude data, and other useful things that you can do before you run analyses. Contents include:

Basic Creation (variable view, data view, inserting new cases and variables, importing files)

Manipulating Variables (computing formulas, "if/then" variables computations, missing variables)

Managing Your Dataset (sort cases, merging data files, excluding data)

Basic Creation

To create a new dataset, start in the FILE menu, pull the mouse over to NEW, then select DATA (alternatively, press apple-N). The data editor (pictured in two diagrams below) will open. There are two important parts of the data editor, listed on the tabs at the bottom: the Variable View (Figure 1.1) and the Data View (Figure 1.3).

Figure 1.1 Variable View

Variable View

The variable view is used to define all of your different variables. This view contains several columns, but the ones you need to worry about the most are: Name (what you want the variable to be called), Type (what kind of information SPSS should expect for the variable, "numeric"/numbers and "string"/letters are the ones you will probably use the most), and Values (important for when you use categorical variables like experimental condition or gender). Another useful column is Label, where you can describe what the variable is in more space than its name; it's up to you whether to use this or not.

Imagine that you wanted to do an analysis with data looking at the relationship between GPA and Gender. You would take the following steps for each one:

1.) GPA

a) Name the variable. Start by entering "GPA" on the first line of the NAME column. Now the variable is called "GPA".

b) Specify type. Next, click on the TYPE column (it will choose "Numeric" by default). Click on the box that appears next to"Numeric" to view the other variable types. In this case, you want "Numeric", though, so don't change it.

c) Specify width and decimals. The next column is WIDTH, which controls how many characters each cell will hold. For example, if you know that no GPA is going to be longer than 3 digits, you could specify "3" as the width. The default length is "8", which would allow you to type a number that's 8 digits long without SPSS doing anything funky to it (like converting it to scientific notation). It is not terribly important to specify WIDTH. Next is the DECIMALS column, where you specify how many decimal places you would like SPSS to store (the default is 2). Any decimal places you enter over the limit will be rounded. For the sake of simplicity, lets argue what we want GPA entered to 2 decimal places (example: 3.32). We don't have to change anything about DECIMALS because 2 has already been selected by default.

d) Specify values. GPA is not a variable that you need to specify categories for; you can leave this column alone. There are other columns, but don't worry about them for now. We have set all of the necessary parameters in order for us to enter GPA data.

2.) Gender

a) Name the variable. Enter "Gender" on the second line of the NAME column ("GPA" should be on the first line).

b) Specify type. Next, because gender is not a number but rather a category represented by two strings ("male" and "female"), choose "String" in the TYPE column.

c) Specify width and decimals. Ignore WIDTH and DECIMALS, as they are not relevant for this kind of variable, and move on to the VALUES column. In the

d) Specify values. VALUES column, you will need to specify what groups exist in your variable. Start by clicking in the VALUES cell for your current row. A grey box should appear on the right side of the column, just as it did when you chose variable type. Click on the box and a window will open (Figure 1.2, below).

Figure 1.2 Labeling Values in the VALUES Column

In the case of the variable "Gender", you have two groups: "Male" and "Female". You must also assign a number, letter or word to each of these groups (for example, you could make "Male" = 1 and "Female" = 2, or "Male" = m and "Female" = f). Once you have worked that out, you are ready to enter your value labels. Put the number "1" in the space next to "Value". Then enter "Male" in the space next to "Value Label." Click on the "Add" button (or press return), and "1 = 'Male'" will appear in the larger space below. Do the same for female, then click on "OK". You're done defining your category!

Data View

If you've been following along so far, you should see two labeled columns when you click onto the "Data View" tab. Now that your variables have been set up in the variable view, you can enter your data. In the GPA column, you enter each participant's GPA in a separate cell and move down vertically. You do the same for gender, entering "1" for male and "2" for female (as we specified earlier in the VALUES column of the variable view). Note that each line contains the data for a particular participant, so in Figure 2.3 participant 1 (on line 1) has a 3.32 GPA and is male (1), while participant 2 has a 4.0 GPA and is female (2).

Figure 1.3 Data View

Inserting new cases and variables between existing cases or variables

1) In the "Data View", select any cell in the row below the position where you wish to insert the new case or select any cell in the variable to the right of the position where you wish to insert a new variable.

2) From the "Data" menu, click on "Insert case" or "Insert Variable."

Importing Files

SPSS can import files from less complex programs such as Excel or any plain text editor. When you do this, keep in mind that SPSS evaluates the incoming data and applies default attributes that seem appropriate to it. You can edit these attributes when importing text files.

For Excel files:

1) Under the "File" menu, click "Open" and then "Data...".

2) Select "Excel" in the "Enable" box.

3) Click on the Excel file you wish to open.

4) Keep in mind that each column is considered a variable and that blank cells are converted to missing data (indicated by a period). If you choose to read the first row of the Excel file as variable names, these names will be transferred into SPSS as the variable names. If not, SPSS creates default variable names.

5) Be careful when importing data from Excel to make sure that your variables are defined how you want (SPSS might think it's smarter than it actually is and define your data as the wrong type).

For text files:

1) Select "Read Text Data" from the "File" menu.

2) Next, select your text file from the "Open file" window.

3) This will open the text import wizard (see Figure 1.4) which will guide you through the steps to specify the attributes ofyour text file so that it can be imported correctly.

Figure 1.4 Text Import Wizard

Manipulating Variables

Computing formulas - creating new variables from old ones

Once you have entered your data, you may wish to compute new variables for analyis using your raw data. For example, you might want the mean score across several questions for one partiicipant or you may want to conduct a log transformation on your data.

1) From the "Transform" menu, select "Compute."

2) You will see a window similar to Figure 1.5, where the list on the left is the list of variables in the data set and the list on the right is the list of mathematical functions recognized by SPSS. To compute a mean, one could use the function "MEAN(variable1, variable2...)" from the list on the right, or use the calculator in the center of the screen. Be sure to name your new variable in the upper left corner. Here, we are computing the difference between the number of hours people spend working during finals week (t4) and during the second week of classes (t1); this data is fictional.

Figure 1.5 Compute Variable Window

3) Once you have completed your new variable, SPSS will compute it for all data rows and place the results in the first unused column (see Figure 1.6). Make sure to name your variable something meaningful so you can remember how to recreate the computed variable, or describe it in the Label field of the variable. Furthermore, SPSS does not update computed variables when the data set changes. Make sure your formula is correct and that your data set is complete before you compute a formula. In the event that you make changes to your raw data, you must also recreate any computed variables.

Figure 1.6 Added Computed Variable in Data View

"If/then" variables computations

Sometimes you may wish to include only data that fits a certain criteria in your new variable. One common way to do this is by using the if/then variable computation feature.

1) From the "Transform" menu, select "Compute" and click on the "If" button (see Figure 1.5).

2) In the "If cases" box, click on "Include if case satisfies conditions".

3) Next, write an expression that suits your purpose using the calculator and the lists of variables and functions. For example, in Figure 1.6 we will compute a difference between t3 and t1 only if the participant did 4 or less hours of work at t1; here we use <= for "less than or equal to." After this information has been entered, click "Continue" to return to the familar "Compute variable" window, where you will enter the computation for the new variable.

Figure 1.7 The "If Cases" window

4) As can be seen in Figure 1.8, only those participants who fufilled the requirements of the if/then clause were included in the variable computation for the last five questions.

Figure 1.8 Added if/then computed variable

Missing variables

When attempting to create a sum or a mean of several raw data scores, one must be careful of missing variables in the raw data that can affect later analysis. SPSS offers an easy way to fill in data using the surrounding information.

1) Select "Replace Missing Values" from the "Transform" menu.

2) Choose the variable for which you want to replace missing values.

3) Choose the method of replacement, most often the "Series Mean", and click "OK".

Managing Your Dataset

Sort cases

It is often useful to organize your data by different variables.

1) From the "Data" Menu, choose "Sort cases."

2) As seen in Figure 1.9, you can choose which variables by which you will sort your data. Simply select the desired variables and click on the arrow in the center of the screen. Press "OK".

Figure 1.9 Sort Cases Window

Merging data files

Sometimes you may need to combine different datasets. This could happen when you are working on the same project at different computers, when you are working in groups, or when you have various datasets collected at different times that you want to merge into one. SPSS makes this process quite easy.

1) Open one of the datasets you wish to combine.

2) Select "Merge Files" from the "Data" menu.

3) Select "Add Cases."

4) In the "Add Cases From" Selection Windown, SPSS will require you to select the dataset you wish to combine with the one you have open.

5) Next, as seen in Figure 1.10, SPSS will identify which variables exist in both datasets and list them in the right box. Variables that only exist in one dataset are listed in the left box. You can pair these variables using the "pair" button in the center of the window.

Figure 1.10 Merging Data Files Window

6) You may find it useful to check the "Indicate case source as variable" box. This feature allows you to create a new variable which identifies the initial dataset from which each case originated. When you check this box, the text box below will allow you to enter source names, for example "source01". If you create this variable, you will be able to sort your combined dataset by the initial source using the "sort cases" funtion (see above).

Excluding data using "select cases"

Sometimes you may want to conduct an analysis that only includes a certain selection of your entire dataset. You can do this using the "Select cases" feature.

1) Select "Select cases" from the "Data" menu.

2) In the "Select cases" window (Figure 1.11) you can choose whether to have all cases selected (the default), or to set a range of cases, a random sample of cases, or a set specified by a formula ("If condition is satisfied"). You can also decide whether to delete those cases which are not selected or to simply filter them out (the latter is recommended in many situations).

Figure 1.11 Select Cases Window

3) Click on the second option and then click on the "If" button.

4) In the "Select Cases: If" Window (Figure 1.12), you can specify the condtions which must be fufilled for each case to be included in the new subgroup. For example, below we have specified that only those cases in which t3 is greater than 4 will be included in the new group. When you have completed your selection criteria, click on "Continue."

Figure 1.12 Select Cases: If

5) Back at the "Select Cases" Screen, click "OK".
You should notice that the numbers to the left of the cases that were not included in your new condition are crossed out, as shown in Figure 1.13. They will now be excluded from analyses until you include them again.

Figure 1.13 Excluded Cases

Important Note: When you save a data set with a selection filter in place, it will not be active when you load the dataset next time. However, you can easily access it as follows:

1) Go back into the "select cases" window.

2) Click on the "Use filter variable" option (see Figure 1.11)

3) Your old filter will be listed at the bottom of the list on the left. Select it and press the arrow under "Use filter variable" (see Figure 1.14).

Figure 1.14 Selecting a case filter