StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Factor and Principal Component Analysis - Assignment Example

Summary
The paper "Factor and Principal Component Analysis" is a wonderful example of an assignment on sociology. Factor analysis is a method for investigating whether a number of variables of interest are linearly related to a smaller number of unobservable factors; where the parameters of these linear functions are referred to as loadings…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER95.6% of users find it useful

Extract of sample "Factor and Principal Component Analysis"

SECTION A 1. Factor analysis is a method for investigating whether a number of variables of interest are linearly related to a smaller number of unobservable factors; where the parameters of these linear functions are referred to as loadings. Primary Factor analysis aids researchers in two fundamental functions of data analysis by identifying underlying dimensions or constructs in the data; and to reduce the number of variables by eliminating redundancy. Therefore, one is to identify and study the underlying constructs to test if the items of a scale measure or are indicators the same construct. The second role is to reduce the number of variables to a more manageable set. It is used as an exploratory technique when researchers want to summarise the structure of a set of variables. It is a method of transforming the original variables into new, non-correlated variables, called factors. 2. Principal Component analysis aims to summarize information in a larger set of variables into fewer factors. It therefore is used for the purpose of scale reduction. Conceptually, principal component analysis is based on the total information in each variable. Therefore, its sole objective is to generate a first factor that will have the maximum explained variance. Principal components then locate a second factor (after fixing the first factor and its associated loadings) that maximizes the explained variance. This procedure continues until there are as many factors generated as there are variables or until the analyst concludes that the number of useful factors has been exhausted. 3. Communality is the percentage of a variable’s variance that contributes to the correlation with the other variables or is ‘common’ to other variables. It is obtained as one of the major outputs of factor analysis. Variables which have higher communalities means that their variation is represented fairly completely by the extracted factors, whereas the variables which have lower communalities have their variation incompletely represented by the extracted factors. 4. In PFA (or principal axis factoring) [unlike principal component analysis (PCA) which requires the eigen values to ultimately account for all of the variance and therefore for a maximal amount of variance of observed variables], factors account for common variance in the data and not maximal amount of variance. Finally, the cumulative proportion must equal 100 %, so the individual variables are less than 100 % so as to sum total later into 100%. 5. The central question of ‘how many factors should be extracted’ in simple simple terms can be identified (a) Theoretically: As a thumb rule, all the included factors (prior to rotation) must explain at least as much variance as an ‘average variable’. Also, a large drop in the variance explained between two factors signal the introduction of meaningless and relatively unimportant factors and therefore, should not be taken into account. (b) Eigen value criteria: Factors with eigen values greater than 1.0 are retained. (c) Scree plot criteria: The factors in the scree plot which have an eigen value of more than one are extracted. (d) Percentage of variance criteria: The factors extracted are decided on the basis of a satisfactory level of cumulative percentage of variance extracted by the factors. A general criterion of 60-70 % is quite common. (e) Significance test criteria: In this we retain only those factors that are statistically significant after determining the statistical significance of the separate eigen values. 6. An eigen value represents the amount of variance in the original variables that is associated with a factor. In other words, the sum of the square of the factor loadings of each variable on a factor represents the eigen value, or the total variance explained by that factor. Hence, only eigen values greater than 1.0 are included. A factor with an eigen value less than 1.0 is no better than a single variable, since, due to standardization, each variable has a variance of 1.0. 7. Under extraction of factors retained for rotation can have deleterious effect on the results. Certain factors which would have contributed to the overall solution may be discarded and hence the scale construct would fail to measure the results accurately or completely. Therefore, the extracted factors would in a way, under represent the data and lead to false analysis and interpretations. 8. Structure of the solution: The initial solution obtains as also represented in Output 1 shows that the loadings of each variable with respect to the various factors or components extracted are ambiguous and not easily interpretable. This is so because the pattern of results indicates that certain variables load highly onto more than one factor. For example, the variables B1, B3, B7, B5, B6 have a lot of cross-loadings which make the interpretations difficult and therefore, the structure obtained is not simple. Output 1 9. After rotation of the above initially extracted solution using Varimax rotation, Output 2 was obtained. This matrix is a lot more simplified than the previous one with a new set of loadings, which fit the data equally, well and is more easily interpretable. Output 2 10. The extracted solution after rotation fits the data moderately well to a certain extent as can also be inferred from the Output 3 below. The extracted solution explains 55.957 % of the data. Although, a good number to be considered in such cases is something above 60%. Output 3 11. Based on the conducted factor analysis and the resultant rotated matrix obtained (Output 2), the decision to include items B2, B4, B7, B8 and B9 can be made. This is so because, the total variance explained although indicates 3 factors to be extracted, there is a sharp drop in the eigen values of first and the second factor, which gives us the indication of ignoring the last two factors. Secondly, items B2, B4, B7, B8 and B9 are unambiguously loading into the first factor and are therefore taken into the scale. 12. Test for internal reliability: (a) For Scale SOP (Self-oriented perfectionism): Items of the scale are a1 a3 a4r a7r a9 a10 a11 a13 a15 a17 a20 a22r a24r a27 a29 The output 4 obtained shows the cronbach alpha value for this scale as 0.8230 which indicates a very high reliability of this scale. However, on deleting the item a7r, the reliability increases even further as also seen in output 4. Therefore, this item may be considered for deletion (although not necessarily due to the already high value of alpha). Output 4 (b) For Scale SPP (Socially prescribed perfectionism): Items of the scale are a2 a5r a6 a8 a12 a14r a16 a18r a19 a21 a23 a25r a26 a28 a30r The output 5 obtained shows the cronbach alpha value for this scale as 0.6551 which also indicates a good reliability of this scale. However, on deleting the item a6, the reliability increases even further as also seen in output 5. Therefore, this item may be considered for deletion. Output 5 (c) For Scale Satisfaction: Items of the scale are B2, B4, B7, B8, B9 The output 6 obtained shows the cronbach alpha value for this scale as 0.6299 which indicates a good reliability of this scale. However, on deleting the item B7, the reliability increases even further as also seen in output 6. Therefore, this item may be considered for deletion. Output 6 (d) For Scale Burnout: Items of the scale are C1, C2, C3, C4, C5 The output 7 obtained shows the cronbach alpha value for this scale as 0.6818 which indicates a good reliability of this scale. However, on deleting the item c1 and c5; the reliability increases considerably as also seen in output 7. Therefore, this item may be considered for deletion. Output 7 SECTION B 1. The appropriate test to assess the homogeneity of the responses between males and females would be Multiple Analysis of Variance (MANOVA). Manova is appropriate in this case because of the involvement of multiple (four) dependent variables like Self-oriented perfectionism (sop), Socially prescribed perfectionism (spp), satisfaction (sat) and burnout (bo) with males and females taken as independent variables. 2. The assumptions for the test of Manova are: (a) Cell sizes – It is necessary to have more subjects in each cell than the number of dependent variables. (b) Univariate and multivariate normality (c) Linearity (d) Homogeneity of regression (e) Homogeneity of variance-covariance matrix (f) Multicollinearity and singularity 3. Examination of test assumptions: I. The cell sizes are approximately equal. Thus this assumption is not violated. II. A) Univariate Normality: The skewness and kurtosis value (Output 8) of each of the four variables are below the absolute value of 1 and 3 and thus, no serious non normality. Also, the histograms (Output 9) of each of these dependent variables are nearly bell shaped indicating normality; the Normal plot (Output 10) shows all the values arranging themselves along the line; and the quartile plot (Output 11) has the mean lines approximately at the centre to indicate normality. Output 8 Output 9 Output 10 Output 11 B) Multivariate Outliers: The mahalanobis distance calculated when compared with the critical value of chi square for four dependent variables, at an alpha value of .001, which is 18.5; indicates no outliers. III. Linearity: The scatter plots among the pairs of dependent variables confirm linearity for example see Output 12 between self oriented perfectionism and satisfaction. Output 12 IV. Homogeneity of regression: Since there is no expected order of preference between the dependent variables, the step-down analysis is not required. And thus, no need to test this assumption. V. The other two assumptions are tested in Manova analysis itself. 4. Box’s M test is used to assess the multivariate homogeneity of the variance covariance matrices. For the assumption of homogeneity of variance covariance of matrices to be met, this test should be non significant at alpha value of .001. This alpha level of alpha value is used because the Box’s M test is very sensitive. 5. The four major multivariate test statistics are: Multiple Regression: It helps in estimating dependence between variables. However, its weakness is that it can assess only one dependent variable at a time. Multiple Analysis of Variance (Manova): It is use for the assessment of dependence of multiple variables. Its strength is that it can take the assessment of multiple dependent variables at a time. Factor Analysis: It is used for analysis of the structure of variables and their interdependence. If the variables are interdependent, only then will this technique be useful. Multidimensional Scaling (MDS): Its strength is its analysis of the positioning of variables in an interpretable, multidimensional space. In this case, as is evident from the question, we need to analyse the effect of the independent variable (Gender) on four dependent variables. Hence, this is a dependence analysis and with multiple dependent variables. Hence, a MANOVA (multiple analysis of variance ) is the most appropriate test in this case. 6. Inferences from the MANOVA analysis conducted: There is a non significant test for Box’s M, F= 1.701at p = 0.074 > 0.05 (Output 13) and thus the data has homogeneity of variance. The univariate tests for homogeneity of variance for each of the dependent measures indicate that the homogeneity of variance has not been violated for socially prescribed perfectionism (SPP) and burnout (BO). However, for self oriented perfectionism (SOP) and Satisfaction (SAT), the levene’s test of equality of error variances is significant at 0.003 and 0.000 (p 0.05). Therefore, there is no need to look further into the univariate / between subjects effects. Thus, it can be said that there is no difference between the male and female athletes in measuring the variables of perfectionism, burnout and satisfaction. Output 15 7. The aim of the analysis here was to analyse whether there is any difference between male and female athletes when measuring the variables of perfectionism, burnout and satisfaction. The multivariate statistical analysis of variance (manova) was employed in this case due to the dependence test involving four dependent variables here. The preliminary analyses conducted indicated that the variables are sufficiently normal and have no multivariate outliers. The Box’s M test was non significant (F= 1.701at p = 0.074 > 0.05) and thus indicated homogeneity of variance. Also the levene’s test was non significant for two variables but significant (significance value at 0.003 and 0.000 (p 0.05)) multivariate tests confirm our initial hypothesis thus verifying that ‘there is no difference between the male and female athletes in measuring the variables of perfectionism, burnout and satisfaction.’ SECTION C 1. A moderator is a variable that alters the direction or strength of the relation between a predictor and an outcome. Moderating variable has an interaction effect in which the effect of one variable depends on the level of another. Therefore, moderator is a third variable, on which depends the relationship between the first two variables. 2. An example in case of this research could be the moderating effect of levels of satisfaction among athletes on the relationship between perfectionism and burnout among athletes. Here, the effect of perfectionism in predicting the level of burnout in athletes can be different for those who have high and those who have low levels of satisfaction. 3. The type of moderation in the above case can often be defined as an ‘Buffering interaction’ because more the satisfaction, the effect of predictor variable perfectionism to cause outcome variable burnout will be weakened. 4. Interactions can be assessed using both Anova and regression. However, in cases of categorical variables as both predictor and moderator, multiple regression is preferred because of the flexibility in options it provides for coding categorical variables. Also, when both are continuous, regression procedures that retain the continuous nature re preferred over Anova. 5. In a normal regression, to identify a set of significant predictors from a larger set of predictor variables, a stepwise regression model is the most appropriate. This is so because it allows for the later removal of variables that were previously entered. 6. For moderation to have taken place, certain factors or conditions need to be fulfilled so as to maintain the power of test of interactions. Rely on theory when planning moderator analysis Use of an experimental design whenever appropriate Determine and obtain the sample size needed to achieve adequate power based on estimated effect sizes and other factors, Attempt to collect equal numbers of participants for different levels of a categorical variable, test the homogeneity of error variance assumption and use appropriate tests if it is violated, Choose highly reliable continuous variables, obtain measures of continuous pre-dictor and moderator variables that are normally distributed, and use outcome measures that are both reliable and sufficiently sen-sitive (i.e., have enough scale points to capture the interaction) In addition to variables needed to test a moderator effect, some researchers may want to consider including covariates to control for the effects of other variables, increase the overall R2 to increase power, or estimate change in an outcome variable over time. 7. A number of assumptions underpin the usage of regression: (a) Ratio of cases to independent variables- It ideally should have 29 times more cases than predictors for standard or hierarchical and even more for stepwise regression. (b) Outliers – Both univariate as well as bivariate outliers should be deleted. (c) Multicollinearity and singularity – These should be checked for the correlations among independent variables. (d) Normality, linearity, homoscedasticity and independence of residuals 8. Test assumptions: First of all, the number of cases are way more than five times the independent variables and thus this assumption is met. Because no univariate outliers were found, casewise plots were not necessary. If they had been produced, then these plots would have identified outlying cases with standard deviations greater than three. From the scatterplot of residuals against predicted values, for example Output 16, no clear relationship between the residuals and the predicted values is consistent with the assumption of linearity. Output 16 An examination of mahalanobis distance values indicates that there are only three multivariate outliers among the independent variables which is not a very significant number; that is, not many values are greater than or equal to the critical chi square value of 18.47 at an alpha of .001. The output 17 below confirms a bell shaped normal curve. The normal plot of regression standardised residuals (Output 18) for the dependent variable also indicates a relatively normal distribution. Output 17 Output 18 9. Multicollinearity refers to high correlations among the independent variables. An extended case of multicollinearity is singularity which occurs when perfect correlations exist among the independent variables. These problems affect the interpretations of relationships between predictor (independent) variables and the dependent variables. 10. In moderated regressions, a very high possibility of multicollinearity existence is expected because the predictor and moderator variables generally are highly correlated with the interaction terms created from them. 11. In moderated regression, the multicollinearity problems can be can be reduced by centering or standardizing the predictor and moderator continuous variables. This is done by putting them into deviation units by subtracting their sample means to produce revised sample means of zero. 12. Results of Model Summary: Output 19 The model summary of the hierarchical moderated multiple regression is shown in output 19. The independent variables in themselves have contributed to around 19.7 per cent of the variance in burnout of athletes and are significant predictors. However, including the moderating effect of satisfaction level with the variables of perfectionism, only an additional 2 per cent (approx.) increase in contribution to variance is seen which is non significant. 13. Results of Anova table: Output 20 The F-statistic in the ANOVA table are significant both for the independent variables and with their moderation effect at F = 8.895 (p = 0.000 < 0.05) and F = 5.851 (p = 0.000 < 0.05) respectively. 14. Interpretation of coefficients table: Output 21 Socially prescribed perfectionism (SPPC) and satisfaction (SATC) significantly predict burnout with beta coefficient values of B = .451 (p = .001 < 0.05) and B = .405 (p = .003 < 0.05) respectively. But, SPPC and SATC interaction (B = .231, p = .173 > 0.05) did not reach statistical significance. 15. From output 19, 20 and 21, and their interpretations, it can be clearly deduced that contrary to the hypothesized relationship, satisfaction did not moderate the relationship between perfectionism and total athlete burnout. 16. Summary of Section A: In this section we attempted to assess the reliabilities of the scales to be used in measuring interactions of perfectionism with levels of satisfaction to predict burnout. Specifically, the scale of satisfaction was tested and its structure analysed. Rotation was conducted to simplify the initial structure and finally items B1, B3, B5 and B6 were deleted from the scale. Consequently, all the reliabilities for scale SOP, SPP, SAT and BO was assessed and found to have very good reliability. Summary of Section B: The main purpose of this section was to investigate whether the measures to be tested had significant variance between male and female athletes. A multiple analysis of variance (MANOVA) was employed. The assumption testing were all consistent with normality, no multivariate outliers and linearity. Moreover a non significant test for Box’s M, F= 1.701at p = 0.074 > 0.05 confirmed homogeneity of variance. Finally, a non significant multivariate effect of gender with a significance F= 2.281at significance level of 0.065(p > 0.05) validated that there is no difference between the male and female athletes in measuring the variables of perfectionism, burnout and satisfaction. Summary of Section C: The aim of this section was to examine the moderation effect of satisfaction variable in the interaction between predictor variable perfectionism and outcome variable burnout of athletes. A moderated hierarchical multiple linear regression was carried out. The assumptions of linearity, normality, multivariate outliers were all met. However, the main test (B = .231, p = .173 > 0.05) showed no significant moderation effect of satisfaction on interaction between perfectionism and burnout. The independent variables in themselves have contributed to around 19.7 per cent of the variance in burnout of athletes and are significant predictors. But, including the moderating effect of satisfaction level with the variables of perfectionism, only an additional 2 per cent (approx.) increase in contribution to variance is seen which is non significant. Read More

CHECK THESE SAMPLES OF Factor and Principal Component Analysis

Transformational Leadership and Risk Taking to Improve Student Achievement

An analysis of the theory of transformational leadership as it relates to new principals from the leadership academy and student achievement will be studied.... According to Cowie and Crawford (2008), given the significance of the post of principal and the complex changes in the principal's role in recent years, the extent to which principals' preparation relates to what is expected of them once they are in post and what it is that they actually do is critically important....
48 Pages (12000 words) Essay

Oil And Food Commodities Prices. Oil prices effect on agricultural commodity prices in Latin American Nations

he analysis and investigation was undertaken using principal component analysis (PCA) to comprehensively analyze the impacts of the macroeconomic index (fossil fuel prices) on the values of agricultural food products.... The relationship coefficient that existed between the obtained principal function or component and the macro-economic index fluctuates between 0....
18 Pages (4500 words) Term Paper

Multivariate Analysis

The Comparison of principal component analysis and Data Envelopment Analysis in Ranking of Decision, by Filiz KARDYEN and H.... This paper is taken as an extension to the previous one as it compares the Data Envelopment Analysis with a purely statistical multivariate analysis technique namely Principle component analysis.... MULTIVARIATE analysis TECHNIQUE ASSESMENT Submitted to: [Name of Instructor] Submitted by: [Name of Student] Submitted on: [June 20, 2011] INTRODUCTION This report is aimed to present a review and critical analysis of the multivariate analysis techniques used in the three selected research papers....
12 Pages (3000 words) Essay

Awareness About the Impact of Leadership on Graduation Rate

The research findings can be used by the educationists and school administrators to focus on the value of the principal leadership style.... This dissertation "Awareness About the Impact of Leadership on Graduation Rate" expects to provide insights and assistance to the high school principals that are involved in the study, and enable them to better understanding the influence of their leadership styles on graduation rates....
19 Pages (4750 words) Dissertation

Iris recognition system using principal component analysis

Iris recognition system using principal component analysis Abstract Iris recognition has been a challenge in the past with the recognition accuracy being low.... principal component analysis has been used to reduce the dimensionality.... principal component analysis 2.... Iris matching Chapter 5 – analysis and Discussion of Results 5.... analysis of Hamming Distance 5....
60 Pages (15000 words) Dissertation

Procedure of the Principal Component Analysis

MO and Bartlett's Test helps in determining whether it is appropriate to apply Factor Analysis and principal component analysis to the given data set.... This report provides a procedure of the principal component analysis in order to determine a factor affecting the export performance in the financial year.... It describes the principal component analysis of factor analysis and VARIMAX rotation which were conducted.... principal component analysis helps us in identifying the factors which appear in the items and also helps in determining which items contribute to each of these factors....
3 Pages (750 words) Report

Factors Affecting Job Motivation, Satisfaction and Performance

This paper 'Factors Affecting Job Motivation, Satisfaction and Performance' explains the data analysis techniques and results through written text, figures, and/or other means, assessing the relationship amongst factors affecting job motivation, satisfaction and performance.... Factor analysis is a variable reduction strategy whose motivation is coming up with a subset of the data explaining much of the variability.... In this case, EFA was achieved using Principal Components analysis (PCA)....
7 Pages (1750 words) Research Paper

Stress Measurement and Photoelasticity

Photoelasticity can be defined as a method of determining the distribution of stress in a given material.... When mathematical methods become excessively complex for the purpose of stress determination, a photoelastic technique is preferred over other available techniques.... ... ... ... Therefore, it can be stated that the main philosophy behind photoelasticity is an urge for simplification of complicated technological processes....
9 Pages (2250 words) Assignment
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us