Join Your Exam WhatsApp group to get regular news, updates & study materials HOW TO JOIN

# Unit VI Major Multivariate Data Analysis Techniques for Business Research Mcom sem 4 Delhi University

## Unit VI Major Multivariate Data Analysis Techniques for Business Research Mcom sem 4 Delhi University

### Unit VI Major Multivariate Data Analysis Techniques for Business Research Mcom sem 4 Delhi University

Unit VI Major Multivariate Data Analysis Techniques for Business Research MCOM sem 4 Delhi University : During the last two or three decades, multivariate statistical analysis has become increasingly popular. The theory has made great progress, and with the rapid advances in computer technology, routine applications of multivariate statistical methods are implemented in several statistical software packages, making it simple even for the novice to undertake fairly sophisticated multivariate statistical analysis of data at their disposal.

While this is certainly a welcome development, we find, on the other hand, that many users of statistical packages are unable to appreciate, what they are doing, and this is particularly true for multivariate statistical methods. With the increasing use of multivariate statistical methods by business analysts, it is important for business-major students to develop an understanding of multivariate statistical methods:

### Unit VI Major Multivariate Data Analysis Techniques for Business Research MCOM sem 4 Delhi University

Even though business executives are not generally required to undertake sophisticated statistical analysis themselves, they are often presented with reports and articles based on such analysis. Furthermore, many executives have heard about these techniques and would like to use them as analytic and decision support tools. The traditional approach to the teaching of multivariate statistical analysis, as exemplified by Anderson (1958), relies heavily on advanced matrix mathematics.

On the other hand, Hury and Riedwyl(1988) suggest that it is possible to understand most of the basic ideas underlying multivariate statistical analysis without a mastery of such mathematics, provided that these are conveyed with the help of real data sets. Since most business data do not follow the usual normality assumption, there are often (possibly severe) limitations in the use of some of the standard multivariate statistical techniques.

Real data sets are therefore required not only to illustrate the statistical techniques concerned, but also to clarify the assumptions needed for these techniques to be valid. In this paper, we propose a non-mathematical data-driven approach for teaching multivariate statistical methods to business-major students. Despite this, we are mindful of the need for students to know some basic linear algebra and univariate statistical concepts. Such basic knowledge provides students with the foundation necessary for the application of the appropriate multivariate statistical procedures and for the interpretation of results.

### Unit VI Major Multivariate Data Analysis Techniques for Business Research MCOM sem 4 Delhi University

Business data and the approach to analyse them

Business data sets are usually large and undifferentiated. They are therefore generally closer to data encountered in the social sciences and differ somewhat from the more precise data encountered in the physical and natural sciences. Large and undifferentiated data sets contain so many inter-relationships that it is virtually impossible to make sense of them without fust arriving at a summary description. Multivariate statistical analysis, when applied to such data sets, should allow us to “explore” the data sets with a view to discovering, describing and understanding the major inter-relationships.

While multivariate statistical techniques allow us to analyse, verify, test, and prove various hypotheses, it should be emphasised that with a large undifferentiated data set, we should, at least in the initial stages, be less involved with building a specific statistical model, or with the formal procedures of statistical inference. Thus, when teaching multivariate statistical analysis to business-major students, we should concentrate on the descriptive techniques rather than on the theory of the multivariate normal distribution, beginning with simple univariate statistical analysis leading to standard multivariate techniques such as principal component analysis, factor analysis, and cluster analysis which are applicable when the data is measured on a continuous or interval scale. Once the basic ideas have been conveyed, it would be a simple matter to introduce students to other related techniques, such as correspondence analysis and multi-dimensional scaling.

### Unit VI Major Multivariate Data Analysis Techniques for Business Research MCOM sem 4 Delhi University

Illustrative example

To illustrate the approach outlined above, we consider the sample data set ‘CARDATA’ provided with the computer package STATGWHICS. The data set comprises 155 observations with 11 variables as follows: mpg, cylinders, displace, horsepower, accel, year, weight, origin, make, model and price. This data set is small enough for most computer packages to handle and large enough to approximate the type of data encountered in practice.

Faced with such a data set, students are generally at a loss to know how to begin to analyse it, particularly when the objectives of the analysis have not been specified. An obvious approach would be to begin by examining each of the variables in turn. Such an initial examination of the data has the merit of enabling students to obtain a better feel for the data. For example, on examining the variable mpg, we might consider the summary statistics tabulated in Table 1.

### Unit VI Major Multivariate Data Analysis Techniques for Business Research MCOM sem 4 Delhi University

TABLE 1 Summary statistics for mpg

 Number of Observations 154 Mean 28.79 Standard Deviation 7.37 Coefficient of Variation 25.62

To enable students to familiarise themselves with the data, more elaborate univariate analysis can be undertaken. For example, students may construct histograms, plot stem-and-leaf diagrams, etc. Note also that because some variables may have missing values, the total number of observations reported may be less than 155. Once the initial examination of data has begun, students might be led to raise further questions.

For example, how does rnpg vary with other variables? If the other variable in question is a classifying (i.e. nominal-scaled) variable, the observations can be grouped before analysis. As an illustration, consider the case when the other variable is origin. Table 2 presents the summary statistics.

### Unit VI Major Multivariate Data Analysis Techniques for Business Research MCOM sem 4 Delhi University

TABLE 2 Summary statistics for mpg by origin

 United States Europe Japan Number of Observations 85 25 44 Mean 25.26 32.55 33.48 Standard Deviation 6.12 8.18 5.27 Coefficient of Variation 24.22 25.13 15.75

These summary statistics suggest that Japanese cars with the highest rnpg are more economical than American or European cars. Students will then be led naturally to consider the possible reasons for this. Is it the case that Japanese cars are smaller or less powerful? Consideration of such questions will lead them to examine the variables weight and horsepower. Summary statistics presented in tables similar to Table 2 suggest that it is indeed the case that Japanese cars are lighter (i.e. smaller) and less powerful.

### Unit VI Major Multivariate Data Analysis Techniques for Business Research MCOM sem 4 Delhi University

Regression and correlation analysis

The simple initial examination of data described above will prompt students to consider the relationship between two or more variables. They now know that Japanese cars are more fuel-economical because they are lighter and less powerful. Students will then be led to consider the relationship between fuel consumption, rnpg and weight or horsepower. The suggestion of plotting a scatterplot of rnpg against weight or horsepower can then be given. Students may then attempt to draw a “line of best fit” through the scatterplot. Figure 1, which shows the result of this exercise, indicates that rnpg decreases as weight increases. Students could then be asked to consider how best to fit the line and thus led to “least squares estimation”, obtaining the line of best fit to be: mpg = 55.89 – 0.0101 x weight suggesting that for every lb increase in weight, the fuel consumption decreases by 0.01 mpg. Furthermore, students can be led to ask how much of the variation of rnpg can be explained by the variable weight. This leads naturally to R~ which is the proportion of the variation of the variable rnpg that is “explained” by weigh!.

In this case, R2 = 0.6874, suggesting that 68.74% of the variation may be explained by. the linear regression.

### Unit VI Major Multivariate Data Analysis Techniques for Business Research MCOM sem 4 Delhi University

While discussing R2, it can be explained that in the simple linear regression case (when there is only one explanatory variable), R2 is just the square of the sample linear correlation coefficient. In this way, students are introduced naturally to the fact that the correlation coefficient is a measurement of the closeness of the sample observations to a straight line. Once this idea is conveyed, students can discuss more complicated issues such as the testing of hypotheses about the parameters of the straight line as well as the assumptions needed to ensure that such tests are valid. Indeed, the appropriateness of the underlying assumptions of the linear regression model can also be examined by careful study of the scatter diagram which would lead students to consider residual analysis and issues such as normality, outliers, etc.

Raw data, as they are received from the field in primary data collection, are in no condition for interpretation. Such data constitute bits of information recorded on many individual forms, and substantial work must be done on them. Therefore, the bits of raw data must be transformed into information that will answer the researcher’s study objectives. The decisions made about these preparatory steps are based on the assumptions involving general logic about the interpretative process and about the supposed nature of the data relative to the appropriate analysis. The transformation of raw data into useful information requires that the data be validated, edited, coded, and keypunched so that it may be transferred to a computer or any other data storage device.

If the amount of data gathered is large, then, there are many advantages in utilizing a computer for data processing. Marketing researchers need to know about computer systems to communicate with computer technicians so that the data requirements can be filled correctly and efficiently. Everyone in administrative positions, in any type of Organization, and everyone doing any sort of research should have this knowledge, since computers have become a universal tool of management.

### Unit VI Major Multivariate Data Analysis Techniques for Business Research MCOM sem 4 Delhi University

Data-processing Methods

Data processing’s total task in carrying out the analytical program is this: to convert crude fragments of observations and responses that we just coded into orderly statistics that are ready of interpretation. Methods of processing data can be placed into two types: manual and computer. Electronic methods other than computers do exist but no longer have sufficient usage to be mentioned. The methods possess unique advantages and disadvantages, and a brief discussion of each will enable you to grasp the implications fo using a particular method. Manual methods can be divided into two types.

One of these, tallying, is completely by hand, entering the responses in appropriate categories on worksheets. In this simple method, the ‘sorting’ is done individually for each observation by selecting the line on which to tally it. Tallying tends to be done more accurately by having two persons work on it, one calling off the responses while the other tallies. The sort-and-count method is exactly that: first, sort all questionnaires or data forms into piles, one for each answer category; then, count each pile. This avoids the tallying danger of making entries on the wrong line and can be speedier, provided that it is easy to read and sort the entries for all questions and categories. A variant of sort-and-count is keysort, a copyrighted name of Litton Industries, that uses a standard card that can be sorted and counted manually with simple equipment.

Along the edges of this card are rows of holes that may be designated as fields and given code numbers. Then, at appropriate places for the observed data, the margin is punched to make a notch. When all the cards are notched and assembled so that the holes are in line, a rod is inserted through the hole representing the data category being counted. When raised with this needle, the cards punched at the hole will fall; then they are counted. Keysort is a quicker and more accurate method.