Plus One Economics Chapter 13

Chapter 13 :-

Organisation of Data

Plus One Economics Short Notes on Chapter 13 Organisation of Data

Introduction

The data collected in its original form is unorganised. Hence we call it raw data. This raw data is to be organised or classified so that it will become meaningful for the purpose of further statistical analysis.

Have you ever observed sorting of letters in a post office?

Letters collected in a post office are sorted into different lots on a geographical basis. They are then put in separate bags, each containing letters with a common characteristic, viz., having the same destination. In other words, they are classified to form groups of homogeneous character.

Similarly, when you arrange your books in a certain order, it will be easier for you to handle them. You may group (classify) them according to subjects. In such a case, each subject becomes a group or a class. If you require a book, say, Economics I, what you should do is to search for that book on the group ‘Economics’. Otherwise, you have to search through the entire books to find the particular book you require.

The activity taking place in the above instances is what is called classification Similarly, the raw data collected have to be organised or classified to make them useful for statistical interpretation.

Raw Data

The data collected in its original form is highly disorganised. They are often very large and cumbersome to handle. It is a tedious task to draw meaningful conclusions from such a raw data, Therefore, proper organisation and presentation of such data is required before any systematic statistical analysis is undertaken. Hence, after collecting data, the next and the most important step is to organise and present them in a classified form.

Suppose you want to know the performance of students in Economics I. You collected data on marks in Economics I of 100 students of your school. The data presented on a table will appear as follows:

Table 13.1 Marks in Economics 1 of 100 Students
46	44	11	10	50	55	48	36	88	41
61	58	57	56	57	48	54	57	100	40
43	69	63	60	59	58	59	64	65	51
65	29	37	51	54	55	71	81	57	91
70	50	52	49	48	49	70	71	54	54
48	46	49	54	55	54	60	59	56	66
49	45	63	63	62	61	59	59	49	65
65	41	26	25	24	27	18	24	39	23
45	52	45	44	43	45	46	12	24	34
47	45	58	57	59	58	59	60	74	24

The-data shown in Table 1 is raw data or unclassified data. Numerical values are not shown in any order. If we want to know how many students got marks above 40 or between 40 and 60, etc., it is very difficult to find them from the table. For proper analysis of data we have to classify the data. Thus by classification, units.having a common characteristic are placed in one class and the whole data are thus divided into a number of classes.

Classification of Data

Classification of data stands for grouping of related facts into classes. Facts in one class differ from those of another class with respect to some characteristics they possess. These characteristics form the basis of classification. Groups or classes of a classification can be done in many ways.

Objectives of Classification

To condense the data for easy understanding
To help comparison
To eliminate unnecessary details
To make decision making possible
To enable further statistical treatments
To identify main features of the data

Types of Classification

Data can be classified on the following four basis:

Geographical, i.e., area wise
Chronological, i.e., on the basis of time
Qualitative, i.e., according to some attributes
Quantitative, i.e., in terms of magnitudes

i. Chronological Classification

It is the arrangement of data in ascending or descending order with reference to time. When data are observed over a period of time the type of classification is known as chronological classification. For example, population of India may be observed for a number of years and shown timewise.

13.2 Chronological Classification
Year	Population
2009	567
2010	638
2011	736
2012	758

ii. Geographical Classification

In this type, data are classified on the basis of geographical differences between the various items. It is also called spatial classification. For example, population on India can be shown state wise. This is a geographical classification.

It is the arrangement of data with reference to geographical location such as countries, states (Spatial). Production of rice in different states of India is given in below table.

13.3 Geographical Classification
States	Production of Rice
Andhrapradesh	1200
Tamilnadu	950
Kerala	830

iii. Qualitative Classification

Under this method, data are classified on the basis of some qualities or values or attribute such as sex, colour of hair, literacy, religion, etc. They are not measurable. Their presence or absence can only be known.

Classification according to attributes may be (a) simple or (b) manifold.

In simple classification, data are divided on the basis of only one attribute. For example, the population under study may be divided into two categories on the basis of sex as male and female. In manifold classification, data are divided on the basis of more than one attribute. For example, population of India divided on the basis of sex and literacy, so that there are four groups: (1) male literate, (2) male illiterate (3) female literate and (4) female illiterate.

13.4 Qualitative Classification
States	Literacy
Kerala	99.5%
Karnataka	95.6%
Bihar	68%

iv. Quantitative Classification

Quantitative classification refers to the classification of data according to some quantitative measurement, such as height, weight, etc.

13.5 Quantitative Classification
Companies	Sales
Hundai	800
Tata	638
Maruti	736

Quantitative Data:

Data can be measured numerically-eg; Income, Production, Price, Cost..

Qualitative:

Data cannot be measured numerically– eg; Health, Intelligence, Ability..Also termed as Attributes.

Before discussing the process of classification, let us consider certain terms which are commonly used in our study.

Variables and Attributes

Variation is the order of the day. People have different life styles, habits, physical features, age, income, etc. Characteristics like height, weight, etc., are called quantitative characteristics while characteristics like sex, colour of hair, literacy, religion, etc., are called qualitative characteristics. A characteristic that can be measured numerically is called a quantitative characteristic. A characteristic that cannot be numerically measured but can only be expressed on the basis of quality or attributes is called qualitative characteristic. A quantitative characteristic which varies from unit to unit is a variable or variate. Thus weight, height, etc., are variables. Here we shall discuss the variation in characteristics which can be expressed quantitatively.

Continuous and Discrete Variables

In the last chapter you have learnt the term variable. But it does not tell you how it varies. Variable is that characteristic whose value is capable of changing from unit to unit. Suppose, the weight of one of the students in a class is 45 k.g. and of another student. 52 k.g, Now the quantitative characteristic, i.e., weight, changes its value from unit to unit. Hence weight is a variable.

Different variables vary differently. In other words, they differ on the basis of Specific criterion. They are broadly classified into two:

CONTINUOUS
DISCRETE

A continuous variable is that which can take any numerical value.

A continuous variable is that which can take any numerical value It can take integers such as (3, 4, 5, 6……..), fractional values (\( {{\frac{1}{2}} } \), \( {{\frac{3}{4}} } \),\( {{\frac{2}{3}} } \) ……), and values that are not exact fractions (irrational numbers) like √3 or √7. Take the example, of the height of a student. The height of a student as he/she grows, say, from 80cm to 140 cm., would take all the values in between them. It can take whole numbers like 85 cm, 100 cm, 112 cm, 145 cm, etc. It can also take fractional values like 91.45 cm, 103.35 cm, 148. 89 cm, etc., which are not whole numbers.

Other examples of a continuous variable are weight, time, distance etc.

Unlike continuous variables,

discrete variables are those which can take only certain values

Those. values are isolated and discontinuous. Its value changes only by finite jumps. It means that, it jumps from one value to another, but does not assume any intermediate value between them. For example, a variable like the number of employees in a firm, for different firms, would assume values with respect to only whole numbers. It cannot take any fractional value like 0.5, because half of an employee is absurd. Therefore, it cannot take a value like 28.5 between 28 and 29 It should be either 28 or 29. What we see here is that as its value changes from 28 to 29, the values in between them, ie., the fractions are not considered by it.

Number of students in each class of plus one in your school can be taken as an example of discrete variable. But we should not be under the impression that a discrete variable cannot take any fractional value. Suppose X is a variable that takes values like \( {{\frac{1}{7}} } \), \( {{\frac{1}{8}} } \), \( {{\frac{1}{32}} } \), \( {{\frac{1}{46}} } \)………… Is it a discrete series or a continuous Series? Definitely it is a discrete series, because though X takes fractional values it cannot take any value in between \( {{\frac{1}{7}} } \) and \( {{\frac{1}{8}} } \) or between \( {{\frac{1}{8}} } \) and \( {{\frac{1}{32}} } \). In other words, X cannot take continuous values.

Statistical Series :

When the items collected are arranged according to some logical order, it becomes a series, Statistical series may be divided into three types on the basis of their construction:

Individual series.

Discrete series.

Continuous series.

Indivisual Series (Simple Array)

In this type the items are listed singly, showing the observations relating to them. Each value of the variable occurs usually once. It can be arranged either in ascending or descending order. It may also be called a simple array. For example, the wages earned by 3 workers a day can be shown in an individual series as follows:

13.6 Individual Series
Number of workers	Wage (Rs)
1	500
2	600
3	550

Discrete Series (Frequency Array)

Certain items occur many times in the data. Items are arranged indicating the number of times each item occurs (ascending or descending order). A discrete series is also called a frequency array. In discrete series, statistical unit is either not divisible or is not divided. Each class is distinct and different from other class.

13.7 Discrete Series
Number of Children per couple	Number of Couples (Frequency)
0	21
1	19
2	10
Total	50

Continuous Series

In continuous series, different values of the variable are stated in a continuous manner with respect to their frequencies. In continuous series the statistical unit is capable unit, is capable of division and can be measured in fractions of any Size. They are expressed in class interval, and are continuous from beginning to end.

13.8 Continuous Series
Marks (Class)	Number of Students (Frequency)
0 – 10	5
10 – 20	10
20 – 30	17
30 – 40	13
40 – 50	5
Total	50

The Array

The first step in the organising raw data is to arrange them by their magnitude. A mass of raw data when put into an orderly arrangement by magnitude (ascending or descending order) is called an array. The following example will make you clear this.

Suppose, the raw data obtained from a business unit with regard to the daily wages in rupees of 20 workers are as follows:

20, 35, 31, 33, 30, 27, 36, 21, 39, 23

34, 41, 47, 32, 46, 49, 42, 43, 52, 50

The raw data when arranged in ascending and descending order is shown below:

13.9 Array in Ascending Order
Rs	Rs
20	36
21	39
23	41
27	42
30	43
31	46
32	47
33	49
34	50
35	52

13.10 Array in Descending Order
Rs	Rs
52	35
50	34
49	33
47	32
46	31
43	30
42	27
41	23
39	21
36	20

A look at the arrayed figure in Table 9 or 10 gives us information with respect to the lowest wage (Rs. 20) and the highest wage (Rs. 52). We also know that the range (deviation) between the lowest and the highest wage is Rs. 32 (Rs.52 – Rs. 20). We also notice a concentration of wage between Rs. 30 and Rs. 40.

An array is useful when the number of items in the raw data is small. But if the items are hundreds or thousands, it is very difficult to handle them, and is time consuming, Hence it necessitates the condensation of data. And that is the second step in the organising data. Here condensation or simplification of data is done through a process of classification into groups or classes.

The Frequency Array

While making an array it is possible that some values occur frequently. The number of times a value of item occurs in a series is called the frequency. If we mark the number of times a value appears in the series, we will get what is known as frequency array. The frequency array is useful only when the number of items in the raw data is small. The frequency array exhibits the frequency of observations and indicates the concentration of items around certain values.

Let us arrange the following raw data of daily wages (in rupees) of 20 employees in a factory into a frequency array.

50, 54, 50, 55, 56, 54, 50, 57, 50, 55

56, 54, 54, 50, 54, 56, 55, 54, 50, 56

13.11 Frequency Array
Daily Wages	No. of Employees
50	6
54	6
55	3
56	4
57	1
Total	20

Frequency Distribution

A frequency distribution is an orderly arrangement of data classified according to the magnitude of observations. When data are grouped into classes of appropriate size indicating the number of observations in each class we get a frequency distribution. For example, the students of a college may be classified according to weight as follows:

13.12
Weight (in Kg)	No. of Students
40 – 45	40
45 – 50	110
50 – 55	35
55 – 60	240
60 – 65	355
65 – 70	20
Total	800

There are two elements, viz., (1) the variable, i.e., the weight and (2) the frequency.

Construction of Frequency Distribution

Selection of Class

There is no hard and fast rule to determine number of classes
A class should not be too big or too small
There should not be too much classes or too short

Example:- 0 – 10, 10 – 20, 20 – 30…etc

Class Limit

The class limits are the lowest and the highest values that can be included in the class.
It is the two ends of a class.
In class 20 – 30, 20 is called the lower class limit and 30 is called upper class limit.

Class Interval

It is the difference between the upper and lower class limits.
Class interval is also known as class width or class size.
The class interval of the class 50 – 100 is 50 (100 – 50 = 50)

Class Midpoint

It is the middle value of a class. It is also known as mid value or class mark.
It lies half way between the lower and upper class limits of a class.

Magnitude of Class Interval

The difference between lower and upper class boundaries is called the magnitude of a class interval

Class frequency

The number of observation corresponding to a particular class is known as the class frequency.

Construction of Frequency Distribution

The following technical terms are important when a frequency distribution is formed:

Selection of class: The quality of a frequency distribution is determined by a wise choice of the number of classes. There is no hard and fast rule to determine the number of classes. Ordinarily, a frequency distribution should to contain more than 20 to 25 classes and not less than 6 to 8 classes depending on the total number of items of the series. Suppose, in an example, 100 entries are given and the lowest value is 3 and the highest 96. In such a case we can have 10 classes as 0-10, 10-20……..:…..90 – 100.
Class limits: The class limits are the lowest and the highest values that can be included in the class. For example, if we take the class of 20 – 30, the lowest value 20 is the lower limit and 30, the upper limit.
Class intervals: The difference between the upper and lower limits is known as class interval. In a class 100-200, the class interval is 100 {i.e., 200 minus 100).
Class mid-point or class mark: It is the value lying half-way between the lower and upper class limits of a class interval. Class mark = 1/2 (lower limit + upper limit).
Magnitude of class interval: The difference between lower and upper class boundaries is called the magnitude of a class interval.
Class frequency: The number of observation corresponding to a particular class is known as the frequency of that class or the class frequency.

How to find Frequency of distribution ?

We had seen that frequency means the number of times a value or item occurs and we have to count the number of times each value of the variable is repeated in the data to get the frequency. If the data is large, the counting simply will invite errors. For this we use the method of tally marks. Tally marks are vertical bars (/) used for counting.

Let us create a frequency distribution for the following data.

70, 54, 35, 45, 45, 73, 56, 46, 3, 42, 43, 43, 43, 36, 47, 23, 57, 45, 25, 43, 55, 21, 65, 78, 39, 28, 42, 21, 27, 70, 23, 85, 41, 71, 24, 43, 17, 26, 56, 39, 87, 43, 8, 38, 12, 71, 68, 28, 47, 23, 67, 60, 34, 59, 2, 77, 91, 56, 28, 43, 40, 21, 80, 56, 55, 51, 34, 58, 28, 28, 54, 34, 68, 30, 45, 24, 32, 34, 21, 54, 7, 16, 49, 32, 26, 21, 5, 26, 29, 37, 34, 21, 29, 71, 35, 8, 34, 20, 21, 80.

Using tally marks, we can create a frequency distribution. For that first we will draw a table with three columns. In the first column we write the class, in the second we write tally marks, and in the third frequency. All the entries in the first column are filled with classes. Now look at the data given. The first entry is 70. That-will fall in the class 70 – 80. Now strike off the entry 70 in the data and and put a tally mark in the second column right to the class 70 – 80. The second entry is 54. That will fall in the class 50 – 60. Now strike off the entry 54 in the data and put a tally mark in the second column right to the class 50 – 60. This process will be repeated up to when all the entries in the data gone stroked off. One more thing to notice is that, after placing 4 tally marks vertically, for the fifth we put the tally mark horizontally to cut the first four tally marks, so that this gives us a block of 5. For the sixth we put another tally mark vertically leaving some space from the first block. Look at the given below table, it is completed by doing the above said process.

13.13 Frequency Distribution with Tally Mark
Class	Tally Marks	Marks
0 – 10	//// /	6
10 – 20	///	3
20 – 30	//// //// //// //// ////	25
30 – 40	//// //// //// /	16
40 – 50	//// //// //// ////	19
50 – 60	//// //// ///	13
60 – 70	////	5
70 – 80	//// ///	8
80 – 90	////	4
90 – 100	/	1
Total		100

Exclusive Method

When the class intervals are so fixed that the upper limit of one class is the lower limit of the next class, it is known as the exclusive method of classification. The classes are, for example, written as 5-10, 10-15, etc. Here a frequency of 10 is not included in the first class 5-10. It is included in the class 10-15 (Second class).

13. 14 Exclusive Classes
Marks (Class)
0 – 10
10 – 20
20 – 30

Inclusive Method

Under the inclusive method of classification the upper limit of one class is included in that class itself. The class under this method are written, for example, as 5-9, 10-14, etc. Here a frequency 9 is included in the first class 5-9.

13.15 Inclusive Classes
Marks (Class)
0 – 9
10 – 19
20 – 29

How to Convert Inclusive Classes into Exclusive Classes ?

Find the difference between the upper limit of a class and the lower limit of the next class. Find half the difference. Subtract this number from all the lower limits and add this number to all the upper limits.

Let us convert the below given inclusive type classes into exclusive type classes.

13.16 Inclusive Classes
Marks (Class)
0 – 9
10 – 19
20 – 29

Given classes, 0 – 9, 10 – 19 , 20 – 29

Difference between the upper limit of a class and the lower limit of the next class = 10 – 9 = 1

Half the difference : \( {{\frac{ 1}{2}} } \) or (0.5).

Now we can get exclusive type class as given below.

13.17 Exclusive Classes
Marks (Class)
-0.5 – 9.5
9.5 – 19.5
19.5 – 29.5

Cumulative Series

In a cumulative series the frequencies are progressively totalled and aggregates are shown.

13.18 Cumulative Series
Marks (Class)	Number of Students (Frequency)
Marks below 10	12
” below 20	18
” below 30	24
” below 40	30
” below 50	36

The cumulation may be upward or downward.

Loss of Information

When we classify data into a frequency distribution there is an inherent shortcoming. When it summarises the raw data to make it concise, it fails to give all details that are found in raw data. That is, while summarising it as a classified data, there is a loss of information. We noted that once the data are grouped into classes, an individual observation has no significance in further statistical computations. Consider an example of a class 30 – 40 containing 6 observations, 35, 35, 30, 32, 35 and 38. When we use the frequency table for further analysis, we will not attach any importance to the actual value of the items. We consider only the total number of items (6). All values in the class are taken to be equal to the middle value of the class interval (i.e., 35); individual values are not considered. This is true for other classes as well. Thus the use of mid value of each class in place of actual values of the observations in statistical methods involves considerable loss of information.

Open end Class

If the lower limit of the first class or upper limit of the last class are not given, such series are called open end class series.

13.19 Open end Class
Marks (Class)	Number of Students (Frequency)
Marks below 10	4
10 – 20	6
20 – 30	6
30 – 40	9
40 and above	5

Unequal Class

We are now familiar with frequency distributions of equal class intervals. But in some cases, frequency distributions with unequal class intervals will be more appropriate. If all classes in the distributions are not equal, it can be called unequal class distribution. Observe the frequency distribution given below:

13.20 Frequency distribution of Marks in Economics
Marks (Class)	Mid Value	Number of Students (Frequency)
0 – 10	5	2
10 – 20	15	8
20 – 30	25	5
30 – 40	35	6
40 – 50	45	24
50 – 60	55	18
60 – 70	65	20
70 – 80	75	7
80 – 90	85	6
90 – 100	95	4

In the above frequency distribution we notice that most of the observations are concentrated in classes 40 – 50, 50 – 60 and 60 – 70. Frequencies corresponding to these classes are 24, 18, 20 respectively. This means that majority of items (62) are highly concentrated around these three classes. This implies that 62 per cent are in the middle range of 40 – 70. Only 38 per cent of data are in other seven classes. These seven classes are sparsely populated. Further we notice that observations in these classes deviate more from their respective class marks than in comparison to those in other classes. Hence making small classes will be more suitable in this case. Unequal class interval is more appropriate to the above frequency distribution.

What we are going to do is that the class with highest concentration ( 40 – 50, 50 – 60 and 60 – 70) are split into two classes. The class 40 -50 into 40 – 45; 45 – 50, class 50 – 60 into 50 – 55; 55 – 60 and class 60 – 70 into 60 – 65; 65 – 70. We retain the other classes as was done earlier (i-e., class interval with 10).

Total number of students in class	40 – 50	= 24
“	40 – 45	= 11 (assumed)
“	45 – 50	= 13 (assumed)
“	50 – 60	= 18
“	50 – 55	= 8 (assumed)
“	55 – 60	= 10 (assumed)
“	60 – 70	= 20
“	60 – 65	= 9 (assumed)
“	65 – 70	= 11 (assumed)

The new classification along with frequency class marks is given in the following table. The new class mark values are more representative of the data in these classes than the old values.

13.21 Frequency distribution of unequal classes
Marks (Class)	Mid Value	Number of Students (Frequency)
0 – 10	5	2
10 – 20	15	8
20 – 30	25	5
30 – 40	35	6
40 – 45	42.5	11
45 – 50	47.5	13
50 – 55	52.5	8
55 – 60	57.5	10
60 – 65	62.5	9
65 – 70	67.5	11
70 – 80	75	7
80 – 90	85	6
90 – 100	95	4

Univariate Distribution.

The frequency distribution of a single variable is called a univariate frequency distribution. The data given in example (inclusive method) shows the univariate distribution of the single variable ‘number of students’.

13.22 Univariate Distribution
Marks.	Number of Students.
40 – 50	5
50 – 60	8
60 – 70	15
70 – 80	20
80 – 90	7
90 – 100	2

Bivariate Distribution.

A bivariate frequency distribution ts the frequency distribution of two variables.

The following table shows the frequency distribution of two variables. Two yariables are sales and advertisement expenditure. The values of variable sales are given in columns and the values of variable advertisement expenditure are shown in rows.

13.23 Bivariate distribution
Sales.	100 – 200	200 – 300	300 – 400	400 – 500
Cost.	100 – 200	200 – 300	300 – 400	400 – 500
40 – 50	5	3	2	1
50 – 60	8	4	3	1
60 – 70	8	3	1	1
70 – 80	6	1	2	1
80 – 90	4	1	1	2

Organisation of Data

Introduction

Raw Data

Classification of Data

Objectives of Classification

Types of Classification

Variables and Attributes

Continuous and Discrete Variables

Statistical Series :

Indivisual Series (Simple Array)

Discrete Series (Frequency Array)

Continuous Series

The Array

The Frequency Array

Frequency Distribution

Construction of Frequency Distribution

Construction of Frequency Distribution

How to find Frequency of distribution ?

Exclusive Method

Inclusive Method

How to Convert Inclusive Classes into Exclusive Classes ?

Cumulative Series

Loss of Information

Open end Class

Unequal Class

Univariate Distribution.

Bivariate Distribution.

Leave a Reply Cancel reply