Organisation of Data
Once data is collected, it is in raw, unorganised form. Raw data needs to be arranged systematically so that it becomes easy to understand and analyse. This process is called organisation of data.
Raw Data and the Need for Organisation
Raw data is unprocessed data in the form it was collected. Example: marks of 20 students listed in the order they were recorded. From raw data alone, it is difficult to identify patterns or make comparisons. Organisation converts raw data into a meaningful form.
Classification of Data
Classification is the process of arranging data into groups or classes based on similarities. It reduces bulk, highlights patterns, and facilitates comparison and analysis.
- 1.Bases of Classification:
- 2.Geographical — by location (states, districts)
- 3.Chronological — by time (year-wise, month-wise)
- 4.Qualitative — by attribute (gender, religion)
- 5.Quantitative — by numerical value (income, marks)
Frequency Distribution
A frequency distribution is a table showing the number of times (frequency) each value or range of values occurs in the data.
- Terms:
- Class Interval (Width): The range of values in a class. E.g., 10-20 has a width of 10.
- Class Limits: The two boundary values of a class — lower class limit and upper class limit.
- Class Frequency: Number of observations falling in a particular class.
- Class Midpoint (Mid-value): (Lower limit + Upper limit) / 2
- Relative Frequency: (Class frequency / Total frequency) × 100
Exclusive (Continuous) Method: Upper limit of one class = Lower limit of next class. Values at the boundary go to the next class. Example: 0-10, 10-20, 20-30.
Inclusive Method: Both class limits are included. Example: 0-9, 10-19, 20-29.
Tally Marks
While constructing a frequency distribution, tally marks are used — four vertical bars and one diagonal for groups of five (||||).
Cumulative Frequency Distribution
Cumulative frequency is the running total of frequencies up to a particular class. Helps in finding how many observations lie below or above a value.
Worked Examples
Classify the following marks data and identify bases of classification.
Data: Marks of 5 students: 45, 72, 38, 91, 55.
Quantitative classification (by numerical value — marks). Arrange in ascending order: 38, 45, 55, 72, 91.
Construct a frequency distribution for the data: 12, 25, 17, 32, 25, 17, 12, 41, 25, 32 using class intervals 10-20, 20-30, 30-40, 40-50.
- 10-20: 12, 17, 17 → frequency = 3
- 20-30: 25, 25, 25 → frequency = 3
- 30-40: 32, 32 → frequency = 2
- 40-50: 41 → frequency = 1
Total = 9 (excluding 12 counted once — recount: 10 observations, 10-20 has 12,17,17 = 3; 20-30 has 25,25,25 = 3; 30-40 has 32,32 = 2; 40-50 has 41 = 1; check = 3+3+2+1 = 9, so one observation at boundary — 20 goes to 20-30 class in exclusive method).
Find the mid-value of the class 20-30.
Mid-value = (20 + 30) / 2 = 50 / 2 = 25
From a frequency distribution, if frequencies are 5, 8, 12, 7, find cumulative frequencies.
- Up to class 1: 5
- Up to class 2: 5 + 8 = 13
- Up to class 3: 13 + 12 = 25
- Up to class 4: 25 + 7 = 32
A data set has values ranging from 10 to 95. Suggest appropriate class intervals if we want 9 classes.
Class width = (95 - 10) / 9 approximately 9.4, round to 10. Use classes: 10-20, 20-30, ..., 90-100.
Distinguish between exclusive and inclusive class intervals with an example.
Exclusive: 0-10, 10-20 — value 10 goes to 10-20 class. Inclusive: 0-9, 10-19 — value 10 goes to 10-19 class. Inclusive method is used when data is in whole numbers.
Why is cumulative frequency useful?
It tells us how many observations lie below (or above) a particular value — useful for computing medians and percentiles.
Common mistakes
Common mistakes
In the exclusive method, students often put a value at the upper limit into the lower class. Always remember — in exclusive intervals, the value at the upper limit belongs to the NEXT class (lower limit included, upper limit excluded).
Summary
Organisation of data involves classifying raw data into frequency distributions using appropriate class intervals. Key concepts include class limits, class width, tally marks, mid-values, and cumulative frequencies. Data can be classified geographically, chronologically, qualitatively, or quantitatively.