Data sets
Octal number system
Data is a piece of collective information or results of an experiment or event. Data set is a set of these data. They are represented in graphs, charts, tables, or any other pictorial representation for easy demonstration. The values inserted in a data set is called a datum. More than one row or column or more data in the exact representation can be used for better study purposes.
Types of data sets
The following are the major five types of data sets used to represent statistical data in corporate sectors. They are-
- Numerical data set – Numbers and natural language represent them. It is often referred to as quantitative data. The whole numerical data is a bunch of numerical data only. They can be studied using arithmetic operations like addition, subtraction, multiplication and division. Some examples of numerical data sets include – a collection of body stats of people, number of pages written by someone, test reports of sports players, etc.
- Bivariate data set – The data set having two variables only is known as a bivariate data set. It deals with two variables and is used to compare two variables or study relationships among them. Some examples of bivariate data sets include – comparison of temperature of a city in winter and summer season, class test results of two subjects, etc.
- Multivariate data set – Unlike a bivariate data set with only two variables, a multivariate data set has more than two variables to represent the data. It contains more than two functions to denote the results of an experiment or study. Examples of multivariate data sets include body data of the whole class, the population of different states in the country, different currencies in the world, their equivalent value in Indian Rupees, etc.
- Categorical data set – It is a qualitative data set. It consists of data of a particular person, machine, organization or relatable object. The information collected can be termed dichotomous if it consists of only two functions and polytomous if it contains more than two functions or variables. Examples of categorical data include – data of humans, animals, their statuses, etc.
- Correlation data set – These are used to form a relationship between two or more functions. If the data is relatable, then it is called correlated data. The relationship between those correlated data is called a correlation. The correlation can be in the form of a linear equation, quadratic equation, or any other equivalent. The correlation so developed can either be positive, negative or zero, depending upon the type of data. For example, the growth of height with age is a positive correlation. In contrast, the relation between vision and the age of a human is a negative correlation.
Properties of a data set
We need to understand the nature of the data or information we have collected. We can take the help of various Exploratory Data Analysis (EDA) techniques to identify the properties of these data, form relationships between them, and study them for our statistical purposes. The following are the properties of a data set-
- Centre of data
- Skewness of data
- Spread among the data members
- Presence of outliers
- Correlation among the data
- Type of probability distribution