Data management, also known as statistics, is the science of collecting, organizing, presenting, analyzing, and interpreting numerical data. It also encompasses the techniques used in processing and analyzing data and involves sampling methods for data collection.
Data management has two main types. Descriptive statistics focuses on collecting, organizing, presenting, and summarizing numerical data to describe a situation. Inferential statistics involves analyzing organized data to make predictions or inferences, with conclusions based on facts and observed patterns, requiring appropriate descriptive measures.
A variable is a characteristic being studied that varies across individuals or objects, such as age, race, gender, height, or weight. Variables are categorized into qualitative (representing differences in quality, character, or kind, like sex or marital status) and quantitative (numerical in nature, like weight or age). Quantitative variables can be discrete (counted using integral values, e.g., number of students) or continuous (can assume any numerical value over an interval, e.g., height or temperature). Variables can also be dependent (the predicted value) or independent (the predictor).
Data is the primary element of data management, representing a collection of observations or factual information. It is the raw material statisticians work with, found through surveys, experiments, and research. Data can be primary (gathered directly from the original source) or secondary (taken from published or unpublished data previously gathered by others).
There are four scales of measurement for data. Nominal data uses numbers to identify membership in a category, without order (e.g., electrical consumption types). Ordinal data shows inequalities or order, but not the magnitude of differences (e.g., grades: A, B, C or socioeconomic status: low, medium, high). Interval data includes greater than/less than relationships and permits measuring how much more or less one possesses, but lacks a true zero point (e.g., Fahrenheit temperature). Ratio data is similar to interval data but has an absolute zero, and multiples are meaningful (e.g., election votes, teacher-student ratio).