The third lecture on the collection of raw data (descriptive statistical analysis)

First, the survey method.

I. Random sampling survey

Concept: Incomplete investigation. Randomly select some units from all the investigated people (the selection of sample units is not affected by subjective factors and other system factors, and each overall unit has equal opportunity to be selected) for observation, and infer the quantitative characteristics of the population according to the sample data.

Note: If the sampling survey does not follow the principle of random sampling, it will be impossible to infer the overall characteristics from the sample. Therefore, how to ensure the randomness of sampling is the primary problem of sampling survey.

Conditions: Sampling inference based on probability theory can not only estimate the size of sampling inference errors, but also control these errors through certain methods.

Scope of application of random sampling survey:

1, unable to conduct a comprehensive investigation. For example, some destructive product quality inspections.

2. Due to the heavy workload and difficulty, it is difficult to conduct a comprehensive investigation, but it is necessary to obtain the overall data. For example: understanding of the income and expenditure of every household in China.

3. Revise and supplement census data.

Advantages: time and labor saving, cost reduction, reliability and effectiveness (the error can be controlled by scientific methods).

I. Simple sampling (time point)

Premise: Know the total amount in advance.

Concept: the most basic sampling method. According to the original situation of the whole population, the samples are taken without any restrictions. Such as drawing lots, touching balls, dice and random functions of excel =int(rand()*x+ 1), where x represents the total number.

Sampling error: The smaller the difference of sample units, the higher the representativeness of the sampled samples and the lower the sampling error.

Disadvantages: When the population is large or infinite, the numbering work is very heavy and even impossible.

B, equidistant sampling (time point, time period)

Method 1 premise: the total amount can be grasped in advance.

Method 2 premise: know the overall situation in advance.

First, sort the whole unit.

Method 1: Sort by symbols irrelevant to the content of sampling survey, for example, when investigating population income, sort by surname strokes.

Method 2: Sort according to the signs related to the sampling survey content. For example, when investigating population income, it is sorted by income.

Sampling error: The closer the relationship between sorting marks and survey contents, the more consistent the order, and the smaller the sampling error.

Taboo: When the overall ranking shows a certain periodicity, especially when the periodicity rhythm is consistent with the sampling interval, it will cause systematic errors, thus affecting the representativeness of the sample.

After numbering in turn, extract the first sample unit.

Method 1: Take the first sample unit by simple sampling within the specified interval. (The smaller the sample unit difference in the interval, the higher the representativeness of the sampled samples and the lower the sampling error. )

Method 2: Within the specified interval, take the sample unit in the middle position (representing the middle level of the first paragraph, which is the most representative, so that a more representative sample can be obtained).

Then, starting from the first sample unit, other sample units are extracted at specified intervals.

Advantages: Compared with simple random sampling, it is simpler and more cost-saving, and the selected samples are more representative.

C sampling type (time point, time period)

Method 1 premise: the whole quantity can be grasped in advance.

The premise of the second method: know the proportion of each group in the total and the situation in each group in advance.

Select grouping flag:

Each grouping mark represents the response of the investigation purpose in a certain aspect. Only by choosing the most appropriate grouping mark can the grouping result correctly reflect the essence of the phenomenon.

Compound grouping: grouping with two or more symbols. The advantage is that it is conducive to comprehensive, in-depth and specific analysis. The disadvantage is that too many groups not only increase the workload but also dilute the main investigation purpose. Therefore, it is necessary to group according to the main signs first, and then supplemented by the secondary signs.

Determine the grouping group:

Under the same sign, people should be grouped according to different characteristics (provided that the range of characteristics is known, and some can refer to the national statistical grouping system), and each group should be mutually exclusive (any overall unit can only be suitable for one group), complete (any overall unit is in one group) and similar (comparable between groups). For example, agricultural products are divided into mountains, hills and plains according to the terrain, and the population is divided into 1~ 18 (juvenile), 19~30 (youth), 3 1~50 (middle age), 5 1 ~ and so on according to the age.

Sampling error: The closer the grouping mark selected by sampling is to the purpose of investigation, the finer the qualitative and quantitative analysis, the smaller the difference within each group, the more representative the sample unit selected from it, and the smaller the sampling error.

Determine the number of samples in each group:

Methods 1: According to the degree of variation in each group, the greater the degree of variation, the more sample units, the smaller the degree of variation and the fewer sample units. (time period)

Method 2: according to the proportion of the number of units in this group to the total number of units, the number of sample units in the same proportion is extracted, that is, type proportional sampling; (time point)

Sort the group units.

Method 1: Sort by symbols irrelevant to the purpose of sampling survey, for example, when investigating population income, sort by surname strokes.

Method 2: Sort according to the signs related to the purpose of sampling survey. For example, when investigating population income, it is sorted by income.

Sampling error: The closer the ranking mark is to the purpose of investigation, the more consistent the order, and the smaller the sampling error.

After each group is numbered in turn, take the first sample unit.

Method 1: Take the first sample unit by simple sampling within the specified interval. (The smaller the sample unit difference in the interval, the higher the representativeness of the sampled samples and the lower the sampling error. )

Method 2: Within the specified interval, take the sample unit in the middle position (representing the middle level of the first paragraph, which is the most representative, so that a more representative sample can be obtained).

Then, starting from the first sample unit, other sample units are extracted at specified intervals.

Advantages: Because the sample unit is selected from each category, all types in the sample are guaranteed to be included, so type sampling will greatly improve the representativeness of the sample.

D, cluster sampling (time point)

Premise: Know the total amount in advance.

Concept: First, divide the crowd into many groups with the same attributes and characteristics, then number them in turn, and randomly select several groups as samples.

Sampling error: the higher the similarity of attributes between groups, the more uniform the overall unit distribution, the higher the sample representativeness and the smaller the sampling error.

Advantages: When the total number is large and the units are scattered in time and space, this method can save manpower and material resources and reduce costs.

Disadvantages: The sampled samples are often not uniform enough and the representativeness is relatively low. Use with caution.

B. Non-random sampling survey

Concept: Incomplete investigation. Sampling is not based on equal probability principle, but on people's subjective judgment or other conditions.

Scope of application of random sampling survey:

1, in some cases, strict random sampling is almost impossible, for example, the overall boundary of the respondents is unclear and it is impossible to make a sampling frame.

2. In order to achieve the purpose of the study, some studies have to take a few representative individuals as samples from the population as needed.

3, the operation process of random sampling is strict, and it is troublesome and time-consuming to implement. Therefore, if the purpose of the survey is only to explore problems, obtain research clues and put forward assumptions, rather than inferring the population from the sample, there is no need to adopt random sampling.

Advantages: Non-random sampling operation saves time and effort. If researchers have a better understanding of the survey population and respondents, they can also get more accurate results.

Disadvantages: Because it is a subjective decision, it is impossible to guarantee whether the sample reproduces the overall distribution structure. The sample is small in representativeness and large in error, so it is impossible to estimate. It is extremely unreliable to infer population with such a sample.

Ba, typical investigation

In the overall survey, deliberately select individual or a few representative units to conduct the survey.

Typical point selection method: sorting points.

Bb, focus on investigation

In the survey population, select some key units for investigation. Although the key units are only a small part of the population, they account for a large proportion of the surveyed quantitative indicators.

Advantages: it saves time and labor, reduces cost and has high sample representativeness, so the inference is generally reliable.

Bc, nearby sampling

In your own convenient form, take the units you meet by chance as samples in the whole crowd.

Target sampling and judgment sampling

Select representative units as samples according to subjective judgment.

Sampling error: The more researchers know about the population, the smaller the sampling error.

Yes, snowball sampling

When we can't understand the overall situation, we can start collecting samples from a few units in the crowd and find more and more sample units through referral or other means.

Advantages: Suitable for studying minority groups.

Bf, quota sampling

Premise: Know the total quantity and the proportion of each group in the total in advance.

Determine the number of samples, determine the proportion of each category in the sample according to the proportion of each category in the population (simulate a population with samples), and finally sample in proportion.

C, regular statistical reports

Concept: the survey method of sorting out the unified report from top to bottom, and then reporting the summary report materials from bottom to top step by step.

Advantages: 1 comprehensiveness and continuity of data; 2. Uniformity and timeliness of data; The source and accuracy of the data are reliable.

Disadvantages: 1 data will be mixed with false data due to the influence of interests; Too many reports will increase the burden on the grassroots and even cause confusion.

The shorter the reporting period, the more concise the content and the tighter the submission time;

The longer the reporting period, the more detailed the content and the looser the submission time.

D. Population census

Concept: A one-time comprehensive survey organized by specialized agencies.

The content of the survey can be the phenomenon at a certain point in time (such as population situation, fixed assets stock, etc.). ) or a process phenomenon in a certain period (such as annual production and sales).

The main purpose of the survey is to collect some data that cannot or is not suitable for regular statistical reports, so as to understand the overall situation of important companies and some important economic phenomena.

Advantages: The information is the most comprehensive, systematic and detailed.

Disadvantages: it takes a lot of manpower, material resources, financial resources and time (although the investigation and registration time is not long, it takes a long time for complicated and subtle preparation and huge data processing), which is prone to registration errors.

Second, the investigation plan

Purpose of investigation:

Find out what problems the investigation should solve, and only when you have a purpose can you know what kind of information you should collect.

Respondents:

A population surveyed or inferred. Individuals in a group are called investigation units (all or part).

Reporting unit:

The unit responsible for submitting survey data.

Survey content:

The content of 1 should be necessary to meet the purpose of investigation, and the optional or unnecessary content may not be included.

The content should only contain signs that can get the exact answer.

The expression of the content should be exact and specific, not ambiguous, so that the reporters have a consistent understanding.

Questionnaire survey method: (omitted)

Questionnaire survey:

1, header: including questionnaire name (middle), reporting company name, address and affiliation (upper left corner), table number, tabulating company and approval number (upper right corner).

2. Table body: the main body of the questionnaire is in the form of a table, and the survey contents are listed in the table.

3. Footprint: including the name and signature of the investigator or informant, and the name and signature of the person in charge of the unit.

The forms of questionnaire include list, simple and special.

1, use the checklist when studying samples.

2. Use a form when investigating sample units.

3. For different surveying markers, special forms are needed.

Investigation time:

The time of survey data is divided into time periods and time points.

investigation method

Investigation period:

The time to submit the investigation report is to obtain information in time.

Survey location:

If the sample units are in a flowing state or distributed in different positions, they should be clearly defined and marked.

Organization and implementation plan of investigation:

Including determining the organization of investigation activities, personnel training arrangements, document preparation, budget, investigation methods, data submission methods, trial and error, etc.

Input of survey data:

Any practice of fabricating or tampering with data is a serious mistake that violates the spirit and thought of statistics, and the wrong information should be resolutely discarded when entering.

Third, the primary data (primary data):

Concept: Data obtained from direct sources. Such as observation, experiment and questionnaire survey.

Fourth, the collection of second-hand information (second-hand information):

Concept: Data obtained through indirect sources.

Such as: various publications, published compiled materials, online materials, etc.

When quoting, you need to indicate the source, one is to respect the labor achievements of others, and the other is to prove its reliability.