Hard Performance Measurement (A+B)

王子轩

Date: 2024-3-13

Chapter 1: Introduction

Given S as a collection of N positive integers that are no more than V. For any given number c, you are supposed to find two integers a and b from S, so that a+b=c.

What needs to be done: Use Srand to randomly generate a non-repeating S array, then find two numbers from it and output them when they add up to equal C. You can consider using bucket sorting.

Why do it: Srand plus an array of record duplicates can meet the requirements of the topic S array, and then there can be two algorithms to solve the problem.And bucket sorting is very efficient.

Chapter 2: Algorithm Specification

Main function

sketch

1
2
3
4
5
6
7
8
9
10
11
12
graph TD;
a(Initialize variables Inputs n, v, c, k, etc. )
b(Inputs the method to be used Generates non-repeating random numbers and stores them in an array )
k(Randomly generate a non-repeating array S containing n numbers.)
c(Records the start time )
d(Performs k loops)
e(Calls the appropriate function according to the method chosen )
f(Prints the current number of loops )
g(Records the end time )
h(Calculates the program running time )
i(Outputs relevant information)
a-->b-->k-->c-->d-->e-->f-->g-->h-->i

Algorithm 1 : a brute-force approach to iterate through all pairs of numbers.

Input: n(the number of digits),c(the summed target value),cnt(Used to record the number of solutions),x(Iterations),s(the numeric set).

Output: cnt(the number of solutions),all the solutions(a and b)

sketch:

1
2
3
4
5
6
graph TD;
a(Iterate through the array s and find all pairs that satisfy the condition )
b(Output the pairs that satisfy the condition )
c(Count the number of pairs that satisfy the condition )
d(Return the number)
a-->b-->c-->d

Pseudo code:

  1. Let cnt $\leftarrow$ 0;
  2. for i = 0 to n-1
  3. ​ for j = i+1 to n-1
  4. ​ When s[i] + s[j] == c, print s[i] + s[j] = c and let cnt $\leftarrow$ cnt + 1;
  5. return cnt;

Algorithm 2 : a bucket sorting to optimize the search process.

Input: n(the number of digits),c(the summed target value),cnt(Used to record the number of solutions),x(Iterations),s(the numeric set).

Output: cnt(the number of solutions),all the solutions(a and b)

sketch:

1
2
3
4
5
6
7
graph TD;
a(Iterate through the array ton and find all pairs that satisfy the condition )
b(Output the pairs that match the conditions )
c(Count the number of pairs that match the conditions )
d(Return the number)
e(Counting using bucket sorting)
e-->a-->b-->c-->d

Pseudo code:

  1. define an array ton and let each element $\leftarrow$ 0;
  2. Let cnt $\leftarrow$ 0;
  3. for i = 0 to n-1
  4. ​ when s[i] < c, let ton[s[i]] $\leftarrow$ 1;
  5. for i = 1 to c/2
  6. ​ if not ((c mod 2 == 0) and (i == c/2)) and ton[i] + ton[c - i] == 2
  7. ​ print i + (c-i) = c and let cnt $\leftarrow$ cnt + 1;
  8. return cnt;

Chapter 3: Testing Results

Test Case Sheet :I tested the program’s runtime at different V-times and plotted the N-run_time graphs.

the purpose :

  1. Study the growth of time required by the algorithm with N.

  2. Compare horizontally the difference in efficiency, complexity, etc. between the two algorithms.

  3. Study how the algorithm reacts when V changes until it reaches the maximum value of the integer V.

the expected result:

  1. Algorithm 1’s running time grows with N in an O(n2) trend, while Algorithm 2 grows more gently.

  2. Algorithm 2 is more efficient and less complex than Algorithm 1.

  3. As V becomes large until it reaches an integer maximum, It takes more time.

the actual behavior of your program:

Note: In the case above, when V is smaller than N, the set S that satisfies the condition cannot be found and does not satisfy the question, so it is counted as /.

image-20240313182100484

Figure_1

The figure shows the running time of the two algorithms versus N when V is 50000, and it is clearly found that Algorithm 1 takes longer, showing a trend of O($n^2$), and Algorithm 2 is more efficient.

Figure_2

The figure shows the running time of the two algorithms versus N when V is 200000.The efficiency gap between the two algorithms widens.

Figure_3

The figure shows the running time of the two algorithms versus N when V is 1000000.The gap between the efficiency of the two algorithms widens further.

Figure_4

The figure shows the running time of the two algorithms versus N when V is 100000000.

Figure_5

The figure shows the running time of the two algorithms versus N when V is 2147483647.

From the last two charts above, I found a strange phenomenon:

For the second algorithm, why does the running time become less while V becomes bigger?

After continuous testing and pondering, I came to the following conclusion:

When V is much larger than n,a wide range of non-repeating random numbers are taken, which leads to a more scattered distribution of random numbers, and thus for a particular objective and c,there are fewer solutions that meet the requirements.

Precisely because no reasonable value can be found, many fewer “printf(…)” statements are executed, resulting in much less time.

So now I understand this phenomenon.

Chapter 4: Analysis and Comments

Figure_1

Figure_2

Figure_3

Figure_4

Figure_5

Time Complexity Derivation:

Algorithm func1:

  1. The number of outer loop iterations is n and the average number of inner loop iterations is n/2.

  2. Therefore, the total number of iterations is n * (n/2) = n^2 / 2.

  3. The time complexity is O(n^2).

Algorithm func2:

  1. The first loop iteration is n, and the time complexity is O(n).

  2. The second loop iteration is c/2, and since c is a constant, the time complexity is O(1).

  3. Thus, the total time complexity is O(n) + O(1) = O(n).

Derivation of space complexity:

Algorithm func1:

  1. Uses only the extra space at the constant level to store the variables, with a space complexity of O(1).

**Algorithm func2: **

  1. An array ton of size 2500000000 is used, with a space complexity of O(2500000000), i.e. O(1).

  2. Other than that, only constant-level extra space is used to store variables, with a space complexity of O(1).

**In summary, **

Algorithm func1 has a time complexity of O(n^2) and a space complexity of O(1); Algorithm func2 has a time complexity of O(n) and a space complexity of O(1).

Why is Algorithm 1 slower than Algorithm 2?

Because Algorithm 1 performs nesting of for loops and therefore grows at a fast rate, whereas Algorithm 2 just grows linearly, so when n is large, Algorithm 2 is very much faster.

Directions for improvement:

For the Algorithm func1 with a time complexity of O(n^2) and a space complexity of O(1), one potential optimization could be to implement a more efficient sorting algorithm if sorting is a bottleneck in the algorithm. Using a faster sorting algorithm like Quicksort or Mergesort could improve the overall performance.

As for Algorithm func2 , one way to optimize it could be to reduce the space complexity by finding ways to store data more efficiently or by utilizing data structures that require less memory. Additionally, optimizing the code logic to reduce redundant operations or improving the overall algorithm design could also lead to performance enhancements.

Appendix: Source Code (in C)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
#include <stdio.h>
#include <stdlib.h> //Importing standard libraries, in order to use random numbers
#include <time.h>
clock_t start, stop;
double duration; // Setting the time variable to record program runtime
int func1(int c, int s[], int cnt, int x, int n); // Packaging Method 1
int func2(int c, int s[], int cnt, int x, int n); // Packaging method 2
void print(long stop, long start, double duration, int k, int cnt); // output function
int main()
{
static int s[1000000]; // Collect n numbers into the set S
int met; // Types of recording methods
static int num[2200000000] = {0}; // Use this to record the number that has been generated before
int n, v, c;
int cnt = 0; // Record the number of solutions
int k; // Execute k times
printf("Please enter:\nn(the number of digits),\nv(the maximum value),\nc(the summed target value),\nk(Iterations).\n");
scanf("%d%d%d%d", &n, &v, &c, &k);
printf("Method1 or method2?(Please enter 1 or 2)\n");
scanf("%d", &met); // Input method type
srand((unsigned int)time(NULL)); // Using the current time as the seed for the random number generator
// Each time the program is run, the time is different, so the sequence of random numbers will also be different
for (int i = 0; i < n; i++) // Generate non-repeating n numbers to be stored in the S array
{
int ran = (rand() % 10 + rand() * 10 + rand() * 100) % v + 1; // Generate datasets by random numbers, using time to ensure randomness
if (num[ran] == 0) // Determine if the number has appeared before
{ // When it's not duplicated, it exists in an array
s[i] = ran;
num[ran] = 1; // Record the numbers that are already there
}
else // Reset the value of i when it is repeated
--i;
}
start = clock();
for (int x = 0; x < k; x++) // Execute k times
{
if (met == 1) // Judgment is method 1
{
cnt = func1(c, s, cnt, x, n);
}
else // Judgment is Method 2
{
cnt = func2(c, s, cnt, x, n);
}
printf("The above is the %dth cycle.\n\n", x + 1);
}
stop = clock();
duration = ((double)(stop - start)) / CLK_TCK; // Calculate the running time of the time
print(stop, start, duration, k, cnt); // Output related information
return 0;
}

int func1(int c, int s[], int cnt, int x, int n)
{ // Iterate through the whole array directly to find all the matching number
cnt = 0; // Initialized to 0
for (int i = 0; i < n; i++)
{
for (int j = i + 1; j < n; j++)
{
if (s[i] + s[j] == c) // Determine if the conditions are met
{

printf("%d + %d = %d\n", s[i], s[j], c); // Output a reasonable combination of numbers
cnt++; // Record the number of solutions
break; // Remove meaningless traversal after
}
}
}
return cnt;
}

int func2(int c, int s[], int cnt, int x, int n)
{
static int ton[2500000000] = {0}; // Open an array for easy bucket sorting
for (int i = 0; i < n; ++i)
{
if (s[i] < c)
ton[s[i]] = 1; // Counting Numbers with Bucket Sorting
}
cnt = 0; // Initialized to 0
for (int i = 1; i <= c / 2; i++)
{
if ((!((c % 2 == 0) && (i == c / 2))) && ton[i] + ton[c - i] == 2) // Determine whether conditions are met and rule out special cases
{
printf("%d + %d = %d\n", i, c - i, c); // Output reasonable results
cnt++; // Record the number of solutions
}
}
return cnt;
}

void print(long stop, long start, double duration, int k, int cnt)
{
printf("Iterations:%d\n", k); // print Iterations
printf("Ticks:%d\n", stop - start); // print Ticks
printf("Total time:%lfs\n", duration); // print Total time
printf("Duration:%lfs\n", duration / k); // print Duration
printf("The number of solutions:%d", cnt); // print The number of solutions
}

Declaration

I hereby declare that all the work done in this project titled “Hard Performance Measurement (A+B)” is of my independent effort.