I. Sugesting Similar Papers
A citation network is a directed network where the vertices are academic papers and there is a directed edge from paper A to paper B if paper A cites paper B in its bibliography. Google Scholar performs automated citation indexing and has a useful feature that allows users to find similar papers. In the following, we analyze two approaches for measuring similarity between papers.
Part (a): Co-citation network
Two papers are said to be cocited if they are both cited by the same third paper. The edge weights in the cocitation network correspond to the number of cocitations. In this part, we will discover how to compute the (weighted) adjacency matrix of the cocitation network from the adjacency matrix of the citation network.
- Problem setup: In order to derive the cocitation matrix, we need to derive it as a function of the original adjacency matrix.
- Problem notation: If there is an edge from paper i to paper j, it means that paper i cites paper j. We will denote by A the corresponding adjacency matrix, such that $A_{ij} = 1$ means there is a directed edge from i to j. Let us denote by C the cocitation network matrix.
Part a) Question 1
Let's analyze the given algorithm step by step to determine if it generates the cocitation weighted adjacency matrix and to find its time complexity.
Algorithm Analysis
- Initial Setup:
- Construct an empty matrix C.
- Iterating through rows of A:
- For each row r of A, check if the row sum is strictly greater than 1.
- If true, consider each pair (a, b) in row r that are non-zero.
- Add 1 to C at location (a, b).
1. Does this generate the cocitation weighted adjacency matrix?
Cocitation Matrix Definition:
- A cocitation matrix C captures the number of times pairs of papers are cited together.
- C(i, j) indicates the number of papers that cite both paper i and paper j.
Analysis:
- The algorithm iterates over each row r of A, where row r corresponds to a paper citing others.
- For each row, if the paper cites more than one other paper (row sum > 1), it identifies all pairs of papers (a, b) that are cited together.
- For each pair (a, b), it increments the corresponding entry in the cocitation matrix C.