textnets.network.Textnet¶
- class textnets.network.Textnet(data: TidyText | BiadjacencyMatrix | DataFrame, min_docs: int = 2, connected: bool = False, remove_weak_edges: bool = False, doc_attrs: dict[str, dict[str, Any]] | None = None)[source]¶
Bases:
TextnetBase
,FormalContext
Textnet for the relational analysis of meanings.
A textnet is a bipartite network of documents and terms. Links exist only between the two types of nodes. Documents have a tie with terms they contain; the tie is weighted by tf-idf.
The bipartite network can be projected into two different kinds of single-mode networks: document-to-document, and term-to-term.
Experimental: The underlying bipartite adjacency matrix can also be turned into a formal context, which can be used to construct a concept lattice.
- Parameters:
data (TidyText or BiadjacencyMatrix) –
DataFrame of tokens with per-document counts, as created by
Corpus.tokenized
Corpus.ngrams
, andCorpus.noun_phrases
.A bipartite adjacency matrix relating documents to terms.
min_docs (int, optional) – Minimum number of documents a term must appear in to be included in the network (default: 2).
connected (bool, optional) – Keep only the largest connected component of the network (default: False).
remove_weak_edges (bool, optional) – Remove edges with weights far below average (default: False).
doc_attrs (dict of dict, optional) – Additional attributes of document nodes.
- Raises:
ValueError – If the supplied data is empty.
Methods
Return the number of edges.
Load a textnet from file.
Plot the bipartite graph.
Project to one-mode network.
Save a textnet to file.
Save the underlying graph.
Show nodes sorted by bipartite clustering coefficient.
Show nodes sorted by BiRank.
Show top nodes per cluster, ranked by a chosen metric.
Show nodes sorted by CoHITS rank.
Show nodes sorted by unweighted degree.
Show nodes sorted by HITS rank.
Show nodes sorted by weighted degree.
Return the number of vertices (nodes).
Attributes
Calculate the unweighted bipartite clustering coefficient.
BiRank of nodes.
Weighted local clustering coefficient within each cluster's subgraph.
Weighted node degree within each cluster's subgraph.
Return graph partition.
CoHITS rank of nodes.
Return formal context of terms and documents.
Unweighted node degree.
Iterate of edges.
Direct access to the underlying igraph object.
HITS rank of nodes.
Weighted bipartite adjacency matrix of the bipartite graph.
Return modularity based on graph partition.
Return list of node types.
Iterate over nodes.
Weighted node degree.
Summary of underlying graph.
- property bipartite_cc: Series¶
Calculate the unweighted bipartite clustering coefficient.
- Returns:
The clustering cofficients indexed by node label.
- Return type:
Notes
Adapted from the
networkx
implementation.References
- property cluster_local_cc: Series¶
Weighted local clustering coefficient within each cluster’s subgraph.
- property clusters: VertexClustering¶
Return graph partition.
The partition is detected by the Leiden algorithm, unless a different partition that was supplied to the setter.
- property context¶
Return formal context of terms and documents.
- property edges: EdgeSeq¶
Iterate of edges.
- classmethod load(source: PathLike[Any] | str) Textnet [source]¶
Load a textnet from file.
- Parameters:
source (str or path) – File to read the corpus from. This should be a file created by
Textnet.save
.- Raises:
FileNotFoundError – If the provided path does not exist.
- Return type:
- property m: BiadjacencyMatrix¶
Weighted bipartite adjacency matrix of the bipartite graph.
- property nodes: VertexSeq¶
Iterate over nodes.
- plot(*, color_clusters: bool | VertexClustering = False, show_clusters: bool | VertexClustering = False, bipartite_layout: bool = False, sugiyama_layout: bool = False, circular_layout: bool = False, kamada_kawai_layout: bool = False, drl_layout: bool = False, node_opacity: float | None = None, edge_opacity: float | None = None, label_term_nodes: bool = False, label_doc_nodes: bool = False, label_nodes: bool = False, label_edges: bool = False, node_label_filter: Callable[[Vertex], bool] | None = None, edge_label_filter: Callable[[Edge], bool] | None = None, scale_nodes_by: str | None = None, **kwargs) CairoPlot [source]¶
Plot the bipartite graph.
- Parameters:
color_clusters (bool or VertexClustering, optional) – Color nodes according to clusters detected by the Leiden algorithm (default: False). Alternately a clustering object generated by another community detection algorithm can be passed.
show_clusters (bool or VertexClustering, optional) – Mark clusters detected by the Leiden algorithm (default: False). Alternately a clustering object generated by another community detection algorithm can be passed.
bipartite_layout (bool, optional) – Use a bipartite graph layout (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
sugiyama_layout (bool, optional) – Use layered Sugiyama layout (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
circular_layout (bool, optional) – Use circular Reingold-Tilford layout (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
kamada_kawai_layout (bool, optional) – Use a layout created by the Kamada-Kawai algorithm (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
drl_layout (bool, optional) – Use the DrL layout, suitable for large networks (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
node_opacity (float, optional) – Opacity (between 0 and 1) to apply to nodes (default: no transparency).
edge_opacity (float, optional) – Opacity (between 0 and 1) to apply to edges (default: no transparency).
label_term_nodes (bool, optional) – Label term nodes (default: False).
label_doc_nodes (bool, optional) – Label document nodes (default: False).
label_nodes (bool, optional) – Label term and document nodes (default: False).
label_edges (bool, optional) – Show edge weights in plot.
node_label_filter (function, optional) – Function returning boolean value mapped to iterator of nodes to decide whether or not to suppress labels.
edge_label_filter (function, optional) – Function returning boolean value mapped to iterator of edges to decide whether or not to suppress labels.
scale_nodes_by (str, optional) – Name of centrality measure or node attribute to scale nodes by. Possible values:
degree
,strength
,hits
,cohits
,birank
or any node attribute (default: None).target (str or file, optional) – File or path that the plot should be saved to (e.g.,
plot.png
).kwargs – Additional arguments to pass to
igraph.drawing.plot
.
- Returns:
The plot can be directly displayed in a Jupyter notebook or saved as an image file.
- Return type:
- project(*, node_type: Literal['doc', 'term'] | NodeType, connected: bool | None = False) ProjectedTextnet [source]¶
Project to one-mode network.
- Parameters:
node_type ({NodeType.DOC, NodeType.TERM, "doc", "term"}) – Either
DOC
orTERM
, depending on desired node type.connected (bool, optional) – Keep only the largest connected component of the projected network (default: False).
- Raises:
ValueError – If no valid node type is specified.
- Returns:
A one-mode textnet.
- Return type:
- save(target: PathLike[Any] | str) None [source]¶
Save a textnet to file.
- Parameters:
target (str or path) – File to save the corpus to. If the file exists, it will be overwritten.
- save_graph(target: str | bytes | PathLike[Any] | IO, format: str | None = None) None ¶
Save the underlying graph.
- Parameters:
target (str or path or file) – File or path that the graph should be written to.
format ({"dot", "edgelist", "gml", "graphml", "pajek", ...}, optional) – Optionally specify the desired format (otherwise it is inferred from the file suffix).
- top_bipartite_cc(n=10)¶
Show nodes sorted by bipartite clustering coefficient.
- Parameters:
n (int, optional) – How many nodes to show (default: 10).
- Returns:
Ranked nodes.
- Return type:
- top_birank(n=10)¶
Show nodes sorted by BiRank.
- Parameters:
n (int, optional) – How many nodes to show (default: 10).
- Returns:
Ranked nodes.
- Return type:
- top_cluster_nodes(n: int = 10, rank_nodes_by: str = 'cluster_strength') DataFrame ¶
Show top nodes per cluster, ranked by a chosen metric.
- Parameters:
- Returns:
Clusters with representative nodes.
- Return type:
- top_cohits(n=10)¶
Show nodes sorted by CoHITS rank.
- Parameters:
n (int, optional) – How many nodes to show (default: 10).
- Returns:
Ranked nodes.
- Return type:
- top_degree(n=10)¶
Show nodes sorted by unweighted degree.
- Parameters:
n (int, optional) – How many nodes to show (default: 10).
- Returns:
Ranked nodes.
- Return type:
- top_hits(n=10)¶
Show nodes sorted by HITS rank.
- Parameters:
n (int, optional) – How many nodes to show (default: 10).
- Returns:
Ranked nodes.
- Return type: