textnets.network.Textnet¶

class textnets.network.Textnet(data: TidyText | BiadjacencyMatrix | DataFrame, min_docs: int = 2, connected: bool = False, remove_weak_edges: bool = False, doc_attrs: dict[str, dict[str, Any]] | None = None)[source]¶

Bases: TextnetBase, FormalContext

Textnet for the relational analysis of meanings.

A textnet is a bipartite network of documents and terms. Links exist only between the two types of nodes. Documents have a tie with terms they contain; the tie is weighted by tf-idf.

The bipartite network can be projected into two different kinds of single-mode networks: document-to-document, and term-to-term.

Experimental: The underlying bipartite adjacency matrix can also be turned into a formal context, which can be used to construct a concept lattice.

Parameters:

data (TidyText or BiadjacencyMatrix) –
- DataFrame of tokens with per-document counts, as created by Corpus.tokenized Corpus.ngrams, and Corpus.noun_phrases.
- A bipartite adjacency matrix relating documents to terms.
min_docs (int, optional) – Minimum number of documents a term must appear in to be included in the network (default: 2).
connected (bool, optional) – Keep only the largest connected component of the network (default: False).
remove_weak_edges (bool, optional) – Remove edges with weights far below average (default: False).
doc_attrs (dict of dict, optional) – Additional attributes of document nodes.

Raises:

ValueError – If the supplied data is empty.

Methods

`ecount`	Return the number of edges.
`load`	Load a textnet from file.
`plot`	Plot the bipartite graph.
`project`	Project to one-mode network.
`save`	Save a textnet to file.
`save_graph`	Save the underlying graph.
`top_bipartite_cc`	Show nodes sorted by bipartite clustering coefficient.
`top_birank`	Show nodes sorted by BiRank.
`top_cluster_nodes`	Show top nodes per cluster, ranked by a chosen metric.
`top_cohits`	Show nodes sorted by CoHITS rank.
`top_degree`	Show nodes sorted by unweighted degree.
`top_hits`	Show nodes sorted by HITS rank.
`top_strength`	Show nodes sorted by weighted degree.
`vcount`	Return the number of vertices (nodes).

Attributes

`bipartite_cc`	Calculate the unweighted bipartite clustering coefficient.
`birank`	BiRank of nodes.
`cluster_local_cc`	Weighted local clustering coefficient within each cluster's subgraph.
`cluster_strength`	Weighted node degree within each cluster's subgraph.
`clusters`	Return graph partition.
`cohits`	CoHITS rank of nodes.
`context`	Return formal context of terms and documents.
`degree`	Unweighted node degree.
`edges`	Iterate of edges.
`graph`	Direct access to the underlying igraph object.
`hits`	HITS rank of nodes.
`m`	Weighted bipartite adjacency matrix of the bipartite graph.
`modularity`	Return modularity based on graph partition.
`node_types`	Return list of node types.
`nodes`	Iterate over nodes.
`strength`	Weighted node degree.
`summary`	Summary of underlying graph.

property bipartite_cc: Series¶

Calculate the unweighted bipartite clustering coefficient.

Returns:: The clustering cofficients indexed by node label.
Return type:: pandas.Series

Notes

Adapted from the networkx implementation.

References

[Latapy et al., 2008]

property birank: Series¶: BiRank of nodes.

property cluster_local_cc: Series¶: Weighted local clustering coefficient within each cluster’s subgraph.

property cluster_strength: Series¶: Weighted node degree within each cluster’s subgraph.

property clusters: VertexClustering¶

Return graph partition.

The partition is detected by the Leiden algorithm, unless a different partition that was supplied to the setter.

property cohits: Series¶: CoHITS rank of nodes.

property context¶: Return formal context of terms and documents.

property degree: Series¶: Unweighted node degree.

ecount() → int¶: Return the number of edges.

property edges: EdgeSeq¶: Iterate of edges.

property graph: Graph¶: Direct access to the underlying igraph object.

property hits: Series¶: HITS rank of nodes.

classmethod load(source: PathLike[Any] | str) → Textnet[source]¶

Load a textnet from file.

Parameters:: source (str or path) – File to read the corpus from. This should be a file created by Textnet.save.
Raises:: FileNotFoundError – If the provided path does not exist.
Return type:: Textnet

property m: BiadjacencyMatrix¶: Weighted bipartite adjacency matrix of the bipartite graph.

property modularity: float¶: Return modularity based on graph partition.

property node_types: list[NodeType]¶: Return list of node types.

property nodes: VertexSeq¶: Iterate over nodes.

plot(*, color_clusters: bool | VertexClustering = False, show_clusters: bool | VertexClustering = False, bipartite_layout: bool = False, sugiyama_layout: bool = False, circular_layout: bool = False, kamada_kawai_layout: bool = False, drl_layout: bool = False, node_opacity: float | None = None, edge_opacity: float | None = None, label_term_nodes: bool = False, label_doc_nodes: bool = False, label_nodes: bool = False, label_edges: bool = False, node_label_filter: Callable[[Vertex], bool] | None = None, edge_label_filter: Callable[[Edge], bool] | None = None, scale_nodes_by: str | None = None, **kwargs) → CairoPlot[source]¶

Plot the bipartite graph.

Parameters:

color_clusters (bool or VertexClustering, optional) – Color nodes according to clusters detected by the Leiden algorithm (default: False). Alternately a clustering object generated by another community detection algorithm can be passed.
show_clusters (bool or VertexClustering, optional) – Mark clusters detected by the Leiden algorithm (default: False). Alternately a clustering object generated by another community detection algorithm can be passed.
bipartite_layout (bool, optional) – Use a bipartite graph layout (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
sugiyama_layout (bool, optional) – Use layered Sugiyama layout (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
circular_layout (bool, optional) – Use circular Reingold-Tilford layout (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
kamada_kawai_layout (bool, optional) – Use a layout created by the Kamada-Kawai algorithm (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
drl_layout (bool, optional) – Use the DrL layout, suitable for large networks (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).
node_opacity (float, optional) – Opacity (between 0 and 1) to apply to nodes (default: no transparency).
edge_opacity (float, optional) – Opacity (between 0 and 1) to apply to edges (default: no transparency).
label_term_nodes (bool, optional) – Label term nodes (default: False).
label_doc_nodes (bool, optional) – Label document nodes (default: False).
label_nodes (bool, optional) – Label term and document nodes (default: False).
label_edges (bool, optional) – Show edge weights in plot.
node_label_filter (function, optional) – Function returning boolean value mapped to iterator of nodes to decide whether or not to suppress labels.
edge_label_filter (function, optional) – Function returning boolean value mapped to iterator of edges to decide whether or not to suppress labels.
scale_nodes_by (str, optional) – Name of centrality measure or node attribute to scale nodes by. Possible values: degree, strength, hits, cohits, birank or any node attribute (default: None).
target (str or file, optional) – File or path that the plot should be saved to (e.g., plot.png).
kwargs – Additional arguments to pass to igraph.drawing.plot.

Returns:

The plot can be directly displayed in a Jupyter notebook or saved as an image file.

Return type:

igraph.drawing.Plot

project(*, node_type: Literal['doc', 'term'] | NodeType, connected: bool | None = False) → ProjectedTextnet[source]¶

Project to one-mode network.

Parameters:

node_type ({NodeType.DOC, NodeType.TERM, "doc", "term"}) – Either DOC or TERM, depending on desired node type.
connected (bool, optional) – Keep only the largest connected component of the projected network (default: False).

Raises:

ValueError – If no valid node type is specified.

Returns:

A one-mode textnet.

Return type:

ProjectedTextnet

save(target: PathLike[Any] | str) → None[source]¶

Save a textnet to file.

Parameters:: target (str or path) – File to save the corpus to. If the file exists, it will be overwritten.

save_graph(target: str | bytes | PathLike[Any] | IO, format: str | None = None) → None¶

Save the underlying graph.

Parameters:

target (str or path or file) – File or path that the graph should be written to.
format ({"dot", "edgelist", "gml", "graphml", "pajek", ...}, optional) – Optionally specify the desired format (otherwise it is inferred from the file suffix).

property strength: Series¶: Weighted node degree.

property summary: str¶: Summary of underlying graph.

top_bipartite_cc(n=10)¶

Show nodes sorted by bipartite clustering coefficient.

Parameters:: n (int, optional) – How many nodes to show (default: 10).
Returns:: Ranked nodes.
Return type:: pandas.Series

top_birank(n=10)¶

Show nodes sorted by BiRank.

Parameters:: n (int, optional) – How many nodes to show (default: 10).
Returns:: Ranked nodes.
Return type:: pandas.Series

top_cluster_nodes(n: int = 10, rank_nodes_by: str = 'cluster_strength') → DataFrame¶

Show top nodes per cluster, ranked by a chosen metric.

Parameters:

n (int, optional) – How many nodes to show per cluster (default: 10)
rank_nodes_by (str, optional) – Metric to rank nodes within each cluster by (default: cluster_strength).

Returns:

Clusters with representative nodes.

Return type:

pandas.DataFrame

top_cohits(n=10)¶

Show nodes sorted by CoHITS rank.

Parameters:: n (int, optional) – How many nodes to show (default: 10).
Returns:: Ranked nodes.
Return type:: pandas.Series

top_degree(n=10)¶

Show nodes sorted by unweighted degree.

Parameters:: n (int, optional) – How many nodes to show (default: 10).
Returns:: Ranked nodes.
Return type:: pandas.Series

top_hits(n=10)¶

Show nodes sorted by HITS rank.

Parameters:: n (int, optional) – How many nodes to show (default: 10).
Returns:: Ranked nodes.
Return type:: pandas.Series

top_strength(n=10)¶

Show nodes sorted by weighted degree.

Parameters:: n (int, optional) – How many nodes to show (default: 10).
Returns:: Ranked nodes.
Return type:: pandas.Series

vcount() → int¶: Return the number of vertices (nodes).