textnets.network.Textnet

class textnets.network.Textnet(data: TidyText | BiadjacencyMatrix | DataFrame, min_docs: int = 2, connected: bool = False, remove_weak_edges: bool = False, doc_attrs: dict[str, dict[str, Any]] | None = None)[source]

Bases: TextnetBase, FormalContext

Textnet for the relational analysis of meanings.

A textnet is a bipartite network of documents and terms. Links exist only between the two types of nodes. Documents have a tie with terms they contain; the tie is weighted by tf-idf.

The bipartite network can be projected into two different kinds of single-mode networks: document-to-document, and term-to-term.

Experimental: The underlying bipartite adjacency matrix can also be turned into a formal context, which can be used to construct a concept lattice.

Parameters:
  • data (TidyText or BiadjacencyMatrix) –

  • min_docs (int, optional) – Minimum number of documents a term must appear in to be included in the network (default: 2).

  • connected (bool, optional) – Keep only the largest connected component of the network (default: False).

  • remove_weak_edges (bool, optional) – Remove edges with weights far below average (default: False).

  • doc_attrs (dict of dict, optional) – Additional attributes of document nodes.

Raises:

ValueError – If the supplied data is empty.

Methods

ecount

Return the number of edges.

load

Load a textnet from file.

plot

Plot the bipartite graph.

project

Project to one-mode network.

save

Save a textnet to file.

save_graph

Save the underlying graph.

top_bipartite_cc

Show nodes sorted by bipartite clustering coefficient.

top_birank

Show nodes sorted by BiRank.

top_cluster_nodes

Show top nodes per cluster, ranked by a chosen metric.

top_cohits

Show nodes sorted by CoHITS rank.

top_degree

Show nodes sorted by unweighted degree.

top_hits

Show nodes sorted by HITS rank.

top_strength

Show nodes sorted by weighted degree.

vcount

Return the number of vertices (nodes).

Attributes

bipartite_cc

Calculate the unweighted bipartite clustering coefficient.

birank

BiRank of nodes.

cluster_local_cc

Weighted local clustering coefficient within each cluster's subgraph.

cluster_strength

Weighted node degree within each cluster's subgraph.

clusters

Return graph partition.

cohits

CoHITS rank of nodes.

context

Return formal context of terms and documents.

degree

Unweighted node degree.

edges

Iterate of edges.

graph

Direct access to the underlying igraph object.

hits

HITS rank of nodes.

m

Weighted bipartite adjacency matrix of the bipartite graph.

modularity

Return modularity based on graph partition.

node_types

Return list of node types.

nodes

Iterate over nodes.

strength

Weighted node degree.

summary

Summary of underlying graph.

property bipartite_cc: Series

Calculate the unweighted bipartite clustering coefficient.

Returns:

The clustering cofficients indexed by node label.

Return type:

pandas.Series

Notes

Adapted from the networkx implementation.

References

[Latapy et al., 2008]

property birank: Series

BiRank of nodes.

property cluster_local_cc: Series

Weighted local clustering coefficient within each cluster’s subgraph.

property cluster_strength: Series

Weighted node degree within each cluster’s subgraph.

property clusters: VertexClustering

Return graph partition.

The partition is detected by the Leiden algorithm, unless a different partition that was supplied to the setter.

property cohits: Series

CoHITS rank of nodes.

property context

Return formal context of terms and documents.

property degree: Series

Unweighted node degree.

ecount() int

Return the number of edges.

property edges: EdgeSeq

Iterate of edges.

property graph: Graph

Direct access to the underlying igraph object.

property hits: Series

HITS rank of nodes.

classmethod load(source: PathLike[Any] | str) Textnet[source]

Load a textnet from file.

Parameters:

source (str or path) – File to read the corpus from. This should be a file created by Textnet.save.

Raises:

FileNotFoundError – If the provided path does not exist.

Return type:

Textnet

property m: BiadjacencyMatrix

Weighted bipartite adjacency matrix of the bipartite graph.

property modularity: float

Return modularity based on graph partition.

property node_types: list[NodeType]

Return list of node types.

property nodes: VertexSeq

Iterate over nodes.

plot(*, color_clusters: bool | VertexClustering = False, show_clusters: bool | VertexClustering = False, bipartite_layout: bool = False, sugiyama_layout: bool = False, circular_layout: bool = False, kamada_kawai_layout: bool = False, drl_layout: bool = False, node_opacity: float | None = None, edge_opacity: float | None = None, label_term_nodes: bool = False, label_doc_nodes: bool = False, label_nodes: bool = False, label_edges: bool = False, node_label_filter: Callable[[Vertex], bool] | None = None, edge_label_filter: Callable[[Edge], bool] | None = None, scale_nodes_by: str | None = None, **kwargs) CairoPlot[source]

Plot the bipartite graph.

Parameters:
  • color_clusters (bool or VertexClustering, optional) – Color nodes according to clusters detected by the Leiden algorithm (default: False). Alternately a clustering object generated by another community detection algorithm can be passed.

  • show_clusters (bool or VertexClustering, optional) – Mark clusters detected by the Leiden algorithm (default: False). Alternately a clustering object generated by another community detection algorithm can be passed.

  • bipartite_layout (bool, optional) – Use a bipartite graph layout (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).

  • sugiyama_layout (bool, optional) – Use layered Sugiyama layout (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).

  • circular_layout (bool, optional) – Use circular Reingold-Tilford layout (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).

  • kamada_kawai_layout (bool, optional) – Use a layout created by the Kamada-Kawai algorithm (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).

  • drl_layout (bool, optional) – Use the DrL layout, suitable for large networks (default: False; a weighted Fruchterman-Reingold layout is used unless another layout is specified).

  • node_opacity (float, optional) – Opacity (between 0 and 1) to apply to nodes (default: no transparency).

  • edge_opacity (float, optional) – Opacity (between 0 and 1) to apply to edges (default: no transparency).

  • label_term_nodes (bool, optional) – Label term nodes (default: False).

  • label_doc_nodes (bool, optional) – Label document nodes (default: False).

  • label_nodes (bool, optional) – Label term and document nodes (default: False).

  • label_edges (bool, optional) – Show edge weights in plot.

  • node_label_filter (function, optional) – Function returning boolean value mapped to iterator of nodes to decide whether or not to suppress labels.

  • edge_label_filter (function, optional) – Function returning boolean value mapped to iterator of edges to decide whether or not to suppress labels.

  • scale_nodes_by (str, optional) – Name of centrality measure or node attribute to scale nodes by. Possible values: degree, strength, hits, cohits, birank or any node attribute (default: None).

  • target (str or file, optional) – File or path that the plot should be saved to (e.g., plot.png).

  • kwargs – Additional arguments to pass to igraph.drawing.plot.

Returns:

The plot can be directly displayed in a Jupyter notebook or saved as an image file.

Return type:

igraph.drawing.Plot

project(*, node_type: Literal['doc', 'term'] | NodeType, connected: bool | None = False) ProjectedTextnet[source]

Project to one-mode network.

Parameters:
  • node_type ({NodeType.DOC, NodeType.TERM, "doc", "term"}) – Either DOC or TERM, depending on desired node type.

  • connected (bool, optional) – Keep only the largest connected component of the projected network (default: False).

Raises:

ValueError – If no valid node type is specified.

Returns:

A one-mode textnet.

Return type:

ProjectedTextnet

save(target: PathLike[Any] | str) None[source]

Save a textnet to file.

Parameters:

target (str or path) – File to save the corpus to. If the file exists, it will be overwritten.

save_graph(target: str | bytes | PathLike[Any] | IO, format: str | None = None) None

Save the underlying graph.

Parameters:
  • target (str or path or file) – File or path that the graph should be written to.

  • format ({"dot", "edgelist", "gml", "graphml", "pajek", ...}, optional) – Optionally specify the desired format (otherwise it is inferred from the file suffix).

property strength: Series

Weighted node degree.

property summary: str

Summary of underlying graph.

top_bipartite_cc(n=10)

Show nodes sorted by bipartite clustering coefficient.

Parameters:

n (int, optional) – How many nodes to show (default: 10).

Returns:

Ranked nodes.

Return type:

pandas.Series

top_birank(n=10)

Show nodes sorted by BiRank.

Parameters:

n (int, optional) – How many nodes to show (default: 10).

Returns:

Ranked nodes.

Return type:

pandas.Series

top_cluster_nodes(n: int = 10, rank_nodes_by: str = 'cluster_strength') DataFrame

Show top nodes per cluster, ranked by a chosen metric.

Parameters:
  • n (int, optional) – How many nodes to show per cluster (default: 10)

  • rank_nodes_by (str, optional) – Metric to rank nodes within each cluster by (default: cluster_strength).

Returns:

Clusters with representative nodes.

Return type:

pandas.DataFrame

top_cohits(n=10)

Show nodes sorted by CoHITS rank.

Parameters:

n (int, optional) – How many nodes to show (default: 10).

Returns:

Ranked nodes.

Return type:

pandas.Series

top_degree(n=10)

Show nodes sorted by unweighted degree.

Parameters:

n (int, optional) – How many nodes to show (default: 10).

Returns:

Ranked nodes.

Return type:

pandas.Series

top_hits(n=10)

Show nodes sorted by HITS rank.

Parameters:

n (int, optional) – How many nodes to show (default: 10).

Returns:

Ranked nodes.

Return type:

pandas.Series

top_strength(n=10)

Show nodes sorted by weighted degree.

Parameters:

n (int, optional) – How many nodes to show (default: 10).

Returns:

Ranked nodes.

Return type:

pandas.Series

vcount() int

Return the number of vertices (nodes).