nestor.tagtrees module¶
__author__ = “Thurston Sexton”
-
get_relevant
(df, col, topn=20)[source]¶ DEPRECATED!
- Parameters
df (a dataframe containing columns of tag assignments (comma-sep, str))
col (which column to extract)
topn (how many of the top most frequent tags to return)
- Returns
list of (tag,count,numpy.array) tuples
-
heymann_taxonomy
(dist_mat, cent_prog='pr', tau=0.0005, dynamic=False, dotfile=None, verbose=False)[source]¶ - Parameters
dist_mat (pandas.DataFrame) – contains similarity matrix, indexed and named by tags
cent_prog (str) – algorithm to use in calculating node centrality
pr: PageRank eig: eigencentrality btw: betweenness cls: closeness
tau (float) – similarity threshold for retaining a node
dynamic (bool) – whether to re-calculate centrality after popping every tag
write_dot (str or None) – file location, where to save a .dot, if any.
verbose (bool) – print some stuff
-
node_adj_mat
(tag_df, similarity='cosine', dag=False, pct_thres=None)[source]¶ Calculate the similarity of tags, in the form of a similarity kernel. Used as input to graph/network methods.
- Parameters
tag_df (pandas.DataFrame) – standard Nestor tag occurrence matrix. Multi-column with top-level containing tag classifications (named-entity NE) and 2nd level containing tags. Each row corresponds to a single event (MWO), with binary indicators (1-occurs, 0-does not).
similarity (str) – cosine: cosine similarity (from
sklearn.metrix.pairwise
) count: count (the number of co-occurrences of each tag-tag pair)dag (bool) – default adj_mat will be accross all nodes. This option will return a directed, acyclic graph (DAG), useful for things like Sankey Diagrams. Current implementation returns (P) -> (I) -> (S) structure (deletes others).
pct_thres (int or None) – If int, between [0,100]. The lower percentile at which to threshold edges/adjacency.
- Returns
pandas.DataFrame, containing adjacency measures for each tag-tag (row-column) occurrence
-
tag_df_network
(tag_df, **node_adj_kws)[source]¶ Starting from a multi-column binary tag-occurrence pandas.Dataframe (such as output by the Nestor UI and the nestor.keyword.tag_extractor() method, create a networkx graph, along with a node_info and edge_info dataframe for plotting convenience (e.g. in nestor.tagplots)
- Parameters
tag_df (pandas.DataFrame) – standard Nestor tag occurrence matrix. Multi-column with top-level containing tag classifications (named-entity NE) and 2nd level containing tags. Each row corresponds to a single event (MWO), with binary indicators (1-occurs, 0-does not).
node_adj_kws