nestor.tagplots module¶
author: Thurston Sexton
-
class
TagPlot
(data_file, cat_specifier='name', topn=10)[source]¶ Bases:
object
Central holder for holoviews dynamic-maps, to be served as a Bokeh App.
apply filter to binary tag matrix (tag_df) :Parameters: * obj_type (passed to filter_type_name)
obj_name (passed to filter_type_name)
n_thres (only return nodes in the top
n_thres
percentile)
- Returns
pd.DataFrame, filtered binary tax matrix
-
filter_type_name
(self, obj_type, obj_name)[source]¶ build a mask to filter data on :Parameters: * obj_type (class of object to filter on)
obj_name (sub-class/instance to filter on)
- Returns
pd.Series, mask for filtering df, tag_df.
-
hv_bars
(self, obj_type)[source]¶ Generates a hv.DynamicMap with a bars/frequency representation of filtered tags. :Parameters: obj_type (class of object to show)
- Returns
hv.DynamicMap
-
tag_relation_net
(tag_df, name=None, kind='coocc', layout=<function fruchterman_reingold_layout at 0x7f8d9a16a1e0>, layout_kws=None, padding=None, **node_adj_kws)[source]¶ Explore tag relationships by create a Holoviews Graph Element. Nodes are tags (colored by classification), and edges occur only when those tags happen together.
- Parameters
tag_df (pandas.DataFrame) – standard Nestor tag occurrence matrix. Multi-column with top-level containing tag classifications (named-entity NE) and 2nd level containing tags. Each row corresponds to a single event (MWO), with binary indicators (1-occurs, 0-does not).
name (str) – what to name this tag relation element. Creates a Holoviews group “name”.
kind (str) –
- coocc :
co-occurrence graph, where tags are connected if they occur in the same MWO, above the value calculated for pct_thres. Connects all types together.
- sankey :
Directed “flow” graph, currently implemented with a (P) -> (I) -> (S) structure. Will require
dag=True
. Alters default tosimilarity=count
layout (object (function), optional) – must take a graph object as input and output 2D coordinates for node locations (e.g. all networkx.layout functions). Defaults to
networkx.spring_layout
layout_kws (dict, optional) – options to pass to networkx layout functions
padding (dict, optional) –
- contains “x” and “y” specifications for boundaries. Defaults:
{'x':(-0.05, 1.05), 'y':(-0.05, 1.05)}
Only valid if
kind
is ‘coocc’.node_adj_kws –
- keyword arguments for
nestor.tagtrees.tag_df_network
. Valid options are similarity : ‘cosine’ (default) or ‘count’ dag : bool, default=’False’, (True if
kind='sankey'
) pct_thres : : int or NoneIf int, between [0,100]. The lower percentile at which to threshold edges/adjacency.
- keyword arguments for
- Returns
graph (holoviews.Holomap or holoviews.Graph element, pending sankey or cooccurrence input.)
-
tagcalendarplot
(tag_df, how='sum', yearlabels=True, yearascending=True, yearlabel_kws=None, subplot_kws=None, gridspec_kws=None, fig_kws=None, **kwargs)[source]¶ Plot a timeseries of (binary) tag occurrences as a calendar heatmap over weeks in the year. any columns passed will be explicitly plotted as rows, with each week in the year as a column. By default, occurences are summed, not averaged, but this aggregation over weeks may be any valid option for the pandas.Dataframe.agg() method.
This function will separate out multiple years within the data as multiple calendars. The plotting has been heavily modified/altered/normalized, but the original version appeared here:
- adapted from:
‘Martijn Vermaat’ 14 Feb 2016 ‘martijn@vermaat.name’ ‘https://github.com/martijnvermaat/calmap’
- Parameters
tag_df (pandas.DataFrame) – standard Nestor tag occurrence matrix. Multi-column with top-level containing tag classifications (named-entity NE) and 2nd level containing tags. Each row corresponds to a single event (MWO), with binary indicators (1-occurs, 0-does not).
how (string) – Method for resampling data by day. If None, assume data is already sampled by day and don’t resample. Otherwise, this is passed to Pandas Series.resample.
yearlabels (bool) – Whether or not to draw the year for each subplot.
yearascending (bool) – Sort the calendar in ascending or descending order.
yearlabel_kws (dict) – Keyword arguments passed to the matplotlib set_ylabel call which is used to draw the year for each subplot.
subplot_kws (dict) – Keyword arguments passed to the matplotlib add_subplot call used to create each subplot.
gridspec_kws (dict) – Keyword arguments passed to the matplotlib GridSpec constructor used to create the grid the subplots are placed on.
fig_kws (dict) – Keyword arguments passed to the matplotlib figure call.
kwargs (other keyword arguments) – All other keyword arguments are passed to yearplot.
- Returns
fig, axes (matplotlib Figure and Axes) – Tuple where fig is the matplotlib Figure object axes is an array of matplotlib Axes objects with the calendar heatmaps, one per year.
-
tagyearplot
(tag_df, year=None, how='sum', vmin=None, vmax=None, cmap='Reds', linewidth=1, linecolor=None, monthlabels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], monthticks=True, ax=None, **kwargs)[source]¶ Plot a timeseries of (binary) tag occurrences as a calendar heatmap over weeks in the year. any columns passed will be explicitly plotted as rows, with each week in the year as a column. By default, occurences are summed, not averaged, but this aggregation over weeks may be any valid option for the pandas.Dataframe.agg() method.
adapted from: ‘Martijn Vermaat’ 14 Feb 2016 ‘martijn@vermaat.name’ ‘https://github.com/martijnvermaat/calmap’
- Parameters
tag_df (pandas.DataFrame) – standard Nestor tag occurrence matrix. Multi-column with top-level containing tag classifications (named-entity NE) and 2nd level containing tags. Each row corresponds to a single event (MWO), with binary indicators (1-occurs, 0-does not).
year (integer) – Only data indexed by this year will be plotted. If None, the first year for which there is data will be plotted.
how (string) – Method for resampling data by day. If None, assume data is already sampled by day and don’t resample. Otherwise, this is passed to Pandas Series.resample.
vmin, vmax (floats) – Values to anchor the colormap. If None, min and max are used after resampling data by day.
cmap (matplotlib colormap name or object) – The mapping from data values to color space.
linewidth (float) – Width of the lines that will divide each day.
linecolor (color) – Color of the lines that will divide each day. If None, the axes background color is used, or ‘white’ if it is transparent.
monthlabels (list) – Strings to use as labels for months, must be of length 12.
monthticks (list or int or bool) – If True, label all months. If False, don’t label months. If a list, only label months with these indices. If an integer, label every n month.
ax (matplotlib Axes) – Axes in which to draw the plot, otherwise use the currently-active Axes.
kwargs (other keyword arguments) – All other keyword arguments are passed to matplotlib ax.pcolormesh.
- Returns
ax (matplotlib Axes) – Axes object with the calendar heatmap.