nestor.tagplots module

author: Thurston Sexton

class TagPlot(data_file, cat_specifier='name', topn=10)[source]

Bases: object

Central holder for holoviews dynamic-maps, to be served as a Bokeh App.

filter_tags(self, obj_type, obj_name, n_thres=20)[source]

apply filter to binary tag matrix (tag_df) :Parameters: * obj_type (passed to filter_type_name)

  • obj_name (passed to filter_type_name)

  • n_thres (only return nodes in the top n_thres percentile)

Returns

pd.DataFrame, filtered binary tax matrix

filter_type_name(self, obj_type, obj_name)[source]

build a mask to filter data on :Parameters: * obj_type (class of object to filter on)

  • obj_name (sub-class/instance to filter on)

Returns

pd.Series, mask for filtering df, tag_df.

hv_bars(self, obj_type)[source]

Generates a hv.DynamicMap with a bars/frequency representation of filtered tags. :Parameters: obj_type (class of object to show)

Returns

hv.DynamicMap

hv_flow(self, obj_type)[source]

Generates a hv.DynamicMap with a Sankey/flow representation of filtered tags. :Parameters: obj_type (class of object to show)

Returns

hv.DynamicMap

Generates a hv.DynamicMap with a nodelink representation of filtered tags. :Parameters: obj_type (class of object to show)

Returns

hv.DynamicMap

tag_relation_net(tag_df, name=None, kind='coocc', layout=<function fruchterman_reingold_layout at 0x7f8d9a16a1e0>, layout_kws=None, padding=None, **node_adj_kws)[source]

Explore tag relationships by create a Holoviews Graph Element. Nodes are tags (colored by classification), and edges occur only when those tags happen together.

Parameters
  • tag_df (pandas.DataFrame) – standard Nestor tag occurrence matrix. Multi-column with top-level containing tag classifications (named-entity NE) and 2nd level containing tags. Each row corresponds to a single event (MWO), with binary indicators (1-occurs, 0-does not).

  • name (str) – what to name this tag relation element. Creates a Holoviews group “name”.

  • kind (str) –

    coocc :

    co-occurrence graph, where tags are connected if they occur in the same MWO, above the value calculated for pct_thres. Connects all types together.

    sankey :

    Directed “flow” graph, currently implemented with a (P) -> (I) -> (S) structure. Will require dag=True. Alters default to similarity=count

  • layout (object (function), optional) – must take a graph object as input and output 2D coordinates for node locations (e.g. all networkx.layout functions). Defaults to networkx.spring_layout

  • layout_kws (dict, optional) – options to pass to networkx layout functions

  • padding (dict, optional) –

    contains “x” and “y” specifications for boundaries. Defaults:

    {'x':(-0.05, 1.05), 'y':(-0.05, 1.05)}

    Only valid if kind is ‘coocc’.

  • node_adj_kws

    keyword arguments for nestor.tagtrees.tag_df_network. Valid options are

    similarity : ‘cosine’ (default) or ‘count’ dag : bool, default=’False’, (True if kind='sankey') pct_thres : : int or None

    If int, between [0,100]. The lower percentile at which to threshold edges/adjacency.

Returns

graph (holoviews.Holomap or holoviews.Graph element, pending sankey or cooccurrence input.)

tagcalendarplot(tag_df, how='sum', yearlabels=True, yearascending=True, yearlabel_kws=None, subplot_kws=None, gridspec_kws=None, fig_kws=None, **kwargs)[source]

Plot a timeseries of (binary) tag occurrences as a calendar heatmap over weeks in the year. any columns passed will be explicitly plotted as rows, with each week in the year as a column. By default, occurences are summed, not averaged, but this aggregation over weeks may be any valid option for the pandas.Dataframe.agg() method.

This function will separate out multiple years within the data as multiple calendars. The plotting has been heavily modified/altered/normalized, but the original version appeared here:

adapted from:

‘Martijn Vermaat’ 14 Feb 2016 ‘martijn@vermaat.name’ ‘https://github.com/martijnvermaat/calmap

Parameters
  • tag_df (pandas.DataFrame) – standard Nestor tag occurrence matrix. Multi-column with top-level containing tag classifications (named-entity NE) and 2nd level containing tags. Each row corresponds to a single event (MWO), with binary indicators (1-occurs, 0-does not).

  • how (string) – Method for resampling data by day. If None, assume data is already sampled by day and don’t resample. Otherwise, this is passed to Pandas Series.resample.

  • yearlabels (bool) – Whether or not to draw the year for each subplot.

  • yearascending (bool) – Sort the calendar in ascending or descending order.

  • yearlabel_kws (dict) – Keyword arguments passed to the matplotlib set_ylabel call which is used to draw the year for each subplot.

  • subplot_kws (dict) – Keyword arguments passed to the matplotlib add_subplot call used to create each subplot.

  • gridspec_kws (dict) – Keyword arguments passed to the matplotlib GridSpec constructor used to create the grid the subplots are placed on.

  • fig_kws (dict) – Keyword arguments passed to the matplotlib figure call.

  • kwargs (other keyword arguments) – All other keyword arguments are passed to yearplot.

Returns

fig, axes (matplotlib Figure and Axes) – Tuple where fig is the matplotlib Figure object axes is an array of matplotlib Axes objects with the calendar heatmaps, one per year.

tagyearplot(tag_df, year=None, how='sum', vmin=None, vmax=None, cmap='Reds', linewidth=1, linecolor=None, monthlabels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], monthticks=True, ax=None, **kwargs)[source]

Plot a timeseries of (binary) tag occurrences as a calendar heatmap over weeks in the year. any columns passed will be explicitly plotted as rows, with each week in the year as a column. By default, occurences are summed, not averaged, but this aggregation over weeks may be any valid option for the pandas.Dataframe.agg() method.

adapted from: ‘Martijn Vermaat’ 14 Feb 2016 ‘martijn@vermaat.name’ ‘https://github.com/martijnvermaat/calmap

Parameters
  • tag_df (pandas.DataFrame) – standard Nestor tag occurrence matrix. Multi-column with top-level containing tag classifications (named-entity NE) and 2nd level containing tags. Each row corresponds to a single event (MWO), with binary indicators (1-occurs, 0-does not).

  • year (integer) – Only data indexed by this year will be plotted. If None, the first year for which there is data will be plotted.

  • how (string) – Method for resampling data by day. If None, assume data is already sampled by day and don’t resample. Otherwise, this is passed to Pandas Series.resample.

  • vmin, vmax (floats) – Values to anchor the colormap. If None, min and max are used after resampling data by day.

  • cmap (matplotlib colormap name or object) – The mapping from data values to color space.

  • linewidth (float) – Width of the lines that will divide each day.

  • linecolor (color) – Color of the lines that will divide each day. If None, the axes background color is used, or ‘white’ if it is transparent.

  • monthlabels (list) – Strings to use as labels for months, must be of length 12.

  • monthticks (list or int or bool) – If True, label all months. If False, don’t label months. If a list, only label months with these indices. If an integer, label every n month.

  • ax (matplotlib Axes) – Axes in which to draw the plot, otherwise use the currently-active Axes.

  • kwargs (other keyword arguments) – All other keyword arguments are passed to matplotlib ax.pcolormesh.

Returns

ax (matplotlib Axes) – Axes object with the calendar heatmap.