Package 'finnsurveytext' reference manual

Title:	Analyse Open-Ended Survey Responses in Finnish
Description:	Annotates Finnish textual survey responses into CoNLL-U format using Finnish treebanks from <https://universaldependencies.org/format.html> using UDPipe as described in Straka and Straková (2017) <doi:10.18653/v1/K17-3009>. Formatted data is then analysed using single or comparison n-gram plots, wordclouds, summary tables and Concept Network plots. The Concept Network plots use the TextRank algorithm as outlined in Mihalcea, Rada & Tarau, Paul (2004) <https://aclanthology.org/W04-3252/>.
Authors:	Adeline Clarke [cre, aut], Krista Lagus [aut], Katja Laine [aut], Maria Litova [aut], Matti Nelimarkka [aut], Joni Oksanen [aut], Jaakko Peltonen [aut], Tuukka Oikarinen [aut], Jani-Matti Tirkkonen [aut], Ida Toivanen [aut], Maria Valaste [aut], Shannon Emilia Carson [ctb], Sirpa Lappalainen [ctb], Tuukka Puonti [ctb], Kimmo Vehkalahti [ctb], DARIAH-FI [cph, fnd]
Maintainer:	Adeline Clarke <[email protected]>
License:	MIT + file LICENSE
Version:	2.1.1
Built:	2025-03-06 16:27:09 UTC
Source:	https://github.com/dariah-fi-survey-concept-network/finnsurveytext

Child Barometer 2016 response data

Description

This data contains background variables and the responses to q3 "Missä asioissa olet hyvä? (Avokysymys)", q7 "Kertoisitko, mitä sinun mielestäsi kiusaaminen on? (Avokysymys)", and q11 "Mikä tekee sinut iloiseksi? (Avokysymys)" in the FSD3134 Lapsibarometri 2016 dataset.

Usage

child
child

Format

## 'child' A dataframe with 414 rows and 8 columns:

fsd_id: FSD case id
q3: 'Which things are you good at?' response text
q7: 'What do you think bullying is?' response text
q11: 'What makes you happy?' response text
paino: Weight
gender: Gender)
major_region: Major region)
daycare_before_school: Daycare before pre-school

Source

<https://urn.fi/urn:nbn:fi:fsd:T-FSD3134>

Young People's Views on Development Cooperation 2012 response data

Description

This data contains background variables and the responses to q11_1 'Jatka lausetta: Kehitysmaa on maa, jossa... (Avokysymys)', q11_2 'Jatka lausetta: Kehitysyhteistyö on toimintaa, jossa... (Avokysymys)', q11_3' Jatka lausetta: Maailman kolme suurinta ongelmaa ovat... (Avokysymys)' in the FSD2821 Nuorten ajatuksia kehitysyhteistyöstä 2012 dataset.

Usage

dev_coop
dev_coop

Format

## 'dev_coop' A dataframe with 925 rows and 9 columns:

fsd_id: FSD case id
q11_1: response text for q11_1
q11_2: response text for q11_2
q11_3: response text for q11_3
paino: Weight
gender: Gender
year_of_birth: Year of Birth
region: Region of Residence
education_level: Education level

Source

<https://urn.fi/urn:nbn:fi:fsd:T-FSD2821>

English Sample Survey Data: Patient Joe

Description

This data contains English text responses to ""Joe’s doctor told him that he would need to return in two weeks to find out whether or not his condition had improved. But when Joe asked the receptionist for an appointment, he was told that it would be over a month before the next available appointment. What should Joe do?" as well as categorisation of these responses by two coders as either destructive, passive, somewhat proactive, or proactive.

Usage

english_sample_survey
english_sample_survey

Format

## 'english_sample_survey' A dataframe with 585 rows and 5 columns:

id: ID
label: Label: destructive, passive, somewhat proactive, or proactive
label_coder1: Label from coder 1
label_coder2: Label from coder 2
text: Text of response

Source

<https://doi.org/10.7802/2474>

Child Barometer 2016 Bullying response data in CoNLL-U format with NLTK stopwords removed and background variables

Description

This data contains the responses to q7 "Kertoisitko, mitä sinun mielestäsi kiusaaminen on? (Avokysymys)" in the FSD3134 Lapsibarometri 2016 dataset in CoNLL-U format with NLTK stopwords and punctuation removed plus weights and background variables.

Usage

fst_child
fst_child

Format

## 'fst_child' A dataframe with 1580 rows and 18 columns:

doc_id: the identifier of the document
paragraph_id: the identifier of the paragraph
sentence_id: the identifier of the sentence
sentence: the text of the sentence for which this token is part of
token_id: Word index, integer starting at 1 for each new sentence; may be a range for multi-word tokens; may be a decimal number for empty nodes.
token: Word form or punctuation symbol.
lemma: Lemma or stem of word form.
upos: Universal part-of-speech tag.
xpos: Language-specific part-of-speech tag; underscore if not available.
feats: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
head_token_id: Head of the current word, which is either a value of token_id or zero (0).
dep_rel: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
deps: Enhanced dependency graph in the form of a list of head-deprel pairs.
misc: Any other annotation.
weight: Weight
gender: Gender
major_region: Major region
daycare_before_school: Daycare before pre-school

Source

<https://urn.fi/urn:nbn:fi:fsd:T-FSD3134>

Child Barometer 2016 Bullying response data in CoNLL-U format with NLTK stopwords removed

Description

Usage

fst_child_2
fst_child_2

Format

## 'fst_child_2' A dataframe with 1580 rows and 14 columns:

doc_id: the identifier of the document
paragraph_id: the identifier of the paragraph
sentence_id: the identifier of the sentence
sentence: the text of the sentence for which this token is part of
token_id: Word index, integer starting at 1 for each new sentence; may be a range for multi-word tokens; may be a decimal number for empty nodes.
token: Word form or punctuation symbol.
lemma: Lemma or stem of word form.
upos: Universal part-of-speech tag.
xpos: Language-specific part-of-speech tag; underscore if not available.
feats: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
head_token_id: Head of the current word, which is either a value of token_id or zero (0).
dep_rel: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
deps: Enhanced dependency graph in the form of a list of head-deprel pairs.
misc: Any other annotation.

Source

<https://urn.fi/urn:nbn:fi:fsd:T-FSD3134>

Concept Network- Plot comparison Concept Network

Description

Creates a Concept Network plot from a list of edges and nodes (and their respective weights) which indicates unique words in this plot in comparison to another Network.

Usage

fst_cn_compare_plot(
  edges,
  nodes,
  concepts,
  unique_lemmas,
  name = NULL,
  concept_colour = "#cd1719",
  unique_colour = "#4DAF4A",
  min_edge = NULL,
  max_edge = NULL,
  min_node = NULL,
  max_node = NULL,
  title_size = 20
)
fst_cn_compare_plot(
  edges,
  nodes,
  concepts,
  unique_lemmas,
  name = NULL,
  concept_colour = "#cd1719",
  unique_colour = "#4DAF4A",
  min_edge = NULL,
  max_edge = NULL,
  min_node = NULL,
  max_node = NULL,
  title_size = 20
)

Arguments

`edges`	Output of ‘fst_cn_edges()', dataframe of ’edges' connecting two words.
`nodes`	Output of 'fst_cn_nodes()', dataframe of relevant lemmas and their associated pagerank.
`concepts`	List of terms which have been searched for, separated by commas.
`unique_lemmas`	List of unique lemmas, output of 'fst_cn_get_unique()'
`name`	An optional "name" for the plot, default is 'NULL' and a generic title ("TextRank extracted keyword occurrences") will be used.
`concept_colour`	Colour to display concept words, default is '"indianred"'.
`unique_colour`	Colour to display unique words, default is '"darkgreen"'.
`min_edge`	A numeric value for the scale of the edges, the smallest co_occurrence value for an edge across all Networks to be plotted together.
`max_edge`	A numeric value for the scale of the edges, the largest co_occurrence value for an edge across all Networks to be plotted together.
`min_node`	A numeric value for the scale of the nodes, the smallest pagerank value for a node across all Networks to be plotted together.
`max_node`	A numeric value for the scale of the nodes, the largest pagerank value for a node across all Networks to be plotted together.
`title_size`	size to display plot title

Value

Plot of concept network with concept and unique words (nodes) highlighted.

Examples

pos_filter <- c("NOUN", "VERB", "ADJ", "ADV")
e1 <- fst_cn_edges(fst_child, "lyödä", pos_filter = pos_filter)
e2 <- fst_cn_edges(fst_child, "lyöminen", pos_filter = pos_filter)
n1 <- fst_cn_nodes(fst_child, e1)
n2 <- fst_cn_nodes(fst_child, e2)
u <- fst_cn_get_unique_separate(n1, n2)

fst_cn_compare_plot(e1, n1, "lyödä", unique_lemma = u)
fst_cn_compare_plot(e2, n2, "lyöminen", u, unique_colour = "purple")
pos_filter <- c("NOUN", "VERB", "ADJ", "ADV")
e1 <- fst_cn_edges(fst_child, "lyödä", pos_filter = pos_filter)
e2 <- fst_cn_edges(fst_child, "lyöminen", pos_filter = pos_filter)
n1 <- fst_cn_nodes(fst_child, e1)
n2 <- fst_cn_nodes(fst_child, e2)
u <- fst_cn_get_unique_separate(n1, n2)

fst_cn_compare_plot(e1, n1, "lyödä", unique_lemma = u)
fst_cn_compare_plot(e2, n2, "lyöminen", u, unique_colour = "purple")

Concept Network - Get TextRank edges

Description

This function takes a string of terms (separated by commas) or a single term and, using 'fst_cn_search()' find words connected to these searched terms. Then, a dataframe is returned of 'edges' between two words which are connected together in an frequently-occurring n-gram containing a concept term.

Usage

fst_cn_edges(
  data,
  concepts,
  threshold = NULL,
  norm = "number_words",
  pos_filter = NULL
)
fst_cn_edges(
  data,
  concepts,
  threshold = NULL,
  norm = "number_words",
  pos_filter = NULL
)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`concepts`	List of terms to search for, separated by commas.
`threshold`	A minimum number of occurrences threshold for 'edge' between searched term and other word, default is 'NULL'. Note, the threshold is applied before normalisation.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses), '"number_resp"' (the number of responses), or 'NULL' (raw count returned, default, also used when weights are applied).
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' to include all UPOS tags.

Value

Dataframe of co-occurrences between two connected words.

Examples

con <- "kiusata, lyöminen"
fst_cn_edges(fst_child, con, pos_filter = c("NOUN", "VERB", "ADJ", "ADV"))
fst_cn_edges(fst_child, con, pos_filter = 'VERB, NOUN')
fst_cn_edges(fst_child, "lyöminen", threshold = 2, norm = "number_resp")
con <- "kiusata, lyöminen"
fst_cn_edges(fst_child, con, pos_filter = c("NOUN", "VERB", "ADJ", "ADV"))
fst_cn_edges(fst_child, con, pos_filter = 'VERB, NOUN')
fst_cn_edges(fst_child, "lyöminen", threshold = 2, norm = "number_resp")

Concept Network- Get unique nodes from a list of top n-grams tables

Description

Takes at least two tables of nodes and pagerank (output of 'fst_cn_nodes()') and finds nodes unique to one table.

Usage

fst_cn_get_unique(list)
fst_cn_get_unique(list)

Arguments

list

A list of top nodes

Value

Dataframe of words and whether word is unique or not.

Examples

pos_filter <- 'NOUN, VERB, ADJ, ADV'
e1 <- fst_cn_edges(fst_child, "lyödä", pos_filter = pos_filter)
e2 <- fst_cn_edges(fst_child, "lyöminen", pos_filter = pos_filter)
n1 <- fst_cn_nodes(fst_child, e1)
n2 <- fst_cn_nodes(fst_child, e2)
list_of_nodes <- list()
list_of_nodes <- append(list_of_nodes, list(n1))
list_of_nodes <- append(list_of_nodes, list(n2))
fst_cn_get_unique(list_of_nodes)
pos_filter <- 'NOUN, VERB, ADJ, ADV'
e1 <- fst_cn_edges(fst_child, "lyödä", pos_filter = pos_filter)
e2 <- fst_cn_edges(fst_child, "lyöminen", pos_filter = pos_filter)
n1 <- fst_cn_nodes(fst_child, e1)
n2 <- fst_cn_nodes(fst_child, e2)
list_of_nodes <- list()
list_of_nodes <- append(list_of_nodes, list(n1))
list_of_nodes <- append(list_of_nodes, list(n2))
fst_cn_get_unique(list_of_nodes)

Concept Network- Get unique nodes from separate top n-grams tables

Description

Takes at least two tables of nodes and pagerank (output of 'fst_cn_nodes()') and finds nodes unique to one table.

Usage

fst_cn_get_unique_separate(table1, table2, ...)
fst_cn_get_unique_separate(table1, table2, ...)

Arguments

`table1`	The first table.
`table2`	The second table.
`...`	Any other tables you want to include.

Value

Dataframe of words and whether word is unique or not.

Examples

pos_filter <- c("NOUN", "VERB", "ADJ", "ADV")
e1 <- fst_cn_edges(fst_child, "lyödä", pos_filter = pos_filter)
e2 <- fst_cn_edges(fst_child, "lyöminen", pos_filter = pos_filter)
n1 <- fst_cn_nodes(fst_child, e1)
n2 <- fst_cn_nodes(fst_child, e2)
fst_cn_get_unique_separate(n1, n2)
pos_filter <- c("NOUN", "VERB", "ADJ", "ADV")
e1 <- fst_cn_edges(fst_child, "lyödä", pos_filter = pos_filter)
e2 <- fst_cn_edges(fst_child, "lyöminen", pos_filter = pos_filter)
n1 <- fst_cn_nodes(fst_child, e1)
n2 <- fst_cn_nodes(fst_child, e2)
fst_cn_get_unique_separate(n1, n2)

Concept Network - Get TextRank nodes

Description

This function takes a string of terms (separated by commas) or a single term and, using 'textrank_keywords()' from 'textrank' package, filters data based on 'pos_filter' ranks words which are the filtered for those connected to search terms.

Usage

fst_cn_nodes(data, edges, pos_filter = NULL)
fst_cn_nodes(data, edges, pos_filter = NULL)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`edges`	Output of 'fst_cn_edges()', dataframe of co-occurrences between two words.
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' to include all UPOS tags.

Value

A dataframe containing relevant lemmas and their associated pagerank.

Examples

con <- "kiusata, lyöminen"
cb <- fst_child
edges <- fst_cn_edges(cb, con, pos_filter = c("NOUN", "VERB", "ADJ", "ADV"))
edges2 <- fst_cn_edges(cb, con, pos_filter = 'NOUN, VERB, ADJ, ADV')
fst_cn_nodes(cb, edges, c("NOUN", "VERB", "ADJ", "ADV"))
fst_cn_nodes(cb, edges, 'NOUN, VERB, ADJ, ADV')
con <- "kiusata, lyöminen"
cb <- fst_child
edges <- fst_cn_edges(cb, con, pos_filter = c("NOUN", "VERB", "ADJ", "ADV"))
edges2 <- fst_cn_edges(cb, con, pos_filter = 'NOUN, VERB, ADJ, ADV')
fst_cn_nodes(cb, edges, c("NOUN", "VERB", "ADJ", "ADV"))
fst_cn_nodes(cb, edges, 'NOUN, VERB, ADJ, ADV')

Plot Concept Network

Description

Creates a Concept Network plot from a list of edges and nodes (and their respective weights).

Usage

fst_cn_plot(edges, nodes, concepts, title = NULL)
fst_cn_plot(edges, nodes, concepts, title = NULL)

Arguments

`edges`	Output of ‘fst_cn_edges()', dataframe of ’edges' connecting two words.
`nodes`	Output of 'fst_cn_nodes()', dataframe of relevant lemmas and their associated pagerank.
`concepts`	List of terms which have been searched for, separated by commas.
`title`	Optional title for plot, default is 'NULL' and a generic title ("TextRank extracted keyword occurrences") will be used.

Value

Plot of Concept Network.

Examples

con <- "kiusata, lyöminen"
cb <- fst_child
edges <- fst_cn_edges(cb, con, pos_filter = c("NOUN", "VERB", "ADJ", "ADV"))
nodes <- fst_cn_nodes(cb, edges, c("NOUN", "VERB", "ADJ", "ADV"))
fst_cn_plot(edges = edges, nodes = nodes, concepts = con)
con <- "kiusata, lyöminen"
cb <- fst_child
edges <- fst_cn_edges(cb, con, pos_filter = c("NOUN", "VERB", "ADJ", "ADV"))
nodes <- fst_cn_nodes(cb, edges, c("NOUN", "VERB", "ADJ", "ADV"))
fst_cn_plot(edges = edges, nodes = nodes, concepts = con)

Concept Network - Search TextRank for concepts

Description

Usage

fst_cn_search(data, concepts, pos_filter = NULL)
fst_cn_search(data, concepts, pos_filter = NULL)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`concepts`	String of terms to search for, separated by commas.
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' to include all UPOS tags.

Value

Dataframe of n-grams containing searched terms.

Examples

con <- "kiusata, lyöminen, lyödä, potkia"
pf <- c("NOUN", "VERB", "ADJ", "ADV")
pf2 <- "NOUN, VERB, ADJ, ADV"
fst_cn_search(fst_child, concepts = con, pos_filter = pf)
fst_cn_search(fst_child, concepts = con, pos_filter = pf2)
fst_cn_search(fst_child, concepts = con)
con <- "kiusata, lyöminen, lyödä, potkia"
pf <- c("NOUN", "VERB", "ADJ", "ADV")
pf2 <- "NOUN, VERB, ADJ, ADV"
fst_cn_search(fst_child, concepts = con, pos_filter = pf)
fst_cn_search(fst_child, concepts = con, pos_filter = pf2)
fst_cn_search(fst_child, concepts = con)

Make comparison cloud

Description

Creates a comparison wordcloud showing words that occur differently between each group. Data is split based on different values in the 'field' column of formatted data. Results will be shown within the plots pane.

Usage

fst_comparison_cloud(
  data,
  field,
  pos_filter = NULL,
  max = 100,
  norm = NULL,
  use_svydesign_weights = FALSE,
  use_svydesign_field = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE,
  exclude_nulls = FALSE,
  rename_nulls = "null_data"
)
fst_comparison_cloud(
  data,
  field,
  pos_filter = NULL,
  max = 100,
  norm = NULL,
  use_svydesign_weights = FALSE,
  use_svydesign_field = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE,
  exclude_nulls = FALSE,
  rename_nulls = "null_data"
)

Arguments

`data`	A dataframe of text in CoNLL-U format with additional 'field' column for splitting data.
`field`	Column in 'data' used for splitting groups
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' which means all word types included.
`max`	The maximum number of words to display, default is '100'.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses), '"number_resp"' (the number of responses), or 'NULL' (raw count returned, default, also used when weights are applied).
`use_svydesign_weights`	Option to weight words in the wordcloud using weights from a svydesign object containing the raw data, default is 'FALSE'
`use_svydesign_field`	Option to get 'field' for splitting the data from the svydesign object, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A svydesign object which contains the raw data and weights.
`use_column_weights`	Option to weight words in the wordcloud using weights from formatted data which includes addition 'weight' column, default is 'FALSE'
`exclude_nulls`	Whether to include NULLs in 'field' column, default is 'FALSE'
`rename_nulls`	What to fill NULL values with if 'exclude_nulls = FALSE'.

Value

A comparison cloud from wordcloud package.

Examples

fst_comparison_cloud(fst_child, 'gender', max = 50)
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
c2 <- fst_child_2
fst_comparison_cloud(c2, 'gender', NULL, 100, NULL, TRUE, TRUE, i, s)
T <- TRUE
fst_comparison_cloud(fst_dev_coop, 'education_level', use_column_weights = T)
pf <- c("NOUN", "VERB", "ADJ", "ADV")
pf2 <- "NOUN, VERB, ADJ, ADV"
fst_comparison_cloud(fst_dev_coop, 'gender', pos_filter = pf)
fst_comparison_cloud(fst_dev_coop, 'gender', pos_filter = pf2)
fst_comparison_cloud(fst_dev_coop, 'gender', norm = 'number_resp')
fst_comparison_cloud(fst_child, 'gender', max = 50)
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
c2 <- fst_child_2
fst_comparison_cloud(c2, 'gender', NULL, 100, NULL, TRUE, TRUE, i, s)
T <- TRUE
fst_comparison_cloud(fst_dev_coop, 'education_level', use_column_weights = T)
pf <- c("NOUN", "VERB", "ADJ", "ADV")
pf2 <- "NOUN, VERB, ADJ, ADV"
fst_comparison_cloud(fst_dev_coop, 'gender', pos_filter = pf)
fst_comparison_cloud(fst_dev_coop, 'gender', pos_filter = pf2)
fst_comparison_cloud(fst_dev_coop, 'gender', norm = 'number_resp')

Concept Network - Make Concept Network plot

Description

This function takes a string of terms (separated by commas) or a single term and, using 'textrank_keywords()' from 'textrank' package, filters data based on 'pos_filter' and finds words connected to search terms. Then it plots a Concept Network based on the calculated weights of these terms and the frequency of co-occurrences.

Usage

fst_concept_network(
  data,
  concepts,
  threshold = NULL,
  norm = "number_words",
  pos_filter = NULL,
  title = NULL
)
fst_concept_network(
  data,
  concepts,
  threshold = NULL,
  norm = "number_words",
  pos_filter = NULL,
  title = NULL
)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`concepts`	List of terms to search for, separated by commas.
`threshold`	A minimum number of occurrences threshold for 'edge' between searched term and other word, default is 'NULL'. Note, the threshold is applied before normalisation.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses), '"number_resp"' (the number of responses), or 'NULL' (raw count returned, default, also used when weights are applied).
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' to include all UPOS tags.
`title`	Optional title for plot, default is 'NULL' and a generic title ("TextRank extracted keyword occurrences") will be used.

Value

Plot of Concept Network.

Examples

data <- fst_child
con <- "kiusata, lyöminen"
pf <- c("NOUN", "VERB", "ADJ", "ADV")
title <- "Bullying Concept Network"
fst_concept_network(data, concepts = con, pos_filter = pf, title = title)
data <- fst_child
con <- "kiusata, lyöminen"
pf <- c("NOUN", "VERB", "ADJ", "ADV")
title <- "Bullying Concept Network"
fst_concept_network(data, concepts = con, pos_filter = pf, title = title)

Concept Network- Compare and plot Concept Network

Description

This function takes a string of terms (separated by commas) or a single term and, using 'textrank_keywords()' from 'textrank' package, filters data based on 'pos_filter' and finds words connected to search terms for each group. Then it plots a Concept Network for each group based on the calculated weights of these terms and the frequency of co-occurrences, indicating any words that are unique to each group's Network plot.

Usage

fst_concept_network_compare(
  data,
  concepts,
  field,
  norm = NULL,
  threshold = NULL,
  pos_filter = NULL,
  use_svydesign_field = FALSE,
  id = "",
  svydesign = NULL,
  exclude_nulls = FALSE,
  rename_nulls = "null_data",
  title_size = 20,
  subtitle_size = 15
)
fst_concept_network_compare(
  data,
  concepts,
  field,
  norm = NULL,
  threshold = NULL,
  pos_filter = NULL,
  use_svydesign_field = FALSE,
  id = "",
  svydesign = NULL,
  exclude_nulls = FALSE,
  rename_nulls = "null_data",
  title_size = 20,
  subtitle_size = 15
)

Arguments

`data`	A dataframe of text in CoNLL-U format with additional 'field' column for splitting data.
`concepts`	List of terms to search for, separated by commas.
`field`	Column in 'data' used for splitting groups
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses, default), '"number_resp"' (the number of responses), or 'NULL' (raw count returned).
`threshold`	A minimum number of occurrences threshold for 'edge' between searched term and other word, default is 'NULL'. Note, the threshold is applied before normalisation.
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' to include all UPOS tags.
`use_svydesign_field`	Option to get 'field' for splitting the data from a svydesign object, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A svydesign object which contains the raw data and weights.
`exclude_nulls`	Whether to include NULLs in 'field' column, default is 'FALSE'
`rename_nulls`	What to fill NULL values with if 'exclude_nulls = FALSE'.
`title_size`	size to display plot title
`subtitle_size`	size to display title of individual concept network

Value

Multiple concept network plots with concept and unique words highlighted.

Examples

con1 <- "lyödä, lyöminen"
fst_concept_network_compare(fst_child, concepts = con1, field = 'gender')
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
c2 <- fst_child_2
i <- 'fsd_id'
fst_concept_network_compare(c2, con1, 'gender', NULL, NULL, NULL, TRUE, i, s)
con2 <- "köyhyys, nälänhätä, sota"
fst_concept_network_compare(fst_dev_coop, con2, 'gender')
con1 <- "lyödä, lyöminen"
fst_concept_network_compare(fst_child, concepts = con1, field = 'gender')
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
c2 <- fst_child_2
i <- 'fsd_id'
fst_concept_network_compare(c2, con1, 'gender', NULL, NULL, NULL, TRUE, i, s)
con2 <- "köyhyys, nälänhätä, sota"
fst_concept_network_compare(fst_dev_coop, con2, 'gender')

Young People's Views on Development Cooperation 2012 q11_3 response data in CoNLL-U format with NTLK stopwords removed and background variables.

Description

This data contains the responses to Development Cooperation q11_3 dataset in CoNLL-U format with NLTK stopwords and punctuation removed plus weights and background variables.

Usage

fst_dev_coop
fst_dev_coop

Format

## 'fst_dev_coop' A dataframe with 4192 rows and 19 columns:

doc_id: the identifier of the document
paragraph_id: the identifier of the paragraph
sentence_id: the identifier of the sentence
sentence: the text of the sentence for which this token is part of
token_id: Word index, integer starting at 1 for each new sentence; may be a range for multi-word tokens; may be a decimal number for empty nodes.
token: Word form or punctuation symbol.
lemma: Lemma or stem of word form.
upos: Universal part-of-speech tag.
xpos: Language-specific part-of-speech tag; underscore if not available.
feats: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
head_token_id: Head of the current word, which is either a value of token_id or zero (0).
dep_rel: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
deps: Enhanced dependency graph in the form of a list of head-deprel pairs.
misc: Any other annotation.
weight: Weight
gender: Gender
year_of_birth: Year of Birth
region: Region of Residence

Source

<https://urn.fi/urn:nbn:fi:fsd:T-FSD2821>

Young People's Views on Development Cooperation 2012 q11_3 response data in CoNLL-U format with NTLK stopwords removed

Description

This data contains the responses to Development Cooperation q11_3 dataset in CoNLL-U format with NLTK stopwords and punctuation removed.

Usage

fst_dev_coop_2
fst_dev_coop_2

Format

## 'fst_dev_coop_2' A dataframe with 4192 rows and 14 columns:

doc_id: the identifier of the document
paragraph_id: the identifier of the paragraph
sentence_id: the identifier of the sentence
sentence: the text of the sentence for which this token is part of
token_id: Word index, integer starting at 1 for each new sentence; may be a range for multi-word tokens; may be a decimal number for empty nodes.
token: Word form or punctuation symbol.
lemma: Lemma or stem of word form.
upos: Universal part-of-speech tag.
xpos: Language-specific part-of-speech tag; underscore if not available.
feats: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
head_token_id: Head of the current word, which is either a value of token_id or zero (0).
dep_rel: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
deps: Enhanced dependency graph in the form of a list of head-deprel pairs.
misc: Any other annotation.

Source

<https://urn.fi/urn:nbn:fi:fsd:T-FSD2821>

Get available stopwords lists

Description

Returns a tibble containing all available stopword lists for the language, their contents, and the size of the lists.

Usage

fst_find_stopwords(language = "fi")
fst_find_stopwords(language = "fi")

Arguments

language

two-letter ISO code of the language for the stopword list

Value

A tibble containing the stopwords lists.

Examples

fst_find_stopwords()
fst_find_stopwords(language = 'et')
fst_find_stopwords()
fst_find_stopwords(language = 'et')

Annotate open-ended survey responses in into CoNLL-U format

Description

Usage

fst_format(data, question, id, model = "ftb", weights = NULL, add_cols = NULL)
fst_format(data, question, id, model = "ftb", weights = NULL, add_cols = NULL)

Arguments

`data`	A dataframe of survey responses which contains an open-ended question.
`question`	The column in the dataframe which contains the open-ended question.
`id`	The column in the dataframe which contains the ids for the responses.
`model`	A language model available for [udpipe]. '"ftb"' (default) or '"tdt"' are recognised as shorthand for "finnish-ftb" and "finnish-tdt". The full list is available in the [udpipe] documentation or via 'fst_print_available_models()'.
`weights`	Optional, the column of the dataframe which contains the respective weights for each response.
`add_cols`	Optional, a column (or columns) from the dataframe which contain other information you'd like to retain (for instance, covariate columnns for splitting the data for comparison plots).

Value

Dataframe of annotated text in CoNLL-U format plus any additional columns.

Examples

## Not run: 
i <- "fsd_id"
fst_format(data = child, question = "q7", id = i)
fst_format(data = child, question = "q7", id = i, model = "tdt")
fst_format(data = child, question = "q7", id = i, weights="paino")
cols <- c("gender", "major_region", "daycare_before_school")
fst_format(child, question = "q7", id = i, add_cols = cols)
fst_format(child, question = "q7", id = i, add_cols = "gender, major_region")
fst_format(child, question = 'q7', id = i, model = 'swedish-talbanken')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")
unlink("swedish-talkbanken-ud-2.5-191206.udpipe")

## End(Not run)
## Not run: 
i <- "fsd_id"
fst_format(data = child, question = "q7", id = i)
fst_format(data = child, question = "q7", id = i, model = "tdt")
fst_format(data = child, question = "q7", id = i, weights="paino")
cols <- c("gender", "major_region", "daycare_before_school")
fst_format(child, question = "q7", id = i, add_cols = cols)
fst_format(child, question = "q7", id = i, add_cols = "gender, major_region")
fst_format(child, question = 'q7', id = i, model = 'swedish-talbanken')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")
unlink("swedish-talkbanken-ud-2.5-191206.udpipe")

## End(Not run)

Annotate open-ended survey responses within a 'svydesign' object into CoNLL-U format

Description

Usage

fst_format_svydesign(
  svydesign,
  question,
  id,
  model = "ftb",
  use_weights = TRUE,
  add_cols = NULL
)
fst_format_svydesign(
  svydesign,
  question,
  id,
  model = "ftb",
  use_weights = TRUE,
  add_cols = NULL
)

Arguments

`svydesign`	A 'svydesign' object which contains an open-ended question.
`question`	The column in the dataframe which contains the open-ended question.
`id`	The column in the dataframe which contains the ids for the responses.
`model`	A language model available for [udpipe]. '"ftb"' (default) or '"tdt"' are recognised as shorthand for "finnish-ftb" and "finnish-tdt". The full list is available in the [udpipe] documentation or via 'fst_print_available_models()'.
`use_weights`	Optional, whether to use weights within the 'svydesign'
`add_cols`	Optional, a column (or columns) from the dataframe which contain other information you'd like to retain (for instance, dimension columnns for splitting the data for comparison plots).

Value

Dataframe of annotated text in CoNLL-U format plus any additional columns.

Examples

## Not run: 
i <- "fsd_id"
svy_child <- survey::svydesign(id=~1, weights= ~paino, data = child)
fst_format_svydesign(svy_child, question = 'q7', id = 'fsd_id')
fst_format_svydesign(svy_child, question = 'q7', id = i, use_weights = FALSE)
cols <- c('gender', 'major_region')
fst_format_svydesign(svy_child, 'q7', 'fsd_id', add_cols = cols)

svy_dev <- survey::svydesign(id = ~1, weights = ~paino, data = dev_coop)
fst_format_svydesign(svy_dev, 'q11_1', 'fsd_id', add_cols = 'gender, region')

fst_format_svydesign(svy_dev, 'q11_2', 'fsd_id', 'finnish-ftb')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")

## End(Not run)
## Not run: 
i <- "fsd_id"
svy_child <- survey::svydesign(id=~1, weights= ~paino, data = child)
fst_format_svydesign(svy_child, question = 'q7', id = 'fsd_id')
fst_format_svydesign(svy_child, question = 'q7', id = i, use_weights = FALSE)
cols <- c('gender', 'major_region')
fst_format_svydesign(svy_child, 'q7', 'fsd_id', add_cols = cols)

svy_dev <- survey::svydesign(id = ~1, weights = ~paino, data = dev_coop)
fst_format_svydesign(svy_dev, 'q11_1', 'fsd_id', add_cols = 'gender, region')

fst_format_svydesign(svy_dev, 'q11_2', 'fsd_id', 'finnish-ftb')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")

## End(Not run)

Find and Plot Top Words

Description

Creates a plot of the most frequently-occurring words (unigrams) within the data. Optionally, weights can be provided either through a 'weight' column in the formatted data, or from a 'svydesign' object with the raw (preformatted) data.

Usage

fst_freq(
  data,
  number = 10,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  name = NULL,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)
fst_freq(
  data,
  number = 10,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  name = NULL,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`number`	The number of top words to return, default is '10'.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses, default), '"number_resp"' (the number of responses), or 'NULL' (raw count returned).
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' which means all word types included.
`strict`	Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.
`name`	An optional "name" for the plot to add to title, default is 'NULL'.
`use_svydesign_weights`	Option to weight words in the plot using weights from a 'svydesign' containing the raw data, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A 'svydesign' which contains the raw data and weights, required if 'use_svydesign_weights = TRUE'.
`use_column_weights`	Option to weight words in the plot using weights from formatted data which includes addition 'weight' column, default is 'FALSE'

Value

Plot of top words.

Examples

fst_freq(fst_child, number = 12, norm = 'number_resp',  name = "All")
fst_freq(fst_child, use_column_weights = TRUE)
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
fst_freq(fst_child_2, use_svydesign_weights = TRUE, svydesign = s, id = i)
fst_freq(fst_child, number = 12, norm = 'number_resp',  name = "All")
fst_freq(fst_child, use_column_weights = TRUE)
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
fst_freq(fst_child_2, use_svydesign_weights = TRUE, svydesign = s, id = i)

Compare and plot top words

Description

Find top and unique top words for different groups of participants. Data is split based on different values in the 'field' column of formatted data. Results will be shown within the plots pane.

Usage

fst_freq_compare(
  data,
  field,
  number = 10,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  use_svydesign_field = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE,
  exclude_nulls = FALSE,
  rename_nulls = "null_data",
  unique_colour = "indianred",
  title_size = 20,
  subtitle_size = 15
)
fst_freq_compare(
  data,
  field,
  number = 10,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  use_svydesign_field = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE,
  exclude_nulls = FALSE,
  rename_nulls = "null_data",
  unique_colour = "indianred",
  title_size = 20,
  subtitle_size = 15
)

Arguments

`data`	A dataframe of text in CoNLL-U format with additional 'field' column for splitting data.
`field`	Column in 'data' used for splitting groups
`number`	The number of n-grams to return, default is '10'.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses), '"number_resp"' (the number of responses), or 'NULL' (raw count returned, default, also used when weights are applied).
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' which means all word types included.
`strict`	Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.
`use_svydesign_weights`	Option to weight words in the wordcloud using weights from a svydesign object containing the raw data, default is 'FALSE'
`use_svydesign_field`	Option to get 'field' for splitting the data from the svydesign object, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A svydesign object which contains the raw data and weights.
`use_column_weights`	Option to weight words in the wordcloud using weights from formatted data which includes addition 'weight' column, default is 'FALSE'
`exclude_nulls`	Whether to include NULLs in 'field' column, default is 'FALSE'
`rename_nulls`	What to fill NULL values with if 'exclude_nulls = FALSE'.
`unique_colour`	Colour to display unique words, default is '"indianred"'.
`title_size`	size to display plot title
`subtitle_size`	size to display title of individual top words plot

Value

Plots of most frequent words in the plots pane with unique words highlighted.

Examples

fst_freq_compare(fst_child, 'gender', number = 10, norm = "number_resp")
fst_freq_compare(fst_child, 'gender', number = 10, norm = NULL)
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
c2 <- fst_child_2
c <- fst_child
g <- 'gender'
fst_freq_compare(c2, g, 10, NULL, NULL, TRUE, TRUE, TRUE, 'fsd_id', s)
fst_freq_compare(c, g, use_column_weights = TRUE, strict = FALSE)
fst_freq_compare(fst_child, 'gender', number = 10, norm = "number_resp")
fst_freq_compare(fst_child, 'gender', number = 10, norm = NULL)
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
c2 <- fst_child_2
c <- fst_child
g <- 'gender'
fst_freq_compare(c2, g, 10, NULL, NULL, TRUE, TRUE, TRUE, 'fsd_id', s)
fst_freq_compare(c, g, use_column_weights = TRUE, strict = FALSE)

Make Top Words plot

Description

Plots most common words.

Usage

fst_freq_plot(table, number = NULL, name = NULL)
fst_freq_plot(table, number = NULL, name = NULL)

Arguments

`table`	Output of 'fst_freq_table()' or 'fst_ngrams_table()'.
`number`	Optional number of n-grams for the title, default is 'NULL'.
`name`	An optional "name" for the plot to add to title, default is 'NULL'.

Value

Plot of top words.

Examples

pf <- c("NOUN", "VERB", "ADJ", "ADV")
top_words <- fst_freq_table(fst_child, number = 15, pos_filter = pf)
fst_freq_plot(top_words, number = 15, name = "Bullying")
pf <- c("NOUN", "VERB", "ADJ", "ADV")
top_words <- fst_freq_table(fst_child, number = 15, pos_filter = pf)
fst_freq_plot(top_words, number = 15, name = "Bullying")

Make Top Words Table

Description

Creates a table of the most frequently-occurring words (unigrams) within the data. Optionally, weights can be provided either through a 'weight' column in the formatted data, or from a 'svydesign' object with the raw (preformatted) data.

Usage

fst_freq_table(
  data,
  number = 10,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)
fst_freq_table(
  data,
  number = 10,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`number`	The number of top words to return, default is '10'.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses), '"number_resp"' (the number of r , or 'NULL' (raw count returned, default, also used when weights are applied).
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' which means all word types included.
`strict`	Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.
`use_svydesign_weights`	Option to weight words in the table using weights from a 'svydesign' containing the raw data, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A 'svydesign' which contains the raw data and weights, required if 'use_svydesign_weights = TRUE'.
`use_column_weights`	Option to weight words in the table using weights from formatted data which includes addition 'weight' column, default is 'FALSE'

Value

A table of the most frequently occurring words in the data.

Examples

pf <- c("NOUN", "VERB", "ADJ", "ADV")
pf2 <- "NOUN, VERB, ADJ, ADV"
fst_freq_table(fst_child, number = 15, strict = FALSE, pos_filter = pf)
fst_freq_table(fst_child, number = 15, strict = FALSE, pos_filter = pf2)
fst_freq_table(fst_child, norm = 'number_words')
fst_freq_table(fst_child, use_column_weights = TRUE)
c2 <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
fst_freq_table(c2, use_svydesign_weights = TRUE, svydesign = s, id = i)
pf <- c("NOUN", "VERB", "ADJ", "ADV")
pf2 <- "NOUN, VERB, ADJ, ADV"
fst_freq_table(fst_child, number = 15, strict = FALSE, pos_filter = pf)
fst_freq_table(fst_child, number = 15, strict = FALSE, pos_filter = pf2)
fst_freq_table(fst_child, norm = 'number_words')
fst_freq_table(fst_child, use_column_weights = TRUE)
c2 <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
fst_freq_table(c2, use_svydesign_weights = TRUE, svydesign = s, id = i)

Get unique n-grams from a list of top n-grams tables

Description

Takes a list containing at least two tables of n-grams and frequencies (either output of 'fst_freq_table()' or 'fst_ngrams_table()') and finds n-grams unique to one table.

Usage

fst_get_unique_ngrams(list_of_top_ngrams)
fst_get_unique_ngrams(list_of_top_ngrams)

Arguments

list_of_top_ngrams

A list of top ngrams

Value

Dataframe of words and whether word is unique or not.

Examples

top_child <- fst_freq_table(fst_child)
top_dev <- fst_freq_table(fst_dev_coop)
list_of_top_words <- list()
list_of_top_words <- append(list_of_top_words, list(top_child))
list_of_top_words <- append(list_of_top_words, list(top_dev))
fst_get_unique_ngrams(list_of_top_words)
top_child <- fst_freq_table(fst_child)
top_dev <- fst_freq_table(fst_dev_coop)
list_of_top_words <- list()
list_of_top_words <- append(list_of_top_words, list(top_child))
list_of_top_words <- append(list_of_top_words, list(top_dev))
fst_get_unique_ngrams(list_of_top_words)

Get unique n-grams from separate top n-grams tables

Description

Takes at least two separate tables of n-grams and frequencies (either output of 'fst_freq_table()' or 'fst_ngrams_table()') and finds n-grams unique to one table.

Usage

fst_get_unique_ngrams_separate(table1, table2, ...)
fst_get_unique_ngrams_separate(table1, table2, ...)

Arguments

`table1`	The first n-grams table.
`table2`	The second n-grams table.
`...`	Any other n-grams tables you want to include.

Value

Dataframe of words and whether word is unique or not.

Examples

top_child <- fst_freq_table(fst_child)
top_dev <- fst_freq_table(fst_dev_coop)
fst_get_unique_ngrams_separate(top_child, top_dev)
top_child <- fst_freq_table(fst_child)
top_dev <- fst_freq_table(fst_dev_coop)
fst_get_unique_ngrams_separate(top_child, top_dev)

Merge N-grams table with unique words

Description

Merges list of unique words from 'fst_get_unique_ngrams()' with output of 'fst_freq_table()' or 'fst_ngrams_table()' so that unique words can be displayed on comparison plots.

Usage

fst_join_unique(table, unique_table)
fst_join_unique(table, unique_table)

Arguments

`table`	Output of 'fst_freq_table()' or 'fst_ngrams_table()'.
`unique_table`	Output of 'fst_get_unique_ngrams()'.

Value

A table of top n-grams, frequency, and whether the n-gram is "unique".

Examples

top_child <- fst_freq_table(fst_child)
top_dev <- fst_freq_table(fst_dev_coop)
unique_words <- fst_get_unique_ngrams_separate(top_child, top_dev)
fst_join_unique(top_child, unique_words)
fst_join_unique(top_dev, unique_words)
top_child <- fst_freq_table(fst_child)
top_dev <- fst_freq_table(fst_dev_coop)
unique_words <- fst_get_unique_ngrams_separate(top_child, top_dev)
fst_join_unique(top_child, unique_words)
fst_join_unique(top_dev, unique_words)

Compare response lengths

Description

Compare length of text responses for different groups of participants. Data is split based on different values in the 'field' column of formatted data. Results will be shown within the plots pane.

Usage

fst_length_compare(
  data,
  field,
  incl_sentences = TRUE,
  exclude_nulls = FALSE,
  rename_nulls = "null_data"
)
fst_length_compare(
  data,
  field,
  incl_sentences = TRUE,
  exclude_nulls = FALSE,
  rename_nulls = "null_data"
)

Arguments

`data`	A dataframe of text in CoNLL-U format with additional 'field' column for splitting data.
`field`	Column in 'data' used for splitting groups
`incl_sentences`	Whether to include sentence data in table, default is 'TRUE'.
`exclude_nulls`	Whether to include NULLs in 'field' column, default is 'FALSE'
`rename_nulls`	What to fill NULL values with if 'exclude_nulls = FALSE'.

Value

Dataframe summarising response lengths.

Examples

fst_length_compare(fst_child, 'gender')
fst_length_compare(fst_dev_coop, 'education_level', incl_sentences = FALSE)
fst_length_compare(fst_child, 'gender')
fst_length_compare(fst_dev_coop, 'education_level', incl_sentences = FALSE)

Make Length Summary Table

Description

Creates a table summarising distribution of the length of responses.

Usage

fst_length_summary(data, desc = "All responses", incl_sentences = TRUE)
fst_length_summary(data, desc = "All responses", incl_sentences = TRUE)

Arguments

`data`	dataframe of text in CoNLL-U format, with optional additional columns.
`desc`	An optional string describing responses in table, default is '"All responses"'.
`incl_sentences`	Whether to include sentence data in table, default is 'TRUE'.

Value

Table summarising distribution of lengths of responses.

Examples

fst_length_summary(fst_child, incl_sentences = FALSE)
fst_length_summary(fst_dev_coop, desc = "Q11_3")
fst_length_summary(fst_child, incl_sentences = FALSE)
fst_length_summary(fst_dev_coop, desc = "Q11_3")

Find and Plot Top N-grams

Description

Creates a plot of the most frequently-occurring n-grams within the data. Optionally, weights can be provided either through a 'weight' column in the formatted data, or from a 'svydesign' object with the raw (preformatted) data.

Usage

fst_ngrams(
  data,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  name = NULL,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)
fst_ngrams(
  data,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  name = NULL,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`number`	The number of top words to return, default is '10'.
`ngrams`	The type of n-grams, default is '1'.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses, default), '"number_resp"' (the number of responses), or 'NULL' (raw count returned).
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' which means all word types included.
`strict`	Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.
`name`	An optional "name" for the plot to add to title, default is 'NULL'.
`use_svydesign_weights`	Option to weight words in the plot using weights from a 'svydesign' containing the raw data, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A 'svydesign' which contains the raw data and weights, required if 'use_svydesign_weights = TRUE'.
`use_column_weights`	Option to weight words in the plot using weights from formatted data which includes addition 'weight' column, default is 'FALSE'

Value

Plot of top n-grams

Examples

fst_ngrams(fst_child, 12, ngrams = 2, strict = FALSE, name = "All")
c <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
T <- TRUE
fst_ngrams(c, ngrams = 3, use_svydesign_weights = T, svydesign = s, id = i)
fst_ngrams(fst_child, 12, ngrams = 2, strict = FALSE, name = "All")
c <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
T <- TRUE
fst_ngrams(c, ngrams = 3, use_svydesign_weights = T, svydesign = s, id = i)

Compare and plot top n-grams

Description

Find top and unique top n-grams for different groups of participants. Data is split based on different values in the 'field' column of formatted data. Results will be shown within the plots pane.

Usage

fst_ngrams_compare(
  data,
  field,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  use_svydesign_field = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE,
  exclude_nulls = FALSE,
  rename_nulls = "null_data",
  unique_colour = "indianred",
  title_size = 20,
  subtitle_size = 15
)
fst_ngrams_compare(
  data,
  field,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  use_svydesign_field = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE,
  exclude_nulls = FALSE,
  rename_nulls = "null_data",
  unique_colour = "indianred",
  title_size = 20,
  subtitle_size = 15
)

Arguments

`data`	A dataframe of text in CoNLL-U format with additional 'field' column for splitting data.
`field`	Column in 'data' used for splitting groups
`number`	The number of n-grams to return, default is '10'.
`ngrams`	The type of n-grams to return, default is '1'.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses), '"number_resp"' (the number of responses), or 'NULL' (raw count returned, default, also used when weights are applied).
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' which means all word types included.
`strict`	Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.
`use_svydesign_weights`	Option to weight words in the wordcloud using weights from a svydesign object containing the raw data, default is 'FALSE'
`use_svydesign_field`	Option to get 'field' for splitting the data from the svydesign object, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A svydesign object which contains the raw data and weights.
`use_column_weights`	Option to weight words in the wordcloud using weights from formatted data which includes addition 'weight' column, default is 'FALSE'
`exclude_nulls`	Whether to include NULLs in 'field' column, default is 'FALSE'
`rename_nulls`	What to fill NULL values with if 'exclude_nulls = FALSE'.
`unique_colour`	Colour to display unique words, default is '"indianred"'.
`title_size`	size to display plot title
`subtitle_size`	size to display title of individual top ngrams plot

Value

Plots of top n-grams in the plots pane with unique n-grams highlighted.

Examples

c <- fst_child
g <- 'gender'
fst_ngrams_compare(c, g, ngrams = 4, number = 10, norm = "number_resp")
fst_ngrams_compare(c, g, ngrams = 2, number = 10, norm = NULL)
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
c2 <- fst_child_2
fst_ngrams_compare(c2, g, 10, 3, NULL, NULL, TRUE, TRUE, TRUE, 'fsd_id', s)
fst_ngrams_compare(c, g, 10, 2, use_column_weights = TRUE, strict = TRUE)
c <- fst_child
g <- 'gender'
fst_ngrams_compare(c, g, ngrams = 4, number = 10, norm = "number_resp")
fst_ngrams_compare(c, g, ngrams = 2, number = 10, norm = NULL)
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
c2 <- fst_child_2
fst_ngrams_compare(c2, g, 10, 3, NULL, NULL, TRUE, TRUE, TRUE, 'fsd_id', s)
fst_ngrams_compare(c, g, 10, 2, use_column_weights = TRUE, strict = TRUE)

Plot comparison n-grams

Description

Plots frequency n-grams with unique n-grams highlighted.

Usage

fst_ngrams_compare_plot(
  table,
  number = 10,
  ngrams = 1,
  unique_colour = "indianred",
  name = NULL,
  override_title = NULL,
  title_size = 20
)
fst_ngrams_compare_plot(
  table,
  number = 10,
  ngrams = 1,
  unique_colour = "indianred",
  name = NULL,
  override_title = NULL,
  title_size = 20
)

Arguments

`table`	The table of n-grams, output of 'get_unique_ngrams()'.
`number`	The number of n-grams, default is '10'.
`ngrams`	The type of n-grams, default is '1'.
`unique_colour`	Colour to display unique words, default is '"indianred"'.
`name`	An optional "name" for the plot, default is 'NULL'.
`override_title`	An optional title to override the automatic one for the plot. Default is 'NULL'. If 'NULL', title of plot will be 'number' "Most Common 'Term'". 'Term' is "Words", "Bigrams", or "N-Grams" where N > 2.
`title_size`	size to display plot title

Value

Plot of top n-grams with unique terms highlighted.

Examples

top_child <- fst_freq_table(fst_child)
top_dev <- fst_freq_table(fst_dev_coop)
unique_words <- fst_get_unique_ngrams_separate(top_child, top_dev)
top_child_u <- fst_join_unique(top_child, unique_words)
top_dev_u <- fst_join_unique(top_dev, unique_words)
fst_ngrams_compare_plot(top_child_u, ngrams = 1, name = "Child")
fst_ngrams_compare_plot(top_dev_u, ngrams = 1, name = "Dev", title_size = 10)
top_child <- fst_freq_table(fst_child)
top_dev <- fst_freq_table(fst_dev_coop)
unique_words <- fst_get_unique_ngrams_separate(top_child, top_dev)
top_child_u <- fst_join_unique(top_child, unique_words)
top_dev_u <- fst_join_unique(top_dev, unique_words)
fst_ngrams_compare_plot(top_child_u, ngrams = 1, name = "Child")
fst_ngrams_compare_plot(top_dev_u, ngrams = 1, name = "Dev", title_size = 10)

Make N-grams plot

Description

Plots frequency n-grams.

Usage

fst_ngrams_plot(table, number = NULL, ngrams = 1, name = NULL)
fst_ngrams_plot(table, number = NULL, ngrams = 1, name = NULL)

Arguments

`table`	Output of 'fst_get_top_words()' or 'fst_get_top_ngrams()'.
`number`	Optional number of n-grams for title, default is 'NULL'.
`ngrams`	The type of n-grams, default is '1'.
`name`	An optional "name" for the plot to add to title, default is 'NULL'.

Value

Plot of top n-grams.

Examples

top_bigrams <- fst_ngrams_table(fst_child, ngrams = 2, number = 15)
fst_ngrams_plot(top_bigrams, ngrams = 2, number = 15, name = "Children")
top_bigrams <- fst_ngrams_table(fst_child, ngrams = 2, number = 15)
fst_ngrams_plot(top_bigrams, ngrams = 2, number = 15, name = "Children")

Make Top N-grams Table

Description

Creates a table of the most frequently-occurring n-grams within the data. Optionally, weights can be provided either through a 'weight' column in the formatted data, or from a 'svydesign' object with the raw (preformatted) data.

Usage

fst_ngrams_table(
  data,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)
fst_ngrams_table(
  data,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`number`	The number of n-grams to return, default is '10'.
`ngrams`	The type of n-grams to return, default is '1'.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses), '"number_resp"' (the number of responses), or 'NULL' (raw count returned, default, also used when weights are applied).
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' which means all word types included.
`strict`	Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.
`use_svydesign_weights`	Option to weight words in the table using weights from a 'svydesign' containing the raw data, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A 'svydesign' which contains the raw data and weights, required if 'use_svydesign_weights = TRUE'.
`use_column_weights`	Option to weight words in the table using weights from formatted data which includes addition 'weight' column, default is 'FALSE'

Value

A table of the most frequently occurring n-grams in the data.

Examples

pf <- c("NOUN", "VERB", "ADJ", "ADV")
pf2 <- "NOUN, VERB, ADJ, ADV"
fst_ngrams_table(fst_child, norm = NULL)
fst_ngrams_table(fst_child, ngrams = 2, norm = "number_resp")
fst_ngrams_table(fst_child, ngrams = 2, pos_filter = pf)
fst_ngrams_table(fst_child, ngrams = 2, pos_filter = pf2)
c2 <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
fst_ngrams_table(c2, use_svydesign_weights = TRUE, svydesign = s, id = i)
fst_ngrams_table(fst_child, use_column_weights = TRUE, ngrams = 3)
pf <- c("NOUN", "VERB", "ADJ", "ADV")
pf2 <- "NOUN, VERB, ADJ, ADV"
fst_ngrams_table(fst_child, norm = NULL)
fst_ngrams_table(fst_child, ngrams = 2, norm = "number_resp")
fst_ngrams_table(fst_child, ngrams = 2, pos_filter = pf)
fst_ngrams_table(fst_child, ngrams = 2, pos_filter = pf2)
c2 <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
fst_ngrams_table(c2, use_svydesign_weights = TRUE, svydesign = s, id = i)
fst_ngrams_table(fst_child, use_column_weights = TRUE, ngrams = 3)

Make Top N-grams Table 2

Description

Usage

fst_ngrams_table2(
  data,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)
fst_ngrams_table2(
  data,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`number`	The number of n-grams to return, default is '10'.
`ngrams`	The type of n-grams to return, default is '1'.
`norm`	The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses, default), '"number_resp"' (the number of responses), or 'NULL' (raw count returned).
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' which means all word types included.
`strict`	Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.
`use_svydesign_weights`	Option to weight words in the table using weights from a 'svydesign' containing the raw data, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A 'svydesign' which contains the raw data and weights, required if 'use_svydesign_weights = TRUE'.
`use_column_weights`	Option to weight words in the table using weights from formatted data which includes addition 'weight' column, default is 'FALSE'

Value

A table of the most frequently occurring n-grams in the data.

Examples

fst_ngrams_table2(fst_child, norm = NULL)
fst_ngrams_table2(fst_child, ngrams = 2, norm = "number_resp")
c <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
T <- TRUE
fst_ngrams_table2(c, 10, 2, use_svydesign_weights = T, svydesign = s, id = i)
fst_ngrams_table2(fst_child, norm = NULL)
fst_ngrams_table2(fst_child, ngrams = 2, norm = "number_resp")
c <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
T <- TRUE
fst_ngrams_table2(c, 10, 2, use_svydesign_weights = T, svydesign = s, id = i)

Make POS Summary Table

Description

Creates a summary table for the input CoNLL-U data which counts the number of words of each part-of-speech tag within the data.

Usage

fst_pos(data)
fst_pos(data)

Arguments

data

A dataframe of text in CoNLL-U format, with optional additional columns.

Value

A dataframe with a count and proportion of each UPOS tag in the data and the full name of the tag.

Examples

fst_pos(fst_child)
fst_pos(fst_dev_coop)
fst_pos(fst_child)
fst_pos(fst_dev_coop)

Compare parts-of-speech

Description

Count each POS type for different groups of participants. Data is split based on different values in the 'field' column of formatted data. Results will be shown within the plots pane.

Usage

fst_pos_compare(data, field, exclude_nulls = FALSE, rename_nulls = "null_data")
fst_pos_compare(data, field, exclude_nulls = FALSE, rename_nulls = "null_data")

Arguments

`data`	A dataframe of text in CoNLL-U format with additional 'field' column for splitting data.
`field`	Column in 'data' used for splitting groups
`exclude_nulls`	Whether to include NULLs in 'field' column, default is 'FALSE'
`rename_nulls`	What to fill NULL values with if 'exclude_nulls = FALSE'.

Value

Table of POS tag counts for the groups.

Examples

fst_pos_compare(fst_child, 'gender')
fst_pos_compare(fst_dev_coop, 'region')
fst_pos_compare(fst_child, 'gender')
fst_pos_compare(fst_dev_coop, 'region')

Read In and format survey text responses

Description

Creates a dataframe in CoNLL-U format from a dataframe containing text from using the [udpipe] package and a language model plus any additional columns that are included such as 'weights' or columns added through 'add_cols'. Stopwords and punctuation are optionally removed if the the 'stopword_list' argument is not "none".

Usage

fst_prepare(
  data,
  question,
  id,
  model = "ftb",
  stopword_list = "nltk",
  language = "fi",
  weights = NULL,
  add_cols = NULL,
  manual = FALSE,
  manual_list = ""
)
fst_prepare(
  data,
  question,
  id,
  model = "ftb",
  stopword_list = "nltk",
  language = "fi",
  weights = NULL,
  add_cols = NULL,
  manual = FALSE,
  manual_list = ""
)

Arguments

`data`	A dataframe of survey responses which contains an open-ended question.
`question`	The column in the dataframe which contains the open-ended question.
`id`	The column in the dataframe which contains the ids for the responses.
`model`	A language model available for [udpipe]. '"ftb"' (default) or '"tdt"' are recognised as shorthand for "finnish-ftb" and "finnish-tdt". The full list is available in the [udpipe] documentation or via 'fst_print_available_models()'.
`stopword_list`	A valid stopword list, default is '"nltk"', '"manual"' can be used to indicate that a manual list will be provided, or ‘"none"' if you don’t want to remove stopwords known as 'source' in 'stopwords::stopwords'
`language`	two-letter ISO code for the language for the stopword list
`weights`	Optional, the column of the dataframe which contains the respective weights for each response.
`add_cols`	Optional, a column (or columns) from the dataframe which contain other information you'd like to retain (for instance, dimension columnns for splitting the data for comparison plots).
`manual`	An optional boolean to indicate that a manual list will be provided, 'stopword_list = "manual"' can also or instead be used.
`manual_list`	A manual list of stopwords.

Details

'fst_prepare_conllu()' produces a dataframe containing survey text responses in CoNLL-U format with stopwords optionally removed.

Value

A dataframe of text in CoNLL-U format.

Examples

## Not run: 
i <- "fsd_id"
cb <- child
dev <- dev_coop
fst_prepare(data = cb, question = "q7", id = 'fsd_id', weights = 'paino')
fst_prepare(data = dev, question = "q11_2", id = i, add_cols = c('gender'))
fst_prepare(data = dev, question = "q11_3", id = i, add_cols = 'gender')
fst_prepare(data = child, question = "q7", id = i, model = 'swedish-lines')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")
unlink("swedish-lines-ud-2.5-191206.udpipe")

## End(Not run)
## Not run: 
i <- "fsd_id"
cb <- child
dev <- dev_coop
fst_prepare(data = cb, question = "q7", id = 'fsd_id', weights = 'paino')
fst_prepare(data = dev, question = "q11_2", id = i, add_cols = c('gender'))
fst_prepare(data = dev, question = "q11_3", id = i, add_cols = 'gender')
fst_prepare(data = child, question = "q7", id = i, model = 'swedish-lines')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")
unlink("swedish-lines-ud-2.5-191206.udpipe")

## End(Not run)

Read In and format survey text responses from 'svydesign' object

Description

Creates a dataframe in CoNLL-U format from a 'svydesign' object including text using the [udpipe] package and a language model plus weights if these are included in the 'svydesign' object and any columns added through 'add_cols'.Stopwords and punctuation are optionally removed if the the 'stopword_list' argument is not "none".

Usage

fst_prepare_svydesign(
  svydesign,
  question,
  id,
  model = "ftb",
  stopword_list = "nltk",
  language = "fi",
  use_weights = TRUE,
  add_cols = NULL,
  manual = FALSE,
  manual_list = ""
)
fst_prepare_svydesign(
  svydesign,
  question,
  id,
  model = "ftb",
  stopword_list = "nltk",
  language = "fi",
  use_weights = TRUE,
  add_cols = NULL,
  manual = FALSE,
  manual_list = ""
)

Arguments

`svydesign`	A 'svydesign' object which contains an open-ended question.
`question`	The column in the dataframe which contains the open-ended question.
`id`	The column in the dataframe which contains the ids for the responses.
`model`	A language model available for [udpipe]. '"ftb"' (default) or '"tdt"' are recognised as shorthand for "finnish-ftb" and "finnish-tdt". The full list is available in the [udpipe] documentation or via 'fst_print_available_models()'.
`stopword_list`	A valid stopword list, default is '"nltk"', or '"none"'.
`language`	two-letter ISO code for the language for the stopword list
`use_weights`	Optional, whether to use weights within the 'svydesign'
`add_cols`	Optional, a column (or columns) from the dataframe which contain other information you'd like to retain (for instance, dimension columnns for splitting the data for comparison plots).
`manual`	An optional boolean to indicate that a manual list will be provided, 'stopword_list = "manual"' can also or instead be used.
`manual_list`	A manual list of stopwords.

Details

'fst_prepare_svydesign()' produces a dataframe containing survey text responses in CoNLL-U format with stopwords optionally removed.

Value

A dataframe of text in CoNLL-U format.

Examples

## Not run: 
i <- "fsd_id"
svy_child <- survey::svydesign(id=~1, weights= ~paino, data = child)
fst_prepare_svydesign(svy_child, question = "q7", id = i, use_weights = TRUE)

svy_d <- survey::svydesign(id = ~1, weights = ~paino, data =dev_coop)
fst_prepare_svydesign(svy_d, question = "q11_2", id = i, add_cols = 'gender')

fst_prepare_svydesign(svy_d, 'q11_2', i, 'finnish-ftb', 'nltk', 'fi')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")

## End(Not run)
## Not run: 
i <- "fsd_id"
svy_child <- survey::svydesign(id=~1, weights= ~paino, data = child)
fst_prepare_svydesign(svy_child, question = "q7", id = i, use_weights = TRUE)

svy_d <- survey::svydesign(id = ~1, weights = ~paino, data =dev_coop)
fst_prepare_svydesign(svy_d, question = "q11_2", id = i, add_cols = 'gender')

fst_prepare_svydesign(svy_d, 'q11_2', i, 'finnish-ftb', 'nltk', 'fi')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")

## End(Not run)

Find treebanks available for use

Description

Find treebanks available for use

Usage

fst_print_available_models(search = NULL)
fst_print_available_models(search = NULL)

Arguments

search

An optional string for filtering the list, name of language in English, eg. 'estonian'

Value

List of available treebanks, filtered

Examples

fst_print_available_models()
fst_print_available_models(search = "swedish")
fst_print_available_models()
fst_print_available_models(search = "swedish")

Remove stopwords and punctuation from CoNLL-U dataframe

Description

Removes stopwords and punctuation from a dataframe containing survey text data which is already in CoNLL-U format.

Usage

fst_rm_stop_punct(
  data,
  stopword_list = "nltk",
  language = "fi",
  manual = FALSE,
  manual_list = ""
)
fst_rm_stop_punct(
  data,
  stopword_list = "nltk",
  language = "fi",
  manual = FALSE,
  manual_list = ""
)

Arguments

`data`	A dataframe of text in CoNLL-U format.
`stopword_list`	A valid stopword list, default is '"nltk"', '"manual"' can be used to indicate that a manual list will be provided, or ‘"none"' if you don’t want to remove stopwords, known as 'source' in 'stopwords::stopwords'
`language`	two-letter ISO code of the language for the stopword list
`manual`	An optional boolean to indicate that a manual list will be provided, 'stopword_list = "manual"' can also or instead be used.
`manual_list`	A manual list of stopwords.

Value

A dataframe of text in CoNLL-U format without stopwords and punctuation.

Examples

## Not run: 
c <- fst_format(child, question = 'q7', id = 'fsd_id')
fst_rm_stop_punct(c)
fst_rm_stop_punct(c, stopword_list = "snowball")
fst_rm_stop_punct(c, "stopwords-iso")

mlist <- c('en', 'et', 'ei', 'emme', 'ette', 'eivät', 'minä', 'minum')
mlist2 <- "en, et, ei, emme, ette, eivät, minä, minum"
fst_rm_stop_punct(c, manual = TRUE, manual_list = mlist)
fst_rm_stop_punct(c, stopword_list = "manual", manual_list = mlist)
unlink("finnish-ftb-ud-2.5-191206.udpipe")

## End(Not run)
## Not run: 
c <- fst_format(child, question = 'q7', id = 'fsd_id')
fst_rm_stop_punct(c)
fst_rm_stop_punct(c, stopword_list = "snowball")
fst_rm_stop_punct(c, "stopwords-iso")

mlist <- c('en', 'et', 'ei', 'emme', 'ette', 'eivät', 'minä', 'minum')
mlist2 <- "en, et, ei, emme, ette, eivät, minä, minum"
fst_rm_stop_punct(c, manual = TRUE, manual_list = mlist)
fst_rm_stop_punct(c, stopword_list = "manual", manual_list = mlist)
unlink("finnish-ftb-ud-2.5-191206.udpipe")

## End(Not run)

Make Summary Table

Description

Creates a summary table for the input CoNLL-U data which provides the response count and proportion, total number of words, the number of unique words, and the number of unique lemmas.

Usage

fst_summarise(data, desc = "All responses")
fst_summarise(data, desc = "All responses")

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`desc`	A string describing responses in table, default is '"All responses"'.

Value

A dataframe with summary information for the data including response rate and word counts.

Examples

fst_summarise(fst_child)
fst_summarise(fst_dev_coop, "Q11_3")
fst_summarise(fst_child)
fst_summarise(fst_dev_coop, "Q11_3")

Make comparison summary

Description

Compare text responses for different groups of participants. Data is split based on different values in the 'field' column of formatted data. Results will be shown within the plots pane.

Usage

fst_summarise_compare(
  data,
  field,
  exclude_nulls = FALSE,
  rename_nulls = "null_data"
)
fst_summarise_compare(
  data,
  field,
  exclude_nulls = FALSE,
  rename_nulls = "null_data"
)

Arguments

`data`	A dataframe of text in CoNLL-U format with additional 'field' column for splitting data.
`field`	Column in 'data' used for splitting groups
`exclude_nulls`	Whether to include NULLs in 'field' column, default is 'FALSE'
`rename_nulls`	What to fill NULL values with if 'exclude_nulls = FALSE'.

Value

Summary table of responses between groups.

Examples

fst_summarise_compare(fst_child, 'gender')
fst_summarise_compare(fst_dev_coop, 'gender')
fst_summarise_compare(fst_child, 'gender')
fst_summarise_compare(fst_dev_coop, 'gender')

Make Simple Summary Table

Description

Creates a summary table for the input CoNLL-U data which provides the total number of words, the number of unique words, and the number of unique lemmas.

Usage

fst_summarise_short(data)
fst_summarise_short(data)

Arguments

data

A dataframe of text in CoNLL-U format, with optional additional columns.

Value

A dataframe with summary information on word counts for the data.

Examples

fst_summarise_short(fst_child)
fst_summarise_short(fst_dev_coop)
fst_summarise_short(fst_child)
fst_summarise_short(fst_dev_coop)

Add 'svydesign' weights to CoNLL-U data

Description

This function takes data in CoNLL-U format and a 'svydesign' (from 'survey' package) object with weights in it and merges the weights, and any additional columns into the formatted data.

Usage

fst_use_svydesign(data, svydesign, id, add_cols = NULL, add_weights = TRUE)
fst_use_svydesign(data, svydesign, id, add_cols = NULL, add_weights = TRUE)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`svydesign`	A 'svydesign' object containing the raw data which produced the 'data'
`id`	ID column from raw data, must match the 'docid' in formatted 'data'
`add_cols`	Optional, a column (or columns) from the dataframe which contain other information you'd need (for instance, covariate column for splitting the data for comparison plots).
`add_weights`	Optional, a boolean for whether to add weights from svydesign object, default is 'TRUE'.

Value

A dataframe of text in CoNLL-U format plus a ''weight'' column and optional other columns

Examples

svy_child <- survey::svydesign(id=~1, weights= ~paino, data = child)
fst_use_svydesign(data = fst_child_2, svydesign = svy_child, id = 'fsd_id')

svy_dev <- survey::svydesign(id = ~1, weights = ~paino, data = dev_coop)
fst_use_svydesign(data = fst_dev_coop_2, svydesign = svy_dev, id = 'fsd_id')
svy_child <- survey::svydesign(id=~1, weights= ~paino, data = child)
fst_use_svydesign(data = fst_child_2, svydesign = svy_child, id = 'fsd_id')

svy_dev <- survey::svydesign(id = ~1, weights = ~paino, data = dev_coop)
fst_use_svydesign(data = fst_dev_coop_2, svydesign = svy_dev, id = 'fsd_id')

Make Wordcloud

Description

Creates a wordcloud from CoNLL-U data of frequently-occurring words. Optionally, weights can be provided either through a 'weight' column in the formatted data, or from a 'svydesign' object with the raw (preformatted) data.

Usage

fst_wordcloud(
  data,
  pos_filter = NULL,
  max = 100,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)
fst_wordcloud(
  data,
  pos_filter = NULL,
  max = 100,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)

Arguments

`data`	A dataframe of text in CoNLL-U format, with optional additional columns.
`pos_filter`	List of UPOS tags for inclusion, default is 'NULL' which means all word types included.
`max`	The maximum number of words to display, default is '100'.
`use_svydesign_weights`	Option to weight words in the wordcloud using weights from a 'svydesign' containing the raw data, default is 'FALSE'
`id`	ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.
`svydesign`	A 'svydesign' which contains the raw data and weights, required if 'use_svydesign_weights = TRUE'.
`use_column_weights`	Option to weight words in the wordcloud using weights from formatted data which includes addition 'weight' column, default is 'FALSE'.

Value

A wordcloud from the data.

Examples

fst_wordcloud(fst_child)
fst_wordcloud(fst_child, pos_filter = c("NOUN", "VERB", "ADJ", "ADV"))
fst_wordcloud(fst_child, pos_filter = 'NOUN, VERB, ADJ')
fst_wordcloud(fst_child, use_column_weights = TRUE)
i <- 'fsd_id'
c <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
fst_wordcloud(c, use_svydesign_weights = TRUE, id = i, svydesign = s)
fst_wordcloud(fst_child)
fst_wordcloud(fst_child, pos_filter = c("NOUN", "VERB", "ADJ", "ADV"))
fst_wordcloud(fst_child, pos_filter = 'NOUN, VERB, ADJ')
fst_wordcloud(fst_child, use_column_weights = TRUE)
i <- 'fsd_id'
c <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
fst_wordcloud(c, use_svydesign_weights = TRUE, id = i, svydesign = s)

Run Shiny App Demo

Description

Run Shiny App Demo

Usage

runDemo()
runDemo()

Value

launches the RShiny demo

Examples

## Not run: 
  runDemo()

## End(Not run)
## Not run: 
  runDemo()

## End(Not run)

Package 'finnsurveytext'

Help Index

Child Barometer 2016 response data

Description

Usage

Format

Source

Young People's Views on Development Cooperation 2012 response data

Description

Usage

Format

Source

English Sample Survey Data: Patient Joe

Description

Usage

Format

Source

Child Barometer 2016 Bullying response data in CoNLL-U format with NLTK stopwords removed and background variables

Description

Usage

Format

Source

Child Barometer 2016 Bullying response data in CoNLL-U format with NLTK stopwords removed

Description

Usage

Format

Source

Concept Network- Plot comparison Concept Network

Description

Usage

Arguments

Value

Examples

Concept Network - Get TextRank edges

Description

Usage

Arguments

Value

Examples

Concept Network- Get unique nodes from a list of top n-grams tables

Description

Usage

Arguments

Value

Examples

Concept Network- Get unique nodes from separate top n-grams tables

Description

Usage

Arguments

Value

Examples

Concept Network - Get TextRank nodes

Description

Usage

Arguments

Value

Examples

Plot Concept Network

Description

Usage

Arguments

Value

Examples

Concept Network - Search TextRank for concepts

Description

Usage

Arguments

Value

Examples

Make comparison cloud

Description

Usage

Arguments

Value

Examples

Concept Network - Make Concept Network plot

Description

Usage

Arguments

Value