Front. Psychol. Frontiers in Psychology Front. Psychol. 1664-1078 Frontiers Media S.A. 10.3389/fpsyg.2022.937058 Psychology Original Research On the Design and Validation of Assessing Tools for Measuring the Impact of Programs Promoting STEM Vocations Herce-Palomares María Pilar 1 * Botella-Mascarell Carmen 2 de Ves Esther 2 López-Iñesta Emilia 3 * Forte Anabel 4 Benavent Xaro 2 Rueda Silvia 2 1International PhD School (EIDUNED), National Distance Education University (UNED), Madrid, Spain 2Department of Computer Science, Universitat de València, Burjassot, Spain 3Department of Didactics of Mathematics, Universitat de València, València, Spain 4Department of Statistics and Operational Research, Universitat de València, Burjassot, Spain

Edited by: Sergi Fàbregues, Open University of Catalonia, Spain

Reviewed by: Elizabeth G. Creamer, Virginia Tech, United States; Lina Montuori, Universitat Politècnica de València, Spain

*Correspondence: María Pilar Herce-Palomares mherce2@alumno.uned.es Emilia López-Iñesta emilia.lopez@uv.es

This article was submitted to Gender, Sex and Sexualities, a section of the journal Frontiers in Psychology

27 06 2022 2022 13 937058 05 05 2022 31 05 2022 Copyright © 2022 Herce-Palomares, Botella-Mascarell, de Ves, López-Iñesta, Forte, Benavent and Rueda. 2022 Herce-Palomares, Botella-Mascarell, de Ves, López-Iñesta, Forte, Benavent and Rueda

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

This paper presents the design and validation process of a set of instruments to evaluate the impact of an informal learning initiative to promote Science, Technology, Engineering, and Mathematics (STEM) vocations in students, their families (parents), and teachers. The proposed set of instruments, beyond assessing the satisfaction of the public involved, allow collecting data to evaluate the impact in terms of changes in the consideration of the role of women in STEM areas and STEM vocations. The procedure followed to develop the set of instruments consisted of two phases. In the first phase, a preliminary version (v1) of the questionnaires was designed based on the objectives of the Girls4STEM initiative, an inclusive project promoting STEM vocations between 6 and 18 years old boys and girls. Five specific questionnaires were designed, one for the families (post activity), two for the students (pre and post activity) and two for the teachers (pre and post avitivity). A refined version (v2) of each questionnaire was obtained with evidence of content validity after undergoing an expert judgment process. The second phase was the refinement of the (v2) instruments, to ascertain the evidence of reliability and validity so that a final version (v3) was derived. In the paper, a high-quality set of good practices focused on promoting diversity and gender equality in the STEM sector are presented from a Higher Education Institution perspective, the University of Valencia. The main contribution of this work is the achievement of a set of instruments, rigorously designed for the evaluation of the implementation and effectiveness of a STEM promoting program, with sufficient validity evidence. Moreover, the proposed instruments can be a reference for the evaluation of other projects aimed at diversifying the STEM sector.

diversity in STEM gender stereotypes informal education self-efficacy questionnaire validation mixed methods Fundación Española para la Ciencia y la Tecnología10.13039/501100011100 Generalitat Valenciana10.13039/501100003359 Ministerio de Ciencia e Innovación10.13039/501100004837

香京julia种子在线播放

    1. <form id=HxFbUHhlv><nobr id=HxFbUHhlv></nobr></form>
      <address id=HxFbUHhlv><nobr id=HxFbUHhlv><nobr id=HxFbUHhlv></nobr></nobr></address>

      1. Introduction

      In recent years, multiple initiatives have emerged, from public and private institutions, to promote interest in disciplines related to Science, Technology, Engineering and Mathematics (STEM), especially among girls from an early age. These initiatives play a fundamental role in showing the relationship that exists between careers and professions in STEM areas and the generation of benefits in society. In addition, they serve to increase the visibility of proximity STEM female referents (UNESCO, 2017), helping to eliminate gender stereotypes (Sáinz et al., 2019).

      The School of Engineering of the University of Valencia (ETSE-UV), in Spain, launched in 2011 a pilot program focused on increasing and retaining the number of Information and Communication Technology (ICT) female students in the institution (Botella et al., 2019). The results showed an increase in the proportion of female students in highly male-dominated ICT-related disciplines with a lower proportion of women in general (López-Iñesta et al., 2020). However, it was also observed that a degree such as Chemical Engineering, traditionally with a higher presence of women, showed a constant decrease in female enrollment. This suggested that a continuous effort was needed from educational institutions, public entities, professionals, and families to break the gender diversity gap in STEM (Sáinz and Müller, 2018; López-Iñesta et al., 2020).

      The problem of the gender diversity gap in STEM disciplines, and specially in the ICT field, has been considered and analyzed from different perspectives (see Bian et al., 2017; Diekman et al., 2017, 2019; Sáinz and Müller, 2018; Botella et al., 2019; Sáinz et al., 2019; Benavent et al., 2020; López-Iñesta et al., 2020; Ayuso et al., 2021; Gladstone and Cimpian, 2021; Guenaga et al., 2022 and references therein). From these works, aspects such as the influence of gender stereotypes, the effectiveness of using role models, the concept of self-efficacy in STEM or understanding the impact of communal goal processes arise as fundamental factors to be covered by initiatives or programs focusing on pre-university students and aiming at diversifying STEM. There is a second pool of factors related to STEM working environments (i.e., perception of male-dominated environments, lack of work-life balance) which cannot be directly impacted by these type of initiatives. Instead, a large agreement between different social and economical actors should be sought.

      In 2019, the Girls4STEM initiative was launched in the ETSE-UV as an evolution of the pilot program. The main feature of the project is that the target audience comprises pre-university students from 6 to 18 years old, as well as their families and teachers (Benavent et al., 2020). It is a project for both boys and girls, with an emphasis on girls, which is framed in the Sustainable Development Goals (SDGs) of United Nations and it is also aligned with the III Equality Plan of the University of Valencia (López-Iñesta et al., 2020). The specific objectives of the Girls4STEM initiative are: i) To awake curiosity about STEM disciplines from an early age; ii) To encourage the participation of students, teachers, families, and companies as a fundamental part of the project; iii) To give visibility to women developing their professional work in STEM areas and show their research, developments and progress; and iv) To increase the number of students in STEM studies through outreach activities such as seminars, workshops or interviews with leading women in STEM. The initiative is arranged around two main activities, Girls4STEM family, focused on pre-university students, their families and teachers, and Girls4STEM Professional, targeting a general audience. Note that a full description of the initiative can be found in Benavent et al. (2020). The initiative builds upon a large database of volunteer female STEM professionals, which are the ones interacting with the students and teachers via the family action or with the general audience via the professional action. The female STEM professionals act then as proximity role models, mitigating the impact of gender stereotypes, while the database helps increasing the visibility of their contributions to the society, reinforcing the link with communal goal objectives (Botella-Mascarell et al., 2021). In the family action, students gather with the STEM experts and they create 3 min videos about them which are later uploaded into the Girls4STEM YouTube channel. A contest is then arranged between the participating schools, where the Girls4STEM initiative selects the videos which best reflect the aims of the project.

      The Girls4STEM initiative has been consolidated in two editions, being the edition 2021–2022 currently on-going. At this point, it is essential to have instruments with sufficient evidence of validity to evaluate with scientific rigor the impact of the initiative, as indicated by Tena Gallego and Couso (2019), beyond the satisfaction of the public involved. With this aim, this paper presents the design and validation process followed to obtain a set of instruments to evaluate the impact of the Girls4STEM initiative in the family action. To this end, the role of formal and informal learning contexts in STEM education is reviewed next, and the focus is then placed in informal education initiatives.

      1.1. State of the Art

      STEM education takes place in both formal and informal contexts and both need to be connected to promote students' STEM skills. Interestingly, informal education can overcome many of the shortcomings of formal education (Herce Palomares et al., 2022). Activities promoted by different initiatives or entities such as universities, museums, science fairs or contests are examples of informal education scenarios in which students, teachers, families or citizen participation is promoted (López-Iñesta et al., 2022). The audience and researchers/professionals in different fields can establish a useful bidirectional communication for fostering interest in STEM areas. From this point of view, the Girls4STEM initiative can be classified as an informal education/learning action organized by a Higher Education Institution. Girls4STEM builds bridges with formal education, involving both teachers and students' families from a systemic, integral and holistic educational vision. Although the word “informal” suggests insufficient correctness, it is actually highlighting the features of the learning environment. As pointed out in Allen and Peterman (2019), informal learning might contribute to achieve high levels of area-specific expertise for motivated student's. In addition, research suggests that educational experiences to promote STEM expertise in informal education play a decisive role (Herce Palomares and Román González, 2021) and, they also contribute to challenge common ideas and beliefs linked to STEM fields in formal education, as well as others related to scientific education (Benavent et al., 2020). In informal education learning, evaluation is one of the key components. Whilst helping to identify if aims and objectives have been met, it can also assist with planning, provide evidence of impact, and critically reflect for future engagement activities. Therefore, evaluation is a process that should run from the start of a project and continue after it has finished (Robinson and Murray, 2019).

      Evaluating the impact in informal learning contexts poses a set of particular challenges (Habig, 2020). Firstly, evaluations should preserve the informal nature of science experiences, while defining appropriate evaluation metrics, using a common language, goals, and theories (National Research Council, 2009). Coupling these challenges with constraints on time, money, and operational capacity, the difficulty of obtaining meaningful, reliable and feasible evaluations becomes clear. The evaluation should then tackle these challenges to provide useful evidence-based information (Fu et al., 2016). Secondly, formal learning experiences are primarily intended to impart scientific knowledge and skills. However, informal learning experiences are intended to arouse curiosity, interest and encourage intrinsic motivation as “stepping stones” for STEM learning. This increases the difficulty of the evaluation process, since constructs such as interest, motivation and curiosity are more difficult to define, operationalize and measure (National Research Council, 2009). In this sense, evaluating the impact of educational interventions in informal STEM education requires the design of instruments that address the project objectives.

      Three future directions for the measurement of the outcomes of informal STEM education actions are suggested in Grack Nelson et al. (2019). First, the measurement capacity should be enhanced. Currently, there is a small number of online repositories, covering also a limited range of activities and audience. Second, stronger collaborative networks should be established. These type of networks would allow to achieve shared measures combining different expertise (measurement experts, educational researchers, STEM experts). Finally, it is mandatory to increase the accessibility of shared measures. There are barriers related to intellectual property rights or instruments not accessible due to journal publishing options.

      Another challenge related to the evaluation of the impact in informal STEM education is the broad range of projects and the large variety of methods used to conduct the evaluation. The most common form of evaluation is the user survey (Robinson and Murray, 2019). When designed well and interpreted appropriately, self-report surveys can be used to gather useful data from large samples at relatively low-cost (Wolf et al., 2021). Note that informal education initiatives are usually constrained by low budgets and hence, sustainable implementations should be sought. Therefore, in this work, the user survey technique via questionnaires is proposed to evaluate the impact of the Girls4STEM initiative in the family action, by designing and validating a set of questionnaires targeting pre-university students, their families and teachers.

      With the increasing development and use of shared measures across the STEM education field, it comes the need for evaluators to better understand and assess instrument's technical qualities, in particular reliability and validity (Grack Nelson et al., 2019). On the one hand, the design of the evaluation instruments must be based on the objectives of the project. However, the questionnaires must undergo a validation process. Content validity evidence relates to how well the construct of interest is represented in the content of an instrument (Haynes et al., 1995; AERA, 2014). Such evidence can be collected by reviewing the literature and gathering feedback from experts related to the construct being measured. Experts review how the construct was defined, identify what is missing from the definition, and help to ensure that the essence of the items or tasks in the measure adequately cover the content area. On the other hand, evidence of the reliability of the questionnaires, after being administered to a pilot sample, is needed. Cronbach's alpha is commonly used to examine the internal consistency or reliability of summated rating scales (Cronbach, 1951; Cronbach and Shavelson, 2004; AERA, 2014), although there is an on-going discussion regarding its limitations (Trizano-Hermosilla and Alvarado, 2016; Xiao and Hau, 2022). Internal consistency describes the extent to which all the items in a test measure the same concept (or construct) and hence it is connected to the inter-relatedness of the items within the test. In addition to obtaining the reliability of the scale items, it is necessary to evaluate how open-response items work in the pilot sample. In this way, it is possible to check whether the answers given in the questionnaires have the same meaning for the target audiences as for the researchers interpreting the data (Wolf et al., 2021). Figure 1 summarizes the main advantages and challenges faced by STEM informal learning contexts, as well as the main constructs to measure and some hints about the instruments design.

      Advantages and disadvantages faced in STEM informal education, main constructs to measure and some hints about the evaluation.

      1.2. The Present Study

      This study tackles good practices focused on promoting gender diversity in the STEM sector from a Higher Education Institution perspective. A high-quality example of a gender-based intervention study in informal STEM education is presented, with sufficient evidence of the validity of a set of rigorously designed instruments for the evaluation of the implementation and effectiveness of the project. In addition, these instruments can be a reference for the evaluation of other projects aimed at reducing the gender diversity gap in STEM areas. The process and the results presented in this paper contribute to the directions suggested by (Grack Nelson et al., 2019), since the measurement capacity is increased, the questionnaires are accessible to other researchers and hence, there is potential to build a collaborative network. The main objective of this work is then to design and obtain evidence of reliability and validity of a set of instruments designed to evaluate the impact of the Girls4STEM initiative. This objective can be broken down into a set of specific objectives:

      To design a set of questionnaires to evaluate the impact of the Girls4STEM initiative (family action). Each questionnaire will be specific for a different audience group: pre-university students, their families and teachers.

      To obtain evidence of content validity of the set of questionnaires.

      To obtain evidence of reliability of the set of questionnaires after administration to a sample and to assess whether the answers in self-assessment questionnaires have the same meaning for the target audiences and the researchers who interpret the data.

      As discussed in the introduction, the gender diversity gap in STEM has been already considered from different perspectives. In Spain, the percentage of enrolled female students in the different STEM disciplines is not uniform. For example, in 2020-2021, there is a percentage of enrolled female students of 59.9% in life-sciences. In the case of Engineering, the number of enrolled female students goes down to 26.1%, and to 14.2% in the case of Computer Science1. There are several initiatives or projects located in Spain that work toward diversifying the STEM sector (Botella et al., 2020). Most of them can be classified as informal education actions, and they also face the evaluation challenges discussed above. Note that some of these initiatives are nodes from international projects. Some representative examples in Spain are, first of all, the Inspira STEAM Program, which is a mentoring program for students between the ages of 10 and 12 years. Results of the program showed an impact on the students' attitudes toward technology, an increase in the number of female STEM referents the student's knew, and an improvement of the students' opinion regarding vocations and professions related to science and technology. Moreover, a larger impact was measured among girls (Guenaga et al., 2022). Secondly, the program by the Inspiring Girls Foundation focuses on pre-university 12–16 years old girls, which interact with female role models working in STEM fields. Reference (González-Pérez et al., 2020) shows a set of benefits on mathematics enjoyment, importance attached to math, expectations of success in math, and girls' aspirations in STEM, and a negative effect on gender stereotypes, among others. Thirdly, the project Science and Technology as Feminine aims at students in the 1st to 3rd years of compulsory secondary education (therefore aged 11–14 years). Results in Santos et al. (2021) show that it should be possible to reduce the gender gap in the future career choices of young students, through the design of a set of activities addressed to individual students, the students' families and peers, schools and society at large, aimed at changing the habits, which for many years have steered women away from STEM. Despite the relevance and impact of the above STEM education initiatives, there is a lack of instruments with evidence of reliability and validity to assess the impact of the projects themselves, since they either make use of questionnaires to measure specific dimensions (i.e., gender stereotypes (Colás Bravo and Villaciervos Moreno, 2007), mathematical self-efficacy (Schwarzer and Baessler, 1996) and attitudes toward technology (Kier et al., 2014)) or questionnaires without a sufficient design and validation process. To the best of our knowledge, this paper contributes to the state of the art of informal STEM education by providing the description of the process and evidences of reliability and validity of a set of instruments that were designed to specifically assess Girls4STEM's objectives.

      The paper is organized as follows. Section 2 presents the two phases followed for the design and validation of the proposed set of instruments. Details about the samples used in each one of the phases are given and the data analysis approach followed is explained. The section finishes providing the results obtained in terms of content validity and reliability for the set of instruments. Finally, section 3 discusses the main findings of this research.

      2. Materials and Methods

      The present work uses a Mixed Methods Research (MMR) approach whereby both qualitative and quantitative data are collected and analyzed in the same study. MMR is often used in social and behavioral studies, such as education or health, to strengthen the reliability of qualitative data, allowing to put quantitative results in a context and enriching the findings and conclusions (Creswell and Clark, 2003; Onwuegbuzie and Johnson, 2006; Anguera et al., 2012; Fàbregues et al., 2019). In the specific context of this work, using mixed methods can both increase the validity and reliability of the data collected with the designed instruments and improve the evaluation procedure to measure the impact of the initiative (Shekhar et al., 2019; Griffiths et al., 2021; Hargraves et al., 2021. In this sense, the aim of the study is to design and validate a set of different instruments for measuring the impact on students, parents and teachers of a program promoting STEM vocations that can be used on a large scale by other researchers.

      The procedure consisted of two phases. First, in phase I, a preliminary version of the questionnaires was designed by the leading researcher based on the objectives of the Girls4STEM initiative, obtaining a first version (v1) of each one. Afterwards, 6 experts participating in the project and with experience in instrument construction and validation, modified and/or polished the items of the different questionnaires through an expert judgment process to obtain evidence of content validity, deriving the version (v2). In the second phase, phase II, the version (v2) instruments were distributed to a pilot-sample. Evidence of reliability was gathered and a final refinement process was carried out. Finally, the final version (v3) was obtained. All the questionnaires collected socio-demographic information and some indicators with a response format with open-ended, multiple choice answers and Likert scale options (1 to 5). Figure 2 summarizes the steps followed during the process of design and validation of the instruments.

      Phase I and phase II stages, and questionnaire versions obtained in each one of them.

      2.1. Instrument: Design and Validation Process

      In this subsection, the two-phase process for obtaining the instruments is detailed. Note that there are a total of five questionnaires targeting different groups: parents (post-activity), students-pre (prior to activity), students-post (post-activity), teachers-pre (prior to activity) and teachers-post (post-activity). The first instrument is a questionnaire for families, administered once the participation in the project is finished. It includes indicators on the overall impact of the initiative and on the individual (family member). An indicator is also provided on the possible improvement of the project and the promotion of STEM within the family. Secondly, there are two questionnaires for students that are applied before and after participating in the project. The pre questionnaire collects indicators on STEM interests, their perception of STEM competence and performance in STEM subjects. The post collects indicators on the degree of participation, the impact and possible improvement of the project. The teachers' questionnaires are also arranged in pre and post. The pre includes indicators on motivation and expectations of the project. The post questionnaire asks about their participation degree, the project impact, and suggestions for improvement.

      Phase I. Design and evidence of content validity using the expert judgment method. The first phase consisted of two parts. Firstly, an initial version (v1) of the questionnaires was designed by the leading researcher and secondly, evidence of content validity using the expert judgment method was obtained, after which a new version (v2) of each of the five questionnaires was available.

      The five questionnaires in their initial version (v1) were designed using as a reference the objectives of the Girls4STEM initiative. A set of items was generated to collect inputs from the subjects participating in the family action (families/parents, students and teachers), and the dimensions to be measured according to the objectives were specified. An ad hoc questionnaire for each of the five questionnaires was then prepared, which was distributed to the committee of experts for undergoing the expert judgment process. These ad hoc questionnaires asked about the pertinence/representativeness (whether the items are representative of the dimensions they are intended to measure), relevance (whether the items contribute with important information to the measurement of the dimension) and formulation (whether the items are understood, unambiguous and clear), all on a Likert scale from 1 (not at all in agreement) to 6 (totally in agreement).

      In addition, after each set of items, suggestions were requested in open-ended questions when not in complete agreement and an open-ended question was provided at the end of each questionnaire, for any relevant considerations on the design of the instrument. The five ad hoc questionnaires were distributed to the committee of experts online, and they were sent to them as well in advance, so that the five questionnaires could be accessed before making their judgments.

      Phase II. Distribution of the instruments to a pilot sample. In the second phase, the five instruments in version (v2) were administered through non-probabilistic purposive sampling to a pilot sample of families (parents), students and teachers participating in Girls4STEM in the 2020–2021 academic year. Before the start of the project and the distribution of each questionnaire, informed consent was requested and the current legislation on data protection was complied with, while maintaining the confidentiality of the data. A double analysis (quantitative and qualitative) was performed with the results. First, with the quantitative information, the reliability as internal consistency was calculated from the two-factor model based on the average correlation between the items, using the SPSS v27 program (George and Mallery, 2010), and studying the items on a Likert scale. Secondly, the open-ended questions were analyzed by the group of researchers by means of a content analysis to determine how the questionnaire worked in the population and to be refined if necessary.

      2.2. Sample

      In this subsection, a description of the sample of each one of the phases is provided.

      Phase I. Six female researchers made up the committee of experts. This is a non-probabilistic purposive sample, all of them being women. The selection meets the criteria proposed by Skjong and Wentworth (2000) for purposive sampling: experience in making judgments and decisions based on evidence or expertise, reputation in the community, availability and motivation to participate, impartiality and inherent qualities such as trustworthiness and adaptability.

      Phase II. A total of 8 schools, all of them located in the Valencian Community, participated in the Girls4STEM initiative during the 2020–2021 academic year. From these schools, 6 were public and 2 were charter schools. Regarding their geographical origin, 2 of them were located in small cities (population < 30, 000), 3 in medium-sized cities (population < 100, 000), while 3 were located in large cities (population > 100, 000). This brings the total group of students participating to 298, distributed between 84 in small cities, 109 in medium-sized cities and 105 in large cities.

      The final sample used for this study, eliminating those students who did not fill in the pre or post questionnaires, was 268 students, 18 teachers (16 female and 2 male teachers) and 113 family members (88 female and 25 male). Therefore, the sample was constructed by non-probability purposive sampling.

      Table 1 shows the distribution of participating students according to gender, with a higher percentage of female students (62%), and educational level, defining the following levels: primary, secondary with 2 subgroups by age, and professional studies.

      Regarding the education level, the table shows that the largest group was secondary education with students between 12 and 16 years old, accounting for 78% of the total sample. The educational level with the lowest representation in our sample corresponded to secondary education, aged 17–18 (0.03%).

      Number of students who completed the pre and post questionnaires by gender and educational level.

      Gender
      Educational level Male Female Undeclared Total
      Primary 16 15 1 32
      Secondary (12–16 years old) 74 135 1 210
      Secondary (17–18 years old) 5 4 9
      Professional studies 5 12 17
      Total 100 166 2 268
      2.3. Data Analysis

      Data have been processed according to the specific objectives of the research and the established phases. A description of the process followed in each phase is included in this subsection.

      Phase I. The SPSS version 27 software was used to calculate the evidence of content validity. Firstly, the mean of the items of each questionnaire in the three dimensions under evaluation (representativeness, relevance, and formulation) was obtained. Given that the Likert scale consisted of 6 points, the criterion for refining an item was that a mean less than 5 were obtained (a value of 5 suggested agreement and 6 suggested total agreement). Secondly, the internal consistency of the judgments issued was calculated by obtaining Cronbach's alpha as intraclass correlation coefficients, according to the bidirectional random model of consistency suggested by Gwet (2014). Finally, the Content Validity Ratio (CVR) of each item was calculated by applying the model of Lawshe (1975) modified by Tristán-López (2008): CVR=ne/N, where ne is the number of experts who gave a favorable judgment (5 or 6 in representativeness) and N is the total number of experts who responded to the ad hoc questionnaire. The CVR provides evidence of content validity for each indicator. From this model, items are considered essential when scores of 5 and 6 are obtained on the Likert representativeness scale. Any item with a score lower than 0.58 should be deleted (Tristán-López, 2008). The ad hoc questionnaires also offered open-ended questions to complete the assessments. In the event that an item needed to be refined, it was modified according to the suggestions of the experts.

      Phase II. The data collected after the administration of the version (v2) instruments to a pilot sample of subjects (parents, teachers and students) was analyzed. With the quantitative information (Likert scale questions), Cronbach's alpha reliability coefficient was calculated. With the qualitative information, a content analysis was conducted, in order to assess the performance of the instruments in the sample and to refine them if necessary. Groenvold et al. (1997) suggests that, although rarely investigated, it is necessary to check whether the answers in self-assessment questionnaires have the same meaning for the target audiences as for the researchers who interpret and report the data.

      3. Results

      This section presents the results of the design and debugging process of the five questionnaires. Results of phase I provide evidence of content validity after the design process, for each of the five questionnaires. Results of the phase II include evidence of reliability of the scale items and an analysis of the performance of the qualitative items.

      3.1. Phase I

      First, the results related to the specific objectives 1 and 2 of the paper are presented. Table 2 summarizes the questionnaires in version (v1) including the dimensions, items and scale used in each one. The questionnaires collected the information that was considered appropriate for the measurement of the initiative's objectives, although for the objective of increasing the number of students in STEM studies, an indirect measurement of the results was proposed, by assessing interest at the time of the evaluation. As it can be seen in the table, the evaluation was not limited to measuring participant satisfaction. For each set of participants, the measurement of those aspects that were considered critical was proposed. In addition, indicators were included on issues relevant to achieving the aims of Girls4STEM, which are intended to be analyzed in further research, such as family involvement in promoting STEM interests, factors that contribute to student involvement in STEM studies, such as achievement or interest (UNESCO, 2017), or the role of teachers in promoting STEM vocations. Note that the questionnaires collected information on socio-demographic data, which is out of the scope of this study.

      Design of the questionnaires (v1).

      Questionnaire Dimensions (item number) Scale
      Parents Overall impact (1–3) 2 multiple choice
      1 dichotomous (with open-ended question)
      Impact on parents (4–7) 4 Likert (1–5 points)
      Satisfaction and project improvement (8–10) 1 Likert (1–5 points)
      2 open-ended questions
      Students-pre STEM interests (1–2) 1 dichotomous (with open-ended question)
      1 Likert (1–5 points)
      Achievement in STEM subjects (3) 1 open-ended question
      Students-post Degree of participation (1–2) 2 open-ended questions
      Impact on students (3–6) 4 Likert (1–5 points)
      Satisfaction and project improvement (7–9) 1 Likert (1–5 points)
      2 open-ended questions
      Teachers-pre Motivation toward the project (1–2) 2 open-ended questions
      Expectations (students) (3–5) 3 open-ended questions
      Expectations (teachers) (6) 1 open-ended questions
      Teachers-post Degree of participation (1–2) 2 open-ended questions
      Impact on students (3–5) 3 open-ended questions
      Impact on teachers (6–13) 1 open-ended question
      1 multiple choice
      6 Likert (1–5 points)
      Satisfaction and project improvement (14–15) 1 Likert (1–5 points)
      2 open-ended questions

      Dimensions and items, and scale of each one are included in the second and third row, respectively.

      After the design of the questionnaires in their initial version (v1), the questionnaires were subjected to expert judgment to reach evidence of content validity and to refine the questionnaires, if necessary. Ad hoc questionnaires were distributed for expert judgment, and the obtained results for the inter-rater reliability (Cronbach's alpha) are summarized in Table 3. In the following, the evidence of content validity is discussed for each questionnaire, both considering the mean of the items of each questionnaire and the internal consistency of the judgments.

      Evidence of content validity of the parents questionnaire. The questionnaire for parents (v1) consisted of a total of 10 items (see Table 2). The results in terms of the mean of the items after the expert judgment are shown in Table 4. In the dimension of representativeness, the mean of all the items ranged between 5.67 and 6, so none of them had to be modified, according to the criterion defined beforehand. Cronnbach's alpha coefficient in Table 3 suggested sufficient consistency with a value of 0.262. Finally, the CVRs for all the items were 1, which leaded to the conclusion that the questionnaire had sufficient evidence of content validity in the representativeness dimension, i.e., the items were representative of the dimensions they were intended to measure. In the relevance dimension, the results were similar to the ones in the representativeness dimension, with means between 5.67 and 6 (Table 4) and a Cronbach's alpha as intraclass correlation of 0.4 (Table 3). The formulation dimension pointed in another direction. Both item 2 and 4 showed values below 5, so both needed to be reformulated. In spite of this, this dimension presented a high consistency, since Cronbach's alpha value was 0.895. In order to proceed with the refinement, the open-ended questions were analyzed qualitatively. In item 2, two experts suggested introducing “in his/her family” and in item 5, replacing “the role” with “participation.” The suggestions were accepted and both items were reformulated.

      Evidence of content validity of the students-pre questionnaire. The initial student questionnaire (v1) consisted of three items (see Table 2), although the first item offered a dichotomous response which, if affirmative, required an explanation in an open-ended question. Table 5 shows the results of the mean of the items after the expert judgment for each dimension. In the representativeness dimension, the mean of the items ranged between the values 5.33 and 5.67 (no rephrasing of any of the items necessary). These results were consistent with a Cronbach's alpha of 0.8 (Table 3). In addition, none of the items needed to be deleted in terms of the CVR criterion, since all of them reached the maximum value (CVR = 1, except item 2 with CVR = 0.83, which also exceeded 0.58). In the dimension of relevance, Cronbach's alpha (Table 3) again suggests consistency in the judgments (Cronbach's alpha = 0.6). The means were higher than in the previous dimension, with values between 5.67 and 6. However, as in the questionnaire for families, the dimension of formulation showed a very high consistency (Cronbach's alpha = 0.944 in this case), but the means indicated the need to reformulate item 2 (mean = 4.17) and 3 (mean = 4.5). Therefore, the open-ended questions of the expert judgment that explained this result were studied. Given that in the Spanish educational system the subjects in the primary and secondary education stages related to STEM contents are different, the experts proposed to specify the term “STEM” in the curricular subjects of both indicators and to not limit the answers to primary education subjects. For version (v2) of this questionnaire, STEM interests (item 2) and school performance (item 3) were defined on the basis of these subjects. Finally, in the open-ended questions at the end of the ad hoc questionnaire, it was suggested to incorporate a new dimension, the self-efficacy (perceived achievement), as the experts judges considered it to be a relevant indicator in STEM education. A new indicator was added as requested by the experts.

      Evidence of content validity of the students-post questionnaire. The final student questionnaire (v1) consisted of 9 items (see Table 2). Table 6 shows the results of the mean of the items after the expert judgment en each dimension. In the dimension of representativeness, item 1 and 2 were below the criterion (at least 5). In addition, the results showed a high consistency (Cronbach's alpha = 0.987) and the CVR warned about a low-content validity of the first two items, since CVR = 0.33 and CVR = 0.17 for item 1 and 2, respectively. This indicated that both items should be deleted. The experts' feedback on the open-ended questions was reviewed. In item 1, they considered that it was not a decision for the students to take, so the item was not appropriate. For item 2, both in this dimension and in relevance, they suggested incorporating the measurement of the degree of participation with new indicators such as justifying participation in the specific project, and quantitatively specifying the degree of participation in number of hours. Items 1 and 2 were eliminated and two new items were created to evaluate the degree of participation. The information obtained in the results for relevance was similar to the representativeness dimension, with the first two items of the degree of participation being the ones that need to be modified. The means of the items 1 and 2 were again below the criterion. Cronbach's alpha reached a high value (Cronbach's alpha = 0.981) and the open-ended questions raised the point found in the dimension of representativeness. Both items 1 and 2 were reformulated. The formulation dimension showed much more satisfactory results, as all the means were above the criterion and Cronbach's alpha = 0.273, so the consistency was sufficient. No item was subject to change after the results in the formulation. However, in the open-ended question of the final part of the ad hoc questionnaire, two experts suggested changing the order of presentation of items 4 and 5. They argued that item 5 was related to the interests raised in item 3, although in this case in relation to the professions. The suggested change in the presentation format was included.

      Evidence of content validity of the teachers-pre questionnaire. The initial teacher questionnaire (v1) consisted of 6 open-ended questions items (see Table 2). Table 7 shows the results of the mean of the items after the expert judgment for each dimension. In the representativeness dimension, only item 2 was below the criterion and needed refinement. The inter-rater reliability was sufficient (Cronbach's alpha = 0.935), but item 2 showed a CVR = 0.33, which indicated that the item should be removed from the questionnaire. Item 4 showed a CVR = 0.66, but it was kept in the questionnaire since it exceed the criterion of 0.58 (Tristán-López, 2008). In the dimension of relevance, item 2 was also below the criterion. The judgments were consistent, since Cronbach's alpha = 0.946. Finally, the formulation dimension did not require modification, since the means were above the criterion and Cronbach's alpha = 0.359. In summary, item 2 was eliminated, and version (v2) was composed of 5 items.

      Evidence of content validity of the teachers-post questionnaire. The final teacher questionnaire (v1) consisted of 16 items (see Table 2). Table 8 shows the results of the mean of the items after the expert judgment for each dimension. In the dimension of representativeness, item 2 was below the criterion. The results were consistent with Cronbach's alpha = 0.69. The CVR of all the items was 1, except for item 2 where CVR = 0.66. Since this value exceeded the criterion of 0.58, the item did not need to be removed. The results in the dimension of relevance were larger, but item 2 was below the criterion. The judgments were consistent with a Cronbach's alpha = 0.92. In the formulation dimension, the results were similar to the other dimensions, with a Cronbach's alpha = 0.942 and the mean of item 2 below the criterion. The judges open-ended responses were revised for item 2. The suggestion was to divide item 2 and quantify it. Hence, the item 2 (“How much time have you spent on it and how much time have your students spent on it?”) was divided in two new items: “Indicate the number of hours you have spent” and “Number of videos in which you have participated.” After the modification, version (v2) was composed of 17 items.

      Inter-rater reliability (Cronbach's alpha).

      Questionnaire Dimension Cronbach's alpha
      Parents Representativeness 0.262
      Relevance 0.406
      Formulation 0.895
      Students-pre Representativeness 0.8
      Relevance 0.6
      Formulation 0.944
      Students-post Representativeness 0.987
      Relevance 0.981
      Formulation 0.273
      Teachers-pre Representativeness 0.935
      Relevance 0.946
      Formulation 0.359
      Teachers-post Representativeness 0.69
      Relevance 0.92
      Formulation 0.942

      Mean (parents).

      Mean Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9 Item 10
      Representative 5.83 6 5.83 6 5.67 6 6 5.67 6 6
      Relevance 5.83 6 5.67 6 5.67 6 6 6 6 6
      Formulation 6 4 5.83 5.83 4.17 6 6 6 6 6

      Mean (students-pre).

      Mean Item 1 Item 2 Item 3
      Representative 5.67 5.33 5.33
      Relevance 5.67 5.67 6
      Formulation 6 4.17 4.50

      Mean (students-post).

      Mean Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9
      Representative 3.17 3 6 5.83 5.67 6 6 6 6
      Relevance 2.50 2.83 5.83 6 5.50 6 5.67 5.83 5.83
      Formulation 5.83 6 5.67 5.83 6 6 6 6 6

      Mean (teachers-pre).

      Mean Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
      Representative 5.67 3.17 5.67 5 5.67 6
      Relevance 5.67 3 5.67 5 5.5 6
      Formulation 5.83 5.50 6 6 6 5.83

      Mean (teachers-post).

      Media Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9 Item 10 Item 11 Item 12 Item 13 Item 14 Item 15
      Representative 5.67 4.67 5.67 5.5 6 6 5.67 6 5.83 5.83 5.67 6 5.83 6 5.5
      Relevance 5.67 3.83 5.67 5.67 6 6 5.83 6 6 5.83 5.83 6 5 6 5.5
      Formulation 6 3.7 5.83 5.83 5 6 6 6 5.17 5.83 6 6 5.83 6 6

      Once phase I was completed, all five questionnaires were available in version (v2), with sufficient evidence of content validity in all of them.

      3.2. Phase II

      In order to collect data for phase II of this study, the pre-questionnaires were administered to students and teachers before interacting with the STEM experts, so gender and professional career aspects have not yet been discussed. The post-questionnaires for students, teachers and families (parents) were administered after each school submitted the STEM expert biography video to the initiative. All the questionnaires were delivered using the Microsoft forms platform. In the following, results related to the specific objective 3 of the paper are analyzed both quantitatively and qualitatively.

      3.2.1. Evidence of Reliability

      The aim was to ascertain the evidence of reliability and to refine the questionnaires if necessary. To this end, the results were analyzed quantitatively. Table 9 summarizes the dimensions, scale and analysis type of the different version (v2) questionnaires. The quantitative information was used to determine the evidence of reliability. To this end, reliability was calculated as internal consistency (using the SPSS v27 program), from the two-factor model based on the average correlation between the items that were formulated using a Likert scale. As it can be seen in Table 9, this analysis was feasible for all the questionnaires except for the teachers-pre case. Table 10 shows the results of evidence of reliability for each one of the questionnaires. The second column indicates the number of items that were evaluated (formulated using a Likert scale), the third column stands for the number of valid samples used out of the total number of responses collected from the pilot sample and the fourth column gives the value of the Cronbach's alpha coefficient. In the fifth column, the evaluated item number is provided, while column 6 shows the total correlation of the corrected item and finally, column 7 gives the Cronbach's alpha coefficient if the item is deleted. Note that item 11 of teachers-post questionnaire did not offer results after its calculation, since the answers of all the subjects presented the same value, in this case 5.

      Design of the questionnaires (v2).

      Questionnaire Dimensions (item number) Scale Analysis type
      Parents Overall impact (1–3) 2 multiple choice Qualitative
      1 dichotomous (with open-ended question) Qualitative
      Impact on parents (4–7) 4 Likert (1–5 points) Quantitative
      Satisfaction and project improvement (8–10) 1 Likert (1–5 points) Quantitative
      2 Open-ended questions Qualitative
      Students-pre STEM interests (1–2) 1 dichotomous (with open-ended question) Qualitative
      1 Likert (1–5 points) Quantitative
      Self-efficacy: perceived achievement (3) 1 Likert (1–5 points) Quantitative
      Achievement in STEM subjects (4) 1 open-ended question Qualitative
      Students-post Degree of participation (1–2) 2 open-ended questions Qualitative
      Impact on students (3–6) 4 Likert (1–5 points) Quantitative
      Satisfaction and project improvement (7–9) 1 Likert (1–5 points) Quantitative
      2 open-ended questions Qualitative
      Teachers-pre Motivation toward the project (1) 1 open-ended question Qualitative
      Expectations (students) (2–4) 3 open-ended questions Qualitative
      Expectations (teachers) (5) 1 open-ended question Qualitative
      Teachers-post Degree of participation (1–3) 2 open-ended questions Qualitative
      1 multiple choice answer Qualitative
      Impact on students (4–6) 3 open-ended questions Qualitative
      Impact on teachers (7–14) 1 open-ended question Qualitative
      1 multiple choice answer Qualitative
      6 Likert (1–5 points) Quantitative
      Satisfaction and project improvement (15–17) 1 Likert (1–5 points) Quantitative
      2 open-ended questions Qualitative

      The second column includes the dimensions and the item number in parentheses. The scale and the type of analysis are included in the third and fourth column, respectively.

      Summary of the Cronbach's alpha results in phase II.

      Questionnaire N of items N valid / N samples Cronbach's alpha Items Corrected item Cronbach's alpha
      (cases) (questionnaire) (Total correlation) if item deleted
      Parents 5 112 / 113 0.85 4 0.55 0.85
      5 0.73 0.80
      6 0.73 0.80
      7 0.67 0.81
      8 0.63 0.83
      Students-pre 4 32 / 32 0.49 2A 0.42 0.25
      (Primary) 2B 0.05 0.59
      3A 0.43 0.24
      3B 0.25 0.45
      Students-pre 6 218 / 236 0.82 2A 0.56 0.79
      (Secondary) 2B 0.52 0.80
      2C 0.66 0.77
      3A 0.55 0.79
      3B 0.56 0.79
      3C 0.61 0.78
      Students-post 5 220 / 220 0.8 3 0.67 0.73
      4 0.65 0.74
      5 0.58 0.76
      6 0.36 0.82
      7 0.67 0.73
      Teachers-post 6 14 / 14 0.65 9 0.33 0.63
      10 0.18 0.66
      11 - -
      12 0.07 0.68
      13 0.71 0.46
      14 0.66 0.47
      15 0.49 0.61

      George and Mallery (2010) suggest that, in order to evaluate the values of Cronbach's alpha coefficients, a value above 0.7 is considered acceptable. Loewenthal and Lewis (2001) warns that, in scales with less than 10 items, an internal consistency value of 0.6 can be considered acceptable. Results in Table 10 show that sufficient evidence of validity was achieved in all the questionnaires in the sample used, except for the students-pre questionnaire, with a Cronbach's Alpha = 0.49, which is a low value. The study of the corrected item-total correlation pointed out that item 2B presented a low linear correlation between this item and the total score of the scale. Moreover, Cronbach's alpha improved if this item was deleted. However, the item was kept, since it was actually the same question posed in 2A, but applied to the subject of natural sciences, instead of mathematics. In addition, it should be noted that having only 4 items in this questionnaire may have contributed to the low Cronbach's alpha coefficient.

      3.2.2. Analysis of Qualitative Information

      The goal is to provide meaningful feedback about the respondents' thought processes when responding to survey items. Then, it is necessary to gather evidence that survey items and response options are well understood by respondents Wolf et al. (2021). From the qualitative data, the answers given by all the participants were analyzed in parallel by each researcher to determine how the questionnaires worked in a real sample and to refine items if necessary. Researchers assessed the following questions for the items that had not been answered on a Likert scale in each questionnaire:

      q1. If the item was understood and corresponded to the measured dimension. In this way, it is possible to have evidence of face validity i.e., to recognize the pertinence of the evaluation system by analyzing the answers given. The researchers indicated yes or no. In case of a negative answer, the reasons were noted down.

      q2. If there were responses that could suggest presenting the item in another format or with some change in its presentation, in order to improve it. If they considered it appropriate, they suggested the reasons.

      q3. Observations, if they considered any comment necessary, when they had answered “no” in any of the previous items.

      Table 11 synthesizes by questionnaires and items the proposals of the group of 6 researchers. The columns “Relevance of the evaluation system” and “Presentation format” indicate the number of yes respondents from the 6 researchers. The last column, “comments,” includes the observations when the researchers disagreed or any other comments they considered of interest.

      Qualitative analysis (v2).

      Questionnaire Dimensions Item Relevance Presentation Comments
      format
      Parents Overall impact 1 6 6 No answer “nothing” or “other”
      2 6 6 No answer “other”
      3 6 6
      Satisfaction and project improvement 9 6 6
      10 6 6
      Students-pre STEM interests 1 6 6
      Achievement in STEM subjects 4 6 3 Modify to closed response (multiple choice)
      Students-post Degree of participation 1 6 6
      2 6 3 Modify to closed response (multiple choice)
      Satisfaction and project improvement 8 6 6
      9 6 6
      Teachers-pre Motivation toward the project 1 6 6
      Expectations (students) 2 6 6
      3 6 6
      4 6 6
      Expectations (teachers) 5 6 6
      Teachers-post Degree of participation 1 6 6
      2 6 3 Modify to closed response (multiple choice)
      3 6 6
      Impact on students 4 6 6
      5 6 6
      6 6 5 Add: “justify your answer”
      (some subjects indicate “positively” without explanation)
      Impact on teachers 7 6 6
      8 6 6
      Satisfaction and project improvement 16 6 6
      17 6 6

      Dimensions and number of items are included in the second and third column, respectively. The fourth and fifth columns collect the number of positive answers in each dimension, relevance and presentation format, respectively. Observations raised by the researchers are included in the last column.

      In general terms, it can be seen that all the responses to the items building the questionnaires met the objective for which they were designed, since all six researchers agreed that, after analyzing all the results, there was no response that did not meet the indicator. They also agreed that the presentation format was adequate in most of the items, but some needed to be revised. Fifty percent of the researchers proposed to modify the type of response in three items: i) in the initial questionnaire for students, item 4 (performance in STEM subjects); ii) in the final questionnaire for students, item 2 (degree of participation); and iii) in the final questionnaire for teachers, item 2 (degree of participation). In addition, other comments were raised in item 1 and 2 of the overall impact on parents, since some of the multiple-choice answers were not chosen, as indicated in the table. Following the parallel analysis, the researchers participated in a debriefing until a consensus was reached on the changes needed. The results and conclusions of the discussion were as follows:

      Parents questionnaire. One of the researchers suggested that some of the multiple-choice options were not selected by any subject. Although she considered that the presentation format was adequate, she offered this topic for discussion. Researchers agreed that since there was a possibility that some person may point out these options in another sample, the presentation format should be maintained.

      Students-pre questionnaire. Fifty percent of the researchers suggested modifying the presentation format in the achievement in STEM subjects (item 4). In the discussion it became clear that it was a numerical response and that the open response option caused some students to indicate values with decimals, others in intervals, others suggested not remembering their grade and even subjective sentences such as “very bad grade”. In the Spanish educational system, in secondary education, the optional nature of some subjects means that they are not prescriptive for all students. Therefore, in order to improve the coding and interpretation of the results, researchers agreed to present this item as a multiple-choice response with the following options: 0–3, 3.1–4.9, 5–5.9, 6–6.9, 7–8.9, 9–10, and I do not take this course.

      Students-post and teachers-post questionnaires. Item 2 measuring the degree of participation was discussed in both questionnaires. Fifty percent of the researchers suggested a closed response. Similar to the students-pre questionnaire discussion, a multiple-choice presentation format was decided, since it was seen that some answers provided intervals of hours of participation, or subjective sentences such as “many” or “the class hours.” To avoid difficulties in processing the information, the multiple options were specified as follows: 0–1 h, between 1 and 2 h, between 2 and 3 h, between 3 and 4 h, between 4 and 5 h, between 5 and 6 h, between 6 and 7 h, between 7 and 8 h, between 8 and 10 h, between 10 and 15 h and more than 15 h. These intervals were established based on the analysis of the answers given in the pilot sample. Finally, in the teachers-post questionnaire, a researcher suggested including “justify your answer” in item 6 on the impact on students, since she appreciated that some of the answers evaluated the project “positively” without providing arguments. The suggestion was accepted by the rest of the researchers, so the formulation of the question was modified.

      4. Discussion

      The research presented in this paper aims at contributing to the state of the art of informal STEM education by describing the process of how to obtain evidences of reliability and validity of a set of instruments. This set of instruments comprises five questionnaires for the evaluation of the impact of the family action from the Girls4STEM initiative, which includes all the participants: students, families (parents) and teachers. The initial specific objectives of this research have been fulfilled. Firstly, in phase I, the initial version (v1) of the questionnaires has been designed, considering the initiative's objectives and important dimensions to measure. The five questionnaires have been subjected to an expert judgment, to obtain evidence of validity of these instruments and to refine them if necessary. The results of all of them suggest high content validity through the calculation of the CVR, means and inter-rater reliability, which confirms the consistency of the results. Nevertheless, it has been necessary to delete some of the items, as well as to reformulate others. Specifically, the following changes have been necessary in the debugging process:

      Parents questionnaire: reformulation of items 2 and 5, given their means in the formulation dimension.

      Students-pre: reformulation of items 2 and 3, given their means in the formulation dimension. In addition, a new item on perceived achievement in STEM subjects has been added.

      Students-post: deletion of items 1 and 2, due to their CVRs values and their low means in representativeness and relevance. Two new items have been constructed from open-ended questions to determine the degree of participation (given that former items 1 and 2 were dealing with this metric). The order of items 4 and 5 has been changed, following the proposal in the open-ended questions.

      Teachers-pre: deletion of item 2, due to its CVR, in addition to the fact that the means in representativeness and relevance pointed to a need for reformulation.

      Teachers-post: reformulation of item 2 due to its representativeness, relevance and formulation means. Former item 2 has been split into two new items.

      Despite the modifications, all the questionnaires in version (v2) measure the dimensions proposed in Table 2, except the initial questionnaire for students, which includes a new dimension, the perception of competence (self-efficacy). In addition, there are some changes in the number of items, as the initial questionnaire for students goes from 3 to 4 items, the initial questionnaire for teachers reduces one item in (from 6 to 5) and the final questionnaire for teachers increases in one item (from 16 to 17). The design and feature of the questionnaires in version (v2) has been given in Table 9.

      Once the objective of designing the instruments in phase I has been achieved and sufficient evidence of content validity has been obtained in this expert judgment, the analysis of the questionnaires in version (v2) has been carried out in a pilot sample. The pilot sample contains students from all pre-university academic cycles (primary, secondary), is gender balanced in line with the inclusive spirit of the project, and the schools are located in diverse contexts (from small urban centers to large cities).

      The results regarding the evidence of reliability in the applied sample suggest that there is sufficient internal consistency of the Likert-type items included in each of the questionnaires. After the qualitative analysis of the remaining items, it is concluded that they have been answered in their entirety, in accordance with the purpose for which they were designed, so that the administration of the questionnaires to the pilot sample allows us to conclude that the objective of phase II has been achieved. In spite of this, it is necessary to modify some of the response formats. Specifically, in the initial student questionnaire, item 4 has changed from an open-ended question to a multiple-choice response to avoid the broad range of responses that has been observed when processing the qualitative analysis. The same happens with item 2 of the final questionnaire for students and teachers. In addition, item 6 of the teachers-post questionnaire adds the suggestion “justify your answer” to improve the quality of the gathered data. As a result of phase II, the version (v3) of the five questionnaires has been obtained, where the students-pre questionnaire, and the teachers-pre and teachers-post questionnaires have been modified as discussed above with respect to version (v2).

      The set of questionnaires, in their final version (v3), are a valuable resource for the evaluation of the family action of the Girls4STEM initiative, allowing to assess the impact over all target audiences (students, families and teachers). The mixed methods methodology has allowed to refine the set of instruments through the use of different techniques, such as the expert judgment. Moreover, the analysis of the set of instruments administered to a pilot sample of the study population has enabled the collection of evidence that survey items and response options are well understood by respondents.

      This set of instruments has been designed and validated with the aim of overcoming the challenges faced by the evaluation of informal STEM education actions. On the one hand, the instruments incorporate features in the evaluation that are often overlooked, such as improvement of the initiative, with measures at different times, e.g., pre and post action for students and teachers. On the other hand, completing the questionnaires does not require excessive time due to their well-designed formulation, which maximizes the likelihood that they will be completed properly by the participants, including primary students from lower courses which might be less familiar with filling on-line forms without help. The fact that they can be delivered on-line, simplifies the posterior data analysis and contributes to the sustainability of the initiative. In addition, preliminary reliability and validity evidence conducted by a multidisciplinary team of researchers has been provided, which to the best of our knowledge, positions this work as a core reference in informal STEM education contexts. Although the initiative Girls4STEM is located in Spain, the process followed to achieved the set of instruments in version (v3) can be applied to any informal evaluation initiative with a low-cost implementation. Moreover, the set of instruments is openly offered for review or administration in other educational experiences in informal education, so that particular features of different cultural contexts can be incorporated via each initiative's objectives. Nevertheless, it is desirable to continue researching and collecting new evidence in on-going and future editions of the initiative, in order to continue improving the rigor of the questionnaires, being applied to other samples or adapted for administration to other STEM educational projects.

      Data Availability Statement

      The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

      Ethics Statement

      Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent from the patients/ participants or patients/participants legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

      Author Contributions

      MH-P designed the research. CB-M, EV, and XB collected the data. MH-P, CB-M, EV, EL-I, AF, XB, and SR analyzed the research. MH-P, CB-M, and EL-I searched the literature. MH-P, CB-M, EV, and EL-I wrote the manuscript. All authors contributed to the article and approved the submitted version.

      Funding

      This research was partially supported by the project FCT-20-15904 from the Fundación Española para la Ciencia y la Tecnología (FECYT) and the Ministerio de Ciencia e Innovación and the project GV/2021/110 from Generalitat Valenciana.

      Conflict of Interest

      The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

      Publisher's Note

      All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

      The authors would like to thank all the entities that support the Girls4STEM initiative: the Vice-Principal of Equality, Diversity and Sustainability (University of Valencia); the Equality Unit (University of Valencia); the Scientific Culture and Innovation Unit (University of Valencia); and the Center for Training, Innovation, and Educational Resources in the Scientific, Technological, and Mathematical fields (CEFIRE STEM, Conselleria of Education, Research, Culture, and Sports of the Generalitat Valenciana), as well as the School of Engineering from the University of Valencia. Special thanks to all the primary and secondary schools that have participated in the three editions, the STEM experts, the project's sponsors, and last but not least, all the colleagues working in the project.

      Supplementary Material

      The Supplementary Material for this article can be found online at: /articles/10.3389/fpsyg.2022.937058/full#supplementary-material

      References AERA (2014). Standards for Educational and Psychological Testing. Lanham, MD: American Educational Research Association. Allen S. Peterman K. (2019). Evaluating informal STEM education: Issues and challenges in context. New Direct. Evaluat. 2019, 1733. 10.1002/ev.20354 Anguera M. T. Camerino O. Castañer M. (2012). Mixed methods procedures and designs for research on sport, physical education and dance, in Mixed Methods Research in the Movement Sciences: Case Studies in Sport, Physical Education and Dance (London), 327. Ayuso N. Fillola E. Masi,á B. Murillo A. C. Trillo-Lado R. Baldassarri S. . (2021). Gender gap in STEM: a cross-sectional study of primary school students' self-perception and test anxiety in mathematics. IEEE Trans. Educ. 64, 4049. 10.1109/TE.2020.3004075 Benavent X. de Ves E. Forte A. Botella-Mascarell C. López-Iñesta E. Rueda S. . (2020). Girls4STEM: Gender diversity in STEM for a sustainable future. Sustainability 12, 6051. 10.3390/su12156051 Bian L. Leslie S.-J. Cimpian A. (2017). Gender stereotypes about intellectual ability emerge early and influence children's interests. Science 355, 389391. 10.1126/science.aah652428126816 Botella C. López-Iñesta E. Rueda S. Forte A. de Ves E. Benavent X. . (2020). Iniciativas contra la brecha de género en STEM, in Una guía de buenas prácticas. [Initiatives against the gender gap in STEM. A guide to good practices]. In XXVI Jornadas sobre la Enseñanza Universitaria de la Informática (JENUI) (Valencia). Botella C. Rueda S. López-Iñesta E. Marzal P. (2019). Gender diversity in STEM disciplines: a multiple factor problem. Entropy 21, 30. 10.3390/e2101003033266746 Botella-Mascarell C. López-Iñesta E. Rueda S. Forte A. de Ves E. Benavent X. (2021). Using databases to improve the visibility of women's contributions in STEM fields, in 8th ACM Celebration of Women in Computing: womENcourage. Available online at: https://roderic.uv.es/handle/10550/80281. Colás Bravo P. Villaciervos Moreno P. (2007). La interiorización de los estereotipos de género en jóvenes y adolescentes. [The internalization of gender stereotypes in young people and adolescents]. Revista de Investigación Educativa 25, 3538. Available online at: https://revistas.um.es/rie/article/view/96421 Creswell J. W. Clark V. L. P. (2003). Designing and Conducting Mixed Methods Research, 2nd Edn. Thousand Oaks, CA: Sage Publications. Cronbach L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika 16, 297334. 10.1007/BF02310555 Cronbach L. J. Shavelson R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educ. Psychol. Meas. 64, 391418. 10.1177/0013164404266386 Diekman A. B. Clark E. K. Belanger A. L. (2019). Finding common ground: Synthesizing divergent theoretical views to promote women's STEM pursuits. Soc. Issues Policy Rev. 13, 182210. 10.1111/sipr.12052 Diekman A. B. Steinberg M. Brown E. R. Belanger A. L. Clark E. K. (2017). A goal congruity model of role entry, engagement, and exit: understanding communal goal processes in STEM gender gaps. Pers. Soc. Psychol. Rev. 21, 142175. 10.1177/108886831664214127052431 Fàbregues S. Paré M.-H. Meneses J. (2019). Operationalizing and conceptualizing quality in mixed methods research: a multiple case study of the disciplines of education, nursing, psychology, and sociology. J. Mix Methods Res. 13, 424445. 10.1177/1558689817751774 Fu A. C. Kannan A. Shavelson R. J. Peterson L. Kurpius A. (2016). Room for rigor: Designs and methods in informal science education evaluation. Visitor Stud. 19, 1238. 10.1080/10645578.2016.1144025 George D. Mallery M. (2010). SPSS for Windows Step by Step: A Simple Study Guide and Reference, 17.0 Update, 10th Edn. Boston, MA: Pearson Education. Gladstone J. R. Cimpian A. (2021). Which role models are effective for which students? a systematic review and four recommendations for maximizing the effectiveness of role models in STEM. Int. J. STEM Educ. 8, 1012710134. 10.1186/s40594-021-00315-x34868806 González-Pérez S. Mateos de Cabo R. Sáinz M. (2020). Girls in STEM: Is it a female role-model thing? Front. Psychol. 11, 2204. 10.3389/fpsyg.2020.0220433013573 Grack Nelson A. Goeke M. Auster R. Peterman K. Lussenhop A. (2019). Shared measures for evaluating common outcomes of informal STEM education experiences. New Direct. Evaluat. 2019, 5986. 10.1002/ev.20353 Griffiths A. J. Brady J. Riley N. Alsip J. Trine V. Gomez L. (2021). STEM for everyone: A mixed methods approach to the conception and implementation of an evaluation process for STEM education programs for students with disabilities. Front. Educ. 5, 545701. 10.3389/feduc.2020.545701 Groenvold M. Klee M. C. Sprangers M. A. Aaronson N. K. (1997). Validation of the EORTC QLQ-C30 quality of life questionnaire through combined qualitative and quantitative assessment of patient-observer agreement. J. Clin. Epidemiol. 50, 441450. 10.1016/S0895-4356(96)00428-39179103 Guenaga M. Eguíluz A. Garaizar P. Mimenza A. (2022). The impact of female role models leading a group mentoring program to promote STEM vocations among young girls. Sustainability 14, 1420. 10.3390/su14031420 Gwet K. L. (2014). Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Gaithersburg: Advanced Analytics, LLC. Habig B. (2020). Practical rubrics for informal science education studies: (1) a STEM research design rubric for assessing study design and a (2) STEM impact rubric for measuring evidence of impact. Front. Educ. 5, 554806. 10.3389/feduc.2020.554806 Hargraves R. H. Morgan K. L. Jackson H. Feltault K. Crenshaw J. McDonald K. G. . (2021). What's your STEMspiration?: adaptation and validation of a survey instrument. Front. Educ. 6, 667616. 10.3389/feduc.2021.667616 Haynes S. N. Richard D. Kubany E. S. (1995). Content validity in psychological assessment: a functional approach to concepts and methods. Psychol. Assess. 7, 238. 10.1037/1040-3590.7.3.23825365751 Herce Palomares M. P. Román González M. (2021). Educación del talento STEM (ciencia, tecnología, ingeniería y matemáticas) como impulso de los Objetivos de Desarrollo Sostenible (ODS), in [STEM (Science, Technology, Engineering and Mathematics) Talent Education as a Driver of the Sustainable Development Goals (SDGs)] (Cham: Dykinson). Herce Palomares M. P. Román González M. Giménez Fernández C. (2022). El talento STEM en la educación obligatoria: una revisión sistemática. [STEM talent in k-10: a systematic review]. Revista de Educ. 396, 537-549. 10.4438/1988-592X-RE-2022-396-530 Kier M. W. Blanchard M. R. Osborne J. W. Albert J. L. (2014). The development of the STEM career interest survey (STEM-CIS). Res. Sci. Educ. 44, 461481. 10.1007/s11165-013-9389-3 Lawshe C. H. (1975). A quantitative approach to content validity1. Pers. Psychol. 28, 563575. 10.1111/j.1744-6570.1975.tb01393.x Loewenthal K. Lewis C. (2001). An Introduction to Psychological Tests and Scales, 2nd Edn. London: Psychology Press. López-Iñesta E. Botella C. Rueda S. Forte A. Marzal P. (2020). Towards breaking the gender gap in science, technology, engineering and mathematics. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje 15, 233241. 10.1109/RITA.2020.3008114 López-Iñesta E. Queiruga-Dios M. Á. García-Costa D. Grimaldo F. (2022). Citizen science projects: An opportunity for education in scientific literacy and sustainability. Mètode Sci. Stud. J. 12, 3239. 10.7203/metode.12.17824 National Research Council (2009). Learning Science in Informal Environments: People, Places, and Pursuits. Washington, DC: The National Academies Press. Onwuegbuzie A. J. Johnson R. B. (2006). The validity issues in mixed research. Res. Sch. 13, 4863. Robinson A. Murray N. (2019). Evaluating Ocean Learning–The Principles and Practicalities of Evaluating Formal Education Audiences in an Informal Education Environment. Cham: Springer International Publishing. Sáinz M. Martínez-Cantos J.-L. Rodó-de Zárate M. Romano M. J. Arroyo L. Fàbregues S. (2019). Young spanish people's gendered representations of people working in STEM. A qualitative study. Front. Psychol. 10, 996. 10.3389/fpsyg.2019.0099631133933 Sáinz M. Müller J. (2018). Gender and family influences on spanish students' aspirations and values in STEM fields. Int. J. Sci. Educ. 40, 188203. 10.1080/09500693.2017.1405464 Santos E. D. D. Albahari A. Díaz S. Freitas E. C. D. (2021). ‘Science and technology as feminine’: raising awareness about and reducing the gender gap in STEM careers. J. Gender Stud. 31, 114. 10.1080/09589236.2021.1922272 Schwarzer R. Baessler J. (1996). Evaluación de la autoeficacia: adaptación española de la escala de autoeficacia general. [Evaluation of self-efficacy: Spanish adaptation of the general self-efficacy scale]. Ansiedad Estrés 2, 18. Shekhar P. Prince M. Finelli C. Demonbrun M. Waters C. (2019). Integrating quantitative and qualitative research methods to examine student resistance to active learning. Eur. J. Eng. Educ. 44, 618. 10.1080/03043797.2018.1438988 Skjong R. Wentworth B. (2000). Expert judgement and risk perception, in Offshore and Polar Engineering Conference, eds Skjong R. Wentworth B. (Seattle, WA: ISOPE), 537544. Tena Gallego È. Couso D. (2019). Compendio de herramientas para evaluar el impacto de iniciativas de investigación científica para alumnado de educación primaria con perspectiva RRI. [Compendium of tools for assessing the impact of scientific research initiatives for primary school students with a RRI perspective]. Working document. Available online at: https://ddd.uab.cat/record/210969?ln=es Tristán-López A. (2008). Modificación al modelo de Lawshe para el dictamen cuantitativo de la validez de contenido de un instrumento objetivo. [Modification to Lawshe's model for quantitative judgment of the content validity of an objective instrument.]. Avances en Medición 6, 3748. Trizano-Hermosilla I. Alvarado J. M. (2016). Best alternatives to Cronbach's alpha reliability in realistic conditions: Congeneric and asymmetrical measurements. Front. Psychol. 7, 769. 10.3389/fpsyg.2016.0076927303333 UNESCO (2017). Cracking the code: Girls' and womens' education in Science, Technology, Engineering and Mathematics (STEM). Available online at: http://unesdoc.unesco.org/images/0025/002534/253479e.pdf (accessed May 5, 2022). Wolf M. G. Ihm E. Maul A. Taves A. (2021). Survey item validation, in The Routledge Handbook of ResearchMethods in the Study of Religion, 2nd Edn, eds Engler S. Stausberg M. (London: Routledge). Xiao L. Hau K.-T. (2022). Performance of coefficient alpha and its alternatives: effects of different types of non-normality. Educ. Psychol. Meas. 123. 10.1177/00131644221088240

      1Ministerio de Universidades. Students statistics. https://bit.ly/3yA6Bcs.

      ‘Oh, my dear Thomas, you haven’t heard the terrible news then?’ she said. ‘I thought you would be sure to have seen it placarded somewhere. Alice went straight to her room, and I haven’t seen her since, though I repeatedly knocked at the door, which she has locked on the inside, and I’m sure it’s most unnatural of her not to let her own mother comfort her. It all happened in a moment: I have always said those great motor-cars shouldn’t be allowed to career about the streets, especially when they are all paved with cobbles as they are at Easton Haven, which are{331} so slippery when it’s wet. He slipped, and it went over him in a moment.’ My thanks were few and awkward, for there still hung to the missive a basting thread, and it was as warm as a nestling bird. I bent low--everybody was emotional in those days--kissed the fragrant thing, thrust it into my bosom, and blushed worse than Camille. "What, the Corner House victim? Is that really a fact?" "My dear child, I don't look upon it in that light at all. The child gave our picturesque friend a certain distinction--'My husband is dead, and this is my only child,' and all that sort of thing. It pays in society." leave them on the steps of a foundling asylum in order to insure [See larger version] Interoffice guff says you're planning definite moves on your own, J. O., and against some opposition. Is the Colonel so poor or so grasping—or what? Albert could not speak, for he felt as if his brains and teeth were rattling about inside his head. The rest of[Pg 188] the family hunched together by the door, the boys gaping idiotically, the girls in tears. "Now you're married." The host was called in, and unlocked a drawer in which they were deposited. The galleyman, with visible reluctance, arrayed himself in the garments, and he was observed to shudder more than once during the investiture of the dead man's apparel. HoME香京julia种子在线播放 ENTER NUMBET 0016khizeu.com.cn
      lixinvip.com.cn
      fhpriu.com.cn
      ntjfc.com.cn
      www.procoqhd.org.cn
      pawallet.com.cn
      slmelec.org.cn
      wfggys.com.cn
      pcgdgl.com.cn
      www.uberloans.com.cn
      处女被大鸡巴操 强奸乱伦小说图片 俄罗斯美女爱爱图 调教强奸学生 亚洲女的穴 夜来香图片大全 美女性强奸电影 手机版色中阁 男性人体艺术素描图 16p成人 欧美性爱360 电影区 亚洲电影 欧美电影 经典三级 偷拍自拍 动漫电影 乱伦电影 变态另类 全部电 类似狠狠鲁的网站 黑吊操白逼图片 韩国黄片种子下载 操逼逼逼逼逼 人妻 小说 p 偷拍10幼女自慰 极品淫水很多 黄色做i爱 日本女人人体电影快播看 大福国小 我爱肏屄美女 mmcrwcom 欧美多人性交图片 肥臀乱伦老头舔阴帝 d09a4343000019c5 西欧人体艺术b xxoo激情短片 未成年人的 插泰国人夭图片 第770弾み1 24p 日本美女性 交动态 eee色播 yantasythunder 操无毛少女屄 亚洲图片你懂的女人 鸡巴插姨娘 特级黄 色大片播 左耳影音先锋 冢本友希全集 日本人体艺术绿色 我爱被舔逼 内射 幼 美阴图 喷水妹子高潮迭起 和后妈 操逼 美女吞鸡巴 鸭个自慰 中国女裸名单 操逼肥臀出水换妻 色站裸体义术 中国行上的漏毛美女叫什么 亚洲妹性交图 欧美美女人裸体人艺照 成人色妹妹直播 WWW_JXCT_COM r日本女人性淫乱 大胆人艺体艺图片 女同接吻av 碰碰哥免费自拍打炮 艳舞写真duppid1 88电影街拍视频 日本自拍做爱qvod 实拍美女性爱组图 少女高清av 浙江真实乱伦迅雷 台湾luanlunxiaoshuo 洛克王国宠物排行榜 皇瑟电影yy频道大全 红孩儿连连看 阴毛摄影 大胆美女写真人体艺术摄影 和风骚三个媳妇在家做爱 性爱办公室高清 18p2p木耳 大波撸影音 大鸡巴插嫩穴小说 一剧不超两个黑人 阿姨诱惑我快播 幼香阁千叶县小学生 少女妇女被狗强奸 曰人体妹妹 十二岁性感幼女 超级乱伦qvod 97爱蜜桃ccc336 日本淫妇阴液 av海量资源999 凤凰影视成仁 辰溪四中艳照门照片 先锋模特裸体展示影片 成人片免费看 自拍百度云 肥白老妇女 女爱人体图片 妈妈一女穴 星野美夏 日本少女dachidu 妹子私处人体图片 yinmindahuitang 舔无毛逼影片快播 田莹疑的裸体照片 三级电影影音先锋02222 妻子被外国老头操 观月雏乃泥鳅 韩国成人偷拍自拍图片 强奸5一9岁幼女小说 汤姆影院av图片 妹妹人艺体图 美女大驱 和女友做爱图片自拍p 绫川まどか在线先锋 那么嫩的逼很少见了 小女孩做爱 处女好逼连连看图图 性感美女在家做爱 近距离抽插骚逼逼 黑屌肏金毛屄 日韩av美少女 看喝尿尿小姐日逼色色色网图片 欧美肛交新视频 美女吃逼逼 av30线上免费 伊人在线三级经典 新视觉影院t6090影院 最新淫色电影网址 天龙影院远古手机版 搞老太影院 插进美女的大屁股里 私人影院加盟费用 www258dd 求一部电影里面有一个二猛哥 深肛交 日本萌妹子人体艺术写真图片 插入屄眼 美女的木奶 中文字幕黄色网址影视先锋 九号女神裸 和骚人妻偷情 和潘晓婷做爱 国模大尺度蜜桃 欧美大逼50p 西西人体成人 李宗瑞继母做爱原图物处理 nianhuawang 男鸡巴的视屏 � 97免费色伦电影 好色网成人 大姨子先锋 淫荡巨乳美女教师妈妈 性nuexiaoshuo WWW36YYYCOM 长春继续给力进屋就操小女儿套干破内射对白淫荡 农夫激情社区 日韩无码bt 欧美美女手掰嫩穴图片 日本援交偷拍自拍 入侵者日本在线播放 亚洲白虎偷拍自拍 常州高见泽日屄 寂寞少妇自卫视频 人体露逼图片 多毛外国老太 变态乱轮手机在线 淫荡妈妈和儿子操逼 伦理片大奶少女 看片神器最新登入地址sqvheqi345com账号群 麻美学姐无头 圣诞老人射小妞和强奸小妞动话片 亚洲AV女老师 先锋影音欧美成人资源 33344iucoom zV天堂电影网 宾馆美女打炮视频 色五月丁香五月magnet 嫂子淫乱小说 张歆艺的老公 吃奶男人视频在线播放 欧美色图男女乱伦 avtt2014ccvom 性插色欲香影院 青青草撸死你青青草 99热久久第一时间 激情套图卡通动漫 幼女裸聊做爱口交 日本女人被强奸乱伦 草榴社区快播 2kkk正在播放兽骑 啊不要人家小穴都湿了 www猎奇影视 A片www245vvcomwwwchnrwhmhzcn 搜索宜春院av wwwsee78co 逼奶鸡巴插 好吊日AV在线视频19gancom 熟女伦乱图片小说 日本免费av无码片在线开苞 鲁大妈撸到爆 裸聊官网 德国熟女xxx 新不夜城论坛首页手机 女虐男网址 男女做爱视频华为网盘 激情午夜天亚洲色图 内裤哥mangent 吉沢明歩制服丝袜WWWHHH710COM 屌逼在线试看 人体艺体阿娇艳照 推荐一个可以免费看片的网站如果被QQ拦截请复制链接在其它浏览器打开xxxyyy5comintr2a2cb551573a2b2e 欧美360精品粉红鲍鱼 教师调教第一页 聚美屋精品图 中韩淫乱群交 俄罗斯撸撸片 把鸡巴插进小姨子的阴道 干干AV成人网 aolasoohpnbcn www84ytom 高清大量潮喷www27dyycom 宝贝开心成人 freefronvideos人母 嫩穴成人网gggg29com 逼着舅妈给我口交肛交彩漫画 欧美色色aV88wwwgangguanscom 老太太操逼自拍视频 777亚洲手机在线播放 有没有夫妻3p小说 色列漫画淫女 午间色站导航 欧美成人处女色大图 童颜巨乳亚洲综合 桃色性欲草 色眯眯射逼 无码中文字幕塞外青楼这是一个 狂日美女老师人妻 爱碰网官网 亚洲图片雅蠛蝶 快播35怎么搜片 2000XXXX电影 新谷露性家庭影院 深深候dvd播放 幼齿用英语怎么说 不雅伦理无需播放器 国外淫荡图片 国外网站幼幼嫩网址 成年人就去色色视频快播 我鲁日日鲁老老老我爱 caoshaonvbi 人体艺术avav 性感性色导航 韩国黄色哥来嫖网站 成人网站美逼 淫荡熟妇自拍 欧美色惰图片 北京空姐透明照 狼堡免费av视频 www776eom 亚洲无码av欧美天堂网男人天堂 欧美激情爆操 a片kk266co 色尼姑成人极速在线视频 国语家庭系列 蒋雯雯 越南伦理 色CC伦理影院手机版 99jbbcom 大鸡巴舅妈 国产偷拍自拍淫荡对话视频 少妇春梦射精 开心激动网 自拍偷牌成人 色桃隐 撸狗网性交视频 淫荡的三位老师 伦理电影wwwqiuxia6commqiuxia6com 怡春院分站 丝袜超短裙露脸迅雷下载 色制服电影院 97超碰好吊色男人 yy6080理论在线宅男日韩福利大全 大嫂丝袜 500人群交手机在线 5sav 偷拍熟女吧 口述我和妹妹的欲望 50p电脑版 wwwavtttcon 3p3com 伦理无码片在线看 欧美成人电影图片岛国性爱伦理电影 先锋影音AV成人欧美 我爱好色 淫电影网 WWW19MMCOM 玛丽罗斯3d同人动画h在线看 动漫女孩裸体 超级丝袜美腿乱伦 1919gogo欣赏 大色逼淫色 www就是撸 激情文学网好骚 A级黄片免费 xedd5com 国内的b是黑的 快播美国成年人片黄 av高跟丝袜视频 上原保奈美巨乳女教师在线观看 校园春色都市激情fefegancom 偷窥自拍XXOO 搜索看马操美女 人本女优视频 日日吧淫淫 人妻巨乳影院 美国女子性爱学校 大肥屁股重口味 啪啪啪啊啊啊不要 操碰 japanfreevideoshome国产 亚州淫荡老熟女人体 伦奸毛片免费在线看 天天影视se 樱桃做爱视频 亚卅av在线视频 x奸小说下载 亚洲色图图片在线 217av天堂网 东方在线撸撸-百度 幼幼丝袜集 灰姑娘的姐姐 青青草在线视频观看对华 86papa路con 亚洲1AV 综合图片2区亚洲 美国美女大逼电影 010插插av成人网站 www色comwww821kxwcom 播乐子成人网免费视频在线观看 大炮撸在线影院 ,www4KkKcom 野花鲁最近30部 wwwCC213wapwww2233ww2download 三客优最新地址 母亲让儿子爽的无码视频 全国黄色片子 欧美色图美国十次 超碰在线直播 性感妖娆操 亚洲肉感熟女色图 a片A毛片管看视频 8vaa褋芯屑 333kk 川岛和津实视频 在线母子乱伦对白 妹妹肥逼五月 亚洲美女自拍 老婆在我面前小说 韩国空姐堪比情趣内衣 干小姐综合 淫妻色五月 添骚穴 WM62COM 23456影视播放器 成人午夜剧场 尼姑福利网 AV区亚洲AV欧美AV512qucomwwwc5508com 经典欧美骚妇 震动棒露出 日韩丝袜美臀巨乳在线 av无限吧看 就去干少妇 色艺无间正面是哪集 校园春色我和老师做爱 漫画夜色 天海丽白色吊带 黄色淫荡性虐小说 午夜高清播放器 文20岁女性荫道口图片 热国产热无码热有码 2015小明发布看看算你色 百度云播影视 美女肏屄屄乱轮小说 家族舔阴AV影片 邪恶在线av有码 父女之交 关于处女破处的三级片 极品护士91在线 欧美虐待女人视频的网站 享受老太太的丝袜 aaazhibuo 8dfvodcom成人 真实自拍足交 群交男女猛插逼 妓女爱爱动态 lin35com是什么网站 abp159 亚洲色图偷拍自拍乱伦熟女抠逼自慰 朝国三级篇 淫三国幻想 免费的av小电影网站 日本阿v视频免费按摩师 av750c0m 黄色片操一下 巨乳少女车震在线观看 操逼 免费 囗述情感一乱伦岳母和女婿 WWW_FAMITSU_COM 偷拍中国少妇在公车被操视频 花也真衣论理电影 大鸡鸡插p洞 新片欧美十八岁美少 进击的巨人神thunderftp 西方美女15p 深圳哪里易找到老女人玩视频 在线成人有声小说 365rrr 女尿图片 我和淫荡的小姨做爱 � 做爱技术体照 淫妇性爱 大学生私拍b 第四射狠狠射小说 色中色成人av社区 和小姨子乱伦肛交 wwwppp62com 俄罗斯巨乳人体艺术 骚逼阿娇 汤芳人体图片大胆 大胆人体艺术bb私处 性感大胸骚货 哪个网站幼女的片多 日本美女本子把 色 五月天 婷婷 快播 美女 美穴艺术 色百合电影导航 大鸡巴用力 孙悟空操美少女战士 狠狠撸美女手掰穴图片 古代女子与兽类交 沙耶香套图 激情成人网区 暴风影音av播放 动漫女孩怎么插第3个 mmmpp44 黑木麻衣无码ed2k 淫荡学姐少妇 乱伦操少女屄 高中性爱故事 骚妹妹爱爱图网 韩国模特剪长发 大鸡巴把我逼日了 中国张柏芝做爱片中国张柏芝做爱片中国张柏芝做爱片中国张柏芝做爱片中国张柏芝做爱片 大胆女人下体艺术图片 789sss 影音先锋在线国内情侣野外性事自拍普通话对白 群撸图库 闪现君打阿乐 ady 小说 插入表妹嫩穴小说 推荐成人资源 网络播放器 成人台 149大胆人体艺术 大屌图片 骚美女成人av 春暖花开春色性吧 女亭婷五月 我上了同桌的姐姐 恋夜秀场主播自慰视频 yzppp 屄茎 操屄女图 美女鲍鱼大特写 淫乱的日本人妻山口玲子 偷拍射精图 性感美女人体艺木图片 种马小说完本 免费电影院 骑士福利导航导航网站 骚老婆足交 国产性爱一级电影 欧美免费成人花花性都 欧美大肥妞性爱视频 家庭乱伦网站快播 偷拍自拍国产毛片 金发美女也用大吊来开包 缔D杏那 yentiyishu人体艺术ytys WWWUUKKMCOM 女人露奶 � 苍井空露逼 老荡妇高跟丝袜足交 偷偷和女友的朋友做爱迅雷 做爱七十二尺 朱丹人体合成 麻腾由纪妃 帅哥撸播种子图 鸡巴插逼动态图片 羙国十次啦中文 WWW137AVCOM 神斗片欧美版华语 有气质女人人休艺术 由美老师放屁电影 欧美女人肉肏图片 白虎种子快播 国产自拍90后女孩 美女在床上疯狂嫩b 饭岛爱最后之作 幼幼强奸摸奶 色97成人动漫 两性性爱打鸡巴插逼 新视觉影院4080青苹果影院 嗯好爽插死我了 阴口艺术照 李宗瑞电影qvod38 爆操舅母 亚洲色图七七影院 被大鸡巴操菊花 怡红院肿么了 成人极品影院删除 欧美性爱大图色图强奸乱 欧美女子与狗随便性交 苍井空的bt种子无码 熟女乱伦长篇小说 大色虫 兽交幼女影音先锋播放 44aad be0ca93900121f9b 先锋天耗ばさ无码 欧毛毛女三级黄色片图 干女人黑木耳照 日本美女少妇嫩逼人体艺术 sesechangchang 色屄屄网 久久撸app下载 色图色噜 美女鸡巴大奶 好吊日在线视频在线观看 透明丝袜脚偷拍自拍 中山怡红院菜单 wcwwwcom下载 骑嫂子 亚洲大色妣 成人故事365ahnet 丝袜家庭教mp4 幼交肛交 妹妹撸撸大妈 日本毛爽 caoprom超碰在email 关于中国古代偷窥的黄片 第一会所老熟女下载 wwwhuangsecome 狼人干综合新地址HD播放 变态儿子强奸乱伦图 强奸电影名字 2wwwer37com 日本毛片基地一亚洲AVmzddcxcn 暗黑圣经仙桃影院 37tpcocn 持月真由xfplay 好吊日在线视频三级网 我爱背入李丽珍 电影师傅床戏在线观看 96插妹妹sexsex88com 豪放家庭在线播放 桃花宝典极夜著豆瓜网 安卓系统播放神器 美美网丝袜诱惑 人人干全免费视频xulawyercn av无插件一本道 全国色五月 操逼电影小说网 good在线wwwyuyuelvcom www18avmmd 撸波波影视无插件 伊人幼女成人电影 会看射的图片 小明插看看 全裸美女扒开粉嫩b 国人自拍性交网站 萝莉白丝足交本子 七草ちとせ巨乳视频 摇摇晃晃的成人电影 兰桂坊成社人区小说www68kqcom 舔阴论坛 久撸客一撸客色国内外成人激情在线 明星门 欧美大胆嫩肉穴爽大片 www牛逼插 性吧星云 少妇性奴的屁眼 人体艺术大胆mscbaidu1imgcn 最新久久色色成人版 l女同在线 小泽玛利亚高潮图片搜索 女性裸b图 肛交bt种子 最热门有声小说 人间添春色 春色猜谜字 樱井莉亚钢管舞视频 小泽玛利亚直美6p 能用的h网 还能看的h网 bl动漫h网 开心五月激 东京热401 男色女色第四色酒色网 怎么下载黄色小说 黄色小说小栽 和谐图城 乐乐影院 色哥导航 特色导航 依依社区 爱窝窝在线 色狼谷成人 91porn 包要你射电影 色色3A丝袜 丝袜妹妹淫网 爱色导航(荐) 好男人激情影院 坏哥哥 第七色 色久久 人格分裂 急先锋 撸撸射中文网 第一会所综合社区 91影院老师机 东方成人激情 怼莪影院吹潮 老鸭窝伊人无码不卡无码一本道 av女柳晶电影 91天生爱风流作品 深爱激情小说私房婷婷网 擼奶av 567pao 里番3d一家人野外 上原在线电影 水岛津实透明丝袜 1314酒色 网旧网俺也去 0855影院 在线无码私人影院 搜索 国产自拍 神马dy888午夜伦理达达兔 农民工黄晓婷 日韩裸体黑丝御姐 屈臣氏的燕窝面膜怎么样つぼみ晶エリーの早漏チ○ポ强化合宿 老熟女人性视频 影音先锋 三上悠亚ol 妹妹影院福利片 hhhhhhhhsxo 午夜天堂热的国产 强奸剧场 全裸香蕉视频无码 亚欧伦理视频 秋霞为什么给封了 日本在线视频空天使 日韩成人aⅴ在线 日本日屌日屄导航视频 在线福利视频 日本推油无码av magnet 在线免费视频 樱井梨吮东 日本一本道在线无码DVD 日本性感诱惑美女做爱阴道流水视频 日本一级av 汤姆avtom在线视频 台湾佬中文娱乐线20 阿v播播下载 橙色影院 奴隶少女护士cg视频 汤姆在线影院无码 偷拍宾馆 业面紧急生级访问 色和尚有线 厕所偷拍一族 av女l 公交色狼优酷视频 裸体视频AV 人与兽肉肉网 董美香ol 花井美纱链接 magnet 西瓜影音 亚洲 自拍 日韩女优欧美激情偷拍自拍 亚洲成年人免费视频 荷兰免费成人电影 深喉呕吐XXⅩX 操石榴在线视频 天天色成人免费视频 314hu四虎 涩久免费视频在线观看 成人电影迅雷下载 能看见整个奶子的香蕉影院 水菜丽百度影音 gwaz079百度云 噜死你们资源站 主播走光视频合集迅雷下载 thumbzilla jappen 精品Av 古川伊织star598在线 假面女皇vip在线视频播放 国产自拍迷情校园 啪啪啪公寓漫画 日本阿AV 黄色手机电影 欧美在线Av影院 华裔电击女神91在线 亚洲欧美专区 1日本1000部免费视频 开放90后 波多野结衣 东方 影院av 页面升级紧急访问每天正常更新 4438Xchengeren 老炮色 a k福利电影 色欲影视色天天视频 高老庄aV 259LUXU-683 magnet 手机在线电影 国产区 欧美激情人人操网 国产 偷拍 直播 日韩 国内外激情在线视频网给 站长统计一本道人妻 光棍影院被封 紫竹铃取汁 ftp 狂插空姐嫩 xfplay 丈夫面前 穿靴子伪街 XXOO视频在线免费 大香蕉道久在线播放 电棒漏电嗨过头 充气娃能看下毛和洞吗 夫妻牲交 福利云点墦 yukun瑟妃 疯狂交换女友 国产自拍26页 腐女资源 百度云 日本DVD高清无码视频 偷拍,自拍AV伦理电影 A片小视频福利站。 大奶肥婆自拍偷拍图片 交配伊甸园 超碰在线视频自拍偷拍国产 小热巴91大神 rctd 045 类似于A片 超美大奶大学生美女直播被男友操 男友问 你的衣服怎么脱掉的 亚洲女与黑人群交视频一 在线黄涩 木内美保步兵番号 鸡巴插入欧美美女的b舒服 激情在线国产自拍日韩欧美 国语福利小视频在线观看 作爱小视颍 潮喷合集丝袜无码mp4 做爱的无码高清视频 牛牛精品 伊aⅤ在线观看 savk12 哥哥搞在线播放 在线电一本道影 一级谍片 250pp亚洲情艺中心,88 欧美一本道九色在线一 wwwseavbacom色av吧 cos美女在线 欧美17,18ⅹⅹⅹ视频 自拍嫩逼 小电影在线观看网站 筱田优 贼 水电工 5358x视频 日本69式视频有码 b雪福利导航 韩国女主播19tvclub在线 操逼清晰视频 丝袜美女国产视频网址导航 水菜丽颜射房间 台湾妹中文娱乐网 风吟岛视频 口交 伦理 日本熟妇色五十路免费视频 A级片互舔 川村真矢Av在线观看 亚洲日韩av 色和尚国产自拍 sea8 mp4 aV天堂2018手机在线 免费版国产偷拍a在线播放 狠狠 婷婷 丁香 小视频福利在线观看平台 思妍白衣小仙女被邻居强上 萝莉自拍有水 4484新视觉 永久发布页 977成人影视在线观看 小清新影院在线观 小鸟酱后丝后入百度云 旋风魅影四级 香蕉影院小黄片免费看 性爱直播磁力链接 小骚逼第一色影院 性交流的视频 小雪小视频bd 小视频TV禁看视频 迷奸AV在线看 nba直播 任你在干线 汤姆影院在线视频国产 624u在线播放 成人 一级a做爰片就在线看狐狸视频 小香蕉AV视频 www182、com 腿模简小育 学生做爱视频 秘密搜查官 快播 成人福利网午夜 一级黄色夫妻录像片 直接看的gav久久播放器 国产自拍400首页 sm老爹影院 谁知道隔壁老王网址在线 综合网 123西瓜影音 米奇丁香 人人澡人人漠大学生 色久悠 夜色视频你今天寂寞了吗? 菲菲影视城美国 被抄的影院 变态另类 欧美 成人 国产偷拍自拍在线小说 不用下载安装就能看的吃男人鸡巴视频 插屄视频 大贯杏里播放 wwwhhh50 233若菜奈央 伦理片天海翼秘密搜查官 大香蕉在线万色屋视频 那种漫画小说你懂的 祥仔电影合集一区 那里可以看澳门皇冠酒店a片 色自啪 亚洲aV电影天堂 谷露影院ar toupaizaixian sexbj。com 毕业生 zaixian mianfei 朝桐光视频 成人短视频在线直接观看 陈美霖 沈阳音乐学院 导航女 www26yjjcom 1大尺度视频 开平虐女视频 菅野雪松协和影视在线视频 华人play在线视频bbb 鸡吧操屄视频 多啪啪免费视频 悠草影院 金兰策划网 (969) 橘佑金短视频 国内一极刺激自拍片 日本制服番号大全magnet 成人动漫母系 电脑怎么清理内存 黄色福利1000 dy88午夜 偷拍中学生洗澡磁力链接 花椒相机福利美女视频 站长推荐磁力下载 mp4 三洞轮流插视频 玉兔miki热舞视频 夜生活小视频 爆乳人妖小视频 国内网红主播自拍福利迅雷下载 不用app的裸裸体美女操逼视频 变态SM影片在线观看 草溜影院元气吧 - 百度 - 百度 波推全套视频 国产双飞集合ftp 日本在线AV网 笔国毛片 神马影院女主播是我的邻居 影音资源 激情乱伦电影 799pao 亚洲第一色第一影院 av视频大香蕉 老梁故事汇希斯莱杰 水中人体磁力链接 下载 大香蕉黄片免费看 济南谭崔 避开屏蔽的岛a片 草破福利 要看大鸡巴操小骚逼的人的视频 黑丝少妇影音先锋 欧美巨乳熟女磁力链接 美国黄网站色大全 伦蕉在线久播 极品女厕沟 激情五月bd韩国电影 混血美女自摸和男友激情啪啪自拍诱人呻吟福利视频 人人摸人人妻做人人看 44kknn 娸娸原网 伊人欧美 恋夜影院视频列表安卓青青 57k影院 如果电话亭 avi 插爆骚女精品自拍 青青草在线免费视频1769TV 令人惹火的邻家美眉 影音先锋 真人妹子被捅动态图 男人女人做完爱视频15 表姐合租两人共处一室晚上她竟爬上了我的床 性爱教学视频 北条麻妃bd在线播放版 国产老师和师生 magnet wwwcctv1024 女神自慰 ftp 女同性恋做激情视频 欧美大胆露阴视频 欧美无码影视 好女色在线观看 后入肥臀18p 百度影视屏福利 厕所超碰视频 强奸mp magnet 欧美妹aⅴ免费线上看 2016年妞干网视频 5手机在线福利 超在线最视频 800av:cOm magnet 欧美性爱免播放器在线播放 91大款肥汤的性感美乳90后邻家美眉趴着窗台后入啪啪 秋霞日本毛片网站 cheng ren 在线视频 上原亚衣肛门无码解禁影音先锋 美脚家庭教师在线播放 尤酷伦理片 熟女性生活视频在线观看 欧美av在线播放喷潮 194avav 凤凰AV成人 - 百度 kbb9999 AV片AV在线AV无码 爱爱视频高清免费观看 黄色男女操b视频 观看 18AV清纯视频在线播放平台 成人性爱视频久久操 女性真人生殖系统双性人视频 下身插入b射精视频 明星潜规测视频 mp4 免賛a片直播绪 国内 自己 偷拍 在线 国内真实偷拍 手机在线 国产主播户外勾在线 三桥杏奈高清无码迅雷下载 2五福电影院凸凹频频 男主拿鱼打女主,高宝宝 色哥午夜影院 川村まや痴汉 草溜影院费全过程免费 淫小弟影院在线视频 laohantuiche 啪啪啪喷潮XXOO视频 青娱乐成人国产 蓝沢润 一本道 亚洲青涩中文欧美 神马影院线理论 米娅卡莉法的av 在线福利65535 欧美粉色在线 欧美性受群交视频1在线播放 极品喷奶熟妇在线播放 变态另类无码福利影院92 天津小姐被偷拍 磁力下载 台湾三级电髟全部 丝袜美腿偷拍自拍 偷拍女生性行为图 妻子的乱伦 白虎少妇 肏婶骚屄 外国大妈会阴照片 美少女操屄图片 妹妹自慰11p 操老熟女的b 361美女人体 360电影院樱桃 爱色妹妹亚洲色图 性交卖淫姿势高清图片一级 欧美一黑对二白 大色网无毛一线天 射小妹网站 寂寞穴 西西人体模特苍井空 操的大白逼吧 骚穴让我操 拉好友干女朋友3p