Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining -

Купить бумажную книгу и читать

По кнопке выше можно купить бумажные варианты этой книги и похожих книг на сайте интернет-магазина "МИФ".

Using the button above you can buy paper versions of this book and similar books on the website of the "MIF" online store.

Реклама. ООО «МИФ», ИНН: 7703809969, erid: LatgBY5SL.

Название:Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining

Автор: Simon Munzert, Christian Rubba, Peter Meissner, Dominic Nyhuis

Издательство: WILEY

Год: 2015

Страниц: 480

Язык: English

Формат: epub

Размер: 44,5 Mb

A hands on guide to web scraping and text mining for both beginners and experienced users of R

Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL.

Provides basic techniques to query web documents and data sets (XPath and regular expressions).

An extensive set of exercises are presented to guide the reader through each technique.

Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management.

Case studies are featured throughout along with examples for each technique presented.

R code and solutions to exercises featured in the book are provided on a supporting website.

Preface xv

1 Introduction 1

1.1 Case study: World Heritage Sites in Danger 1

1.2 Some remarks on web data quality 7

1.3 Technologies for disseminating, extracting, and storing web data 9

1.3.1 Technologies for disseminating content on the Web 9

1.3.2 Technologies for information extraction from web documents 11

1.3.3 Technologies for data storage 12

1.4 Structure of the book 13

Part One A Primer onWeb and Data Technologies 15

2 HTML 17

2.1 Browser presentation and source code 18

2.2 Syntax rules 19

2.2.1 Tags, elements, and attributes 20

2.2.2 Tree structure 21

2.2.3 Comments 22

2.2.4 Reserved and special characters 22

2.2.5 Document type definition 23

2.2.6 Spaces and line breaks 23

2.3 Tags and attributes 24

2.3.1 The anchor tag 24

2.3.2 The metadata tag 25

2.3.3 The external reference tag 26

2.3.4 Emphasizing tags , , 26

2.3.5 The paragraphs tag 27

2.3.6 Heading tags 27

2.3.7 Listing content with 27

2.3.8 The organizational tags 27

2.3.9 The tag and its companions 29

2.3.10 The foreign script tag 30

2.3.11 Table tags 32

2.4 Parsing 32

2.4.1 What is parsing? 33

2.4.2 Discarding nodes 35

2.4.3 Extracting information in the building process 37

Summary 38

Купить бумажную книгу или электронную версию книги и скачать

По кнопке выше можно купить бумажные варианты этой книги и похожих книг на сайте интернет-магазина "МИФ".

Using the button above you can buy paper versions of this book and similar books on the website of the "MIF" online store.

Реклама. ООО «МИФ», ИНН: 7703809969, erid: LatgBY5SL.

Дата создания страницы: 2019-02-26 12:40