第七节课 - 爬虫入门

课程结构

Requests基础

学习使用requests库发送HTTP请求，获取网页内容

8个练习

网页解析

使用BeautifulSoup解析HTML，提取所需数据

8个练习

数据提取

掌握各种数据提取技巧，处理结构化数据

8个练习

故事化案例

小华的新闻爬虫

小华是一名新闻专业的学生，需要收集各大新闻网站的热点新闻进行分析。通过编写爬虫程序，她能够自动获取新闻标题、发布时间、作者等信息，大大提高了数据收集效率。

                        
# 小华的新闻爬虫示例
import requests
from bs4 import BeautifulSoup

url = 'https://news.example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# 提取新闻标题
news_titles = soup.find_all('h2', class_='news-title')
for title in news_titles:
    print(title.text.strip())
                        
                    

小美的商品信息爬取

小美开了一家网店，需要了解竞争对手的商品价格和库存情况。她使用爬虫技术自动收集电商平台的商品信息，帮助她制定更合理的定价策略。

                        
# 小美的商品爬虫
import requests
import json

url = 'https://shop.example.com/api/products'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)

data = json.loads(response.text)
for product in data['products']:
    print(f"商品：{product['name']} - 价格：{product['price']}")