java爬虫项目实战(2)

编程入门 行业动态 更新时间:2024-10-26 19:34:30

java<a href=https://www.elefans.com/category/jswz/34/1770264.html style=爬虫项目实战(2)"/>

java爬虫项目实战(2)

java爬虫项目实战(2)------爬取研招网调剂信息

1.前言

复试咨询信息只是大概地能够获取那些院校招收调剂,但是对于每个院校具体招收多少个调剂一般招生办老师不会公布,因此可以在复试调剂之前,通过爬虫爬取相关专业院校发布地调剂信息。

调剂信息一般在调剂信息正式发布之前一天会有发布,因此可以在调剂系统正式开始之前,找到找到招收调剂人数较多地院校。本项目中以查找电子信息专业为例子。

写此博客的时候,离开启调剂系统已经差不多有一周的时间,因此发布的招收电子信息调剂信息的院校信息并不太多。

2.思路

首先,我们要来查找这些信息是通过什么显示出来的? 从上图可以看出,当电子查询时候,会发送请求向服务器发送数据请求,然后显示出来。那么,我们就要找到服务器的请求地址和请求参数。

右击点击检查,然后点击Network,找到sytjqexxcx.action.通过点击该条地址,可以看到如下的信息。

在此页面中我们能够看到请求这些参数的地址和请求方法,以及需要模拟登录来爬取参数。不过可以通过在头部设置具体的Cookie值,来模拟登录。Cookie的的使用时间是有限制的,因此如果过一段时间再爬取信息,要及时更新Cookie。

上面中的Host等参数也需要在请求头部中去设置。而From Data中是有关于请求需要设置的请求参数。点击进入上方的Preview中后,也已看到返回的具体数据。



数据的返回格式是Json,因此可以通过解析Json获得所需要的数据。上表显示的是参数就是每一个参数所对应的具体信息。

那么,综上所述总体思路就是向服务器地址,发送请求参数,请求参数要包含头部的具体信息和请求的具体内容,然后再解析返回的Json数据,最后存入数据库中。

3.具体代码

准备的环境和工具:Idea mysql8.0 navicat

maven 的pom.xml依赖需要的工具包。

<dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.13.1</version></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>8.0.17</version></dependency><!-- .alibaba/fastjson --><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.68</version></dependency>

CollegeInformation.java

package com.kevin.Service;import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.kevin.Dao.Dao;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;
import java.util.HashMap;/*** * @author Kevin xiao* @category java* * */
public class CollectInformation {public static String cookies = "JSESSIONID=4150DE9E72D495234A452D36CEEC29C8; acw_tc=2760828515890017509038118ede153ed16e334232db3e0bfe7f2a9ddeee72; _ga=GA1.3.1626918446.1589001752; zg_did=%7B%22did%22%3A%20%22171f7e29293631-039cd05f63a124-3b654406-144000-171f7e29294564%22%7D; _gid=GA1.3.343651325.1589721285; __utma=65168252.1626918446.1589001752.1589726297.1589726297.1; __utmz=65168252.1589726297.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmz=229973332.1589726376.1.1.utmcsr=blog.csdn|utmccn=(referral)|utmcmd=referral|utmcct=/lyc44813418/article/details/88739173; aliyungf_tc=AQAAAOKSnnqDKA8AU7+EdcB936eG6I8+; XSRF-CCKTOKEN=cafd5e3b2bbbd4edbe0a3545fcc86059; CHSICC_CLIENTFLAGYZ=5d9a178e980046720dfb5abc37a90f32; JSESSIONID=86B030C9B24414E288DB697BBB1C3119; __utma=229973332.1626918446.1589001752.1589726376.1589905059.2; __utmc=229973332; CHSICC_CLIENTFLAGSYTJ=f73563edd7e12d0e24836ab1fc94497b; __utmb=229973332.5.10.1589905059; zg_adfb574f9c54457db21741353c3b0aa7=%7B%22sid%22%3A%201589905235123%2C%22updated%22%3A%201589905530560%2C%22info%22%3A%201589721287765%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22yz.chsi%22%2C%22landHref%22%3A%20%22https%3A%2F%2Faccount.chsi%2Fpassport%2Flogin%3Fentrytype%3Dyzgr%26service%3Dhttps%253A%252F%252Fyz.chsi%252Fsytj%252Fj_spring_cas_security_check%23%23%23%22%2C%22cuid%22%3A%20%22f6335c4e1ffc05200091fb14e0fe2463%22%7D";public static String Origin = "";public static String Host = "yz.chsi";public static String Referer = ".html";public static String UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36";public static String LOGIN_URL=".action";public static void main(String[] args) {int cnt = 0;for (int i = 0; i <324 ; i++) {cnt = i * 20;collectCollegeInfo(cnt);}}public static void collectCollegeInfo(int num) {Document con;try {
//			con = (Document) Jsoup
//					.connect(LOGIN_URL)
//					.userAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36")
//					.ignoreContentType(true)
//					.ignoreHttpErrors(true)
//					.timeout(1000 * 30)
//					.header("accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
//					.header("accept-encoding","gzip, deflate, br")
//					.header("accept-language","zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7")
//					.get();HashMap<String, String> hashMap = new HashMap<String, String>();hashMap.put("pageSize", "20");hashMap.put("start", num+"");hashMap.put("orderBy", "");hashMap.put("mhcx", "1");hashMap.put("ssdm2", "");hashMap.put("dwmc", "");hashMap.put("xxfs2", "1");hashMap.put("dwmc2", "计算机科学与技术");hashMap.put("data_type", "json");hashMap.put("agent_from", "web");hashMap.put("pageid", "");con = Jsoup.connect(LOGIN_URL).ignoreContentType(true).header("Cookie", cookies).header("User-Agent", UserAgent).header("Referer", Referer).header("Host", Host).header("Origin", Origin).header("User-Agent", UserAgent).ignoreHttpErrors(true).timeout(1000 * 30).data(hashMap).post();
//			System.out.println(con);/*获得返回的字符串文件*/String other = con.getElementsByTag("body").text();//转换为Json对象JSONObject otherJsonObject = JSON.parseObject(other);//获取数据集JSONObject dataJsonObject =  otherJsonObject.getJSONObject("data").getJSONObject("vo_list");/*jiang*/JSONArray vosList = dataJsonObject.getJSONArray("vos");JSONObject collegeObject ;String fbsjStr = null;//发布时间String dwmc = null;//单位名称String yxsmc = null;//院系所名称String zymc = null;//专业名称String zydm = null;//专业代码String dwdm = null;//单位代码String xxfs = null;//学习方式String yjfxmc = null;//研究方向名称String yxsdm = null;//院系所代码String ssdm = null;//省份代码String qers = null;//招生人数String hasit = null;//String bz = null;String gxsj = null;String zt = null;String id = null;String sfmzyq = null;/*遍历得到院校信息组*/for (int i = 0; i < vosList.size(); i++) {collegeObject = vosList.getJSONObject(i);fbsjStr = collegeObject.getString("fbsjStr");dwmc = collegeObject.getString("dwmc");yxsmc = collegeObject.getString("yxsmc");zymc = collegeObject.getString("zymc");zydm = collegeObject.getString("zydm");dwdm = collegeObject.getString("dwdm");xxfs = collegeObject.getString("xxfs");yjfxmc = collegeObject.getString("yjfxmc");yxsdm = collegeObject.getString("yxsdm");ssdm = collegeObject.getString("ssdm");qers = collegeObject.getString("qers");hasit = collegeObject.getString("hasit");bz = collegeObject.getString("bz");gxsj = collegeObject.getString("gxsj");zt = collegeObject.getString("qers");id = collegeObject.getString("id");sfmzyq = collegeObject.getString("sfmzyq");/*把数据插入到数据库中*/HashMap<String, Object> map = new HashMap<>();map.put("fbsjStr", fbsjStr);map.put("dwmc", dwmc);map.put("yxsmc", yxsmc);map.put("zymc", zymc);map.put("zydm", zydm);map.put("dwdm", dwdm);map.put("xxfs", xxfs);map.put("yjfxmc", yjfxmc);map.put("yxsdm", yxsdm);map.put("ssdm", ssdm);map.put("qers", qers);map.put("hasit", hasit);map.put("bz", bz);map.put("gxsj", gxsj);map.put("zt", zt);map.put("tid", id);map.put("sfmzyq", sfmzyq);int result = Dao.insertObj("tj", map);}
//			System.out.println(vosList.get(0));//			System.out.println(vosList);
//			System.out.println(other);System.out.println("结束");} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}}}

Dao.java

public static int insertObj(String tableName,Map<String, Object> dataItem) {// TODO Auto-generated method stubString fieldStr = "";String valueStr = "";Object[] valueObjs =new Object[dataItem.size()];int i=0;for (String key : dataItem.keySet()) {fieldStr = fieldStr + key + ",";valueStr = valueStr + "?"+",";valueObjs[i]=dataItem.get(key);i++;}fieldStr = fieldStr.substring(0, fieldStr.length()-1);valueStr = valueStr.substring(0, valueStr.length()-1);//System.out.println(fieldStr);//System.out.println(valueObjs);String sqlStr = "insert into "+tableName+" ("+  fieldStr+ ") values ("+valueStr+")";int exe = execute(sqlStr,valueObjs);//System.out.println(exe);return exe;}

DButil.java

package com.kevin.Dao;import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;public class DButil {static {try {Class.forName("com.mysql.cj.jdbc.Driver");} catch (ClassNotFoundException e) {// TODO Auto-generated catch blocke.printStackTrace();}}public static Connection con() {Connection connection = null;String username = "root";String password = "a18713837118";String database = "collegeinfo";String url = "jdbc:mysql://localhost:3306/"+database+"?serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf8&useSSL=false&allowPublicKeyRetrieval=true";try {connection = DriverManager.getConnection(url,username,password);} catch (SQLException e) {e.printStackTrace();// TODO: handle exception}return connection;}public static void close(Connection conn) {if(conn!=null) {try {conn.close();} catch (SQLException e) {// TODO Auto-generated catch blocke.printStackTrace();}}}public static void main(String[] args) {System.out.println("DButil");}
}

数据库建表

数据表建表约束可以更加具体和严格。

4.结语

”海阔凭鱼跃,天高任鸟飞“,希望 各自都能够去到自己想去的院校。考研一年真的很辛苦,想放弃过,但还是坚持了下来。特别是去年九月在天津某公司实习一个月期间,多次想要放弃,心态多次崩溃,但是一想到自己已经付出了六个多月的努力,还是坚持了下来。这一路不容易,顺利完成考研的每一个人,不论是否上岸他们都是勇者。调剂是一场信息战,考验心态和承受力。能够成攻调剂,分数和实力才是王道,信息收集只是辅助手段。最后,还是说一句本人技术有限,代码有充分的改进和提升空间,还请各位网友指正。技术是用来服务社会的,希望爬虫技术不要乱用!

更多推荐

java爬虫项目实战(2)

本文发布于:2024-03-23 18:14:03,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1741272.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:爬虫   实战   项目   java

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!