我正在尝试废弃网页以将其转换为pdf,我尝试了几个dll,包括.Net中的原生网页,但总是得到404 Not Found。但是我将网址放在浏览器中并显示页面
Hi, I'm trying to scrap a web page in order to convert it to pdf, I tried several dll, including the ones native in .Net, but always I'm getting a 404 Not Found. But I put the url in a browser and the page appears
public void ConvertPDF() { var url = "localhost:4200/statementaccount/id"; //id is a Guid, with come from DB WebClientService httpClient = new WebClientService(url); ResponseWebClientModel response = httpClient.RequestGet(url); //always response 404 if (!response.error) { //never enter here var content = response.content; //TODO: save to pfd the response } }
public ResponseWebClientModel RequestGet(string url, bool setCookies = false, string urlcoockie = "", string urlcache = "") { response = new ResponseWebClientModel(); HttpClient.GetAsync(url).ContinueWith((task, o) => { try { var responseAsync = task.Result; responseAsync.EnsureSuccessStatusCode(); response.refererUrl = responseAsync.RequestMessage.RequestUri.ToString(); response.content = responseAsync.Content.ReadAsStringAsync().Result; if (setCookies) { if (urlcoockie != "") { var uri = new Uri(urlcoockie); var cookieCollection = _CookieContainer.GetCookies(uri); IEnumerable<string> cookiesHeaders = responseAsync.Headers.GetValues("Set-Cookie"); var missingCookies = cookiesHeaders.Where(val => !cookieCollection.Cast<Cookie>().Any(c => val.StartsWith($"{ c.Name}="))); if (urlcache != "") { var cookieCollection2 = _CookieContainer.GetCookies(new Uri(urlcache)); missingCookies = cookiesHeaders.Where(val => !cookieCollection2.Cast<Cookie>().Any(c => val.StartsWith($"{ c.Name}="))); } foreach (var missingCookie in missingCookies) { var keyValue = missingCookie.Split('='); var cookieName = keyValue[0]; var value = keyValue[1].Split(';')[0]; if (urlcache != "") { _CookieContainer.Add(new Uri(urlcache), new Cookie(cookieName, value)); } } } catch (Exception e) { response.error = true; response.content = e.Message; }}, null).Wait(); return response; }
推荐答案
404可以表示它找不到网站或其他服务正在返回它无法找到所要求的项目。你打什么网址?在网站允许您查看或获取HTML之前,该网站是否可以预期标题?如果你试图刮掉 html为什么不只是使用GetStringAsync?
404 can mean it cannot find the website or a rest service is returning it can not find the item requested. What url are you hitting? Could the web site be expecting header before it lets you view or get the html? If you are trying to scrape the html why not just use GetStringAsync?
HttpClient client = new HttpClient ();
HttpClientclient=newHttpClient();
var html = await 客户 。 GetStringAsync ( " microsoft" );
varhtml=awaitclient.GetStringAsync("microsoft");
更多推荐
C#HttpClient.GetAsync返回404 Not Found。
发布评论