公共数据集下载

说明

中国服务计算学术社区提供一个开放共享的数据平台,任何组织和个人都能方便的获取开放的数据集,除了下载,您更可以贡献数据集,供其他研究者使用。


一日各话题的情感指数

Author: 金蝶软件(中国)有限公司
Description: 一日各话题的情感指数包含如下字段:
url:数据来源网址
title:话题
mark:情感指数
haDate:数据产生日期
样例:
url title mark haDate
http://www.zhihu.com/question/19553905 如何增加一个人的自信?-5.53 2016/3/9
http://www.zhihu.com/question/19573039 有哪些不错的网页设计素材网站?25 2016/3/9
http://www.zhihu.com/question/19598612 安全感是什么? -3.57 2016/3/9

Size: 88K
Cite: Chen H., Li X.N., Zhang L.J., Huang Y.X., Cai X.S., Cloud-based Core Text Processing Services for Sentiment Analysis, IEEE Big Data Congress 2016.
Download




一月份每日的全话题情感平均指数

Author: 金蝶软件(中国)有限公司
Description: 一月份每日的全话题情感平均指数包含如下字段
analyTarget:分析目标
aDate: 数据产生日期
mark:情感指数
样例:
analyTarget aDate mark
kanzhihu 2016/1/1 -6.76
kanzhihu 2016/1/2 -12.22
kanzhihu 2016/1/3 9.72
kanzhihu 2016/1/4 -4.33

Size: 88K
Cite: Chen H., Li X.N., Zhang L.J., Huang Y.X., Cai X.S., Cloud-based Core Text Processing Services for Sentiment Analysis, IEEE Big Data Congress 2016.
Download




JTang Workflow BenchMark

Author: Bin Cao; Xueying Peng; Jiaxing Wang
Description: 数据集JTang Workflow BenchMark包含三部分,一个包含100个流程的待检索流程库,10个检索流程以及对于这些检索流程基于人类的认知判断给出的目标流程排序。不同的流程检索算法在这个数据集上运行时会得出基于他们的相似度衡量特征得到的不同结果数据。通过比较这些结果数据与我们标准数据集中给出的标准目标流程,包括准确率、目标流程顺序等等,便可以对算法效果进行分析判断。
Size: 146K
Cite:
Download




500 Service Event Logs

Author: Zhiling Luo
Description: The dataset contains the event logs of 500 service which are deployed on 10 servers (50 services per server).
Each line in the log files is an event. For example:

2014-10-27 14:24:12,760 - INFO: [RecvDataS]
2014-10-27 14:24:26,198 - INFO: [RecvDataE] 1697KB
2014-10-27 14:24:26,199 - INFO: [TransactionS] trans92
2014-10-27 14:24:28,201 - INFO: [TransactionE] trans92
2014-10-27 14:24:28,201 - INFO: [NextIP] 192.168.0.57:10007
2014-10-27 14:24:28,201 - INFO: [SendDataS]
2014-10-27 14:24:30,301 - INFO: [SendDataE] 1686KB

In this segement, the service recived the data 1697KB as the input and started a transation, called trans92 and sent 1686KB data to 192.168.0.57:10007.

Size: 1M
Cite: Zhiling Luo, Ying Li, Jianwei Yin, "A Framework for Transmission Cost Aware Service Selection" in 22nd International Conference on Web Services (ICWS 2015).
Download




Source data for CloudScout

Author: Xinkui Zhao
Description: 290台虚拟机运行时资源使用情况和TCP/UDP连接数的监控日志
Size: 190M
Cite: none
Download




Source data for vSpec

Author: Xinkui Zhao
Description: 系统运行时资源使用情况监控数据,运行的运行程序包含:Hadoop, IPTV,JTangTest, BlogBench.数据收集的时间为03/13/2015,监控的总维度为65维,具体数据格式如下: %user,%nice,%system,%iowait,%steal,%idle,proc/s,cswch/s,tps,rtps,wtps,bread/s,bwrtn/s,kbmemfree,kbmemused,%memused,kbbuffers,kbcached,kbcommit %commit,kbactive,kbinact,kbswpfree,kbswpused,%swpused,kbswpcad,%swpcad,rd_sec/s,wr_sec/s,avgrq-sz,avgqu-sz,await,svctm,%util,rxpck/s,txpck/s,rxkB/s,txkB/s,rxcmp/s,txcmp/s,rxmcst/s 所监控的虚拟机部署在KVM虚拟机上,每台虚拟机的操作系统为Linux 3.5.0-23-generic (ubuntu),每台虚拟机的配置为2G内存,52G磁盘,2核CPU。
Size: 7.44M
Cite: none
Download




15,361 real-world Web services dataset

Author: Liang Chen
Description: We crawl 15,361 real-world Web services from the Internet. For each service, we crawl the corresponding WSDL document and some other information. Each line in the .csv file is a Web service information record, where the following table provides the information form.
Size: 32M
Cite: "WTCluster: Utilizing Tags for Web Services Clustering", Liang Chen, Liukai Hu, Zibin Zheng, Jian Wu, Jianwei Yin, Ying Li, and Shuiguang Deng. 9th International Conference on Service Oriented Computing [ICSOC], Paphos, Cyprus, December 5-8, 2011, pp.204-218.
Download




WSDL dataset(ws-dream)

Author: Zibin Zheng
Description: This dataset includes:
readme.txt (1 KB): descriptions of the dataset.
wslist.txt (343 KB): contains information of 3,738 real-world Web services.
folder wsdl (46.2 MB): includes 3,738 WSDL files.
Size: 47M
Cite: Yilei Zhang, Zibin Zheng, and Michael R. Lyu, "WSExpress: A QoS-aware Search Engine for Web Services," in Proceedings of the 8th International Conference on Web Services (ICWS2010), Miami, Florida, USA, July 5-10, 2010, pp.83-90.
Download




142 * 4532 * 64 time-aware Web service QoS dataset(ws-dream)

Author: Zibin Zheng
Description: Real-world QoS evaluation results from 142 users on 4,532 Web services on 64 different time slots.
This dataset includes:
readme.txt (1 KB): descriptions of the dataset.
rtRate(480 MB): response-time values of 4,532 Web services when invoked by 142 service users in 64 time intervals. The data format is as following:
| Time Interval ID | Web Service ID | Service User ID | Response-Time (s) |
e.g.: 98 4352 33 0.311
tpRate (571 MB): throughput values of 4,532 Web services when invoked by 142 service users in 64 time intervals. The data format is as following:
| Time Interval ID | Web Service ID | Service User ID | Throughput (kbps) |
e.g.: 91 1196 62 32.882355
Size: 1G
Cite: Yilei Zhang, Zibin Zheng, and Michael R. Lyu, "WSPred: A Time-Aware Personalized QoS Prediction Framework for Web Services", in Proceedings of the 22th IEEE Symposium on Software Reliability Engineering (ISSRE 2011).
Download




339 * 5825 Web service QoS dataset(ws-dream)

Author: Zibin Zheng
Description: Real-world QoS evaluation results from 339 users on 5,825 Web services.
This dataset includes:
readme.txt (1 KB): descriptions of the dataset.
userlist.txt (19KB): information of 339 service users. Format: | User ID | IP address of user | country | longitude | latitude |
wslist.txt (505KB): information of the 5825 Web services. Format: | WS ID | WSDL address | provider name | country name |
rtmatrix.txt (11MB) 339 * 5825 user-item matrix of response-time. Use ULtraEdit to open the file, since the file size is too large for Notepad.
tpmatrix.txt (12MB): 339 * 5825 user-item matrix for throughput. Use ULtraEdit to open the file, since the file size is too large for Notepad.
Size: 23M
Cite: Zibin Zheng, Yilei Zhang, and Michael R. Lyu, “Distributed QoS Evaluation for Real-World Web Services,” in Proceedings of the 8th International Conference on Web Services (ICWS2010), Miami, Florida, USA, July 5-10, 2010, pp.83-90.
Download




Web Service QoS Dataset

Author: Zibin Zheng
Description: We monitor 100 Web services by using 150 distributed computer nodes located all over the world.
The dataset contains 150 files, where each file includes 10,000 Web service invocations on 100 Web services by a service user.
Planet-lab is employed for monitoring the Web services.
There are totally more than 1.5 millons Web service invocations.
Each line in the file is a Web service invocation result, where the following table provides some samples of the results.
Size: 2M
Cite: Zibin Zheng, Michael R. Lyu, "Collaborative Reliability Prediction for Service-Oriented Systems", in Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering (ICSE2010), Cape Town, South Africa, May 2-8, 2010, pp. 35 - 44.
Download




IBM工作流数据集

Author: jtang
Description: IBM服务器上一组工作流的BPMN的定义文件XML格式
Size: 2M
Cite: Fahland, Dirk, et al. "Instantaneous soundness checking of industrial business process models." Business Process Management. Springer Berlin Heidelberg, 2009. 278-293.
Download




WSTRank

Author: Liang Chen
Description: We extract 185 Web services from dataset1 for the service clustering evaluation. Specifically, there are 28 Web services in "Weather" category, 21 Web services in "Stock" category, 37 Web services in "SMS" category, 21 Web services in "Finance" category, 31 Web services in "Tourism" category, 27 Web services in "University" category, and 20 "noise" Web services. The .zip file includes 185 .xml files (WSDL documents), tag.txt (service name and the corresponding tags), groundtruth.txt (groundtruth for evaluation). For each service, there are two corresponding lines in the tag.txt. For example, the first two lines in the tag.txt includes the information in service 1 (i.e., 1.xml), and the third and forth line includes the information in service 2 (i.e., 2.xml). As for the lines in tag.txt, taking the first two lines as a example:

us weather
tags:report,company,free,zip code,weather,usa

Where "us weather" is the word segmentation result of the service name of 1.xml, and the second lin
Size: 3M
Cite: Liang Chen, Yilun Wang, Qi Yu, Zibin Zheng, and Jian Wu. "WT-LDA: User Tagging Augmented LDA for Web Service Clustering", 11st International Conference on Service Oriented Computing [ICSOC], Berlin, Germany, Dec 2 - 5, 2013. pp.162-176.
Download