Zehan Cui via gem5-users | 28 Jan 06:39 2015

how to support 8 cores for ARMv8 FS simulation

Hello everyone,

I have download the latest aarch64 binary and disk image from this link (http://www.gem5.org/dist/current/arm/aarch-system-2014-10.tar.xz). The command line I'm using is:

build/ARM/gem5.fast configs/example/fs.py --kernel=vmlinux.aarch64.20140821 --disk-image=aarch64-ubuntu-trusty-headless.img --dtb-filename=vexpress.aarch64.20140821.dtb --machine-type=VExpress_EMM64 --caches --l2cache --l3cache --l1d_size=32kB --l1i_size=32kB --l2_size=256kB --l3_size=16MB -n 8

However, I can only see four cores:

[    0.000000] Virtual kernel memory layout:
[    0.000000]     vmalloc : 0xffffff8000000000 - 0xffffffbbffff0000   (245759 MB)
[    0.000000]     vmemmap : 0xffffffbc01c00000 - 0xffffffbc02300000   (     7 MB)
[    0.000000]     modules : 0xffffffbffc000000 - 0xffffffc000000000   (    64 MB)
[    0.000000]     memory  : 0xffffffc000000000 - 0xffffffc020000000   (   512 MB)
[    0.000000]       .init : 0xffffffc000692000 - 0xffffffc0006c6200   (   209 kB)
[    0.000000]       .text : 0xffffffc000080000 - 0xffffffc0006914e4   (  6214 kB)
[    0.000000]       .data : 0xffffffc0006c7000 - 0xffffffc0007141e0   (   309 kB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000] RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.000000] NR_IRQS:64 nr_irqs:64 0
[    0.000000] Architected cp15 timer(s) running at 100.00MHz (phys).
[    0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns, wraps every 2748779069440ns
[    0.000068] Console: colour dummy device 80x25
[    0.000072] Calibrating delay loop (skipped) preset value.. 3997.69 BogoMIPS (lpj=19988480)
[    0.000076] pid_max: default: 32768 minimum: 301
[    0.000110] Mount-cache hash table entries: 1024 (order: 1, 8192 bytes)
[    0.000114] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes)
[    0.000402] hw perfevents: no hardware support available
[    0.060098] CPU1: Booted secondary processor
[    0.080101] CPU2: Booted secondary processor
[    0.100108] CPU3: Booted secondary processor
[    0.100130] Brought up 4 CPUs
[    0.100138] SMP: Total of 4 processors activated.


Does anyone know how to bring up 8 cores?

Thanks,
Zehan
_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

MSHR assertion fail

Hi all,

  First off thanks for your help with my previous question. However, the sad thing about fixing a bug is that another problem is revealed :-)

Currently my prefetcher works fine when it can only issue within the same page. When it is free to prefetch from different pages (while checking that the address is mapped), I get the following issue from the MSHRs:

gem5.opt: build/ALPHA_MESI_Two_Level/mem/cache/cache_impl.hh:1014: void Cache<TagStore>::recvTimingResp(PacketPtr) [with TagStore = RandomRepl; PacketPtr = Packet*]: Assertion `mshr->hasTargets()' failed.

It would seem that nobody called allocate or allocateTarget for the new page. Is that accurate or I am going down the wrong path?

Thanks again,
  George M
_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: Gem5 on multiple cores

Hi Tiago,

As I already pointed out a while back, there are some good reasons to not implement fine-grained parallelisation in the simulation kernel.

In most use-cases a large number of experiment will be run comparing various options, or merely getting statistical significance in the results. Thus, when running 10, 100 or even 1000’s of runs, there is already trivial parallelism, and adding synchronisation/locks etc to gem5 would only serve to slow things down and throw potential performance away. As an example, just making the reference counting pointers atomic costs ~15% performance even for a single-core run that does not even use it.

All that said, it makes sense to parallelise chunks of a large-scale simulation that are already parallel without much interaction, such as multiple systems communicating via Ethernet etc. There has already been efforts in this direction (the support is there with multiple EQs), and I think any efforts on making the latter a more well-developed flow/methodology is valuable.

Andreas



From: Tiago Mück via gem5-users <gem5-users <at> gem5.org>
Reply-To: Tiago Mück <tmuck <at> uci.edu>, gem5 users mailing list <gem5-users <at> gem5.org>
Date: Monday, 26 January 2015 23:56
To: gem5 users mailing list <gem5-users <at> gem5.org>
Subject: Re: [gem5-users] Gem5 on multiple cores

Hi everyone,

Is there any new effort in making this parallelization work for arbitrary multi-core systems (e.g  scaling the number of threads/EQs according to the number of cores in the simulated system) ?

Regards,
Tiago

On Tue, Aug 26, 2014 at 9:07 AM, Steve Reinhardt via gem5-users <gem5-users <at> gem5.org> wrote:
I'll mention that gem5 does have the foundation for parallelizing a single simulation across multiple cores; see for example http://repo.gem5.org/gem5/rev/2cce74fe359e.  However, if you want to model a non-trivial configuration (i.e., one where there is communication between threads), then you have to insert synchronization, and that does limit your speedup, as Andreas has mentioned.

Steve


On Tue, Aug 26, 2014 at 3:03 AM, Hussain Asad via gem5-users <gem5-users <at> gem5.org> wrote:
Thank you, Andreas
*moved to gem5-users :)


On Tue, Aug 26, 2014 at 8:39 AM, Andreas Hansson <Andreas.Hansson <at> arm.com> wrote:
Hi Hussain,

I’d suggest to ask on the gem5-users list for everyone’s benefit.

Multi-threading invariably comes at a cost, and if you want to run say 10 experiments, they are embarrassingly parallel. As one of the main purposes of gem5 is design-space exploration most users will be running 10’s or 100’s of experiments. Thus, instead of making gem5 multi-threaded and “throwing performance away”, it is efficient as a single-threaded simulator, and I suggest to run your experiments in parallel to make use of your many cores/servers etc.

Andreas

From: Hussain Asad <x7xcloudstrife <at> gmail.com>
Date: Tuesday, 26 August 2014 04:13
To: Andreas Hansson <andreas.hansson <at> arm.com>
Subject: Gem5 on multiple cores

Hi Andreas,

I have a quick question, I am running gem5 build on a core i7 system, but gem5 uses just one core of the available 8(4cores +4threads).

Is this feature not yet implemented or am I compiling the system not correctly, As I would assume if it was using all my CPU cores the simulation would be much faster.

running gem5 on Ubuntu 14 LTS, core i7, 8GB of RAM at the moment, should I move my system to University servers would it be faster in a server system?

Thanks
Best Regards
Hussain

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782


_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Meng Wang via gem5-users | 27 Jan 04:56 2015

memory change during system call emulation

Hello,
I am writing a trace probe for AtomicSimpleCPU. The simulation is
planed to run in SE mode. For user-level instruction, I can get
address and data of memory access by *traceData* member in
BasicSimpleCPU. But for system call, I don't know how to collect the
memory change during the syscall emulation, especially when read/write
syscalls are emulated. Anybody can provide a clue for me?

Thanks,
Meng

Stride prefetcher across pages

Hi all,

I noticed that the latest gem5-dev after the reorganization of prefetcher structures, removed the option to perform strided prefetching across pages. Now the prefetcher remains in the same page.

Is this assumption necessary for other components?

Forcing the option back produces a seg fault due to an unmapped address.

Thanks,
  George M
_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

GEM 5 & DSENT

Hi all,

Could anyone help me how to run Gem 5 with DSENT patch in util part.

Thanks,
MV
_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

create cluster_size

Hello Gem5 users..

Could you pls help me ..

I am using the gem5 to simulate my project ..and i am modifying the mesh.py to partition the network to equal sizes , now i have a question about this simulator "Where do you think I should call a parameter the name of the cluster size?" for example such as cpu-nums in gem5 ?





Thanks

_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

WAR dependency

Hello Users,

I have a basic read-after-write and write-after-read dependency question. If DRAM controller receives a write request, then it is added to the write queue and its response is sent immediately. When its response is being sent, access(pkt) method from AbstractMemory is called which actually writes the data. If a read request to the same address was pending in the readQueue (note that this read request arrived at DRAM controller before the write), then this request would read new data written. Wouldn't this violate WAR dependency? (I am sure WAR is handled properly, otherwise benchmarks would have resulted in incorrect output, but I don't seem to be able to figure out where the dependencies are handled in memory system).

Thank you,
-Rizwana
_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Tracing DRAM requests that target the software stack

I'm currently struggling to trace accesses to the software stack that
are not found in the cache (and, therefore, must be forwarded to the
DRAM).
By stack I mean the portion of memory dedicated to the stack (as in
the figure below).

---------------
|   Stack    |
|        ||      |   => Portion of memory that I'd like to keep track of.
|        \/      |
|                |
|                |
|       /\       |
|       ||       |
|    Heap   |
---------------
|     bss     |
---------------
|     data    |
---------------
|      text    |
---------------

Does anyone envision a way to do this without keeping track of the
addresses that are sent to the DRAM controller?
Is this something that a communication monitor could keep track of?
Zehan Cui via gem5-users | 26 Jan 04:03 2015

Gem5 freezing with x86 timing cpu

Hello George,

 

Have you solved this problem? I have just run into a similar problem.

 

Thanks,

Zehan

_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

McPAT: number of cache levels

Hello everyone

I am trying to understand and set various parameters in the xml file that is input to Mcpat. I have a question about setting parameters for number of cache levels. I have defined various parameters for L1 cache (system.core0.icache/system.core0.dcache) and shared L2 cache (system.L20).

1. My understanding is that I can choose to have an L2 cache in the system or not by setting parameter "number_cache_levels" to 1 or 2. However, in either case, I am getting the same power number output from Mcpat against my expectation of seeing a difference in power. How do I really set the number of cache levels in mcpat xml file? Am I missing something?

2. Could someone elaborate on the parameter "number_of_L2s" and if that has anything to do with caches? I am not able to understand the description given in sample xml file.

I have attached my xml file for reference.

Thanks
Lokesh
Attachment (l2.xml): text/xml, 25 KiB
_______________________________________________
gem5-users mailing list
gem5-users <at> gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Gmane